Skip to content

Investigate Spark SQL test suites that load Comet without registering CometShuffleManager #4377

@andygrove

Description

@andygrove

What is the problem the feature request solves?

After #4328, CometSparkSessionExtensions.isCometLoaded now returns false and logs a warning when spark.comet.exec.shuffle.enabled is true (the default) but spark.shuffle.manager is not set to CometShuffleManager.

This surfaces a long-standing asymmetry in the Spark SQL test diffs:

  • dev/diffs/{3.4.3,3.5.8,4.0.2,4.1.1}.diff patch SparkSession.applyExtensions to globally add CometSparkSessionExtensions to every SparkSession built in the JVM whenever ENABLE_COMET=true.
  • The same diffs set spark.shuffle.manager=CometShuffleManager only inside SharedSparkSessionBase.sparkConf and TestHiveContext.

Test suites that build their own SparkConf/SparkSession outside those traits therefore get Comet installed but no shuffle manager. Before #4328 these suites still ran with Comet, just falling back to Spark's default shuffle. After #4328 Comet disables itself entirely for these suites and logs a warning per isCometLoaded invocation.

The once-per-session log dedup landing alongside this issue keeps the log volume manageable, but the underlying questions are unanswered:

  1. Are these suites supposed to be exercising Comet at all? If yes, the diff needs to register CometShuffleManager for them too (currently a coverage regression).
  2. If no, should we suppress the warning when Comet was auto-injected by the diff rather than explicitly enabled by the user (i.e. only warn when the user set spark.comet.enabled=true themselves)?
  3. Should we revive the spark.comet.exec.shuffle.required opt-out that feat: disable Comet by default when CometShuffleManager is not registered #4328's description promised but the merged code did not implement?

Spark 4.1.1 suites observed emitting the warning

Attribution is from the in-progress run https://github.com/apache/datafusion-comet/actions/runs/26177223270 (six of seven 4.1.1 jobs complete; sql_core-1 still running at file time). Counts are pre-dedup.

Warnings Suite
~304 BroadcastJoinSuiteAE
~132 BroadcastJoinSuite
58 ExecutorSideSQLConfSuite
30 ExpressionInfoSuite
24 DisableUnnecessaryBucketedScanWithoutHiveSupportSuite
18 SparkSessionJobTaggingAndCancellationSuite
18 SparkSessionBuilderSuite
~12 UISeleniumSuite
6 UISeleniumWithRocksDBBackendSuite
6 ParquetCommitterSuite
4 OrcFilterSuite
2 ParquetV1SchemaPruningSuite
2 OrcV2SchemaPruningSuite

The Spark 3.5.8, 4.0.2, and 3.4.3 diffs have the same shape, so the same suites should surface there.

Describe the potential solution

Investigate and decide whether this matters. Possible directions:

  • Accept as-is — the warning is informative and the per-session dedup keeps log volume manageable. Test coverage for those suites without CometShuffleManager is small.
  • Extend the diffs to register CometShuffleManager for the affected suites (per-suite patches, or a broader hook such as a system property picked up by every SparkConf).
  • Add an opt-out config (e.g. spark.comet.exec.shuffle.required) so users / tests can keep Comet enabled with Spark's default shuffle manager.
  • Restrict the disable-on-missing-shuffle-manager logic to sessions where the user explicitly set spark.comet.enabled=true, leaving auto-injected sessions in the prior "Comet on, default shuffle" mode.

Additional context

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions