Background
spark.comet.schemaEvolution.enabled is an internal config that gates whether Comet's Parquet scan paths permit certain widening type promotions (INT32 -> INT64, FLOAT -> DOUBLE, and on Spark 4+ INT32 -> DOUBLE).
Defaults today (via ShimCometConf):
- Spark 3.x:
false
- Spark 4.x:
true
@mbutrovich raised the question in #4229 (comment):
why do we have spark.comet.schemaEvolution.enabled config anymore? Maybe we should deprecate that first and help us simplify the story. I think it's legacy from when Comet's Parquet decoder could be called from Iceberg, which has different schema evolution semantics.
Why this is worth investigating
The flag now exists primarily to model the Spark version's permissiveness rather than a user-tunable knob: Spark 3.x rejects these widenings, Spark 4.x accepts them. If that's its only purpose post Iceberg-decoder removal, the per-version default already encodes the right answer and a user-tunable internal config adds little besides surface area to reason about.
Scope of this issue
Investigate and report back:
- Are there any code paths today that flip this flag away from the per-version default (Iceberg integration, tests, callers outside the Comet codebase)?
- Does keeping the flag enable any correct behavior that we'd lose by hardcoding per-version defaults?
- If neither (1) nor (2), propose a deprecation path: rename to a non-tunable internal constant, fold the check into the version-specific shim, and update the contributor docs.
No code changes required up front; a writeup on the above is enough to decide next steps.
Related
Background
spark.comet.schemaEvolution.enabledis an internal config that gates whether Comet's Parquet scan paths permit certain widening type promotions (INT32 -> INT64,FLOAT -> DOUBLE, and on Spark 4+INT32 -> DOUBLE).Defaults today (via
ShimCometConf):falsetrue@mbutrovich raised the question in #4229 (comment):
Why this is worth investigating
The flag now exists primarily to model the Spark version's permissiveness rather than a user-tunable knob: Spark 3.x rejects these widenings, Spark 4.x accepts them. If that's its only purpose post Iceberg-decoder removal, the per-version default already encodes the right answer and a user-tunable internal config adds little besides surface area to reason about.
Scope of this issue
Investigate and report back:
No code changes required up front; a writeup on the above is enough to decide next steps.
Related