Skip to content

Consider deprecating spark.comet.schemaEvolution.enabled #4298

@andygrove

Description

@andygrove

Background

spark.comet.schemaEvolution.enabled is an internal config that gates whether Comet's Parquet scan paths permit certain widening type promotions (INT32 -> INT64, FLOAT -> DOUBLE, and on Spark 4+ INT32 -> DOUBLE).

Defaults today (via ShimCometConf):

  • Spark 3.x: false
  • Spark 4.x: true

@mbutrovich raised the question in #4229 (comment):

why do we have spark.comet.schemaEvolution.enabled config anymore? Maybe we should deprecate that first and help us simplify the story. I think it's legacy from when Comet's Parquet decoder could be called from Iceberg, which has different schema evolution semantics.

Why this is worth investigating

The flag now exists primarily to model the Spark version's permissiveness rather than a user-tunable knob: Spark 3.x rejects these widenings, Spark 4.x accepts them. If that's its only purpose post Iceberg-decoder removal, the per-version default already encodes the right answer and a user-tunable internal config adds little besides surface area to reason about.

Scope of this issue

Investigate and report back:

  1. Are there any code paths today that flip this flag away from the per-version default (Iceberg integration, tests, callers outside the Comet codebase)?
  2. Does keeping the flag enable any correct behavior that we'd lose by hardcoding per-version defaults?
  3. If neither (1) nor (2), propose a deprecation path: rename to a non-tunable internal constant, fold the check into the version-specific shim, and update the contributor docs.

No code changes required up front; a writeup on the above is enough to decide next steps.

Related

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions