Skip to content

Reduce Iceberg CI matrix: pin JDK per Spark version #4101

@andygrove

Description

@andygrove

Describe the proposed change

The Iceberg CI workflow (.github/workflows/iceberg_spark_test.yml) currently runs a fully-crossed matrix:

  • 3 job types (iceberg-spark, iceberg-spark-extensions, iceberg-spark-runtime)
  • 3 Iceberg versions (1.8.1, 1.9.1, 1.10.0)
  • 2 Spark versions (3.4.3, 3.5.8)
  • 2 JDK versions (11, 17)
  • 1 Scala version (2.13)

That's 36 jobs per PR. Adding Spark 4.0.1 / 4.1.1 to the same pattern would push it higher.

The JDK dimension is the easiest to trim. Each Spark version effectively has one preferred JDK in practice:

  • Spark 3.4 → JDK 11
  • Spark 3.5 → JDK 17 (and Spark 4.x will require 17+ anyway)

Pinning JDK per Spark version (using a matrix include: rather than full cross-product) cuts the matrix in half: 36 → 18 jobs, removing 18 redundant combinations.

Rationale

  • Iceberg's own CI does not exhaustively cross JDK with every Spark version; we are overcovering relative to upstream.
  • JDK 11 vs 17 differences that affect Comet are caught elsewhere: pr_build_linux.yml and spark_sql_test.yml both run JDK 11 + 17 against multiple Spark versions.
  • The Iceberg suites are about Iceberg/Comet integration, not JVM-level differences; the same JDK delta is unlikely to surface there but not in the broader matrix.
  • Frees CI capacity to add Spark 4.0.1 and 4.1.1 to the matrix without a net increase in PR runtime.

Proposed change

Replace the fully-crossed matrix in each of the three Iceberg jobs with an include: list that pins JDK per Spark version, e.g.

strategy:
  matrix:
    iceberg-version: [{short: '1.8', full: '1.8.1'}, {short: '1.9', full: '1.9.1'}, {short: '1.10', full: '1.10.0'}]
    include:
      - spark-version: {short: '3.4', full: '3.4.3'}
        java-version: 11
      - spark-version: {short: '3.5', full: '3.5.8'}
        java-version: 17

Net effect: 36 → 18 PR jobs.

Additional context

Part of a broader CI cleanup ahead of adding Spark 4.0.1 and 4.1.1 to the test matrix. Other potential follow-ups (separate issues): drop Iceberg 1.9 (boundary coverage only), reduce macOS matrix, tier nightly vs PR-blocking tests.

Metadata

Metadata

Assignees

No one assigned

    Labels

    area:Icebergarea:ciCI/CD, GitHub Actions, build toolingenhancementNew feature or requestgood first issueGood for newcomerspriority:lowMinor issues, test failures, tooling, cosmetic

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions