Skip to content

CI: Fix JMH benchmark workflows#15800

Merged
kevinjqliu merged 8 commits intoapache:mainfrom
kevinjqliu:kevinjqliu/improve-jmh
Mar 28, 2026
Merged

CI: Fix JMH benchmark workflows#15800
kevinjqliu merged 8 commits intoapache:mainfrom
kevinjqliu:kevinjqliu/improve-jmh

Conversation

@kevinjqliu
Copy link
Copy Markdown
Contributor

@kevinjqliu kevinjqliu commented Mar 27, 2026

Fix both jmh-benchmarks.yml and recurring-jmh-benchmarks.yml github workflows, and update site/docs/benchmarks.md

The JMH benchmark workflows were broken because:

  1. The default Spark version changed to 4.1, so Spark 3.5 projects are no longer registered unless explicitly requested via -DsparkVersions
  2. The Gradle task path used a stale project name (iceberg-spark-3.5) missing the Scala suffix

Changes

jmh-benchmarks.yml

  • Split spark_version input into separate spark_version (default 3.5) and scala_version (default 2.12)
  • Pass -DsparkVersions and -DscalaVersion to Gradle to register the correct Spark project
  • Fix show-matrix to read spark_version from github.event.inputs instead of non-existent job output

recurring-jmh-benchmarks.yml

  • Use separate spark and scala matrix parameters with -DsparkVersions and -DscalaVersion
  • Remove invalid repository/ref checkout params (schedule trigger has no inputs)
  • Add workflow_dispatch trigger for manual runs

site/docs/benchmarks.md

  • Add -DsparkVersions=3.5 -DscalaVersion=2.12 to all local benchmark commands
  • Document Spark/Scala version inputs for the GitHub Actions workflow

Testing

Tested on forked repo w/ multiple benchmarks
https://github.com/kevinjqliu/iceberg/actions/runs/23672440749

@github-actions github-actions bot added the INFRA label Mar 27, 2026
@github-actions github-actions bot added the docs label Mar 27, 2026
A benchmark that evaluates the performance of writing nested Parquet data using Iceberg and the built-in file source in Spark. To run this benchmark for either spark-2 or spark-3:

`./gradlew :iceberg-spark:iceberg-spark-3.5_2.12:jmh -PjmhIncludeRegex=IcebergSourceNestedListParquetDataWriteBenchmark -PjmhOutputPath=benchmark/iceberg-source-nested-list-parquet-data-write-benchmark-result.txt`
`./gradlew -DsparkVersions=3.5 -DscalaVersion=2.12 :iceberg-spark:iceberg-spark-3.5_2.12:jmh -PjmhIncludeRegex=IcebergSourceNestedListParquetDataWriteBenchmark -PjmhOutputPath=benchmark/iceberg-source-nested-list-parquet-data-write-benchmark-result.txt`
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ran and validated this locally

@kevinjqliu kevinjqliu requested a review from huaxingao March 27, 2026 23:10

- name: Run Benchmark
run: ./gradlew :iceberg-spark:${{ matrix.spark_version }}:jmh -PjmhIncludeRegex=${{ matrix.benchmark }} -PjmhOutputPath=benchmark/${{ matrix.benchmark }}.txt -PjmhJsonOutputPath=benchmark/${{ matrix.benchmark }}.json
run: ./gradlew -DsparkVersions=${{ matrix.spark }} -DscalaVersion=${{ matrix.scala }} :iceberg-spark:iceberg-spark-${{ matrix.spark }}_${{ matrix.scala }}:jmh -PjmhIncludeRegex=${{ matrix.benchmark }} -PjmhOutputPath=benchmark/${{ matrix.benchmark }}.txt -PjmhJsonOutputPath=benchmark/${{ matrix.benchmark }}.json
Copy link
Copy Markdown
Member

@ebyhr ebyhr Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I sent a similar PR #15729 a few days ago and noticed the actions/upload-artifact below fails with the conflicted artifact name. My PR added another step to change the artifact name. Does this PR work without such changes?

I'll close my PR. Please feel to copy the change if needed :)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tested this PR at #15802. The job failed with:

Failed to CreateArtifact: Received non-retryable error: Failed request: (409) Conflict: an artifact with this name already exists on the workflow run

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks! i ran into this https://github.com/apache/iceberg/actions/runs/23669409378/job/68959473079

cherry picked your commit and fix it up

Copy link
Copy Markdown
Contributor

@huaxingao huaxingao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks @kevinjqliu

@kevinjqliu kevinjqliu merged commit 4eee56c into apache:main Mar 28, 2026
37 checks passed
@kevinjqliu kevinjqliu deleted the kevinjqliu/improve-jmh branch March 28, 2026 03:35
manuzhang pushed a commit to manuzhang/iceberg that referenced this pull request Mar 30, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants