[GLUTEN-11559][Build] Improve incremental build time for test-compile phase#11560
Conversation
|
Run Gluten Clickhouse CI on x86 |
There was a problem hiding this comment.
Pull request overview
This PR targets faster incremental mvn test-compile runs by deferring build-info generation to later lifecycle phases and enabling incremental protobuf codegen.
Changes:
- Bump
protobuf-maven-pluginversion in parentpom.xmlfrom0.5.1to0.6.1. - Enable incremental protobuf generation via
<checkStaleness>true</checkStaleness>in affected modules. - Move
maven-antrun-pluginbuild-info executions fromgenerate-resourcestoprepare-packageingluten-coreandgluten-substrait.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| pom.xml | Upgrades protobuf-maven-plugin version in pluginManagement. |
| gluten-substrait/pom.xml | Moves backend build-info generation later; enables protobuf staleness checking. |
| gluten-core/pom.xml | Enables protobuf staleness checking; moves core build-info generation later. |
| backends-velox/pom.xml | Enables protobuf staleness checking for velox backend protos. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
gluten-core/pom.xml
Outdated
| <goal>run</goal> | ||
| </goals> | ||
| <phase>generate-resources</phase> | ||
| <phase>prepare-package</phase> |
There was a problem hiding this comment.
Moving the build-info antrun execution to prepare-package means gluten-build-info.properties is generated after the default process-resources phase. Since this file is written under ${project.build.directory}/generated-resources (see dev/gluten-build-info.sh) and resources are only copied into ${project.build.outputDirectory} during process-resources, the gluten-core artifact (and reactor classpath for downstream modules) will not contain the build-info resource in a clean build. This can break runtime code that loads gluten-build-info.properties (e.g., shims/common/.../GlutenBuildInfo.scala throws if missing) and any tests that touch that code.
Consider either keeping this execution before process-resources, or adding a resources copy step after this execution (e.g., a maven-resources-plugin execution in prepare-package) / adjusting the script to write directly into ${project.build.outputDirectory} so the file ends up on the classpath before packaging and downstream module tests.
| <phase>prepare-package</phase> | |
| <phase>generate-resources</phase> |
|
Run Gluten Clickhouse CI on x86 |
7c48ecb to
1c2d696
Compare
|
Run Gluten Clickhouse CI on x86 |
1c2d696 to
0764fd9
Compare
|
Run Gluten Clickhouse CI on x86 |
0764fd9 to
f376cb7
Compare
|
Run Gluten Clickhouse CI on x86 |
f702a89 to
2fef0dc
Compare
|
Run Gluten Clickhouse CI on x86 |
1 similar comment
|
Run Gluten Clickhouse CI on x86 |
2fef0dc to
b83c2fa
Compare
|
Run Gluten Clickhouse CI on x86 |
b83c2fa to
b8155e6
Compare
|
Run Gluten Clickhouse CI on x86 |
b8155e6 to
39f8cc1
Compare
|
Run Gluten Clickhouse CI on x86 |
|
Run Gluten Clickhouse CI on x86 |
5a4da52 to
201d145
Compare
|
Run Gluten Clickhouse CI on x86 |
1 similar comment
|
Run Gluten Clickhouse CI on x86 |
da5dd85 to
406821d
Compare
|
Run Gluten Clickhouse CI on x86 |
…leness Enable <checkStaleness>true</checkStaleness> in all protobuf-maven-plugin executions (gluten-core, gluten-substrait, backends-velox) so protobuf compilation is skipped when .proto files haven't changed, improving incremental build speed.
1. Upgrade scala-maven-plugin from 4.8.0 to 4.9.2 (aligned with Spark) 2. Change scala.recompile.mode from 'all' to 'incremental' 3. Skip javac compilation - Zinc already handles Java sources in incremental mode (same approach as Apache Spark) 4. Add -Ybackend-parallelism 8 for both Scala 2.12 and 2.13 profiles 5. Update gluten-it to use incremental mode and 4.9.2 (hardcoded since it's a standalone third-party module without parent POM properties)
Merge build-info and build-info-with-backends into a single execution in gluten-core, eliminating the separate call from gluten-substrait: - Remove build-info-with-backends execution from gluten-substrait/pom.xml - Remove redundant backend profile definitions from gluten-substrait - Add --backend parameter to gluten-core's build-info execution - Modify gluten-build-info.sh to compute backend paths internally based on backend_type (no longer needs external path argument) - Remove DO_REMOVAL flag; always regenerate the file from scratch
- dev/run-scala-test.sh: Run ScalaTest like IntelliJ IDEA from CLI with auto classpath resolution, profiler support, and mvnd integration - build/mvnd: Maven Daemon wrapper (auto-downloads mvnd 1.0.3) for persistent JVM that keeps Zinc's JIT caches across builds - build/mvn: Increase ReservedCodeCacheSize from 1g to 2g - dev/analyze-build-profile.py: Analyze Maven profiler JSON reports - .gitignore: Add build/mvnd, .run-scala-test-cache/, .profiler/, .mvn/
Remove the <phase>test-compile</phase> override from the prepare-test-jar execution in all 18 modules. The test-jar goal defaults to the package phase, so test-jars are no longer rebuilt during mvn test-compile. This eliminates a Zinc cascade recompilation issue: previously, test-jars were repackaged at every test-compile invocation (even with no changes), causing downstream modules to detect classpath changes and triggering full recompilation of their test sources. Trade-off: cross-module test-jar dependencies in the reactor are now resolved from the local repository (~/.m2) during test-compile. Run 'mvn install -DskipTests' after changing upstream test APIs.
Since Scala 2.13.15 (scala/scala#10708), the semantics of combined -Wconf rules changed: in '-Wconf:x,y', y now takes priority over x (last-match-wins), whereas before 2.13.15, x took priority (first-match-wins). This means '-Wconf:cat=deprecation:wv,any:e' now treats deprecation warnings as errors (any:e overrides cat=deprecation:wv), breaking Scala 2.13 compilation when -Pdelta is enabled. Split into separate -Wconf flags where later flags have higher priority: -Wconf:any:e (baseline) -Wconf:msg=While parsing annotations in:silent (override) -Wconf:cat=deprecation:wv (override) This aligns with Apache Spark's approach in SPARK-49746 (983f6f43). Gluten uses Scala 2.13.17 which is affected by this change. Reference: - scala/scala#10708 - apache/spark#48192
AI tools can perform build profile analysis on-demand without requiring a committed script. Moved to fix/improve-incremental-build-tmp branch for reference.
653a80a to
e8dc081
Compare
|
Run Gluten Clickhouse CI on x86 |
1 similar comment
|
Run Gluten Clickhouse CI on x86 |
|
@baibaichen I ran into a issue seems introduced by the incremental mode: |
| <os.full.name>unknown</os.full.name> | ||
| <!-- To build built-in backend c++ codes --> | ||
| <scala.recompile.mode>all</scala.recompile.mode> | ||
| <scala.recompile.mode>incremental</scala.recompile.mode> |
There was a problem hiding this comment.
can we use a separate profile for this?
There was a problem hiding this comment.
Do we have -Dxxx to control the behavior, and update the default value to original all
What changes are proposed in this pull request?
This PR dramatically improves incremental
mvn test-compiletime through a series of build system optimizations, and introducesdev/run-scala-test.sh— a CLI tool that replicates IntelliJ IDEA's ScalaTest execution, enabling AI agents (Claude, GitHub Copilot, etc.) to run individual ScalaTest methods for automated bug fixing.fix #11559
Motivation
Gluten's incremental
mvn test-compiletakes ~2.5 minutes even for single-file changes (32-core machine), making the AI-driven edit → compile → test → fix loop impractically slow. With Maven Daemon (mvnd), incremental builds now complete in 20 seconds and zero-change builds in 3 seconds.Additionally, standard Maven cannot run individual ScalaTest methods — the
-Dsuitesand-amflags conflict.run-scala-test.shsolves this by building the classpath via Maven and then launching ScalaTest directly, exactly as IntelliJ does.Changes
Commit 1: Upgrade protobuf-maven-plugin and enable checkStaleness
protobuf-maven-pluginfrom 0.5.1 to 0.6.1<checkStaleness>true</checkStaleness>in all protobuf executions (gluten-core, gluten-substrait, backends-velox).protofiles are unchangedCommit 2: Enable Scala incremental compilation
scala-maven-pluginfrom 4.8.0 to 4.9.2 (aligned with Apache Spark)scala.recompile.modefromalltoincremental— Zinc now recompiles only affected filesmaven-compiler-plugincompilation — Zinc already handles Java sources in incremental mode (same approach as Apache Spark)-Ybackend-parallelism 8for parallel code generation in both Scala 2.12/2.13Commit 3: Consolidate build-info generation
build-infoandbuild-info-with-backendsinto a single antrun execution in gluten-corebuild-info-with-backendsexecution from gluten-substraitgluten-build-info.shnow computes backend paths internally based on--backendparameterCommit 4: Add dev tooling for AI agent integration
dev/run-scala-test.sh(674 lines): Run ScalaTest like IntelliJ IDEA from CLItarget/classesdirs for instant code changes--mvndfor Maven Daemon,--profilefor profiling,--export-onlyfor classpath inspection-t "test name") to verify fixesbuild/mvnd: Maven Daemon wrapper (auto-downloads mvnd 1.0.3) — persistent JVM preserves Zinc's JIT cachesbuild/mvn: IncreaseReservedCodeCacheSizefrom 1g to 2gdev/analyze-build-profile.py: Analyze maven-profiler JSON reports with comparison mode.gitignore: Addbuild/mvnd,.run-scala-test-cache/,.profiler/,.mvn/Commit 5: Move test-jar from test-compile to package phase
<phase>test-compile</phase>fromprepare-test-jarexecution in all 18 modulestest-jargoal defaults to thepackagephase, so test-jars are no longer rebuilt duringmvn test-compiletest-compileinvocation (even with no changes), causing downstream modules to detect classpath changes and triggering full recompilation of their test sources~/.m2duringtest-compile. Runmvn install -DskipTestsafter changing upstream test APIsBenchmark Results
Machine 1
Machine 1
run-scala-test.shUsage (for AI Agents)The
--mvndflag uses Maven Daemon for persistent JVM, reducing repeat compilations to ~3s. The built-in cache detects source file changes and skips Maven entirely when nothing changed.Fixes #11559
How was this patch tested?
mvn test-compile -pl backends-velox -am(BUILD SUCCESS)--export-onlyclasspath comparison: IDENTICAL before and after Commit 5 changesdev/benchmark-build.shacross 11 scenarios (mvn/mvnd × clean/incremental/zero-change/cache/force)Was this patch authored or co-authored using generative AI tooling?
Generated-by: GitHub Copilot (Claude Opus 4.6)