Skip to content

ci: reduce repeated setup and nightly rebuild work#4119

Merged
kerwin612 merged 4 commits intoapache:masterfrom
Aias00:ci-stage1-optimizations
Apr 16, 2026
Merged

ci: reduce repeated setup and nightly rebuild work#4119
kerwin612 merged 4 commits intoapache:masterfrom
Aias00:ci-stage1-optimizations

Conversation

@Aias00
Copy link
Copy Markdown
Contributor

@Aias00 Aias00 commented Apr 15, 2026

What's changed?

This PR now contains two CI optimization stages that keep the workflow scope intact but reduce repeated setup work and shorten the backend PR critical path:

Stage 1

  • cache Maven dependencies in the shared Java setup action
  • cache the downloaded mvnd distribution in the shared Java setup action
  • add concurrency to cancel superseded workflow runs for the same branch/PR
  • add pnpm cache setup for frontend/docs workflows
  • align the mvnd cache key with the actual Linux/X64 artifact and fail fast on unsupported runners

Stage 2

  • split the backend PR workflow so the heavy Maven hertzbeat-e2e subtree runs in its own parallel required job
  • keep the core backend build focused on non-E2E modules while still producing dist/ for the Docker image + API E2E lane
  • upload Codecov coverage from both the core backend lane and the Maven-E2E lane
  • switch monitor/backend compose-based flows to native docker compose instead of downloading legacy docker-compose each run
  • keep nightly backend tests enabled while collapsing duplicated packaging work into a single Maven invocation

This branch intentionally does not modify MCP Bash Server CI: editing that workflow currently surfaces an unrelated enterprise action-allowlist failure (dtolnay/rust-toolchain@stable), which should be handled separately.

Why this should help

Baseline evidence from the current repo before stage 2:

  • Backend CI green runs around 14m35s, with about 12m33s spent in one Maven step
  • that Maven step also included the hertzbeat-e2e subtree, whose reactor summary contained several long modules such as hertzbeat-log-e2e (3m42s) and hertzbeat-collector-basic-e2e (1m38s)
  • Docker image build + API E2E were only about 84s combined

The intended result is:

  • the core backend lane reports much sooner because it no longer waits on the Maven E2E subtree
  • the Docker image + API E2E lane starts from the uploaded dist/ artifact as soon as the core build is done
  • the Maven E2E subtree still runs as a separate required parallel check instead of being dropped
  • repeat pushes waste less runner time because older runs are auto-cancelled

Validation

  • parsed changed workflow YAML files and the shared composite action with python3 + yaml.safe_load
  • ran git diff --check
  • locally validated that the new explicit Maven E2E module selector expands to the intended leaf modules before failing on the local machine's JDK < 25
  • confirmed local docker compose version availability for the compose-command migration

Checklist

  • I have read the Contributing Guide
  • I have written the necessary doc or comment.
  • I have added the necessary unit tests and all cases have passed.

Add or update API

  • I have added the necessary e2e tests and all cases have passed.

Add dependency caching where workflows repeatedly download toolchains and
packages, cancel superseded runs for the same ref, and remove duplicated
nightly Maven work that rebuilt artifacts after the main release build had
already produced them.

Constraint: Keep existing workflow coverage and trigger scope unchanged in this pass
Rejected: Split Backend CI into separate jobs now | larger behavior change than the first-stage optimization pass
Confidence: medium
Scope-risk: narrow
Reversibility: clean
Directive: Validate warm-cache runtime on GitHub Actions before changing job topology so speedups stay attributable
Tested: Parsed all workflow YAML files and the shared composite action with python3 yaml.safe_load
Tested: git diff --check
Not-tested: Remote GitHub Actions execution, cache hit rates, and nightly artifact publication
Copilot AI review requested due to automatic review settings April 15, 2026 01:53
Drop the MCP Bash Server workflow edit from this branch so the PR does not
surface an unrelated enterprise action-allowlist failure from that existing
workflow definition.

Constraint: apache/hertzbeat rejects dtolnay/rust-toolchain@stable at workflow startup
Rejected: Fix the MCP workflow in this PR | separate policy/remediation task outside the first-stage CI optimization scope
Confidence: high
Scope-risk: narrow
Reversibility: clean
Directive: If the MCP workflow is optimized later, replace the blocked action reference with an org-allowed equivalent first
Tested: Parsed all workflow YAML files and the shared composite action with python3 yaml.safe_load
Tested: git diff --check
Not-tested: Remote GitHub Actions execution after the PR resync
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Optimizes GitHub Actions CI execution by adding dependency caching and concurrency controls to reduce repeated setup work and redundant runs.

Changes:

  • Enable Maven dependency caching and cache the mvnd distribution in the shared setup-deps composite action.
  • Add workflow concurrency to cancel superseded runs for the same branch/PR.
  • Add pnpm caching to frontend/docs workflows and streamline the nightly backend packaging steps.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
script/ci/github-actions/setup-deps/action.yml Adds Maven cache via setup-java and caches/restores the mvnd install directory.
.github/workflows/nightly-build.yml Adds concurrency, pnpm cache setup, and switches backend build to mvnd with fewer/combined steps.
.github/workflows/monitor-e2e-test.yml Adds concurrency to reduce redundant runs.
.github/workflows/mcp-bashserver-test.yml Adds concurrency cancellation for PR/branch updates.
.github/workflows/license-checker.yml Adds concurrency cancellation for PR/branch updates.
.github/workflows/frontend-build-test.yml Adds concurrency and pnpm caching via setup-node.
.github/workflows/doc-deploy.yml Adds concurrency and pnpm caching configuration to the deploy workflow.
.github/workflows/doc-build-test.yml Adds concurrency and pnpm caching via setup-node.
.github/workflows/backend-build-test.yml Adds concurrency and removes a trailing YAML artifact line.
Comments suppressed due to low confidence (1)

.github/workflows/doc-deploy.yml:63

  • actions/setup-node with cache: pnpm expects pnpm to be available when the action runs (it shells out to pnpm to determine the store path). In this workflow, pnpm is installed after setup-node, so enabling pnpm caching here is likely to fail or be a no-op. Reorder the steps so pnpm/action-setup runs before actions/setup-node (or use corepack-enabled pnpm explicitly) to ensure caching works reliably.
      - name: Setup Node.js
        uses: actions/setup-node@v4
        with:
          node-version: '20'
          cache: pnpm
          cache-dependency-path: home/pnpm-lock.yaml

      # Setup pnpm
      - name: Setup pnpm
        uses: pnpm/action-setup@v4
        with:
          version: latest


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 35 to 46
path: ~/.local/mvnd
key: ${{ runner.os }}-${{ runner.arch }}-mvnd-2.0.0-rc-3

- name: Install mvnd
if: steps.mvnd-cache.outputs.cache-hit != 'true'
shell: bash
run: |
MVND_VERSION=2.0.0-rc-3
curl -sL https://dlcdn.apache.org/maven/mvnd/${MVND_VERSION}/maven-mvnd-${MVND_VERSION}-linux-amd64.zip -o mvnd.zip
unzip -q mvnd.zip
mkdir -p $HOME/.local
mv maven-mvnd-${MVND_VERSION}-linux-amd64 $HOME/.local/mvnd
Copy link

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The mvnd download/extract is hard-coded to the linux-amd64 distribution (...-linux-amd64.zip / maven-mvnd-...-linux-amd64), but the cache key includes ${{ runner.arch }}. If this composite action ever runs on ARM64 (or non-Linux), it will download the wrong binary and/or fail to find the extracted directory. Consider selecting the mvnd artifact based on runner.os/runner.arch (or explicitly fail fast when unsupported) and keeping the cache key aligned with the selected artifact.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 06d57f1. The shared action now fails fast unless the runner is Linux/X64, and the mvnd cache key is pinned to the actual linux-amd64 artifact that gets downloaded and extracted.

Comment thread .github/workflows/nightly-build.yml Outdated
Comment on lines +53 to +56
- name: Build the Backend
run: |
mvn clean install
mvn clean package -Prelease -DskipTests
cd hertzbeat-collector
mvn clean package -Pcluster -DskipTests
mvnd -B clean install -Prelease -DskipTests
mvnd -B -f hertzbeat-collector/hertzbeat-collector-collector/pom.xml package -Pcluster -DskipTests
Copy link

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This changes the nightly backend build from running tests (mvn clean install previously) to skipping them (-DskipTests on the clean install invocation). That’s a functional coverage change and seems to conflict with the PR description (“keeps the existing workflow coverage intact”). Either restore test execution here or update the PR description / workflow intent to reflect that nightly no longer runs backend tests.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 06d57f1. Nightly backend coverage now preserves the original test-running behavior via mvnd -B clean install, and the follow-up packaging work is scoped to hertzbeat-startup and the collector module so the duplicate full-reactor rebuild is still removed.

Aias00 added 2 commits April 15, 2026 10:02
… coverage

Fail fast when the shared setup action is used on an unsupported runner so the
cached mvnd binary and downloaded archive stay consistent, and restore nightly
backend test execution while keeping the duplicate release packaging work scoped
to the startup and collector packaging modules.

Constraint: setup-deps currently downloads only the linux-amd64 mvnd distribution
Constraint: Nightly CI should retain backend test coverage from the previous workflow behavior
Rejected: Add multi-platform mvnd selection in this PR | current callers are all ubuntu-latest and this pass only needs to remove the cache/key mismatch safely
Rejected: Keep nightly tests skipped | conflicts with the existing workflow behavior and PR intent
Confidence: high
Scope-risk: narrow
Reversibility: clean
Directive: If setup-deps is reused on non-Linux or non-X64 runners later, add explicit artifact selection before broadening support
Tested: Parsed all workflow YAML files and the shared composite action with python3 yaml.safe_load
Tested: git diff --check
Not-tested: Remote GitHub Actions execution after the review fix
The backend PR workflow spent most of its time inside one root reactor build
that combined core backend verification, release packaging, and the heavy
hertzbeat-e2e subtree. Split the Maven E2E subtree into its own parallel job,
keep the dist-producing core build focused on non-E2E modules, and remove
legacy docker-compose bootstrapping from the remaining E2E-oriented flows.

Constraint: Backend PR CI must still produce dist artifacts for Docker E2E and preserve Maven E2E coverage
Constraint: Nightly CI must continue running backend tests while producing both release and cluster packaging outputs
Rejected: Keep backend as one monolithic Maven job | leaves the 12m+ reactor as the dominant critical path
Rejected: Select only the hertzbeat-e2e aggregator module | does not include the leaf E2E modules that actually carry the tests
Confidence: medium
Scope-risk: moderate
Reversibility: clean
Directive: If branch protection starts requiring the new parallel jobs individually, review required-check configuration after merge
Tested: Parsed changed workflow YAML files with python3 yaml.safe_load
Tested: git diff --check
Tested: Local Maven selector probe showed the new leaf-module -pl command expands to the intended E2E subtree before failing on local JDK < 25
Tested: docker compose version
Not-tested: Remote GitHub Actions execution for the new split backend workflow and combined Codecov behavior across parallel jobs
@kerwin612 kerwin612 merged commit e57d617 into apache:master Apr 16, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants