-
Notifications
You must be signed in to change notification settings - Fork 627
ci: conditional backend github workflow #3141
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Anant Sharma <anants@nvidia.com>
Signed-off-by: Anant Sharma <anants@nvidia.com>
Signed-off-by: Anant Sharma <anants@nvidia.com>
Signed-off-by: Anant Sharma <anants@nvidia.com>
Signed-off-by: Anant Sharma <anants@nvidia.com>
Signed-off-by: Anant Sharma <anants@nvidia.com>
Signed-off-by: Anant Sharma <anants@nvidia.com>
Signed-off-by: Anant Sharma <anants@nvidia.com>
Signed-off-by: Anant Sharma <anants@nvidia.com>
WalkthroughAdds two composite GitHub Actions (Docker build and Pytest), introduces centralized path filters, and refactors the container validation workflow to a multi-stage, change-detected execution with per-backend jobs (vllm, sglang, trtllm), concurrency control, and a status aggregation job. Updates trigger_ci to use the centralized filters. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant GH as GitHub Workflow
participant CF as changed-files (paths-filter)
participant V as job: vllm
participant S as job: sglang
participant T as job: trtllm
participant BSC as backend-status-check
GH->>CF: Detect code changes via .github/filters.yaml
CF-->>GH: outputs.has_code_changes (true/false)
alt has_code_changes == true
par Backend jobs (conditional)
GH->>V: Build & Test (uses composite actions)
GH->>S: Build & Test (uses composite actions)
GH->>T: Build & Test (uses composite actions)
end
else No code changes
Note over GH: Skip backend jobs
end
GH->>BSC: Summarize results (always runs)
BSC-->>GH: Final status
sequenceDiagram
autonumber
participant WB as Workflow Job
participant DBA as docker-build-composite-action
participant BX as Docker Buildx
participant NGC as nvcr.io Registry
participant BS as ./container/build.sh
WB->>DBA: inputs (framework, target, image_tag, tokens, AWS/SCCache)
DBA->>BX: Setup Buildx
opt NGC login (conditional on event/repo)
DBA->>NGC: docker login with ngc_ci_access_token
NGC-->>DBA: login success/fail
end
DBA->>DBA: Cleanup (pre)
DBA->>DBA: Compute IMAGE_TAG (custom or framework:latest)
DBA->>BS: Build with tag/target/framework, env (CI/AWS/SCCache)
BS-->>DBA: Build complete
DBA-->>WB: outputs.image_tag
DBA->>DBA: Cleanup (post)
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Poem
Pre-merge checks✅ Passed checks (3 passed)
Tip 👮 Agentic pre-merge checks are now available in preview!Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.
Please see the documentation for more information. Example: reviews:
pre_merge_checks:
custom_checks:
- name: "Undocumented Breaking Changes"
mode: "warning"
instructions: |
Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal). Please share your feedback with us on this Discord post. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
🧹 Nitpick comments (10)
.github/actions/docker-build-composite-action/action.yml (2)
60-60
: Fix YAML lint: remove extra space after colon.Conforms to yamllint and avoids noisy warnings.
- SCCACHE_S3_BUCKET: ${{ inputs.sccache_s3_bucket }} + SCCACHE_S3_BUCKET: ${{ inputs.sccache_s3_bucket }}
49-54
: Avoid aggressive docker prune before build; move a safer prune after build.
docker system prune -af
on shared/self‑hosted runners can nuke caches/images used by parallel jobs and kills BuildKit caching.- - name: Cleanup - if: always() - shell: bash - run: | - docker system prune -af + # (moved after Build image step)Add this step after the Build image step:
+ - name: Cleanup (prune dangling images only) + if: always() + shell: bash + run: | + docker image prune -af.github/filters.yaml (2)
27-33
: Consider expanding backend filters to include shared test/utils if applicable.If vllm logic lives beyond the single test file, widen globs now to avoid under‑triggering.
1-10
: Unused anchors (docs, vllm, sglang, trtllm).Since aliases aren’t used (and sequence merge isn’t supported), either remove anchors or reference them as full lists (not as items).
.github/actions/pytest-composite-action/action.yml (1)
23-27
: Mount workspace and pass env into container.Without
-v
, tests rely on code baked into the image. If that’s not guaranteed, mount the repo. Also export HF_HOME if needed.- docker run --runtime=nvidia --rm --gpus all -w /workspace \ - --network host \ + docker run --runtime=nvidia --rm --gpus all -w /workspace \ + --network host \ + --shm-size 1g \ + -e HF_HOME=${{ env.HF_HOME }} \ + -v "$GITHUB_WORKSPACE:/workspace" \ --name ${{ env.CONTAINER_ID }}_pytest \ ${{ inputs.image_tag }} \ bash -c "pytest -xsv --basetemp=/tmp --junitxml=${{ env.PYTEST_XML_FILE }} -m \"${{ inputs.pytest_marks }}\""Confirm whether images already contain the repo; if yes, the volume mount is optional.
.github/workflows/container-validation-backends.yml (5)
17-28
: Expose per-backend filter outputs for targeted execution.Currently only
has_code_changes
is exported; all backends run on any code change. Export per-backend outputs to gate jobs precisely.changed-files: runs-on: ubuntu-latest outputs: - has_code_changes: ${{ steps.filter.outputs.has_code_changes }} + has_code_changes: ${{ steps.filter.outputs.has_code_changes }} + vllm: ${{ steps.filter.outputs.vllm }} + sglang: ${{ steps.filter.outputs.sglang }} + trtllm: ${{ steps.filter.outputs.trtllm }} steps:
39-43
: Gate vLLM job on vLLM changes only.Avoids unnecessary builds/tests.
- if: needs.changed-files.outputs.has_code_changes == 'true' + if: needs.changed-files.outputs.vllm == 'true'
65-68
: Gate SGLang job on SGLang changes only.- if: needs.changed-files.outputs.has_code_changes == 'true' + if: needs.changed-files.outputs.sglang == 'true'
90-93
: Gate TRT‑LLM job on TRT‑LLM changes only.- if: needs.changed-files.outputs.has_code_changes == 'true' + if: needs.changed-files.outputs.trtllm == 'true'
4-4
: Optional: tighten default permissions.Add minimal permissions at workflow root for better security posture.
name: Docker Build and Test +permissions: + contents: read
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
.github/actions/docker-build-composite-action/action.yml
(1 hunks).github/actions/pytest-composite-action/action.yml
(1 hunks).github/filters.yaml
(1 hunks).github/workflows/container-validation-backends.yml
(1 hunks).github/workflows/trigger_ci.yml
(1 hunks)
🧰 Additional context used
🪛 YAMLlint (1.37.1)
.github/actions/docker-build-composite-action/action.yml
[warning] 60-60: too many spaces after colon
(colons)
🪛 actionlint (1.7.7)
.github/workflows/container-validation-backends.yml
40-40: label "gpu-l40-amd64" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2025", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-24.04-arm", "ubuntu-22.04", "ubuntu-22.04-arm", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file
(runner-label)
65-65: label "gpu-l40-amd64" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2025", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-24.04-arm", "ubuntu-22.04", "ubuntu-22.04-arm", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file
(runner-label)
90-90: label "gpu-l40-amd64" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2025", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-24.04-arm", "ubuntu-22.04", "ubuntu-22.04-arm", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file
(runner-label)
🔇 Additional comments (6)
.github/actions/docker-build-composite-action/action.yml (1)
72-77
: Double‑check secret propagation and masking in build script.Ensure
./container/build.sh
doesn't echo AWS creds/CI token. If it prints, mask or pass via build args/credentials file..github/filters.yaml (1)
41-48
: Consistency: include TRT‑LLM deps files analogous to vLLM if needed.Double‑check coverage for wheel/requirements changes beyond current globs.
.github/workflows/container-validation-backends.yml (3)
12-15
: Concurrency group/cancel policy looks good.Reasonable cancellation strategy for non-main branches.
30-38
: Status check logic is fine; ensure jq is available.Ubuntu images usually have jq, but if self-hosted, install it or use
node -e
equivalent.
40-40
: Verify custom runner label presence.
gpu-l40-amd64
is fine for self-hosted; ensure runners are registered with this label..github/workflows/trigger_ci.yml (1)
54-54
: LGTM — centralized filters reference verifiedAll downstream filter keys (vllm, sglang, trtllm, sdk, has_code_changes) are present in .github/filters.yaml.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have any specific workflow links to show the testing?
- A workflow where you changed a filtered file, and the expected jobs are skipped
- A workflow where you change a target file, and the expected jobs are triggered
If this hasn't been done, you can comment out the filters temporarily and add some test filters just to ensure its working properly.
First one, you can check here - https://github.com/ai-dynamo/dynamo/actions/runs/17862125429 Second is just this PR (as we trigger on ci changes rn) so any latest commit can verify that |
Signed-off-by: Anant Sharma <anants@nvidia.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm! Thank you Anant! 👏
Overview:
Update GitHub workflow for backend validation to run based on source code changes, along with reusable composite actions and centralized path filtering.
Right now we are skipping all backend checks for doc only and other top level config file changes
TODO: properly define filters for each backend and use that to trigger relevant workflow
Files changes that will trigger all backend-checks
Tested skip flow here - https://github.com/ai-dynamo/dynamo/actions/runs/17862125429
Post merge,
backend-status-check
will be a required check on github PR, instead of vllm/framework specific checkKey Changes
closes: OPS-862
Summary by CodeRabbit
New Features
Chores