Skip to content

ci: build container once and share across downstream tests#661

Merged
chtruong814 merged 13 commits into
mainfrom
chtruong/container-test
May 7, 2026
Merged

ci: build container once and share across downstream tests#661
chtruong814 merged 13 commits into
mainfrom
chtruong/container-test

Conversation

@chtruong814
Copy link
Copy Markdown
Contributor

@chtruong814 chtruong814 commented May 5, 2026

Summary

  • Adds a cicd-container-build job that builds a single inframework base container image and pushes it to ECR (evaluator:<sha>)
  • All downstream test jobs pull the pre-built image instead of building independently, then run uv sync --extra <framework> to install framework-specific deps at runtime
  • Hardcodes runner prefixes, test data paths, and registries in pre-flight config (matching NeMo-LM pattern)
  • Removes all Azure credential plumbing since we're on AWS H100 runners exclusively
  • Remove the MLM inframework test. MBridge test will be good enough
  • Update the MBridge checkpoints for testing
  • Had to install a specific version of nvrx to work with this MBridge

Test plan

  • Verify cicd-container-build job builds and pushes to ECR successfully
  • Verify downstream GPU tests pull the image and install framework extras correctly
  • Verify CPU unit tests can pull from ECR
  • Verify coverage collection still works end-to-end

🤖 Generated with Claude Code

@chtruong814 chtruong814 requested a review from a team as a code owner May 5, 2026 01:49
@github-actions github-actions Bot added the CI label May 5, 2026
Instead of having each test job build its own container, add a
cicd-container-build job that builds a single inframework base image,
pushes it to ECR, and all downstream tests pull that image. Each test
then runs `uv sync --extra <framework>` to install its specific
inference framework deps at runtime.

Also hardcodes runner prefixes/paths/registries in pre-flight config
(matching NeMo-LM pattern) and removes Azure credential plumbing
since we're using AWS H100 runners exclusively.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Charlie Truong <chtruong@nvidia.com>
@chtruong814
Copy link
Copy Markdown
Contributor Author

/ok to test d12cb14

Signed-off-by: Charlie Truong <chtruong@nvidia.com>
@chtruong814
Copy link
Copy Markdown
Contributor Author

/ok to test f9009c0

Signed-off-by: Charlie Truong <chtruong@nvidia.com>
@github-actions github-actions Bot added the tests label May 5, 2026
@chtruong814
Copy link
Copy Markdown
Contributor Author

/ok to test e25cdfc

Signed-off-by: Charlie Truong <chtruong@nvidia.com>
Signed-off-by: Charlie Truong <chtruong@nvidia.com>
@chtruong814
Copy link
Copy Markdown
Contributor Author

/ok to test 9eabc10

Signed-off-by: Charlie Truong <chtruong@nvidia.com>
Signed-off-by: Charlie Truong <chtruong@nvidia.com>
@chtruong814
Copy link
Copy Markdown
Contributor Author

/ok to test 84327c7

Signed-off-by: Charlie Truong <chtruong@nvidia.com>
Signed-off-by: Charlie Truong <chtruong@nvidia.com>
@chtruong814
Copy link
Copy Markdown
Contributor Author

/ok to test a670fee

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants