with zstd compression lets just run the steps individually#1117
Conversation
WalkthroughThe workflow .github/workflows/unit-tests-recipes.yml now emits a JSON array of {dir, image} objects, drives a matrix-based unit-tests job per recipe with container.image from each object, uses sparse checkout per directory, installs dependencies per-recipe, and runs pytest in each recipe directory. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Dev as Developer
participant CI as GitHub Actions
participant CD as Job: changed-dirs
participant UT as Job: unit-tests (matrix)
participant C as Container (per-recipe)
participant Repo as Repo (sparse)
Dev->>CI: Push / PR
CI->>CD: Run set-dirs
CD->>CD: Build DIRS_WITH_IMAGES = [{dir, image}, ...]
CD-->>UT: Output dirs (array of {dir, image})
rect rgba(230,245,255,0.5)
note over UT: Matrix fan-out per recipe
loop for each recipe in dirs
UT->>C: Start container using recipe.image
UT->>Repo: Checkout with sparse path = recipe.dir (cone=false)
UT->>UT: Show GPU info
UT->>UT: Install deps in recipe.dir (editable or requirements)
UT->>UT: Run pytest -v . in recipe.dir
end
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested reviewers
Poem
✨ Finishing Touches🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (3)
.github/workflows/unit-tests-recipes.yml (3)
92-115: Confirm image availability and consider a small guard.
- Verify both Docker Hub tags exist and are readable by CI; a missing/private tag will hard-fail matrix setup.
- Optional: guard against empty DIRS to avoid confusing logs.
Apply this minimal guard:
DIRS_WITH_IMAGES=$(echo "$DIRS" | jq -c ' - map({ + ( . // [] ) | map({ dir: ., image: ( if . == "models/amplify" then "svcbionemo023/bionemo-framework:amplify-model-devcontainer-082025" else "svcbionemo023/bionemo-framework:pytorch25.06-py3-squashed-zstd" end ) }) ')Also please confirm via a one-off CI run that both images pull successfully on the gpu runner.
145-148: Sparse checkout may miss shared root files; consider adding fetch-depth and patterns if needed.If any recipe relies on repo-root config (e.g., .coveragerc, tox.ini, pyproject.toml), sparse-checkout will omit it.
Suggestion (only if you hit missing-file errors):
uses: actions/checkout@v4 with: + fetch-depth: 0 - sparse-checkout: "${{ matrix.recipe.dir }}" + sparse-checkout: | + ${{ matrix.recipe.dir }} + .coveragerc + pyproject.toml sparse-checkout-cone-mode: falseConfirm whether any recipe imports across dirs or reads root-level configs before changing this.
169-170: Emit JUnit for easier triage (optional).JUnit output per-matrix job makes failures easier to inspect in the UI.
Apply:
- - name: Run tests - working-directory: ${{ matrix.recipe.dir }} - run: pytest -v . + - name: Run tests + working-directory: ${{ matrix.recipe.dir }} + run: pytest -v . --maxfail=1 --durations=25 --junitxml=/tmp/pytest.xml + - name: Upload test report + if: always() + uses: actions/upload-artifact@v4 + with: + name: junit-${{ matrix.recipe.dir }} + path: /tmp/pytest.xmlNote: If slashes in artifact names are a problem in your setup, replace with a sanitized name first.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
.github/workflows/unit-tests-recipes.yml(2 hunks)
| container: | ||
| image: ${{ matrix.recipe.image }} | ||
| strategy: | ||
| matrix: | ||
| recipe: ${{ fromJson(needs.changed-dirs.outputs.dirs) }} | ||
| fail-fast: false |
There was a problem hiding this comment.
💡 Verification agent
🧩 Analysis chain
GPU passthrough likely needs explicit Docker options.
Containers on GPU runners typically need --gpus all and larger shared memory for PyTorch.
Apply:
container:
image: ${{ matrix.recipe.image }}
+ options: --gpus all --ipc=host --shm-size=16gPlease verify CUDA is visible inside the job by adding a quick probe:
python -c "import torch; print('cuda:', torch.cuda.is_available(), 'devices:', torch.cuda.device_count())"🏁 Script executed:
#!/bin/bash
set -e
echo "=== runs-on entries in unit-tests-recipes.yml ==="
rg -n '^ runs-on:' -n .github/workflows/unit-tests-recipes.yml
echo
echo "=== container block around line 132 ==="
sed -n '120,160p' .github/workflows/unit-tests-recipes.yml
echo
echo "=== Searching for torch.cuda usage in the repository ==="
rg -nP --type=py 'torch\.cuda' -C2 || echo "No torch.cuda references detected"Length of output: 183
🏁 Script executed:
#!/bin/bash
set -e
echo "=== runs-on entries in .github/workflows/unit-tests-recipes.yml ==="
rg -n '^ *runs-on:' .github/workflows/unit-tests-recipes.yml
echo
echo "=== container block around matrix definition ==="
rg -nP 'container:' -C5 .github/workflows/unit-tests-recipes.yml
echo
echo "=== Any GPU-related labels or runner names ==="
rg -nP 'gpu|cuda' -n .github/workflows/unit-tests-recipes.yml || echo "No GPU or CUDA labels found"Length of output: 1072
Add Docker GPU and IPC options to the container
By default the Actions container won’t passthrough GPUs or enlarge shared memory—add the following under container to enable GPU support and sufficient /dev/shm:
container:
image: ${{ matrix.recipe.image }}
+ options: --gpus all --ipc=host --shm-size=16gThen confirm CUDA inside the job:
python -c "import torch; print('cuda:', torch.cuda.is_available(), 'devices:', torch.cuda.device_count())"🤖 Prompt for AI Agents
In .github/workflows/unit-tests-recipes.yml around lines 132 to 137, the
container configuration lacks Docker options to expose GPUs and enlarge shared
memory; add a container.options entry with Docker flags like --gpus all
--ipc=host --shm-size=1g (or your preferred size) so the runner will passthrough
GPUs and increase /dev/shm, and then add a job step that runs python -c "import
torch; print('cuda:', torch.cuda.is_available(), 'devices:',
torch.cuda.device_count())" to confirm CUDA availability inside the job.
Signed-off-by: Peter St. John <pstjohn@nvidia.com>
c279cce to
7f31546
Compare
There was a problem hiding this comment.
Actionable comments posted: 0
♻️ Duplicate comments (2)
.github/workflows/unit-tests-recipes.yml (2)
141-143: Augment GPU probe to verify torch CUDA availability.
Keep nvidia-smi, but also check the Python stack loads CUDA.Apply:
- name: Show GPU info run: nvidia-smi + - name: Verify torch CUDA + run: | + python - <<'PY' +import torch +print("cuda_available:", torch.cuda.is_available(), "device_count:", torch.cuda.device_count()) +assert torch.cuda.is_available(), "CUDA not available inside container" +PY
132-134: GPU not passed through to the container; add Docker options.
Without these, nvidia-smi/torch CUDA will likely fail inside the container on GPU runners.Apply:
container: image: ${{ matrix.recipe.image }} + options: --gpus all --ipc=host --shm-size=16g
🧹 Nitpick comments (3)
.github/workflows/unit-tests-recipes.yml (3)
92-115: Dir→image mapping looks good; pin images by digest for reproducibility.
Tag-only versions can drift. Consider pinning both images with @sha256 digests and documenting the provenance of the squashed image.
152-165: Harden dependency install; use the container’s interpreter and ensure pytest.
Avoid barepip; invoke via python, upgrade tooling, and guarantee pytest exists. Also prefer unsetting PIP_CONSTRAINT vs setting it empty.Apply:
- - name: Install dependencies + - name: Install dependencies working-directory: ${{ matrix.recipe.dir }} run: | - if [ -f pyproject.toml ] || [ -f setup.py ]; then - PIP_CONSTRAINT= pip install -e . - echo "Installed ${{ matrix.recipe.dir }} as editable package" - elif [ -f requirements.txt ]; then - PIP_CONSTRAINT= pip install -r requirements.txt - echo "Installed ${{ matrix.recipe.dir }} from requirements.txt" - else - echo "No pyproject.toml, setup.py, or requirements.txt found in ${{ matrix.recipe.dir }}" - exit 1 - fi + python --version + python -m pip install -U pip setuptools wheel + unset PIP_CONSTRAINT || true + if [ -f pyproject.toml ] || [ -f setup.py ]; then + python -m pip install -e . + echo "Installed ${{ matrix.recipe.dir }} as editable package" + elif [ -f requirements.txt ]; then + python -m pip install -r requirements.txt + echo "Installed ${{ matrix.recipe.dir }} from requirements.txt" + else + echo "No pyproject.toml, setup.py, or requirements.txt found in ${{ matrix.recipe.dir }}" + exit 1 + fi + python - <<'PY' +try: + import pytest # noqa + print("pytest present") +except Exception: + import sys, subprocess + subprocess.check_call([sys.executable, "-m", "pip", "install", "pytest"]) +PY
167-168: Consider clearer test output and quicker failure.
Optional: add --maxfail=1 and -r a for actionable logs; or emit JUnit XML for parsing.Example:
- run: pytest -v . + run: pytest -v -r a --maxfail=1 .
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
.github/workflows/unit-tests-recipes.yml(2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: unit-tests (recipes/esm2_native_te_mfsdp, svcbionemo023/bionemo-framework:pytorch25.06-py3-squash...
🔇 Additional comments (1)
.github/workflows/unit-tests-recipes.yml (1)
149-151: Sparse checkout: confirm recipes are fully self-contained.
If tests import shared modules outside ${{ matrix.recipe.dir }}, sparse checkout will break them. Either include shared paths in sparse-checkout or verify no cross-dir imports.
7f31546 to
11e6464
Compare
Reverts part of #1108 since the output can be very hard to parse. with zstd compression and dockerhub caching the image pull doesn't take as much time
Summary by CodeRabbit
New Features
Tests
Chores