Skip to content

run all recipes on a single node#1108

Merged
pstjohn merged 10 commits into
NVIDIA-BioNeMo:mainfrom
pstjohn:pstjohn/ci-run-recipes-serially
Sep 4, 2025
Merged

run all recipes on a single node#1108
pstjohn merged 10 commits into
NVIDIA-BioNeMo:mainfrom
pstjohn:pstjohn/ci-run-recipes-serially

Conversation

@pstjohn
Copy link
Copy Markdown
Collaborator

@pstjohn pstjohn commented Sep 3, 2025

This PR does two things:

  • We want changed-files to show the files changed in this PR, not necessarily all differences between ToT main and the current branch. The merge queue will make sure that these run at least once where ToT main is the merge base
  • we spend almost all our time currently in CI pulling the base image, since we have to pull the pytorch base image independently on every CI runner. This change uses a single runner and just uses docker run for each test suite, so we can re-use that image pull

Summary by CodeRabbit

  • Chores

    • CI now compares changes against a merge-base for more accurate diffs.
    • Consolidated unit tests into a single runner for simpler, faster execution.
    • Streamlined outputs and removed unnecessary per-directory and matrix steps.
  • Tests

    • Switched to a slimmer container image for test runs.
    • Added pip caching to speed dependency installs.
    • Ensured a clean environment by unsetting constraint variables before installs.
    • Standardized test invocation and non-interactive Docker runs.
    • Improved logging to show cache location and key test context.

Signed-off-by: Peter St. John <pstjohn@nvidia.com>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Sep 3, 2025

Walkthrough

Compute merge-base against origin/main for changed-files detection, use that baseline to select directories, replace per-recipe test steps with a single ci/recipes_local_test.py runner, and update the Docker test runner to mount a pip cache, use a squashed base image, unset PIP_CONSTRAINT before installs, and adjust install/test commands.

Changes

Cohort / File(s) Summary
Recipes workflow overhaul
.github/workflows/unit-tests-recipes.yml
Add "Get merge-base" step and use its output as base_sha for changed-files; remove per-directory image mapping and matrix/container strategy; simplify outputs; run a single ci/recipes_local_test.py invocation with derived dirs instead of per-recipe checkout/install/pytest steps.
Framework workflow merge-base update
.github/workflows/unit-tests-framework.yml
Add "Get merge-base" step (id merge-base), set changed-files base_sha to ${{ steps.merge-base.outputs.merge-base }}, exclude ci/scripts/recipes_local_test.py from the changed-files filter, and ensure the merge-base step runs before changed-files.
Docker test runner script updates
ci/scripts/recipes_local_test.py
Add PIP_CACHE_DIR via platformdirs.user_cache_dir; mount it into Docker (add volume) and remove -it from DOCKER_RUN_ARGS; change DEFAULT_CONTAINER to svcbionemo023/bionemo-framework:pytorch25.06-py3-squashed; unset PIP_CONSTRAINT before installs and prefix installs with PIP_CACHE_DIR=/workspace/.cache/pip; run tests with python -m pytest -v; log the cache path at runtime.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant GH as GitHub Actions (unit-tests-recipes)
  participant Repo as Repository
  participant CF as changed-files action
  participant Runner as ci/recipes_local_test.py
  participant Verify as verify-recipe-tests

  GH->>Repo: Checkout
  GH->>Repo: Get merge-base (git merge-base HEAD origin/main)
  GH->>CF: Compute changed files (base_sha = merge-base)
  CF-->>GH: changed-files JSON + dirs
  GH->>GH: Convert DIRS_JSON -> DIRS_ARGS
  GH->>Runner: Run `ci/recipes_local_test.py` with DIRS_ARGS
  Runner-->>GH: Exit status & logs
  GH->>Verify: Validate overall results
  Verify-->>GH: Status
  note over Runner: Docker run mounts pip cache and uses updated image/install/test commands
Loading
sequenceDiagram
  autonumber
  participant GH as GitHub Actions (unit-tests-framework)
  participant Repo as Repository
  participant CF as changed-files action

  GH->>Repo: Checkout
  GH->>Repo: Get merge-base (git merge-base HEAD origin/main)
  GH->>CF: changed-files (base_sha = merge-base, exclude ci/scripts/recipes_local_test.py)
  CF-->>GH: changed-files outputs
  note over GH,CF: Dynamic base_sha and updated file filter
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~30 minutes

Poem

I twitch my nose at merge-base trails,
One hop finds what each branch unveils.
A pip-cache burrow, Docker snug,
Tests now run from one warm mug.
Hoppity hooray — fewer tails to chase!

✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore or @coderabbit ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
.github/workflows/unit-tests-recipes.yml (1)

84-84: Use printf when writing to GITHUB_OUTPUT to avoid quoting/formatting pitfalls

Minor robustness: printf avoids issues with shells interpreting quotes or special chars in JSON.

-          echo "dirs=$DIRS" >> $GITHUB_OUTPUT
+          printf 'dirs=%s\n' "$DIRS" >> "$GITHUB_OUTPUT"
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 4a47964 and 0173c0d.

📒 Files selected for processing (1)
  • .github/workflows/unit-tests-recipes.yml (2 hunks)

Signed-off-by: Peter St. John <pstjohn@nvidia.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
.github/workflows/unit-tests-recipes.yml (1)

58-60: Fix: ls + set -e will fail when no directories exist

With bash -e, ls -d recipes// models// exits non‑zero if a glob doesn’t match, aborting the step (e.g., if models/ or recipes/ is absent). Use nullglob or guard the command.

-          ALL_DIRS=$(ls -d recipes/*/ models/*/ 2>/dev/null | jq -R -s -c 'split("\n")[:-1] | map(rtrimstr("/"))')
+          shopt -s nullglob
+          dirs=(recipes/*/ models/*/)
+          shopt -u nullglob
+          ALL_DIRS=$(printf '%s\n' "${dirs[@]}" | jq -R -s -c 'split("\n")[:-1] | map(rtrimstr("/"))')
🧹 Nitpick comments (5)
.github/workflows/unit-tests-recipes.yml (5)

78-81: Normalize changed dir names to avoid false negatives

If the changed-files output contains trailing slashes or mixed forms, index($dir) can miss matches. Normalize both sides in jq.

-              DIRS=$(echo "$ALL_DIRS" | jq -c --argjson changed "$CHANGED_FILES" '
-                map(select(. as $dir | $changed | index($dir) != null))
-              ')
+              DIRS=$(echo "$ALL_DIRS" | jq -c --argjson changed "$CHANGED_FILES" '
+                ([$changed[] | rtrimstr("/")]) as $norm_changed
+                | map(select(. as $dir | $norm_changed | index($dir) != null))
+              ')

84-84: Use multiline GITHUB_OUTPUT to safely emit JSON

Echoing JSON inline risks parsing issues. Prefer the EOF form.

-          echo "dirs=$DIRS" >> $GITHUB_OUTPUT
+          {
+            echo "dirs<<EOF"
+            echo "$DIRS"
+            echo "EOF"
+          } >> "$GITHUB_OUTPUT"

96-109: Call script via Python to avoid exec-bit/shebang pitfalls; guard empty args

Directly executing the script can fail if it’s not executable or lacks a shebang. Also, fail early if no dirs slipped through unexpectedly.

       - name: Run tests
         env:
           DIRS_JSON: ${{ needs.changed-dirs.outputs.dirs }}
         run: |
           # Convert JSON array to space-separated arguments
           DIRS_ARGS=$(echo "$DIRS_JSON" | jq -r '.[]' | tr '\n' ' ')
-          ./ci/recipes_local_test.py $DIRS_ARGS
+          if [[ -z "$DIRS_ARGS" ]]; then
+            echo "No directories to test; exiting."
+            exit 0
+          fi
+          python -u ci/recipes_local_test.py $DIRS_ARGS

91-95: Set an explicit timeout for the consolidated job

Now that all recipes run on a single node, add a job-level timeout to prevent runaway executions.

   unit-tests:
     needs: changed-dirs
     runs-on: linux-amd64-gpu-l4-latest-1
     if: ${{ needs.changed-dirs.outputs.dirs != '[]' }}
+    timeout-minutes: 240

111-124: Rename messages to drop “matrix” wording

The job no longer uses a matrix; update wording for clarity.

-    # This job checks the status of the unit-tests matrix and fails if any matrix job failed or was cancelled.
-    # Use this job as the required check for PRs.
+    # This job checks the status of the consolidated unit-tests job and fails if it failed or was cancelled.
+    # Use this job as the required check for PRs.
       - name: Check unit-tests matrix status
         run: |
           if [[ "${{ needs.unit-tests.result }}" == "failure" || "${{ needs.unit-tests.result }}" == "cancelled" ]]; then
-            echo "Some unit-tests matrix jobs have failed or been cancelled!"
+            echo "Unit-tests job failed or was cancelled!"
             exit 1
           else
-            echo "All unit-tests matrix jobs have completed successfully or were skipped!"
+            echo "Unit-tests job completed successfully or was skipped!"
             exit 0
           fi
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 0173c0d and 14d0086.

📒 Files selected for processing (1)
  • .github/workflows/unit-tests-recipes.yml (2 hunks)

Signed-off-by: Peter St. John <pstjohn@nvidia.com>
Signed-off-by: Peter St. John <pstjohn@nvidia.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
ci/recipes_local_test.py (1)

66-66: Fix incorrect docstring.

The docstring mentions returning tuples but the function returns a list of strings.

-    Returns list of (directory_path, docker_image) tuples.
+    Returns list of directory paths.
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between e1d955a and e0b4260.

📒 Files selected for processing (2)
  • .github/workflows/unit-tests-recipes.yml (3 hunks)
  • ci/recipes_local_test.py (3 hunks)
🔇 Additional comments (15)
ci/recipes_local_test.py (9)

26-29: LGTM: Persistent pip cache implementation looks good.

The addition of platformdirs import and PIP_CACHE_DIR configuration will improve build performance by persisting pip cache across test runs.


43-44: LGTM: Docker volume mount for pip cache.

Correctly mounts the host pip cache directory to the container's pip cache location.


59-59: LGTM: Cache directory creation.

Properly creates the pip cache directory structure within the git root to ensure the mount point exists.


63-87: LGTM: Well-implemented directory filtering logic.

The get_test_directories function correctly:

  • Scans both models/ and recipes/ directories
  • Filters based on input directories when provided
  • Validates that input directories exist
  • Returns absolute paths for consistency

96-96: LGTM: Added error handling.

The set -e addition ensures the script fails fast on any error, which is appropriate for CI environments.


102-102: LGTM: Consistent pip cache environment variable usage.

Properly sets PIP_CACHE_DIR and clears PIP_CONSTRAINT for both installation scenarios.

Also applies to: 106-106


114-114: LGTM: More explicit pytest invocation.

Using python -m pytest instead of just pytest is more reliable and explicit.


174-174: LGTM: Informative logging.

Adding cache location logging helps with debugging and transparency.


50-51: Ensure the new internal container image is pullable in CI and includes all required dependencies.
I can’t verify the pull here—please confirm in your CI environment that nvcr.io/nvidian/cvai_bnmo_trng/bionemo:pytorch25.06-py3-squashed is accessible (credentials, network) and contains everything the previous nvcr.io/nvidia/pytorch:25.06-py3 image provided. Consider adding a CI pre-pull check to fail fast if the image isn’t available.

.github/workflows/unit-tests-recipes.yml (6)

38-44: LGTM: Merge-base calculation for changed files.

Using merge-base against origin/main provides a more accurate baseline for detecting changes compared to just using the main branch head. This prevents issues where main has moved ahead since the PR was created.


52-52: LGTM: Using merge-base for change detection.

Correctly uses the calculated merge-base instead of hardcoded "main" for more accurate change detection.


92-92: LGTM: Simplified output format.

The simplified output format aligns with the new single-script approach that no longer needs per-directory image mapping.


95-102: LGTM: Enhanced debugging output.

The expanded logging provides valuable debugging information including current branch, merge-base commit, and changed files analysis.


117-122: LGTM: Streamlined test execution.

The new approach correctly:

  • Converts JSON array to space-separated arguments
  • Passes all changed directories to a single script invocation
  • Eliminates the complexity of per-directory matrix jobs

This aligns perfectly with the PR objective to "run all recipes on a single node".


117-122: Script already executable — no action required. Verified execute permissions on ci/recipes_local_test.py (-rwxr-xr-x).

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
ci/scripts/recipes_local_test.py (2)

26-30: Ensure host cache dir exists and align with the mount (stop creating an unused repo-local cache).

Create PIP_CACHE_DIR on the host; the current mkdir in get_git_root() targets repo/.cache/pip instead.

 from platformdirs import user_cache_dir
 
 
 PIP_CACHE_DIR = user_cache_dir(appname="bionemo-pip-cache", appauthor="nvidia")
+Path(PIP_CACHE_DIR).mkdir(parents=True, exist_ok=True)

Additional change (outside this hunk): remove the unused mkdir in get_git_root():

# Remove this line in get_git_root():
# (Path(git_root) / ".cache" / "pip").mkdir(parents=True, exist_ok=True)

43-45: Propagate PIP_CACHE_DIR into the container env for all pip invocations.

Mount is good; also pass -e PIP_CACHE_DIR so any pip call (including transitive tooling) uses the cache without needing per-command env.

   "--ulimit",
   "stack=67108864",
   "-v",
   f"{PIP_CACHE_DIR}:/workspace/.cache/pip",
+  "-e",
+  "PIP_CACHE_DIR=/workspace/.cache/pip",
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between e0b4260 and a995136.

📒 Files selected for processing (3)
  • .github/workflows/unit-tests-framework.yml (2 hunks)
  • .github/workflows/unit-tests-recipes.yml (3 hunks)
  • ci/scripts/recipes_local_test.py (3 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • .github/workflows/unit-tests-recipes.yml
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: changed-dirs
🔇 Additional comments (5)
.github/workflows/unit-tests-framework.yml (2)

44-44: Using merge-base as base_sha: LGTM.

This correctly scopes changed-files to the common ancestor.


53-53: Excluding the test runner may skip CI when it changes—confirm intent.

If only ci/scripts/recipes_local_test.py changes, any_changed becomes false and build/tests may skip. If you want CI to run on runner changes, drop this exclusion.

-            !ci/scripts/recipes_local_test.py
ci/scripts/recipes_local_test.py (3)

50-52: New default image: verify availability and permissions in all environments.

Confirm this NGC path is accessible in CI and for developers (token/ACLs) and that CUDA/driver compatibility matches CI runners.


114-115: Switch to python -m pytest: LGTM.

This avoids PATH issues with pytest entrypoints in various images.


174-175: Helpful cache log: LGTM.

Nice visibility for where the pip cache lives on the host.

Comment thread .github/workflows/unit-tests-framework.yml
Comment thread ci/scripts/recipes_local_test.py
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
ci/scripts/recipes_local_test.py (1)

105-117: Good fix on unsetting PIP_CONSTRAINT; minor cleanup: export PIP_CACHE_DIR once.

This implements the prior suggestion; you can also export the cache env once to avoid repeating it for each pip call.

-        unset PIP_CONSTRAINT || true
+        unset PIP_CONSTRAINT || true
+        export PIP_CACHE_DIR=/workspace/.cache/pip

         echo "Checking for dependency files..."
         # Install dependencies based on available files
         if [ -f pyproject.toml ] || [ -f setup.py ]; then
             echo "Installing package in editable mode..."
-            PIP_CACHE_DIR=/workspace/.cache/pip pip install -e .
+            pip install -e .
             echo "Installed package as editable package"
         elif [ -f requirements.txt ]; then
             echo "Installing from requirements.txt..."
-            PIP_CACHE_DIR=/workspace/.cache/pip pip install -r requirements.txt
+            pip install -r requirements.txt
             echo "Installed from requirements.txt"
🧹 Nitpick comments (5)
ci/scripts/recipes_local_test.py (5)

26-29: Pre-create the host pip cache directory to avoid root-owned creation and flakiness.

Docker may create the source directory as root if it doesn’t exist; proactively mkdir the user cache to ensure correct perms and fewer CI surprises.

 PIP_CACHE_DIR = user_cache_dir(appname="bionemo-pip-cache", appauthor="nvidia")
+Path(PIP_CACHE_DIR).mkdir(parents=True, exist_ok=True)

43-45: Volume mount LGTM; consider SELinux compatibility.

On RHEL/Fedora runners, add :Z to the bind to handle SELinux labeling (no-op elsewhere): f"{PIP_CACHE_DIR}:/workspace/.cache/pip:Z".


51-58: Pin the base image and allow env override to reduce tag drift risk.

Expose DEFAULT_CONTAINER via env and (optionally) pin by digest in CI to ensure reproducibility.

-DEFAULT_CONTAINER = "svcbionemo023/bionemo-framework:pytorch25.06-py3-squashed"
+DEFAULT_CONTAINER = os.getenv(
+    "BIONEMO_RECIPES_IMAGE",
+    "svcbionemo023/bionemo-framework:pytorch25.06-py3-squashed"
+)

Follow-up:

  • If you have a known digest, set it in CI via BIONEMO_RECIPES_IMAGE to lock the image.

184-185: Nice visibility; optionally ensure the cache path exists here, too.

If you prefer not to mkdir at import time, add a mkdir here before kicking off runs.


71-75: Docstring mismatch and fragile input validation/order.

  • Docstring says the function returns (dir, image) tuples, but it returns a list of dirs.
  • Using assert for user input validation can be stripped with -O; raise a clear exception instead.
  • Current intersection loses the caller-provided order; preserve order for reproducible CI.

Example (outside this hunk):

# Fix docstring to: "Returns list of directory paths."
if input_dirs:
    absolute = [os.path.abspath(d) for d in input_dirs]
    missing = [d for d in absolute if d not in directories]
    if missing:
        raise FileNotFoundError(f"Input directory not found: {missing}")
    # preserve order from input_dirs
    directories = [d for d in absolute if d in directories]

Also applies to: 87-93

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 6c63f1b and 6094b76.

📒 Files selected for processing (1)
  • ci/scripts/recipes_local_test.py (3 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: changed-files
🔇 Additional comments (1)
ci/scripts/recipes_local_test.py (1)

124-124: LGTM on test invocation.

Switching to python -m pytest -v is more robust w.r.t. PATH.

Comment thread .github/workflows/unit-tests-recipes.yml
Copy link
Copy Markdown
Collaborator

@cspades cspades left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, the issue before was that we were detecting file changes in ToT instead of the merge base, which triggers unit tests for those files, but how did the tests fail?

@pstjohn pstjohn enabled auto-merge September 4, 2025 15:58
Signed-off-by: Peter St. John <pstjohn@nvidia.com>
Signed-off-by: Peter St. John <pstjohn@nvidia.com>
Signed-off-by: Peter St. John <pstjohn@nvidia.com>
Signed-off-by: Peter St. John <pstjohn@nvidia.com>
Signed-off-by: Peter St. John <pstjohn@nvidia.com>
Signed-off-by: Peter St. John <pstjohn@nvidia.com>
@pstjohn pstjohn force-pushed the pstjohn/ci-run-recipes-serially branch from 6094b76 to 92b1a44 Compare September 4, 2025 16:53
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
ci/scripts/recipes_local_test.py (1)

101-107: Unset PIP_CONSTRAINT: LGTM and matches prior feedback.

This prevents empty-value misinterpretation by pip. Good fix.

🧹 Nitpick comments (10)
ci/scripts/recipes_local_test.py (5)

26-30: Create the host cache dir and align it with the mounted path.

Right now we mkdir a different path under the repo (Line 66) but don’t ensure PIP_CACHE_DIR exists. Create it so Docker doesn’t create it as root, and consider dropping the repo-local mkdir.

Apply:

 PIP_CACHE_DIR = user_cache_dir(appname="bionemo-pip-cache", appauthor="nvidia")
+Path(PIP_CACHE_DIR).mkdir(parents=True, exist_ok=True)

Additionally (outside this hunk), remove the unused repo-local cache mkdir:

# In get_git_root(), delete this line:
# (Path(git_root) / ".cache" / "pip").mkdir(parents=True, exist_ok=True)

43-45: Propagate proxy env into the container (helps installs behind proxies).

Pass HTTP(S)_PROXY/NO_PROXY through to Docker so pip can use the runner’s proxy/cache.

 DOCKER_RUN_ARGS = [
@@
-    f"{PIP_CACHE_DIR}:/workspace/.cache/pip",
+    f"{PIP_CACHE_DIR}:/workspace/.cache/pip",
+    "-e", "HTTP_PROXY",
+    "-e", "HTTPS_PROXY",
+    "-e", "NO_PROXY",
 ]

108-116: Prefer python -m pip for reliability.

Invoking pip via the interpreter avoids PATH/shim issues across images.

-            PIP_CACHE_DIR=/workspace/.cache/pip pip install -e .
+            PIP_CACHE_DIR=/workspace/.cache/pip python -m pip install -e .
@@
-            PIP_CACHE_DIR=/workspace/.cache/pip pip install -r requirements.txt
+            PIP_CACHE_DIR=/workspace/.cache/pip python -m pip install -r requirements.txt

103-103: Harden the bash script flags.

Include -u and -o pipefail to fail fast on unset vars and pipeline errors.

-        set -e  # Exit on any error
+        set -euo pipefail  # Exit on error, unset vars are errors, and pipeline errors propagate

184-185: Nice visibility on cache path; consider logging image selection per dir.

Add an info log when a custom container is selected vs DEFAULT_CONTAINER to aid debugging.

.github/workflows/unit-tests-recipes.yml (5)

38-45: Make merge-base resolution robust to default-branch changes and ensure refs exist.

Fetch the default branch explicitly and compute merge-base against it, not a hardcoded origin/main.

-      - name: Get merge-base commit
+      - name: Get merge-base commit
         id: merge-base
         run: |
-          # Get the merge-base between current branch and main
-          MERGE_BASE=$(git merge-base HEAD origin/main)
+          set -euo pipefail
+          DEFAULT_BRANCH="${{ github.event.repository.default_branch }}"
+          # Ensure the ref exists locally
+          git fetch origin "$DEFAULT_BRANCH" --deepen=1 || git fetch origin "$DEFAULT_BRANCH" --depth=1
+          # Compute merge-base between HEAD and origin/<default-branch>
+          MERGE_BASE=$(git merge-base HEAD "origin/$DEFAULT_BRANCH")
           echo "merge-base=$MERGE_BASE" >> $GITHUB_OUTPUT
           echo "Merge-base commit: $MERGE_BASE"

66-91: Avoid brittle ls globs; use find to list dirs safely.

ls -d recipes/*/ models/*/ can fail under set -e if one path is missing. find is safer.

-          ALL_DIRS=$(ls -d recipes/*/ models/*/ 2>/dev/null | jq -R -s -c 'split("\n")[:-1] | map(rtrimstr("/"))')
+          ALL_DIRS=$(find recipes models -mindepth 1 -maxdepth 1 -type d 2>/dev/null \
+            | sed 's:/*$::' \
+            | jq -R -s -c 'split("\n")[:-1]')

117-126: Minimal Python setup for the helper script: LGTM.

Optionally cache this tiny install to avoid repeated downloads on self-hosted runners.

-      - name: Setup python
+      - name: Setup python
         uses: actions/setup-python@v5
         with:
           python-version: "3.12"
+          cache: "pip"
+          cache-dependency-path: "ci/scripts/requirements-recipes-local-test.txt"
+# (Optional) if you add a small requirements file with `platformdirs` pinned.

128-134: Dir args plumbing to the script: LGTM.

Consider guarding empty results defensively, though the job-level if already handles [].


136-151: Update wording that still mentions “matrix”.

The unit-tests job is no longer a matrix; adjust messages to avoid confusion.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 6094b76 and 92b1a44.

📒 Files selected for processing (3)
  • .github/workflows/unit-tests-framework.yml (2 hunks)
  • .github/workflows/unit-tests-recipes.yml (3 hunks)
  • ci/scripts/recipes_local_test.py (3 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • .github/workflows/unit-tests-framework.yml
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: build-bionemo-image
  • GitHub Check: changed-dirs
🔇 Additional comments (4)
ci/scripts/recipes_local_test.py (2)

124-124: Use of python -m pytest -v .: LGTM.

This is robust across environments.


51-58: Pin container image to immutable digest
Use a sha256 digest for DEFAULT_CONTAINER instead of a mutable tag and record the squash provenance. Run in an environment with Docker CLI:

docker buildx imagetools inspect svcbionemo023/bionemo-framework:pytorch25.06-py3-squashed \
  | awk -F': ' '/Digest:/ {print $2; exit}'

then update to:

DEFAULT_CONTAINER = "svcbionemo023/bionemo-framework@sha256:<digest>"

and add comments citing the source image and commit used for squashing.

.github/workflows/unit-tests-recipes.yml (2)

52-52: Changed-files now keyed to merge-base: LGTM.

Correctly scopes diffs to the PR baseline.


95-101: Expanded debug output: LGTM.

The extra context (branch, merge-base, file count) will help triage.

@pstjohn pstjohn added this pull request to the merge queue Sep 4, 2025
Merged via the queue into NVIDIA-BioNeMo:main with commit 0284fdd Sep 4, 2025
15 checks passed
@pstjohn pstjohn deleted the pstjohn/ci-run-recipes-serially branch September 4, 2025 21:37
pstjohn added a commit that referenced this pull request Sep 5, 2025
Reverts part of #1108 since the output can be very hard to parse. with
zstd compression and dockerhub caching the image pull doesn't take as
much time

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- New Features
  - None

- Tests
  - Parallelized per-recipe unit tests using containerized images.
  - Switched test execution to PyTest for each recipe.
- Added per-recipe dependency installation with automatic detection of
install method.
  - Introduced targeted/sparse checkouts and a pre-test GPU info check.

- Chores
- Refined CI to pass structured per-directory metadata to test jobs and
updated outputs/logging to reflect per-recipe images.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

Signed-off-by: Peter St. John <pstjohn@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants