Skip to content

ci: redirect uv cache to node-local TMPDIR self-hosted runners#1385

Merged
sbryngelson merged 1 commit into
MFlowCode:masterfrom
sbryngelson:uv-lock-fix
Apr 30, 2026
Merged

ci: redirect uv cache to node-local TMPDIR self-hosted runners#1385
sbryngelson merged 1 commit into
MFlowCode:masterfrom
sbryngelson:uv-lock-fix

Conversation

@sbryngelson

Copy link
Copy Markdown
Member

Summary

  • On GitHub Actions self-hosted runners, the default uv cache (~/.cache/uv) often lives on a shared NFS $HOME (e.g. OLCF /ccs/home), where uv's file-lock implementation hits os error 524 when concurrent runners on different nodes contend for the same .lock file.
  • Redirects UV_CACHE_DIR to node-local ${TMPDIR:-/tmp}/uv-cache-${USER} when running in CI, leaving non-CI users' normal reusable cache unaffected.

Test plan

  • Verify CI passes on self-hosted runners (OLCF/Frontier or similar NFS-home systems)
  • Confirm local builds are unaffected (no GITHUB_ACTIONS env var set)

@sbryngelson sbryngelson marked this pull request as ready for review April 29, 2026 19:24
@qodo-code-review

Copy link
Copy Markdown
Contributor
ⓘ You've reached your Qodo monthly free-tier limit. Reviews pause until next month — upgrade your plan to continue now, or link your paid account if you already have one.

@coderabbitai

coderabbitai Bot commented Apr 29, 2026

Copy link
Copy Markdown
Contributor
📝 Walkthrough

Walkthrough

The script toolchain/bootstrap/python.sh has been modified to configure the uv package manager's cache behavior for CI environments. When USE_UV=1, the script now exports a UV_CACHE_DIR environment variable pointing to a temporary directory on GitHub Actions self-hosted runners. The cache directory is scoped to the TMPDIR and keyed by the USER variable to isolate cache usage per runner instance. In non-CI environments, the default cache behavior remains unchanged. This configuration change consists of 8 new lines of code.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Description check ⚠️ Warning The description explains the problem, solution, and test plan clearly, but does not follow the repository's template structure with required sections like 'Type of change', 'Testing', and the checklist. Restructure the description to follow the template: add 'Type of change' section, formal 'Testing' section with how changes were tested, and complete the 'Checklist' items.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately and specifically describes the main change: redirecting the uv cache to a node-local TMPDIR on GitHub Actions self-hosted runners.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
Review rate limit: 0/1 reviews remaining, refill in 60 minutes.

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1


ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: ab0daafb-b0b3-426a-a1d0-60f40c201c53

📥 Commits

Reviewing files that changed from the base of the PR and between 00aa603 and bee5332.

📒 Files selected for processing (1)
  • toolchain/bootstrap/python.sh

Comment on lines +188 to +190
if [ "${GITHUB_ACTIONS:-}" = "true" ] && [ -w "${TMPDIR:-/tmp}" ]; then
export UV_CACHE_DIR="${TMPDIR:-/tmp}/uv-cache-${USER:-$(id -un)}"
fi

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Harden CI cache-root fallback to avoid silently reverting to shared ~/.cache/uv.

At Line 188, if TMPDIR is set but unusable, the condition fails and UV_CACHE_DIR is left unset, so CI can still hit the shared-home lock path this PR is fixing.

Suggested patch
-        if [ "${GITHUB_ACTIONS:-}" = "true" ] && [ -w "${TMPDIR:-/tmp}" ]; then
-            export UV_CACHE_DIR="${TMPDIR:-/tmp}/uv-cache-${USER:-$(id -un)}"
-        fi
+        if [ "${GITHUB_ACTIONS:-}" = "true" ]; then
+            uv_cache_root="${TMPDIR:-}"
+            if [ -z "$uv_cache_root" ] || [ ! -d "$uv_cache_root" ] || [ ! -w "$uv_cache_root" ]; then
+                uv_cache_root="/tmp"
+            fi
+            if [ -d "$uv_cache_root" ] && [ -w "$uv_cache_root" ]; then
+                export UV_CACHE_DIR="${uv_cache_root}/uv-cache-${USER:-$(id -un)}"
+            fi
+        fi

@codecov

codecov Bot commented Apr 29, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 64.76%. Comparing base (00aa603) to head (bee5332).
⚠️ Report is 1 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #1385   +/-   ##
=======================================
  Coverage   64.76%   64.76%           
=======================================
  Files          71       71           
  Lines       18713    18713           
  Branches     1549     1549           
=======================================
  Hits        12119    12119           
  Misses       5638     5638           
  Partials      956      956           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@sbryngelson sbryngelson changed the title ci: redirect uv cache to node-local TMPDIR on self-hosted runners ci: redirect uv cache to node-local TMPDIR self-hosted runners Apr 30, 2026
@sbryngelson sbryngelson merged commit 9ce86a8 into MFlowCode:master Apr 30, 2026
127 of 128 checks passed
@sbryngelson sbryngelson deleted the uv-lock-fix branch May 6, 2026 19:57
sbryngelson added a commit that referenced this pull request Jul 4, 2026
Self-hosted Frontier/Frontier-AMD matrix legs (acc/omp/cpu x shards) run
their "Fetch Dependencies" step directly on the same login node as the
same OS user, all pointed at the same UV_CACHE_DIR (introduced in #1385
to dodge NFS file-lock errors on ~/.cache/uv). uv's own cache lock
guards individual entries, but concurrent installs from separate uv
processes can still race while one extracts/prunes the shared
archive-v0 store, leaving a corrupted entry behind (e.g. a missing
dist-info METADATA file) that fails every subsequent install until the
cache is cleared by hand -- as happened on PR #1414's Frontier gpu-acc
[2/2] job.

Serialize the actual `uv pip install` call with flock so only one
process touches a given cache dir at a time, while keeping the cache
itself shared and warm across runs.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

1 participant