ci: redirect uv cache to node-local TMPDIR self-hosted runners#1385
Conversation
ⓘ You've reached your Qodo monthly free-tier limit. Reviews pause until next month — upgrade your plan to continue now, or link your paid account if you already have one. |
📝 WalkthroughWalkthroughThe script 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Review rate limit: 0/1 reviews remaining, refill in 60 minutes.Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: ab0daafb-b0b3-426a-a1d0-60f40c201c53
📒 Files selected for processing (1)
toolchain/bootstrap/python.sh
| if [ "${GITHUB_ACTIONS:-}" = "true" ] && [ -w "${TMPDIR:-/tmp}" ]; then | ||
| export UV_CACHE_DIR="${TMPDIR:-/tmp}/uv-cache-${USER:-$(id -un)}" | ||
| fi |
There was a problem hiding this comment.
Harden CI cache-root fallback to avoid silently reverting to shared ~/.cache/uv.
At Line 188, if TMPDIR is set but unusable, the condition fails and UV_CACHE_DIR is left unset, so CI can still hit the shared-home lock path this PR is fixing.
Suggested patch
- if [ "${GITHUB_ACTIONS:-}" = "true" ] && [ -w "${TMPDIR:-/tmp}" ]; then
- export UV_CACHE_DIR="${TMPDIR:-/tmp}/uv-cache-${USER:-$(id -un)}"
- fi
+ if [ "${GITHUB_ACTIONS:-}" = "true" ]; then
+ uv_cache_root="${TMPDIR:-}"
+ if [ -z "$uv_cache_root" ] || [ ! -d "$uv_cache_root" ] || [ ! -w "$uv_cache_root" ]; then
+ uv_cache_root="/tmp"
+ fi
+ if [ -d "$uv_cache_root" ] && [ -w "$uv_cache_root" ]; then
+ export UV_CACHE_DIR="${uv_cache_root}/uv-cache-${USER:-$(id -un)}"
+ fi
+ fi
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #1385 +/- ##
=======================================
Coverage 64.76% 64.76%
=======================================
Files 71 71
Lines 18713 18713
Branches 1549 1549
=======================================
Hits 12119 12119
Misses 5638 5638
Partials 956 956 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Self-hosted Frontier/Frontier-AMD matrix legs (acc/omp/cpu x shards) run their "Fetch Dependencies" step directly on the same login node as the same OS user, all pointed at the same UV_CACHE_DIR (introduced in #1385 to dodge NFS file-lock errors on ~/.cache/uv). uv's own cache lock guards individual entries, but concurrent installs from separate uv processes can still race while one extracts/prunes the shared archive-v0 store, leaving a corrupted entry behind (e.g. a missing dist-info METADATA file) that fails every subsequent install until the cache is cleared by hand -- as happened on PR #1414's Frontier gpu-acc [2/2] job. Serialize the actual `uv pip install` call with flock so only one process touches a given cache dir at a time, while keeping the cache itself shared and warm across runs.
Summary
~/.cache/uv) often lives on a shared NFS$HOME(e.g. OLCF/ccs/home), where uv's file-lock implementation hitsos error 524when concurrent runners on different nodes contend for the same.lockfile.UV_CACHE_DIRto node-local${TMPDIR:-/tmp}/uv-cache-${USER}when running in CI, leaving non-CI users' normal reusable cache unaffected.Test plan
GITHUB_ACTIONSenv var set)