fix(rf3): bind Fabric to caller-specified GPU#222
Conversation
RF3Wrapper previously ignored the device passed down from the guidance dispatcher. Lightning Fabric's default `devices=1` always resolves to GPU 0, so parallel grid-search workers on a multi-GPU box all landed on cuda:0 and serialised. Accept a `device` argument on RF3Wrapper and forward the CUDA index to RF3InferenceEngine as `devices_per_node=[idx]`, matching the Boltz and Protenix wrappers. Drop the `getattr(model_wrapper, "device", device)` fallback in guidance_script_utils since the wrapper now honours what was asked for. Closes #37
|
Warning Rate limit exceeded
Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 55 minutes and 0 seconds. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughRF3Wrapper gained an optional Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
k-chrispens
left a comment
There was a problem hiding this comment.
Looks good, would be worth adding a few extra comments though I think in case these things break longer term
| msa_manager: MSAManager | None | ||
| MSA manager for retrieving MSAs for input structures. | ||
| device: torch.device | str | None | ||
| CUDA device to bind the underlying Lightning Fabric to (e.g. ``"cuda:3"``). |
There was a problem hiding this comment.
To make sure I understood this, I went searching in the foundry repo and found this line: https://github.com/RosettaCommons/foundry/blob/b071919caa19ff334bc04b1b41145cac61eba819/src/foundry/trainers/fabric.py#L92
Would probably be worth referencing this here for posterity
| ------ | ||
| ValueError | ||
| If ``device`` is not a CUDA device. RF3's Fabric-backed accelerator | ||
| only supports GPU execution. |
There was a problem hiding this comment.
it does appear that the latest foundry does support MPS, but my fork (which this repo depends on) does not and in general sampleworks doesn't support non-CUDA systems right now. I think it'd be worth clarifying that here
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/sampleworks/models/rf3/wrapper.py`:
- Around line 227-236: Update the docstring for the device parameter in
src/sampleworks/models/rf3/wrapper.py to accurately state that passing
device=None resolves to GPU 0 (Fabric's default behavior) rather than implying
it selects a load-balanced "first available" device; edit the paragraph
referencing "When ``None``, Fabric picks the first available device" to
explicitly say "When ``None``, Fabric's default (devices=1) binds to GPU 0" and
keep the existing references to Lightning Fabric and the related foundry link so
readers see the authoritative sources.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: ee8984a5-6808-4406-8085-25d3e0951aea
📒 Files selected for processing (1)
src/sampleworks/models/rf3/wrapper.py
Closes #37.
RF3Wrapper didn't accept a
deviceargument, so the grid-search dispatcher'scuda:Nassignment was silently dropped. Lightning Fabric's defaultdevices=1always resolves to GPU 0, which meant eight parallel workers all piled ontocuda:0.The wrapper now takes a
deviceand forwards the CUDA index to Fabric asdevices_per_node=[idx], matching the Boltz and Protenix wrappers.Reproduction & verification
Two RF3 wrappers launched in parallel inside the same container (different processes, no
CUDA_VISIBLE_DEVICEShacks):Before — both pinned to the same GPU:
After (passing
device="cuda:0"anddevice="cuda:3"):Unit + regression tests:
Summary by CodeRabbit
New Features
Bug Fixes
Tests