-
Notifications
You must be signed in to change notification settings - Fork 122
Updated template to not request GPU resources on CPU builds #1055
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. WalkthroughMultiple Mako templates now use a template-level boolean Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Renderer as Template renderer
participant Helpers as include/helpers.mako
participant Template as .mako template
participant SBATCH as SBATCH / module loader
rect `#E8F4FF`
Note over Renderer,Helpers: render start — helpers compute GPU state
Renderer->>Helpers: import gpuConfigOptions\ncompute is_gpu_active = (gpu != NONE)
end
Renderer->>Template: render template (context includes is_gpu_active)
alt is_gpu_active
Template->>SBATCH: emit GPU SBATCH directives\ninvoke mfc.sh load with 'g'
else not is_gpu_active
Template->>SBATCH: omit GPU SBATCH directives\ninvoke mfc.sh load with 'c'
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
✅ Files skipped from review due to trivial changes (1)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (14)
Tip 📝 Customizable high-level summaries are now available in beta!You can now customize how CodeRabbit generates the high-level summary in your pull requests — including its content, structure, tone, and formatting.
Example instruction:
Note: This feature is currently in beta for Pro-tier users, and pricing will be announced later. Comment |
|
@prathi-wind does this need to be a draft or is it ready to merge? |
PR Reviewer Guide 🔍Here are some key observations to aid the review process:
|
It is ready to merge. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
High-level Suggestion
The logic for checking GPU usage is duplicated across 14 template files. This should be centralized into a single helper function or variable to improve maintainability and consistency. [High-level, importance: 8]
Solution Walkthrough:
Before:
// In toolchain/templates/bridges2.mako
<%! from mfc.state import gpuConfigOptions %>
...
% if gpu != gpuConfigOptions.NONE.value:
#SBATCH --gres=gpu:v100-16:${tasks_per_node}
% endif
// In toolchain/templates/delta.mako
<%! from mfc.state import gpuConfigOptions %>
...
% if gpu != gpuConfigOptions.NONE.value:
#SBATCH --gpus-per-node=${tasks_per_node}
% endif
// ... and so on for many other template files
After:
// In a central Python file that renders templates:
from mfc.state import gpuConfigOptions
...
is_gpu_active = (gpu_mode != gpuConfigOptions.NONE.value)
template.render(..., is_gpu_active=is_gpu_active)
// In toolchain/templates/bridges2.mako (and all others)
// No import or complex check needed
...
% if is_gpu_active:
#SBATCH --gres=gpu:v100-16:${tasks_per_node}
% endif
// In toolchain/templates/delta.mako (and all others)
...
% if is_gpu_active:
#SBATCH --gpus-per-node=${tasks_per_node}
% endif
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the final PR Bugbot will review for you during this billing cycle
Your free Bugbot reviews will reset on December 8
Details
You are on the Bugbot Free tier. On this plan, Bugbot will review limited PRs each billing cycle.
To receive Bugbot reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial.
Bug: GPU load command uses stale boolean check
The line . ./mfc.sh load -c cc -m ${'g' if gpu else 'c'} still uses the old boolean check for gpu. Since gpu is now a string ('no', 'acc', or 'mp'), the string 'no' is truthy and will incorrectly load GPU modules on CPU builds. This should use gpu != gpuConfigOptions.NONE.value to match the pattern in other fixes.
toolchain/templates/carpenter-cray.mako#L30-L31
MFC/toolchain/templates/carpenter-cray.mako
Lines 30 to 31 in 985a861
| . ./mfc.sh load -c cc -m ${'g' if gpu else 'c'} | |
| cd - > /dev/null |
toolchain/templates/frontier.mako
Outdated
| echo | ||
|
|
||
| % if gpu: | ||
| % if gpu != gpuConfigOptions.NONE.value: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: GPU load command uses stale boolean check
The expression . ./mfc.sh load -c f -m ${'g' if gpu else 'c'} still uses the old boolean check for gpu. Since gpu is now a string ('no', 'acc', or 'mp'), the string 'no' evaluates to truthy, causing CPU builds to incorrectly load GPU modules. Should be updated to ${'g' if gpu != gpuConfigOptions.NONE.value else 'c'} to match other fixed conditions in this PR.
|
some ai reviews that seem reasonable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
toolchain/templates/frontier.mako
Outdated
| % if gpu != gpuConfigOptions.NONE.value: | ||
| export MPICH_GPU_SUPPORT_ENABLED=1 | ||
| % else: | ||
| export MPICH_GPU_SUPPORT_ENABLED=0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update remaining GPU conditionals to handle enum values
Changing the job header to gpu != gpuConfigOptions.NONE.value avoids requesting GPUs via SLURM, but other logic in this template still relies on if gpu (e.g. the module load line . ./mfc.sh load -c f -m ${'g' if gpu else 'c'}). Because gpu is now a non‑empty string ('no', 'acc', etc.), those expressions are always truthy and CPU builds will still load GPU modules and pass GPU runtime flags. This leaves CPU jobs misconfigured despite the header change. Please apply the same explicit comparison for the module selection here and in the other templates touched by this commit (bridges2.mako, carpenter*.mako, default.mako, hipergator.mako, summit.mako, etc.).
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (3)
toolchain/templates/carpenter.mako (1)
4-5: Unused import can be removed.The
gpuConfigOptionsimport is not used in this template since it doesn't contain GPU-related conditional logic. Consider removing it to keep the imports clean.Apply this diff to remove the unused import:
-<%! from mfc.state import gpuConfigOptions %> -toolchain/templates/default.mako (1)
5-6: Unused import can be removed.The
gpuConfigOptionsimport is not used in this template. While line 53 references thegpuvariable, it doesn't usegpuConfigOptions.NONE.valuefor comparison. Consider removing the unused import.Apply this diff to remove the unused import:
-<%! from mfc.state import gpuConfigOptions %> -toolchain/templates/carpenter-cray.mako (1)
4-5: Unused import can be removed.The
gpuConfigOptionsimport is not used in this template since it doesn't contain GPU-related conditional logic.Apply this diff to remove the unused import:
-<%! from mfc.state import gpuConfigOptions %> -
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (14)
toolchain/templates/bridges2.mako(2 hunks)toolchain/templates/carpenter-cray.mako(1 hunks)toolchain/templates/carpenter.mako(1 hunks)toolchain/templates/default.mako(1 hunks)toolchain/templates/delta.mako(2 hunks)toolchain/templates/deltaai.mako(2 hunks)toolchain/templates/frontier.mako(4 hunks)toolchain/templates/hipergator.mako(2 hunks)toolchain/templates/nautilus.mako(2 hunks)toolchain/templates/oscar.mako(2 hunks)toolchain/templates/phoenix-bench.mako(2 hunks)toolchain/templates/phoenix.mako(2 hunks)toolchain/templates/santis.mako(2 hunks)toolchain/templates/summit.mako(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: cubic · AI code reviewer
- GitHub Check: Cursor Bugbot
🔇 Additional comments (22)
toolchain/templates/nautilus.mako (2)
4-4: LGTM: Import enables explicit GPU configuration check.The import is correctly used at line 18 to gate GPU resource requests.
18-21: The code change is correct and verified. Approval is appropriate.The template properly imports
gpuConfigOptionsat line 4, and the explicit comparisongpu != gpuConfigOptions.NONE.valuecorrectly gates GPU resource requests. Whengpuis'no'(the NONE value), the GPU directives are skipped; otherwise they're included. This pattern is consistent across all compute cluster templates in the codebase.toolchain/templates/delta.mako (2)
4-4: LGTM: Import enables explicit GPU configuration check.
18-22: LGTM: Explicit NONE check prevents unintended GPU requests.The change correctly gates GPU-specific SBATCH directives (gpus-per-node, mem, gpu-bind) to only activate when a real GPU configuration is selected.
toolchain/templates/bridges2.mako (2)
4-4: LGTM: Import enables explicit GPU configuration check.
18-21: LGTM: Explicit NONE check prevents unintended GPU requests.The condition correctly prevents GPU resource allocation on CPU builds.
toolchain/templates/phoenix.mako (2)
4-4: LGTM: Import enables explicit GPU configuration check.
21-24: LGTM: Explicit NONE check prevents unintended GPU requests.The condition correctly gates GPU resource directives for the Phoenix cluster.
Note: Line 23 has a trailing backslash after
16G\which may be unintentional but appears to be pre-existing.toolchain/templates/phoenix-bench.mako (2)
4-4: LGTM: Import enables explicit GPU configuration check.
21-24: LGTM: Explicit NONE check prevents unintended GPU requests.The condition correctly gates GPU resource directives. Same trailing backslash note as in phoenix.mako.
toolchain/templates/deltaai.mako (2)
4-4: LGTM: Import added for GPU configuration.The import of
gpuConfigOptionsenables explicit GPU gating in this template.
18-22: LGTM: GPU resources correctly gated on non-NONE configuration.The explicit comparison against
gpuConfigOptions.NONE.valueensures SBATCH GPU directives are only emitted when GPU is actually configured, preventing CPU builds from incorrectly requesting GPU resources.toolchain/templates/oscar.mako (2)
4-4: LGTM: Import added for GPU configuration.The import enables explicit GPU gating in this template.
18-22: LGTM: GPU resources correctly gated.The explicit comparison ensures GPU directives are only emitted for non-NONE GPU configurations, preventing unintended GPU resource requests.
toolchain/templates/santis.mako (2)
4-4: LGTM: Import added for GPU configuration.
78-80: LGTM: GPU flag correctly gated in srun command.The explicit comparison ensures
--gpus-per-taskis only passed to srun when GPU is actually configured, preventing CPU builds from requesting GPU resources.toolchain/templates/frontier.mako (4)
4-4: LGTM: Import added for GPU configuration.
14-17: LGTM: GPU resources correctly gated in batch mode.The explicit comparison ensures SBATCH GPU directives are only emitted for non-NONE GPU configurations.
43-47: LGTM: MPICH_GPU_SUPPORT_ENABLED explicitly controlled.Setting
MPICH_GPU_SUPPORT_ENABLEDbased on GPU configuration is good practice and ensures MPI behavior aligns with resource allocation.
70-72: LGTM: GPU flags correctly gated in interactive mode.toolchain/templates/hipergator.mako (2)
4-4: LGTM: Import added for GPU configuration.
13-16: LGTM: GPU resources correctly gated.The explicit comparison against
gpuConfigOptions.NONE.valueensures GPU directives are only emitted when GPU is configured, fixing the issue where CPU builds incorrectly requested GPU resources.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR fixes a bug where CPU builds incorrectly requested GPU resources on SLURM/PBS batch systems. The issue arose after OpenMP support changed the state variable's representation - the gpu variable now holds string values ('no', 'acc', or 'mp') instead of boolean values. Since any non-empty string is truthy in Python/Mako templates, the previous % if gpu: conditionals were evaluating to True even for CPU builds (gpu = 'no'), causing SLURM/PBS to request GPU resources unnecessarily.
The fix updates all batch scheduler templates to properly check gpu != gpuConfigOptions.NONE.value instead of relying on truthiness, ensuring GPU resources are only requested when actually needed.
- Updated conditional checks from
% if gpu:to% if gpu != gpuConfigOptions.NONE.value:across all affected templates - Added
from mfc.state import gpuConfigOptionsimport to access the enum value - Applied changes consistently across 13 batch scheduler template files
Reviewed Changes
Copilot reviewed 14 out of 14 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| toolchain/templates/summit.mako | Added gpuConfigOptions import to support future GPU-related conditionals (no direct GPU resource requests in this template) |
| toolchain/templates/santis.mako | Fixed conditional to prevent GPU resource requests on CPU builds |
| toolchain/templates/phoenix.mako | Fixed conditional to prevent GPU resource requests on CPU builds |
| toolchain/templates/phoenix-bench.mako | Fixed conditional to prevent GPU resource requests on CPU builds |
| toolchain/templates/oscar.mako | Fixed conditional to prevent GPU resource requests on CPU builds |
| toolchain/templates/nautilus.mako | Fixed conditional to prevent GPU resource requests on CPU builds |
| toolchain/templates/hipergator.mako | Fixed conditional to prevent GPU resource requests on CPU builds |
| toolchain/templates/frontier.mako | Fixed conditionals for both SBATCH directives and runtime environment variables to prevent GPU requests on CPU builds |
| toolchain/templates/deltaai.mako | Fixed conditional to prevent GPU resource requests on CPU builds |
| toolchain/templates/delta.mako | Fixed conditional to prevent GPU resource requests on CPU builds |
| toolchain/templates/default.mako | Added gpuConfigOptions import to support future GPU-related conditionals |
| toolchain/templates/carpenter.mako | Added gpuConfigOptions import to support future GPU-related conditionals |
| toolchain/templates/carpenter-cray.mako | Added gpuConfigOptions import to support future GPU-related conditionals |
| toolchain/templates/bridges2.mako | Fixed conditional to prevent GPU resource requests on CPU builds |
| <%namespace name="helpers" file="helpers.mako"/> | ||
| <%! from mfc.state import gpuConfigOptions %> | ||
|
|
||
|
|
Copilot
AI
Nov 19, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] Inconsistent blank line spacing after the import statement. This file has two blank lines after <%! from mfc.state import gpuConfigOptions %> (lines 5-6), while other template files like phoenix.mako, frontier.mako, etc. have only one blank line. For consistency, remove one of the blank lines.
| <%namespace name="helpers" file="helpers.mako"/> | ||
| <%! from mfc.state import gpuConfigOptions %> | ||
|
|
||
|
|
Copilot
AI
Nov 19, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] Inconsistent blank line spacing after the import statement. This file has two blank lines after <%! from mfc.state import gpuConfigOptions %> (lines 5-6), while other template files like phoenix.mako, frontier.mako, etc. have only one blank line. For consistency, remove one of the blank lines.
toolchain/templates/default.mako
Outdated
| %> | ||
| <%! from mfc.state import gpuConfigOptions %> |
Copilot
AI
Nov 19, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] The import statement is placed in a separate <%! ... %> block from the existing imports (lines 1-4). For consistency and cleaner code organization, consider consolidating all module-level imports into a single <%! ... %> block:
<%!
import os
from mako.exceptions import RuntimeException
from mfc.state import gpuConfigOptions
%>
| %> | |
| <%! from mfc.state import gpuConfigOptions %> | |
| from mfc.state import gpuConfigOptions | |
| %> |
|
Actually, I realized I missed a few things I need to update. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2 issues found across 14 files
Prompt for AI agents (all 2 issues)
Understand the root cause of the following 2 issues and fix them.
<file name="toolchain/templates/delta.mako">
<violation number="1" location="toolchain/templates/delta.mako:18">
This condition still allocates GPUs for OpenMP-only builds; it should only request GPUs when the configuration is explicitly the accelerator option.</violation>
</file>
<file name="toolchain/templates/oscar.mako">
<violation number="1" location="toolchain/templates/oscar.mako:18">
This condition still requests GPUs for OpenMP CPU builds (gpu == gpuConfigOptions.MP.value); it should only request GPUs for the ACC (actual GPU) mode to avoid consuming GPU resources on CPU jobs.</violation>
</file>
Since this is your first cubic review, here's how it works:
- cubic automatically reviews your code and comments on bugs and improvements
- Teach cubic by replying to its comments. cubic learns from your replies and gets better over time
- Ask questions if you need clarification on any suggestion
Reply to cubic to teach it or ask questions. Re-run a review with @cubic-dev-ai review this PR
| #SBATCH --account="${account}" | ||
| % endif | ||
| % if gpu: | ||
| % if gpu != gpuConfigOptions.NONE.value: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This condition still allocates GPUs for OpenMP-only builds; it should only request GPUs when the configuration is explicitly the accelerator option.
Prompt for AI agents
Address the following comment on toolchain/templates/delta.mako at line 18:
<comment>This condition still allocates GPUs for OpenMP-only builds; it should only request GPUs when the configuration is explicitly the accelerator option.</comment>
<file context>
@@ -14,7 +15,7 @@
#SBATCH --account="${account}"
% endif
-% if gpu:
+% if gpu != gpuConfigOptions.NONE.value:
#SBATCH --gpus-per-node=${tasks_per_node}
#SBATCH --mem=208G
</file context>
| % if gpu != gpuConfigOptions.NONE.value: | |
| % if gpu == gpuConfigOptions.ACC.value: |
toolchain/templates/oscar.mako
Outdated
| #SBATCH --account="${account}" | ||
| % endif | ||
| % if gpu: | ||
| % if gpu != gpuConfigOptions.NONE.value: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This condition still requests GPUs for OpenMP CPU builds (gpu == gpuConfigOptions.MP.value); it should only request GPUs for the ACC (actual GPU) mode to avoid consuming GPU resources on CPU jobs.
Prompt for AI agents
Address the following comment on toolchain/templates/oscar.mako at line 18:
<comment>This condition still requests GPUs for OpenMP CPU builds (gpu == gpuConfigOptions.MP.value); it should only request GPUs for the ACC (actual GPU) mode to avoid consuming GPU resources on CPU jobs.</comment>
<file context>
@@ -14,7 +15,7 @@
#SBATCH --account="${account}"
% endif
-% if gpu:
+% if gpu != gpuConfigOptions.NONE.value:
#SBATCH --gpus-per-node=${tasks_per_node}
#SBATCH --mem=64G
</file context>
| % if gpu != gpuConfigOptions.NONE.value: | |
| % if gpu == gpuConfigOptions.ACC.value: |
|
CodeAnt AI is reviewing your PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
toolchain/templates/oscar.mako (1)
4-4: GPU gating now centralized viais_gpu_active, but please re-confirm its semantics for CPU/OpenMP buildsUsing
is_gpu_activehere to guard--gpus-per-node,--mem,--gpu-bind, and the-m g/cswitch inmfc.shis consistent and keeps the site template itself simple. However, becauseis_gpu_activeis currently defined asgpu != gpuConfigOptions.NONE.valueinhelpers.mako, this still treats every non‑NONEGPU mode as “GPU active”.If your
gpustate includes non‑GPU CPU‑only modes (e.g., an OpenMP/MP option distinct from the real GPU/ACC option), this condition will continue to request GPUs for those CPU‑only builds—the same behavior this PR is trying to eliminate. In that case,is_gpu_activeshould instead be tied explicitly to the actual GPU mode(s) (e.g.,gpu == gpuConfigOptions.ACC.valueor a small set of GPU modes), or delegated to a helper inmfc.stateso the mapping lives in one place.Please double‑check the
gpuenum andis_gpu_activeimplementation against your intended state model and confirm that CPU/OpenMP configurations no longer trigger these GPU directives on Oscar.Also applies to: 18-22, 36-36
🧹 Nitpick comments (1)
toolchain/templates/include/helpers.mako (1)
3-3: Clarify and possibly tightenis_gpu_activeso it truly means “GPU hardware in use”Centralizing GPU gating as:
<% is_gpu_active = (gpu != gpuConfigOptions.NONE.value) %>is a good step, but a few details are worth double‑checking:
Semantic scope: If
gpuis an enum with more than two states (e.g.,NONE,MPfor OpenMP‑only CPU builds,ACCfor actual GPU builds), thengpu != gpuConfigOptions.NONE.valuewill be true for all non‑NONEmodes, including CPU‑only ones. That would meanis_gpu_activeis still true for OpenMP CPU builds, and all the site templates that rely on it would continue to request GPUs for those jobs. If that’s not intended, defineis_gpu_activein terms of the specific GPU mode(s), e.g., equality against the GPU enum member(s) or via a small Python helper inmfc.state(so the mapping between state and “uses GPUs” lives in one place).Context dependencies: This expression assumes both
gpuandgpuConfigOptionsare always present and well‑typed in the template context for every engine/site that pulls inhelpers.mako. Please verify that all render paths set these consistently; otherwise this comparison could behave unexpectedly or fail at render time.Given that every template in this PR now keys its SBATCH GPU options and module mode selection off
is_gpu_active, it’s worth nailing down this definition so it exactly matches your conceptual “GPU enabled” state.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (15)
toolchain/templates/bridges2.mako(3 hunks)toolchain/templates/carpenter-cray.mako(2 hunks)toolchain/templates/carpenter.mako(2 hunks)toolchain/templates/default.mako(2 hunks)toolchain/templates/delta.mako(3 hunks)toolchain/templates/deltaai.mako(3 hunks)toolchain/templates/frontier.mako(4 hunks)toolchain/templates/hipergator.mako(3 hunks)toolchain/templates/include/helpers.mako(1 hunks)toolchain/templates/nautilus.mako(3 hunks)toolchain/templates/oscar.mako(3 hunks)toolchain/templates/phoenix-bench.mako(3 hunks)toolchain/templates/phoenix.mako(3 hunks)toolchain/templates/santis.mako(3 hunks)toolchain/templates/summit.mako(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (9)
- toolchain/templates/carpenter.mako
- toolchain/templates/bridges2.mako
- toolchain/templates/summit.mako
- toolchain/templates/phoenix.mako
- toolchain/templates/frontier.mako
- toolchain/templates/carpenter-cray.mako
- toolchain/templates/delta.mako
- toolchain/templates/deltaai.mako
- toolchain/templates/santis.mako
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: cubic · AI code reviewer
- GitHub Check: Build & Publish
🔇 Additional comments (6)
toolchain/templates/default.mako (1)
5-5:jsrunGPU count wiring viais_gpu_activelooks correct; just ensure the helper matches your GPU enumSwitching
--gpu_per_rsto${1 if is_gpu_active else 0}is the right pattern: the launcher now cleanly depends on a single boolean that encodes “this run actually wants GPUs”.Given that
is_gpu_activeis defined inhelpers.makoasgpu != gpuConfigOptions.NONE.value, please confirm that every non‑NONEvalue ofgputruly corresponds to a GPU‑using configuration. If there are CPU‑only modes encoded ingpu(e.g., OpenMP), you may still end up with--gpu_per_rs 1for CPU jobs unlessis_gpu_activeis tightened to check only the real GPU mode(s).Also applies to: 53-53
toolchain/templates/nautilus.mako (1)
4-4: Nautilus GPU directives correctly gated onis_gpu_active; behavior depends on helper definitionOn Nautilus, the SBATCH GPU options (
--gpu-bind,--gres=gpu:...) and themfc.sh load -m g/cswitch are now consistently guarded byis_gpu_active, which is exactly what you want at the template level.Just as with the other templates, please verify that
is_gpu_activeinhelpers.makoaccurately captures “this job actually uses GPUs” for all combinations of your build options. Ifgpuencodes CPU‑only modes as non‑NONE, consider tighteningis_gpu_activeto match only the actual GPU mode(s) to avoid continuing to request V100 resources for CPU/OpenMP cases on Nautilus.Also applies to: 18-21, 35-35
toolchain/templates/phoenix-bench.mako (1)
4-4: Phoenix-bench GPU resource flags are cleanly gated onis_gpu_activeHere too, the SBATCH GPU options (
--gres=gpu:V100:...,--mem-per-gpu) and themfc.sh load -m g/cmode now consistently depend onis_gpu_active, which is the right abstraction for the template.As with the other templates, please confirm that
is_gpu_active(currentlygpu != gpuConfigOptions.NONE.valueinhelpers.mako) is false for all CPU‑only Phoenix bench configurations, including any OpenMP‑only modes, so these jobs don’t continue to consume V100 GPUs unintentionally.Also applies to: 21-24, 35-35
toolchain/templates/hipergator.mako (3)
13-16: GPU directive gating looks correct.The change from
gputois_gpu_activeproperly gates the GPU-specific SBATCH directives, which should resolve the issue where CPU builds incorrectly requested GPU resources. Ensure thatis_gpu_activecorrectly evaluates toFalsefor CPU builds.The verification script above (for line 4) will confirm that
is_gpu_activeis properly defined in helpers.mako.
39-39: LGTM! Consistent with GPU directive gating.The change from
gputois_gpu_activein the module load command is consistent with the conditional on line 13, ensuring CPU builds correctly load CPU mode ('c') instead of GPU mode ('g').
4-4: I need to check whether helpers.mako also imports gpuConfigOptions:Now I need to check more thoroughly how the templates are structured and whether gpuConfigOptions is passed as context:
Based on my analysis of the script output and Mako template documentation, I can now provide the rewritten review comment:
Remove the unused import or verify if it's necessary for context passing.
The import
<%! from mfc.state import gpuConfigOptions %>on line 4 does not appear to be directly used in this file—onlyis_gpu_activeis referenced (lines 13, 39). Code within <%! %> tags is executed at the module level of the template, and not within the rendering function of the template. Each template module maintains its own module-level imports independently. Sinceis_gpu_activeis defined in helpers.mako usinggpuConfigOptions, verify whether:
- helpers.mako also imports
gpuConfigOptionsindependently, or- The import here is needed to pass
gpuConfigOptionsthrough the context to helpers.makoIf the import is truly unused after verification, it should be removed as dead code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (1)
toolchain/templates/summit.mako (1)
32-35: Inconsistent GPU conditionals remain in jsrun flags.While line 20 correctly uses
is_gpu_activefor module loading, lines 32 and 35 still use truthiness checks (if gpu) for the jsrun GPU flags. This is inconsistent with the PR objective and the pattern used in other templates. Thegpuvariable could have a non-NONE truthy value (e.g.,'no') that would incorrectly trigger GPU flags even for CPU-only builds.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (15)
toolchain/templates/bridges2.mako(2 hunks)toolchain/templates/carpenter-cray.mako(1 hunks)toolchain/templates/carpenter.mako(1 hunks)toolchain/templates/default.mako(2 hunks)toolchain/templates/delta.mako(2 hunks)toolchain/templates/deltaai.mako(2 hunks)toolchain/templates/frontier.mako(3 hunks)toolchain/templates/hipergator.mako(2 hunks)toolchain/templates/include/helpers.mako(1 hunks)toolchain/templates/nautilus.mako(2 hunks)toolchain/templates/oscar.mako(2 hunks)toolchain/templates/phoenix-bench.mako(2 hunks)toolchain/templates/phoenix.mako(2 hunks)toolchain/templates/santis.mako(2 hunks)toolchain/templates/summit.mako(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (9)
- toolchain/templates/include/helpers.mako
- toolchain/templates/carpenter.mako
- toolchain/templates/santis.mako
- toolchain/templates/delta.mako
- toolchain/templates/oscar.mako
- toolchain/templates/nautilus.mako
- toolchain/templates/phoenix-bench.mako
- toolchain/templates/carpenter-cray.mako
- toolchain/templates/default.mako
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: cubic · AI code reviewer
- GitHub Check: Build & Publish
🔇 Additional comments (13)
toolchain/templates/frontier.mako (5)
13-16: LGTM: SBATCH GPU directives now properly gated.The GPU resource requests are now correctly conditional on
is_gpu_active, preventing CPU builds from requesting GPU resources via SLURM.
37-37: Past review issue resolved: Module loading now usesis_gpu_active.The module loading command now correctly uses
is_gpu_activeinstead of the booleangpucheck, ensuring CPU builds load CPU modules. This addresses the issues flagged in previous reviews.
42-46: LGTM: MPICH GPU support flag correctly configured.The
MPICH_GPU_SUPPORT_ENABLEDenvironment variable is now properly set based onis_gpu_active, ensuring it matches the actual GPU configuration.
69-71: LGTM: Interactive srun GPU flags properly gated.The GPU flags for interactive execution are correctly conditional on
is_gpu_active, maintaining consistency with the batch execution path.
3-3: Verification confirmed:is_gpu_activeis correctly defined in helpers.mako.The definition in
toolchain/templates/include/helpers.mako(line 5) correctly setsis_gpu_active = (gpu != gpuConfigOptions.NONE.value), matching the intended logic across all template changes.toolchain/templates/summit.mako (1)
20-20: Module load update looks correct.The change from
if gputoif is_gpu_activecorrectly aligns with the PR objective to prevent CPU builds from incorrectly requesting GPU resources.toolchain/templates/bridges2.mako (2)
17-20: GPU SBATCH directives correctly updated.The condition change from
if gputoif is_gpu_activecorrectly ensures GPU resources are only requested when GPU is explicitly configured, addressing the PR's objective.
34-34: Module load update looks correct.The change to use
is_gpu_activefor module selection aligns with the updated SBATCH directives and PR objective.toolchain/templates/deltaai.mako (2)
17-21: GPU SBATCH directives correctly updated.The condition change from
if gputoif is_gpu_activeproperly gates all GPU-related SBATCH options, preventing CPU builds from incorrectly requesting GPU resources.
35-35: Module load update looks correct.The change to use
is_gpu_activefor mode selection ('g' vs 'c') aligns with the updated SBATCH directives.toolchain/templates/hipergator.mako (2)
12-15: GPU SBATCH directives correctly updated.The condition change from
if gputoif is_gpu_activeproperly gates GPU resource requests, addressing the PR's objective.
38-38: Module load update looks correct.The change to use
is_gpu_activefor module selection aligns with the updated SBATCH directives.toolchain/templates/phoenix.mako (1)
34-34: Module load update looks correct.The change to use
is_gpu_activefor module selection aligns with the updated SBATCH directives.
| % if is_gpu_active: | ||
| #SBATCH --gres=gpu:V100:${tasks_per_node} | ||
| #SBATCH --mem-per-gpu=16G\ | ||
| % endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GPU SBATCH directives updated, but trailing backslash needs removal.
The condition change from if gpu to if is_gpu_active is correct. However, line 22 has a trailing backslash (16G\) with no line continuation, which could cause issues with SLURM directive parsing.
Apply this diff to remove the trailing backslash:
-#SBATCH --mem-per-gpu=16G\
+#SBATCH --mem-per-gpu=16G📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| % if is_gpu_active: | |
| #SBATCH --gres=gpu:V100:${tasks_per_node} | |
| #SBATCH --mem-per-gpu=16G\ | |
| % endif | |
| % if is_gpu_active: | |
| #SBATCH --gres=gpu:V100:${tasks_per_node} | |
| #SBATCH --mem-per-gpu=16G | |
| % endif |
🤖 Prompt for AI Agents
In toolchain/templates/phoenix.mako around lines 20 to 23, the GPU SBATCH block
contains a trailing backslash on the --mem-per-gpu directive ("16G\") which
should be removed; edit that line to remove the backslash so it reads a normal
SLURM directive (e.g., "--mem-per-gpu=16G") and ensure no unintended line
continuation characters remain in the SBATCH directives.
Pull Request Feedback 🔍
|
| #SBATCH --account="${account}" | ||
| % endif | ||
| % if gpu: | ||
| % if is_gpu_active: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion: Use a guarded lookup such as globals().get('is_gpu_active', False) for the GPU block so that rendering still works even if only the old gpu flag is provided. [possible bug]
| #SBATCH --account="${account}" | ||
| % endif | ||
| % if gpu: | ||
| % if is_gpu_active: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion: Guard the SBATCH GPU directives with a context lookup so missing is_gpu_active values don't crash the template. [possible bug]
|
CodeAnt AI finished reviewing your PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
3 issues found across 15 files (reviewed changes from recent commits).
Prompt for AI agents (all 3 issues)
Understand the root cause of the following 3 issues and fix them.
<file name="toolchain/templates/delta.mako">
<violation number="1" location="toolchain/templates/delta.mako:35">
`gpuConfigOptions` is referenced here but the template no longer imports it, so rendering the delta SLURM template now fails with a NameError instead of launching jobs. Reintroduce the state import before using the enum value.</violation>
</file>
<file name="toolchain/templates/include/helpers.mako">
<violation number="1" location="toolchain/templates/include/helpers.mako:2">
The newly added `from mfc.state import gpuConfigOptions` line is indented even though it is top-level Python inside `<%! ... %>`, so the template module fails to compile with `IndentationError`, breaking every template that includes helpers.mako.</violation>
</file>
<file name="toolchain/templates/phoenix.mako">
<violation number="1" location="toolchain/templates/phoenix.mako:20">
`is_gpu_active` is referenced here, but the phoenix template never defines or imports that name (it only exists inside the helpers namespace), so rendering this template will raise `NameError` before the batch script is produced.</violation>
</file>
Reply to cubic to teach it or ask questions. Re-run a review with @cubic-dev-ai review this PR
| ok ":) Loading modules:\n" | ||
| cd "${MFC_ROOT_DIR}" | ||
| . ./mfc.sh load -c d -m ${'g' if gpu else 'c'} | ||
| . ./mfc.sh load -c d -m ${'g' if gpu != gpuConfigOptions.NONE.value else 'c'} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gpuConfigOptions is referenced here but the template no longer imports it, so rendering the delta SLURM template now fails with a NameError instead of launching jobs. Reintroduce the state import before using the enum value.
Prompt for AI agents
Address the following comment on toolchain/templates/delta.mako at line 35:
<comment>`gpuConfigOptions` is referenced here but the template no longer imports it, so rendering the delta SLURM template now fails with a NameError instead of launching jobs. Reintroduce the state import before using the enum value.</comment>
<file context>
@@ -33,7 +32,7 @@ ${helpers.template_prologue()}
ok ":) Loading modules:\n"
cd "${MFC_ROOT_DIR}"
-. ./mfc.sh load -c d -m ${'g' if gpu else 'c'}
+. ./mfc.sh load -c d -m ${'g' if gpu != gpuConfigOptions.NONE.value else 'c'}
cd - > /dev/null
echo
</file context>
✅ Addressed in 3503f95
| @@ -1,4 +1,8 @@ | |||
| <%! import os %> | |||
| <%! import os | |||
| from mfc.state import gpuConfigOptions | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The newly added from mfc.state import gpuConfigOptions line is indented even though it is top-level Python inside <%! ... %>, so the template module fails to compile with IndentationError, breaking every template that includes helpers.mako.
Prompt for AI agents
Address the following comment on toolchain/templates/include/helpers.mako at line 2:
<comment>The newly added `from mfc.state import gpuConfigOptions` line is indented even though it is top-level Python inside `<%! ... %>`, so the template module fails to compile with `IndentationError`, breaking every template that includes helpers.mako.</comment>
<file context>
@@ -1,4 +1,8 @@
-<%! import os %>
+<%! import os
+ from mfc.state import gpuConfigOptions
+%>
+
</file context>
| from mfc.state import gpuConfigOptions | |
| from mfc.state import gpuConfigOptions |
✅ Addressed in 214ee17
| #SBATCH --qos=${quality_of_service} | ||
| % endif | ||
| % if gpu: | ||
| % if is_gpu_active: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is_gpu_active is referenced here, but the phoenix template never defines or imports that name (it only exists inside the helpers namespace), so rendering this template will raise NameError before the batch script is produced.
Prompt for AI agents
Address the following comment on toolchain/templates/phoenix.mako at line 20:
<comment>`is_gpu_active` is referenced here, but the phoenix template never defines or imports that name (it only exists inside the helpers namespace), so rendering this template will raise `NameError` before the batch script is produced.</comment>
<file context>
@@ -18,7 +17,7 @@
#SBATCH --qos=${quality_of_service}
% endif
-% if gpu != gpuConfigOptions.NONE.value:
+% if is_gpu_active:
#SBATCH --gres=gpu:V100:${tasks_per_node}
#SBATCH --mem-per-gpu=16G\
</file context>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1 issue found across 15 files (reviewed changes from recent commits).
Prompt for AI agents (all 1 issues)
Understand the root cause of the following 1 issues and fix them.
<file name="toolchain/templates/delta.mako">
<violation number="1" location="toolchain/templates/delta.mako:35">
`gpuConfigOptions` is used in the updated module-load command even though this template no longer imports it, so Delta submissions will crash with a NameError before running; reintroduce the import (or qualify the reference) so the CPU/GPU selection works.</violation>
</file>
Reply to cubic to teach it or ask questions. Re-run a review with @cubic-dev-ai review this PR
| ok ":) Loading modules:\n" | ||
| cd "${MFC_ROOT_DIR}" | ||
| . ./mfc.sh load -c d -m ${'g' if gpu else 'c'} | ||
| . ./mfc.sh load -c d -m ${'g' if gpu != gpuConfigOptions.NONE.value else 'c'} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gpuConfigOptions is used in the updated module-load command even though this template no longer imports it, so Delta submissions will crash with a NameError before running; reintroduce the import (or qualify the reference) so the CPU/GPU selection works.
Prompt for AI agents
Address the following comment on toolchain/templates/delta.mako at line 35:
<comment>`gpuConfigOptions` is used in the updated module-load command even though this template no longer imports it, so Delta submissions will crash with a NameError before running; reintroduce the import (or qualify the reference) so the CPU/GPU selection works.</comment>
<file context>
@@ -33,7 +32,7 @@ ${helpers.template_prologue()}
ok ":) Loading modules:\n"
cd "${MFC_ROOT_DIR}"
-. ./mfc.sh load -c d -m ${'g' if gpu else 'c'}
+. ./mfc.sh load -c d -m ${'g' if gpu != gpuConfigOptions.NONE.value else 'c'}
cd - > /dev/null
echo
</file context>
|
/improve |
| % if gpu != gpuConfigOptions.NONE.value: | ||
| #SBATCH --gpus-per-node=${tasks_per_node} | ||
| #SBATCH --mem=208G | ||
| #SBATCH --gpu-bind=closest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion: To ensure consistency and prevent potential NameError exceptions, replace the direct GPU check in delta.mako with the centralized is_gpu_active variable. [possible issue, importance: 9]
New proposed code:
-% if gpu != gpuConfigOptions.NONE.value:
+% if is_gpu_active:
#SBATCH --gpus-per-node=${tasks_per_node}
#SBATCH --mem=208G
#SBATCH --gpu-bind=closest
% endif| ok ":) Loading modules:\n" | ||
| cd "${MFC_ROOT_DIR}" | ||
| . ./mfc.sh load -c d -m ${'g' if gpu else 'c'} | ||
| . ./mfc.sh load -c d -m ${'g' if gpu != gpuConfigOptions.NONE.value else 'c'} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion: To maintain consistency and prevent potential errors, replace the inline GPU check with the is_gpu_active variable for module loading. [possible issue, importance: 9]
| . ./mfc.sh load -c d -m ${'g' if gpu != gpuConfigOptions.NONE.value else 'c'} | |
| . ./mfc.sh load -c d -m ${'g' if is_gpu_active else 'c'} |
| ${helpers.template_prologue()} | ||
|
|
||
| ok ":) Loading modules:\n" | ||
| cd "${MFC_ROOTDIR}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion: Correct the typo in the environment variable from MFC_ROOTDIR to MFC_ROOT_DIR to prevent a script failure. [possible issue, importance: 10]
| cd "${MFC_ROOTDIR}" | |
| cd "${MFC_ROOT_DIR}" |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #1055 +/- ##
=======================================
Coverage 44.34% 44.34%
=======================================
Files 71 71
Lines 20578 20578
Branches 1990 1990
=======================================
Hits 9125 9125
Misses 10308 10308
Partials 1145 1145 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
hmmm it's failing a bunch of tests |
|
hi @prathi-wind can you explain what's going on with this PR? |
User description
User description
Description
Adding OpenMP support changed the state variable. This caused CPU builds to request GPU resources on SLURM since the template was not updated. I updated the templates to no longer request GPU resources when not needed.
Fixes #(issue) [optional]
Type of change
Please delete options that are not relevant.
Scope
If you cannot check the above box, please split your PR into multiple PRs that each have a common goal.
How Has This Been Tested?
Please describe the tests that you ran to verify your changes.
Provide instructions so we can reproduce.
Please also list any relevant details for your test configuration
Test Configuration:
I tested on Phoenix by printing out what the template expanded out to in the different build types.
Checklist
docs/)examples/that demonstrate my new feature performing as expected.They run to completion and demonstrate "interesting physics"
./mfc.sh formatbefore committing my codeIf your code changes any code source files (anything in
src/simulation)To make sure the code is performing as expected on GPU devices, I have:
nvtxranges so that they can be identified in profiles./mfc.sh run XXXX --gpu -t simulation --nsys, and have attached the output file (.nsys-rep) and plain text results to this PR./mfc.sh run XXXX --gpu -t simulation --rsys --hip-trace, and have attached the output file and plain text results to this PR.PR Type
Bug fix
Description
Replace boolean
gpuchecks withgpuConfigOptions.NONE.valuecomparisonImport
gpuConfigOptionsfrommfc.statein all SLURM templatesFix CPU builds incorrectly requesting GPU resources on SLURM
Ensure GPU directives only apply when GPU is explicitly configured
Diagram Walkthrough
File Walkthrough
14 files
Import gpuConfigOptions and fix GPU checkImport gpuConfigOptions and add blank lineImport gpuConfigOptions and add blank lineImport gpuConfigOptions with blank lineImport gpuConfigOptions and fix GPU checkImport gpuConfigOptions and fix GPU checkImport gpuConfigOptions and fix multiple GPU checksImport gpuConfigOptions and fix GPU checkImport gpuConfigOptions and fix GPU checkImport gpuConfigOptions and fix GPU checkImport gpuConfigOptions and fix GPU checkImport gpuConfigOptions and fix GPU checkImport gpuConfigOptions and fix GPU checkImport gpuConfigOptions moduleSummary by CodeRabbit
New Features
Bug Fixes
Documentation
✏️ Tip: You can customize this high-level summary in your review settings.
CodeAnt-AI Description
Stop requesting GPU resources when running CPU-only templates
What Changed
Impact
✅ No GPU requests during CPU-only builds✅ Fewer GPU-related submission errors on SLURM✅ More accurate resource allocations for CPU jobs💡 Usage Guide
Checking Your Pull Request
Every time you make a pull request, our system automatically looks through it. We check for security issues, mistakes in how you're setting up your infrastructure, and common code problems. We do this to make sure your changes are solid and won't cause any trouble later.
Talking to CodeAnt AI
Got a question or need a hand with something in your pull request? You can easily get in touch with CodeAnt AI right here. Just type the following in a comment on your pull request, and replace "Your question here" with whatever you want to ask:
This lets you have a chat with CodeAnt AI about your pull request, making it easier to understand and improve your code.
Example
Preserve Org Learnings with CodeAnt
You can record team preferences so CodeAnt AI applies them in future reviews. Reply directly to the specific CodeAnt AI suggestion (in the same thread) and replace "Your feedback here" with your input:
This helps CodeAnt AI learn and adapt to your team's coding style and standards.
Example
Retrigger review
Ask CodeAnt AI to review the PR again, by typing:
Check Your Repository Health
To analyze the health of your code repository, visit our dashboard at https://app.codeant.ai. This tool helps you identify potential issues and areas for improvement in your codebase, ensuring your repository maintains high standards of code health.