-
Notifications
You must be signed in to change notification settings - Fork 117
Change unknown job device #978
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #978 +/- ##
=======================================
Coverage 43.23% 43.23%
=======================================
Files 70 70
Lines 20109 20109
Branches 2516 2513 -3
=======================================
Hits 8695 8695
Misses 9877 9877
Partials 1537 1537 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
PR Reviewer Guide 🔍Here are some key observations to aid the review process:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR simplifies the SLURM job submission scripts for the Phoenix cluster by removing complex job polling logic and making device detection more consistent. The changes aim to ensure Phoenix properly recognizes GPU-enabled environments for MFC testing and benchmarking.
Key changes:
- Simplified SLURM job submission by removing job polling and exit code handling
- Standardized device variable naming from
device
tojob_device
- Cleaned up temporary directory management in test scripts
Reviewed Changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.
File | Description |
---|---|
.github/workflows/phoenix/test.sh |
Removed temporary directory setup and cleanup, simplified test execution |
.github/workflows/phoenix/submit.sh |
Simplified SLURM submission by removing job polling logic and using synchronous execution |
.github/workflows/phoenix/submit-bench.sh |
Applied same simplification as submit.sh for benchmarking jobs |
.github/workflows/phoenix/bench.sh |
Updated device variable reference from device to job_device |
./mfc.sh test --dry-run -j 8 $build_opts | ||
|
||
n_test_threads=8 | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The variable n_test_threads is defined after it's used on line 8. This creates a logical inconsistency where the hardcoded value 8 is used before the variable is defined with the same value.
./mfc.sh test --dry-run -j 8 $build_opts | |
n_test_threads=8 | |
n_test_threads=8 | |
./mfc.sh test --dry-run -j $n_test_threads $build_opts |
Copilot uses AI. Check for mistakes.
|
||
# read the body of the user script | ||
sbatch_body=$(<"$sbatch_script") | ||
if [ ! -z "$1" ]; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] Use -n
instead of ! -z
for checking non-empty strings. The condition should be if [ -n "$1" ]; then
for better readability and following shell scripting best practices.
if [ ! -z "$1" ]; then | |
if [ -n "$1" ]; then |
Copilot uses AI. Check for mistakes.
#SBATCH -o $job_slug.out # stdout+stderr | ||
#SBATCH --mem-per-cpu=2G # default mem (overridden below) | ||
" | ||
if [ ! -z "$1" ]; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] Use -n
instead of ! -z
for checking non-empty strings. The condition should be if [ -n "$1" ]; then
for better readability and following shell scripting best practices.
if [ ! -z "$1" ]; then | |
if [ -n "$1" ]; then |
Copilot uses AI. Check for mistakes.
#SBATCH -t 03:00:00 # Duration of the job (Ex: 15 mins) | ||
#SBATCH -q embers # QOS Name | ||
#SBATCH -o$job_slug.out # Combined output and error messages file | ||
#SBATCH -W # Do not exit until the submitted job terminates. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The -W
flag makes sbatch wait for job completion, but this removes the ability to handle job failures gracefully. The original polling mechanism provided better error handling and cleanup capabilities.
#SBATCH -W # Do not exit until the submitted job terminates. |
Copilot uses AI. Check for mistakes.
#SBATCH -t 02:00:00 # Duration of the job (Ex: 15 mins) | ||
#SBATCH -q embers # QOS Name | ||
#SBATCH -o$job_slug.out # Combined output and error messages file | ||
#SBATCH -W # Do not exit until the submitted job terminates. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The -W
flag makes sbatch wait for job completion, but this removes the ability to handle job failures gracefully. The original polling mechanism provided better error handling and cleanup capabilities.
#SBATCH -W # Do not exit until the submitted job terminates. |
Copilot uses AI. Check for mistakes.
PR Code Suggestions ✨Explore these optional code suggestions:
|
User description
Make it so phoenix knows to run with proper device (GPU enabled possible)
PR Type
Bug fix
Description
Replace
device
variable withjob_device
for consistencySimplify SLURM job submission scripts
Remove complex job polling and cleanup logic
Clean up test script temporary directory handling
Diagram Walkthrough
File Walkthrough
bench.sh
Update device variable name
.github/workflows/phoenix/bench.sh
$device
variable references with$job_device
submit-bench.sh
Simplify SLURM job submission script
.github/workflows/phoenix/submit-bench.sh
device
tojob_device
-W
flag for synchronous job executionsubmit.sh
Simplify SLURM job submission script
.github/workflows/phoenix/submit.sh
device
tojob_device
-W
flag for synchronous job executiontest.sh
Clean up test script structure
.github/workflows/phoenix/test.sh
n_test_threads
variable declaration after usage