Skip to content

Commit

Permalink
Update make_bcs
Browse files Browse the repository at this point in the history
change constraint option
  • Loading branch information
biljanaorescanin committed Mar 23, 2023
1 parent b4141a8 commit b3b18cb
Showing 1 changed file with 11 additions and 11 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -801,10 +801,10 @@ cat << _EOF_ > $BCJOB
#SBATCH --output=$EXPDIR/$OUTDIR/logs/$BCNAME.log
#SBATCH --error=$EXPDIR/$OUTDIR/logs/$BCNAME.err
#SBATCH --account=$group
#SBATCH --time=03:00:00
#SBATCH --ntasks=28
#SBATCH --time=12:00:00
#SBATCH --nodes=1

This comment has been minimized.

Copy link
@gmao-rreichle

gmao-rreichle Mar 23, 2023

Contributor

@biljanaorescanin, are you sure this is what we're supposed to do? Here's an excerpt from the SI team's email:
"...be sure you ask for --ntasks= rather than nodes since the tasks per node on sky|cas varies (make sure any run scripts use the Slurm environment variable $SLURM_CPUS_ON_NODE rather than hardwiring as 36 or 45)."
Maybe we need to do something like:
#SBATCH --ntasks=$SLURM_CPUS_ON_NODE
Or would we automatically get ntasks=SLURM_CPUS_ON_NODE?
Also, what happened to the instruction from a year or so ago that suggested using fewer than the max number of CPUs on the newer (cas?) nodes?
I might well be misunderstanding the email from the SI team.
cc: @weiyuan-jiang

This comment has been minimized.

Copy link
@biljanaorescanin

biljanaorescanin Mar 23, 2023

Author Contributor

#SBATCH --ntasks=$SLURM_CPUS_ON_NODE this will not work.
(SLURM_CPUS_ON_NODE: Undefined variable.) I've tried that.

@mathomp4 what is best thing to do here?

This comment has been minimized.

Copy link
@mathomp4

mathomp4 Mar 23, 2023

Member

Well, from my read of the script you are always running with 20 threads? So you could do:

#SBATCH --nodes=1
#SBATCH --ntasks-per-node=20

or even:

#SBATCH --ntasks=20

Since both Skylake and Cascade Lake have at least 20 cores, you would be safe.

This comment has been minimized.

Copy link
@gmao-rreichle

gmao-rreichle Mar 23, 2023

Contributor

Thanks, @mathomp4. But don't we get charged for the entire node? If that's still true, then wouldn't it be better to use all (or nearly all?) of the CPUs on the node we're getting?

#SBATCH --ntasks=$SLURM_CPUS_ON_NODE this will not work.

The email from the SI Team to GMAO-All just said:
"...make sure any run scripts use the Slurm environment variable $SLURM_CPUS_ON_NODE..."
Perhaps whoever wrote this email for the SI Team could clarify how exactly this can be done?

This comment has been minimized.

Copy link
@mathomp4

mathomp4 Mar 23, 2023

Member

@gmao-rreichle Hmm. I don't remember sending that email. When was it sent?

But you can use $SLURM_CPUS_ON_NODE only inside an allocation, and the allocation is controlled by those #SBATCH pragmas or options you pass to sbatch.

If you want to use $SLURM_CPUS_ON_NODE for controlling OpenMP threads, then the best way is to specify:

#SBATCH --nodes=1

and then in the heredocs which make your scripts do:

NCPUS=$SLURM_CPUS_ON_NODE

If you get an allocation on each type using just --nodes=1 you'll see:

SLURM_CPUS_ON_NODE=40

on Skylake and:

SLURM_CPUS_ON_NODE=46

on Cascade Lake. (NCCS by fiat won't allow you to use all 48 cores on the Cascade Lake nodes.)

This comment has been minimized.

Copy link
@gmao-rreichle

gmao-rreichle Mar 23, 2023

Contributor

Thanks, @mathomp4! I forwarded the SI Team email to you separately.

@biljanaorescanin: Based on what @mathomp4 wrote, I understand that it is indeed best to just ask use #SBATCH --nodes=1, but we'd then have to use NCPUS=$SLURM_CPUS_ON_NODE here:


Does that make sense?

This comment has been minimized.

Copy link
@mathomp4

mathomp4 Mar 23, 2023

Member

@gmao-rreichle You can't do it there. That's before you are in SLURM. What you need to do is at some point after every set of #SBATCH pragmas set in a heredoc, do:

set NCPUS = $SLURM_CPUS_ON_NODE

because that variable is only set inside a SLURM allocation.

So pretty much somewhere after each of the file changes in this PR.

This comment has been minimized.

Copy link
@gmao-rreichle

gmao-rreichle Mar 23, 2023

Contributor

Thanks again, @mathomp4! Makes eminent sense, of course. I generally should refrain coding and commenting after 5pm... my brain is fried.
@biljanaorescanin, @weiyuan-jiang: The important thing is to update the python version accordingly. Fixing this in the c-shell make_bcs is only so helpful.

#SBATCH --job-name=$BCNAME.j
#SBATCH --constraint=sky
#SBATCH --constraint=sky|cas
cd $BCDIR
Expand Down Expand Up @@ -1003,9 +1003,9 @@ cat << _EOF_ > $BCJOB
#SBATCH --error=$EXPDIR/$OUTDIR/logs/$BCNAME.err
#SBATCH --account=$group
#SBATCH --time=12:00:00
#SBATCH --ntasks=1
#SBATCH --nodes=1
#SBATCH --job-name=$BCNAME.j
#SBATCH --constraint=sky
#SBATCH --constraint=sky|cas
cd $BCDIR
Expand Down Expand Up @@ -1070,9 +1070,9 @@ cat << _EOF_ > $BCJOB-2
#SBATCH --error=$EXPDIR/$OUTDIR/logs/$BCNAME-2.err
#SBATCH --account=$group
#SBATCH --time=12:00:00
#SBATCH --ntasks=28
#SBATCH --nodes=1
#SBATCH --job-name=$BCNAME-2.j
#SBATCH --constraint=sky
#SBATCH --constraint=sky|cas
cd $BCDIR
Expand Down Expand Up @@ -1215,9 +1215,9 @@ cat << _EOF_ > $BCJOB
#SBATCH --error=$EXPDIR/$OUTDIR/logs/$BCNAME.err
#SBATCH --account=$group
#SBATCH --time=12:00:00
#SBATCH --ntasks=28
#SBATCH --nodes=1
#SBATCH --job-name=$BCNAME.j
#SBATCH --constraint=sky
#SBATCH --constraint=sky|cas
cd $BCDIR
Expand Down Expand Up @@ -1430,9 +1430,9 @@ cat << _EOF_ > $BCJOB
#SBATCH --error=$EXPDIR/$OUTDIR/logs/$BCNAME.err
#SBATCH --account=$group
#SBATCH --time=12:00:00
#SBATCH --ntasks=28
#SBATCH --nodes=1
#SBATCH --job-name=$BCNAME.j
#SBATCH --constraint=sky
#SBATCH --constraint=sky|cas
cd $BCDIR
Expand Down

0 comments on commit b3b18cb

Please sign in to comment.