Skip to content

Conversation

@sean-smith
Copy link
Contributor

Signed-off-by: Sean Smith seaam@amazon.com

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

Sean Smith and others added 7 commits July 27, 2018 12:27
Signed-off-by: Sean Smith <seaam@amazon.com>
Signed-off-by: CfnCluster AMI bot <ec2-ds9-dev@amazon.com>
Signed-off-by: Sean Smith <seaam@amazon.com>
After fixing the configuration of the compute nodes in a Slurm cluster
and set the CPU as consumable resource we should also fix job submission
in the integration tests.
In order to properly test the scale up a single job submission should
allocate all the slots available in a compute node.
The fix has been tested.

Stage 1: two jobs submitted (one running and one pending)
JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
3   compute  job2.sh   centos PD       0:00      1 (Resources)
2   compute  job1.sh   centos  R       5:18      1 ip-10-0-82-245

- one nodes with the 2 CPUs allocated
[centos@ip-10-0-235-160 ~]$ scontrol show nodes --all
NodeName=ip-10-0-82-245 Arch=x86_64 CoresPerSocket=1
   CPUAlloc=2 CPUErr=0 CPUTot=2 CPULoad=0.11
   AvailableFeatures=(null)
   ActiveFeatures=(null)
   Gres=(null)
   NodeAddr=ip-10-0-82-245 NodeHostName=ip-10-0-82-245 Version=16.05
   OS=Linux RealMemory=3711 AllocMem=0 FreeMem=3022 Sockets=2 Boards=1
   State=ALLOCATED ThreadsPerCore=1 TmpDisk=14989 Weight=1 Owner=N/A
MCS_label=N/A
   BootTime=2018-07-31T14:37:49 SlurmdStartTime=2018-07-31T14:41:31
   CapWatts=n/a
   CurrentWatts=0 LowestJoules=0 ConsumedJoules=0
   ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s

Stage 2: the second compute node join the cluster and the two jobs are
both running on two different hosts:

[centos@ip-10-0-235-160 ~]$ squeue --states=all
JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
2   compute  job1.sh   centos  R       6:14      1 ip-10-0-82-245
3   compute  job2.sh   centos  R       0:34      1 ip-10-0-121-16

[centos@ip-10-0-235-160 ~]$ scontrol show nodes --all
NodeName=ip-10-0-82-245 Arch=x86_64 CoresPerSocket=1
   CPUAlloc=2 CPUErr=0 CPUTot=2 CPULoad=0.11
   AvailableFeatures=(null)
   ActiveFeatures=(null)
   Gres=(null)
   NodeAddr=ip-10-0-82-245 NodeHostName=ip-10-0-82-245 Version=16.05
   OS=Linux RealMemory=3711 AllocMem=0 FreeMem=3022 Sockets=2 Boards=1
   State=ALLOCATED ThreadsPerCore=1 TmpDisk=14989 Weight=1 Owner=N/A
MCS_label=N/A
   BootTime=2018-07-31T14:37:49 SlurmdStartTime=2018-07-31T14:41:31
   CapWatts=n/a
   CurrentWatts=0 LowestJoules=0 ConsumedJoules=0
   ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s

NodeName=ip-10-0-121-16 Arch=x86_64 CoresPerSocket=1
   CPUAlloc=2 CPUErr=0 CPUTot=2 CPULoad=0.37
   AvailableFeatures=(null)
   ActiveFeatures=(null)
   Gres=(null)
   NodeAddr=ip-10-0-121-16 NodeHostName=ip-10-0-121-16 Version=(null)
   OS=Linux RealMemory=3711 AllocMem=0 FreeMem=3035 Sockets=2 Boards=1
   State=ALLOCATED ThreadsPerCore=1 TmpDisk=14989 Weight=1 Owner=N/A
MCS_label=N/A
   BootTime=2018-07-31T14:43:46 SlurmdStartTime=2018-07-31T14:47:35
   CapWatts=n/a
   CurrentWatts=0 LowestJoules=0 ConsumedJoules=0
   ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s

Signed-off-by: Maurizio Melato <mmelato@amazon.com>
"ec2:CreatePlacementGroup" and "ec2:DeletePlacementGroup" used
when setting the placement group config to be DYNAMIC

"iam:GetRole" and "iam:SimulatePrincipalPolicy" used when setting
a custom instance role

Signed-off-by: Luca Carrogu <carrogu@amazon.com>
Signed-off-by: Sean Smith <seaam@amazon.com>
Adds support for GovCloud (us-gov-west-1) region

Signed-off-by: Sean Smith <seaam@amazon.com>
@sean-smith sean-smith merged commit 687118b into master Aug 9, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants