updating slurm scripts

EPCCed · Jul 6, 2023 · 8272049 · 8272049
1 parent 5076c30
commit 8272049
Show file tree

Hide file tree

Showing 13 changed files with 181 additions and 28 deletions.
diff --git a/content/Part0_Introduction/contents.md b/content/Part0_Introduction/contents.md
@@ -1,6 +1,6 @@
 # Introduction to High Performance Computing
 
-This course aims to introduce the principles of high performance computing (HPC) systems to participants. How HPC systems operate and we can utilise the computing power they can offer to model complex systems. 
+This course aims to introduce the principles of high performance computing (HPC) systems, how they operate and how we can utilise the computing power they can offer to model complex systems. 
 
 This will start with a discussion of supercomputers (HPC systems) there purpose, hardware and trends in computing. We will then cover material on the fundamentals of modern computers and how we can perform computations in parallel. This will lead into a discussion of how to supercomputers derive there performance and distributed memory architecture is used in HPC systems.
 

diff --git a/content/Part6_Exercises/exercise0/part2.md b/content/Part6_Exercises/exercise0/part2.md
@@ -5,8 +5,7 @@ This example is meant to get you used to the command line environment of a high
 
 In the following we are going to look at variety of hello world programs and look at the two most common types of parallelism in the HPC world. 
 
-One that takes advantage of shared memory and distributed memory parts of a HPC system.
-
+One that takes advantage of shared memory and one that uses distributed memory in a HPC system. We will then look at how they can be combined but first we start with a simple serial code.
 
 ## Serial
 
@@ -32,31 +31,47 @@ int main(int argc, char* argv[])
         return 1;
     }
 
-    // Receive arguments
+    // Receive argument
 
-    char* iname = (char *)malloc(strlen(argv[1])+1);
+    char* iname = (char *)malloc(strlen(argv[1]));
 
     strcpy(iname, argv[1]);
 
+    // Get the name of the node we are running on
+
     char hostname[HOST_NAME_MAX];
     gethostname(hostname, HOST_NAME_MAX);
 
+    // Hello World message
+
     printf("Hello World!\n");
+
+    // Message from the node to the user
+
     printf("Hello %s, this is %s.\n", iname, hostname);
 
 }
 
 ```
-this is a very simple C code but it will say hello to you and report where it is running from.
+This is a simple C code but it will say hello to you and report which node it is running from.
+
+To try this example yourself you will first need to compile the example code.
 
-To run this example use the following batch script for {{ machine_name }},
+If the file that contains the above code is called `helloWorldSerial.c` then to compile on {{ machine_name }} use command,
+
+{{  '```{include} ../../substitutions/substitutions_REPLACE/Exercise0/Hello-SER-Compile.md\n```'.replace("REPLACE",machine_name) }}
+
+To run this example using the compute nodes via the job queue, use the following bash script written for {{ machine_name }},
 
 {{  '```{include} ../../substitutions/substitutions_REPLACE/Exercise0/Hello-SER-Slurm.md\n```'.replace("REPLACE",machine_name) }}
 
-This example is small enough that it can be run on the login nodes of {{ machine_name }} by typing `./hello-SER`.
+This example is small enough that it can be run on the login nodes of {{ machine_name }} by running,
 
-How does this differ from when you run using a batch script?
+ ```
+ ./hello-SER
+ ```
 
+ How does this differ from when you run using the batch script?
 
 
 ---
@@ -81,21 +96,31 @@ The code is a little more complex than the last example in order to run multiple
 int main(int argc, char* argv[])
 {
 
+  // Check input argument
+
   if(argc != 2)
   {
       printf("Required one argumnet `name`.\n");
       return 1;
   }
 
-  char* iname = (char *)malloc(strlen(argv[1])+1);
+  // Receive argument
+
+  char* iname = (char *)malloc(strlen(argv[1]));
 
   strcpy(iname,argv[1]);
 
+  // Get the name of the node we are running on
+
   char hostname[HOST_NAME_MAX];
   gethostname(hostname, HOST_NAME_MAX);
 
+  // Hello World message
+
   printf("Hello World!\n");
 
+  // Message from each thread on the node to the user
+
   #pragma omp parallel
   {
     printf("Hello %s, this is node %s responding from thread %d\n", iname, hostname,
@@ -106,22 +131,27 @@ int main(int argc, char* argv[])
 
 ```
 
-In order to run this on a {{ machine_name }} node we can use the following script,
+To try this example yourself you will first need to compile the example code.
+
+If the file that contains the above code is called `helloWorldThreaded.c` then to compile on {{ machine_name }} use command,
 
+{{  '```{include} ../../substitutions/substitutions_REPLACE/Exercise0/Hello-THRD-Compile.md\n```'.replace("REPLACE",machine_name) }}
+
+In order to run this on a {{ machine_name }} node we can use the following script,
 
 {{  '```{include} ../../substitutions/substitutions_REPLACE/Exercise0/Hello-THRD-Slurm.md\n```'.replace("REPLACE",machine_name) }}
 
-Here we continue to run on a single process, each process can have a number of threads. The number of threads used is usually set to the number of CPU's on a given node or a fraction of that. Threaded codes can take advantage of the shared memory aspect of a HPC systems to pass data between each other but cannot communicated between distinct nodes. 
+As in the serial case we have run a single process but now the process runs a number of threads. Threaded codes can take advantage of the shared memory aspect of a HPC systems to pass data between each thread but cannot communicated between distinct nodes. 
 
-If you run this with multiple processes then it will still work but without MPI communication these processes will be entirely independent and will not communicate information.
+If you run this code on multiple processes then it will still work but without MPI communication these processes will be entirely independent and are not able to communicate information.
 
 ---
 
 ## MPI
 
 MPI is a message passing interface, this allow for messages to be sent by multiple instances of the program running on different nodes to each other. Each instance of the program is controlled by a separate instance of the operating system.
 
-This MPI example each process says hello in the programs and states which node it is running on and which process of the group it is.
+This MPI example each process says hello in the programs and states which node it is running on and which process in the group it is.
 
 ```
 
@@ -145,7 +175,7 @@ int main(int argc, char *argv[])
     char* iname = (char *)malloc(strlen(argv[1])+1);
     char* iname2 = (char *)malloc(strlen(argv[1])+1);
 
-    strcpy(iname,argv[1]);
+    strcpy(iname, argv[1]);
     strcpy(iname2, iname);
 
     // MPI Setup
@@ -160,10 +190,10 @@ int main(int argc, char *argv[])
 
     MPI_Get_processor_name(name, &len);
 
-    // Create message to broadcast to all processes.
+    // Create message from rank 0 to broadcast to all processes.
 
     strcat(iname, "@");
-    strcat(iname,name);
+    strcat(iname, name);
 
     int inameSize = strlen(iname);
 
@@ -184,11 +214,17 @@ int main(int argc, char *argv[])
 
     MPI_Barrier(MPI_COMM_WORLD);
 
+    // Send different messages from different ranks
+
+    // Send hello from rank 0
+
     if (rank == 0)
     {
       printf("Hello world, my name is %s, I am sending this message from process %d of %d total processes executing, which is running on node %s. \n", iname2, rank, size, name);
     }
 
+    // Send responce from the other ranks
+
     if (rank != 0)
     {
       printf("Hello, %s I am process %d of %d total processes executing and I am running on node %s.\n", buff, rank, size, name);
@@ -203,7 +239,13 @@ int main(int argc, char *argv[])
 
 ```
 
-We can run this executable using this batch script,
+To try this example yourself you will first need to compile the example code.
+
+If the file that contains the above code is called `helloWorldMPI.c` then to compile on {{ machine_name }} use command,
+
+{{  '```{include} ../../substitutions/substitutions_REPLACE/Exercise0/Hello-MPI-Compile.md\n```'.replace("REPLACE",machine_name) }}
+
+We can run this executable using this bash script,
 
 {{  '```{include} ../../substitutions/substitutions_REPLACE/Exercise0/Hello-MPI-SlurmA.md\n```'.replace("REPLACE",machine_name) }}
 
@@ -218,7 +260,7 @@ Hello, your-name@nid001059 I am process 2 of 4 total processes executing and I a
 Hello, your-name@nid001059 I am process 3 of 4 total processes executing and I am running on node nid001098.
 ```
 
-We can however have multiple processes per node,
+We can however have multiple processes per node, if we update our bash script to,
 
 {{  '```{include} ../../substitutions/substitutions_REPLACE/Exercise0/Hello-MPI-SlurmB.md\n```'.replace("REPLACE",machine_name) }}
 

diff --git a/content/substitutions/substitutions_ARCHER2/Exercise0/Hello-HYB-Compile.md b/content/substitutions/substitutions_ARCHER2/Exercise0/Hello-HYB-Compile.md
@@ -0,0 +1,4 @@
+
+```
+CC helloWorldHYB.c -fopenmp -o hello-HYB  
+```
diff --git a/content/substitutions/substitutions_ARCHER2/Exercise0/Hello-HYB-Slurm.md b/content/substitutions/substitutions_ARCHER2/Exercise0/Hello-HYB-Slurm.md
@@ -14,13 +14,34 @@
 #SBATCH --qos=standard
 #SBATCH --partition=standard
 
+# Set the number of threads to the CPUs per task
+export SRUN_CPUS_PER_TASK=$SLURM_CPUS_PER_TASK
 export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
+
 NODES=$SLURM_JOB_NUM_NODES
 CORES=$((NODES*128))
 THREADS=$OMP_NUM_THREADS
 
 export OMP_PLACES=cores
 
-srun --hint=nomultithread --distribution=block:block ./hello-HYB your-name > HYBRID-${NODES}nodes-${CORES}cores-${THREADS}threads-run.${SLURM_JOBID}.out
+echo "job start"
+
+# Launch the parallel job
+srun --hint=nomultithread --distribution=block:block ./hello-HYB YOUR-NAME-HERE > HYBRID-${NODES}nodes-${CORES}cores-${THREADS}threads-run.${SLURM_JOBID}.out
+
+echo "job complete"
+
+```
+
+Place this bash code into a a file called `Hello_hybrid_Slurm.sh` in the same directory as the previous code and replace `YOUR_NAME_HERE` with your own input.
+
+To submit this job run,
+
+```
+sbatch Hello_hybrid_Slurm.sh
+```
+
+This should return two files as output,
 
-```
+- The first file name begins with `HYBRID-...` is the log file from the job and contains a message produced by the code at run time.
+- The second file name begins with `slurm` is the output from the script used to submit the job.
diff --git a/content/substitutions/substitutions_ARCHER2/Exercise0/Hello-MPI-Compile.md b/content/substitutions/substitutions_ARCHER2/Exercise0/Hello-MPI-Compile.md
@@ -0,0 +1,4 @@
+
+```
+cc helloWorldMPI.c -o hello-MPI
+```
diff --git a/content/substitutions/substitutions_ARCHER2/Exercise0/Hello-MPI-SlurmA.md b/content/substitutions/substitutions_ARCHER2/Exercise0/Hello-MPI-SlurmA.md
@@ -14,14 +14,33 @@
 #SBATCH --qos=standard
 
 # Set the number of threads to the CPUs per task
+export SRUN_CPUS_PER_TASK=$SLURM_CPUS_PER_TASK
 export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
+
 NODES=$SLURM_JOB_NUM_NODES
 CORES=$((NODES*128))
 THREADS=$OMP_NUM_THREADS
 
 export OMP_PLACES=cores
 
+echo "job start"
+
 # Launch the parallel job
-srun --hint=nomultithread --distribution=block:block ./hello-MPI your-name > MPI-${NODES}nodes-${CORES}cores-${THREADS}threads.${SLURM_JOBID}.out
+srun --hint=nomultithread --distribution=block:block ./hello-MPI YOUR-NAME-HERE > MPI-${NODES}nodes-${CORES}cores-${THREADS}threads.${SLURM_JOBID}.out
+
+echo "job complete"
+
+```
+
+Place this bash code into a a file called `Hello_MPI_Slurm.sh` in the same directory as the previous code and replace `YOUR_NAME_HERE` with your own input.
+
+To submit this job run,
+
+```
+sbatch Hello_MPI_Slurm.sh
+```
+
+This should return two files as output,
 
-```
+- The first file name begins with `MPI-...` is the log file from the job and contains a message produced by the code at run time.
+- The second file name begins with `slurm` is the output from the script used to submit the job.
diff --git a/content/substitutions/substitutions_ARCHER2/Exercise0/Hello-MPI-SlurmB.md b/content/substitutions/substitutions_ARCHER2/Exercise0/Hello-MPI-SlurmB.md
@@ -14,14 +14,20 @@
 #SBATCH --qos=standard
 
 # Set the number of threads to the CPUs per task
+export SRUN_CPUS_PER_TASK=$SLURM_CPUS_PER_TASK
 export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
+
 NODES=$SLURM_JOB_NUM_NODES
 CORES=$((NODES*128))
 THREADS=$OMP_NUM_THREADS
 
 export OMP_PLACES=cores
 
+echo "job start"
+
 # Launch the parallel job
-srun --hint=nomultithread --distribution=block:block ./hello-MPI your-name > MPI-${NODES}nodes-${CORES}cores-${THREADS}threads.${SLURM_JOBID}.out
+srun --hint=nomultithread --distribution=block:block ./hello-MPI YOUR-NAME-HERE > MPI-${NODES}nodes-${CORES}cores-${THREADS}threads.${SLURM_JOBID}.out
+
+echo "job complete"
 
 ```
diff --git a/content/substitutions/substitutions_ARCHER2/Exercise0/Hello-SER-Compile.md b/content/substitutions/substitutions_ARCHER2/Exercise0/Hello-SER-Compile.md
@@ -0,0 +1,5 @@
+
+
+```
+cc helloWorldSerial.c -o hello-SER
+```
diff --git a/content/substitutions/substitutions_ARCHER2/Exercise0/Hello-SER-Slurm.md b/content/substitutions/substitutions_ARCHER2/Exercise0/Hello-SER-Slurm.md
@@ -14,14 +14,35 @@
 #SBATCH --qos=standard
 
 # Set the number of threads to the CPUs per task
+export SRUN_CPUS_PER_TASK=$SLURM_CPUS_PER_TASK
 export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
+
 NODES=$SLURM_JOB_NUM_NODES
 CORES=$((NODES*128))
 THREADS=$OMP_NUM_THREADS
 
 export OMP_PLACES=cores
 
+echo "job start"
+
 # Launch the parallel job
-srun --hint=nomultithread --distribution=block:block ./hello-SER your-name > SERIAL-${NODES}nodes-${CORES}cores-${THREADS}threads.${SLURM_JOBID}.out
+srun --hint=nomultithread --distribution=block:block ./hello-SER YOUR_NAME_HERE > SERIAL-${NODES}nodes-${CORES}cores-${THREADS}threads.${SLURM_JOBID}.out
+
+echo "job complete"
+
+```
+
+Place this bash code into a a file called `Hello_Serial_Slurm.sh` in the same directory as the previous code and replace `YOUR_NAME_HERE` with your own input.
+
+To submit this job run,
+
+```
+sbatch Hello_Serial_Slurm.sh
+```
+
+This should return two files as output,
+
+- The first file name begins with `SERIAL-...` is the log file from the job and contains a message produced by the code at run time.
+- The second file name begins with `slurm` is the output from the script used to submit the job.
 
-```
+Have a look in both files and identify the source of the different messages.
diff --git a/content/substitutions/substitutions_ARCHER2/Exercise0/Hello-THRD-Compile.md b/content/substitutions/substitutions_ARCHER2/Exercise0/Hello-THRD-Compile.md
@@ -0,0 +1,8 @@
+
+```
+
+cc helloWorldThreaded.c -fopenmp -o hello-THRD
+
+```
+
+Where `-fopenmp` is a flag that tells the compiler that we are using openmp a library that allows up to write threaded code.