Skip to content

Commit

Permalink
updating slurm scripts
Browse files Browse the repository at this point in the history
  • Loading branch information
JPRichings committed Jul 6, 2023
1 parent 5076c30 commit 8272049
Show file tree
Hide file tree
Showing 13 changed files with 181 additions and 28 deletions.
2 changes: 1 addition & 1 deletion content/Part0_Introduction/contents.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Introduction to High Performance Computing

This course aims to introduce the principles of high performance computing (HPC) systems to participants. How HPC systems operate and we can utilise the computing power they can offer to model complex systems.
This course aims to introduce the principles of high performance computing (HPC) systems, how they operate and how we can utilise the computing power they can offer to model complex systems.

This will start with a discussion of supercomputers (HPC systems) there purpose, hardware and trends in computing. We will then cover material on the fundamentals of modern computers and how we can perform computations in parallel. This will lead into a discussion of how to supercomputers derive there performance and distributed memory architecture is used in HPC systems.

Expand Down
78 changes: 60 additions & 18 deletions content/Part6_Exercises/exercise0/part2.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,7 @@ This example is meant to get you used to the command line environment of a high

In the following we are going to look at variety of hello world programs and look at the two most common types of parallelism in the HPC world.

One that takes advantage of shared memory and distributed memory parts of a HPC system.

One that takes advantage of shared memory and one that uses distributed memory in a HPC system. We will then look at how they can be combined but first we start with a simple serial code.

## Serial

Expand All @@ -32,31 +31,47 @@ int main(int argc, char* argv[])
return 1;
}
// Receive arguments
// Receive argument
char* iname = (char *)malloc(strlen(argv[1])+1);
char* iname = (char *)malloc(strlen(argv[1]));
strcpy(iname, argv[1]);
// Get the name of the node we are running on
char hostname[HOST_NAME_MAX];
gethostname(hostname, HOST_NAME_MAX);
// Hello World message
printf("Hello World!\n");
// Message from the node to the user
printf("Hello %s, this is %s.\n", iname, hostname);
}
```
this is a very simple C code but it will say hello to you and report where it is running from.
This is a simple C code but it will say hello to you and report which node it is running from.

To try this example yourself you will first need to compile the example code.

To run this example use the following batch script for {{ machine_name }},
If the file that contains the above code is called `helloWorldSerial.c` then to compile on {{ machine_name }} use command,

{{ '```{include} ../../substitutions/substitutions_REPLACE/Exercise0/Hello-SER-Compile.md\n```'.replace("REPLACE",machine_name) }}

To run this example using the compute nodes via the job queue, use the following bash script written for {{ machine_name }},

{{ '```{include} ../../substitutions/substitutions_REPLACE/Exercise0/Hello-SER-Slurm.md\n```'.replace("REPLACE",machine_name) }}

This example is small enough that it can be run on the login nodes of {{ machine_name }} by typing `./hello-SER`.
This example is small enough that it can be run on the login nodes of {{ machine_name }} by running,

How does this differ from when you run using a batch script?
```
./hello-SER
```

How does this differ from when you run using the batch script?


---
Expand All @@ -81,21 +96,31 @@ The code is a little more complex than the last example in order to run multiple
int main(int argc, char* argv[])
{
// Check input argument
if(argc != 2)
{
printf("Required one argumnet `name`.\n");
return 1;
}
char* iname = (char *)malloc(strlen(argv[1])+1);
// Receive argument
char* iname = (char *)malloc(strlen(argv[1]));
strcpy(iname,argv[1]);
// Get the name of the node we are running on
char hostname[HOST_NAME_MAX];
gethostname(hostname, HOST_NAME_MAX);
// Hello World message
printf("Hello World!\n");
// Message from each thread on the node to the user
#pragma omp parallel
{
printf("Hello %s, this is node %s responding from thread %d\n", iname, hostname,
Expand All @@ -106,22 +131,27 @@ int main(int argc, char* argv[])
```

In order to run this on a {{ machine_name }} node we can use the following script,
To try this example yourself you will first need to compile the example code.

If the file that contains the above code is called `helloWorldThreaded.c` then to compile on {{ machine_name }} use command,

{{ '```{include} ../../substitutions/substitutions_REPLACE/Exercise0/Hello-THRD-Compile.md\n```'.replace("REPLACE",machine_name) }}

In order to run this on a {{ machine_name }} node we can use the following script,

{{ '```{include} ../../substitutions/substitutions_REPLACE/Exercise0/Hello-THRD-Slurm.md\n```'.replace("REPLACE",machine_name) }}

Here we continue to run on a single process, each process can have a number of threads. The number of threads used is usually set to the number of CPU's on a given node or a fraction of that. Threaded codes can take advantage of the shared memory aspect of a HPC systems to pass data between each other but cannot communicated between distinct nodes.
As in the serial case we have run a single process but now the process runs a number of threads. Threaded codes can take advantage of the shared memory aspect of a HPC systems to pass data between each thread but cannot communicated between distinct nodes.

If you run this with multiple processes then it will still work but without MPI communication these processes will be entirely independent and will not communicate information.
If you run this code on multiple processes then it will still work but without MPI communication these processes will be entirely independent and are not able to communicate information.

---

## MPI

MPI is a message passing interface, this allow for messages to be sent by multiple instances of the program running on different nodes to each other. Each instance of the program is controlled by a separate instance of the operating system.

This MPI example each process says hello in the programs and states which node it is running on and which process of the group it is.
This MPI example each process says hello in the programs and states which node it is running on and which process in the group it is.

```
Expand All @@ -145,7 +175,7 @@ int main(int argc, char *argv[])
char* iname = (char *)malloc(strlen(argv[1])+1);
char* iname2 = (char *)malloc(strlen(argv[1])+1);
strcpy(iname,argv[1]);
strcpy(iname, argv[1]);
strcpy(iname2, iname);
// MPI Setup
Expand All @@ -160,10 +190,10 @@ int main(int argc, char *argv[])
MPI_Get_processor_name(name, &len);
// Create message to broadcast to all processes.
// Create message from rank 0 to broadcast to all processes.
strcat(iname, "@");
strcat(iname,name);
strcat(iname, name);
int inameSize = strlen(iname);
Expand All @@ -184,11 +214,17 @@ int main(int argc, char *argv[])
MPI_Barrier(MPI_COMM_WORLD);
// Send different messages from different ranks
// Send hello from rank 0
if (rank == 0)
{
printf("Hello world, my name is %s, I am sending this message from process %d of %d total processes executing, which is running on node %s. \n", iname2, rank, size, name);
}
// Send responce from the other ranks
if (rank != 0)
{
printf("Hello, %s I am process %d of %d total processes executing and I am running on node %s.\n", buff, rank, size, name);
Expand All @@ -203,7 +239,13 @@ int main(int argc, char *argv[])
```

We can run this executable using this batch script,
To try this example yourself you will first need to compile the example code.

If the file that contains the above code is called `helloWorldMPI.c` then to compile on {{ machine_name }} use command,

{{ '```{include} ../../substitutions/substitutions_REPLACE/Exercise0/Hello-MPI-Compile.md\n```'.replace("REPLACE",machine_name) }}

We can run this executable using this bash script,

{{ '```{include} ../../substitutions/substitutions_REPLACE/Exercise0/Hello-MPI-SlurmA.md\n```'.replace("REPLACE",machine_name) }}

Expand All @@ -218,7 +260,7 @@ Hello, your-name@nid001059 I am process 2 of 4 total processes executing and I a
Hello, your-name@nid001059 I am process 3 of 4 total processes executing and I am running on node nid001098.
```

We can however have multiple processes per node,
We can however have multiple processes per node, if we update our bash script to,

{{ '```{include} ../../substitutions/substitutions_REPLACE/Exercise0/Hello-MPI-SlurmB.md\n```'.replace("REPLACE",machine_name) }}

Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@

```
CC helloWorldHYB.c -fopenmp -o hello-HYB
```
Original file line number Diff line number Diff line change
Expand Up @@ -14,13 +14,34 @@
#SBATCH --qos=standard
#SBATCH --partition=standard
# Set the number of threads to the CPUs per task
export SRUN_CPUS_PER_TASK=$SLURM_CPUS_PER_TASK
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
NODES=$SLURM_JOB_NUM_NODES
CORES=$((NODES*128))
THREADS=$OMP_NUM_THREADS
export OMP_PLACES=cores
srun --hint=nomultithread --distribution=block:block ./hello-HYB your-name > HYBRID-${NODES}nodes-${CORES}cores-${THREADS}threads-run.${SLURM_JOBID}.out
echo "job start"
# Launch the parallel job
srun --hint=nomultithread --distribution=block:block ./hello-HYB YOUR-NAME-HERE > HYBRID-${NODES}nodes-${CORES}cores-${THREADS}threads-run.${SLURM_JOBID}.out
echo "job complete"
```

Place this bash code into a a file called `Hello_hybrid_Slurm.sh` in the same directory as the previous code and replace `YOUR_NAME_HERE` with your own input.

To submit this job run,

```
sbatch Hello_hybrid_Slurm.sh
```

This should return two files as output,

```
- The first file name begins with `HYBRID-...` is the log file from the job and contains a message produced by the code at run time.
- The second file name begins with `slurm` is the output from the script used to submit the job.
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@

```
cc helloWorldMPI.c -o hello-MPI
```
Original file line number Diff line number Diff line change
Expand Up @@ -14,14 +14,33 @@
#SBATCH --qos=standard
# Set the number of threads to the CPUs per task
export SRUN_CPUS_PER_TASK=$SLURM_CPUS_PER_TASK
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
NODES=$SLURM_JOB_NUM_NODES
CORES=$((NODES*128))
THREADS=$OMP_NUM_THREADS
export OMP_PLACES=cores
echo "job start"
# Launch the parallel job
srun --hint=nomultithread --distribution=block:block ./hello-MPI your-name > MPI-${NODES}nodes-${CORES}cores-${THREADS}threads.${SLURM_JOBID}.out
srun --hint=nomultithread --distribution=block:block ./hello-MPI YOUR-NAME-HERE > MPI-${NODES}nodes-${CORES}cores-${THREADS}threads.${SLURM_JOBID}.out
echo "job complete"
```

Place this bash code into a a file called `Hello_MPI_Slurm.sh` in the same directory as the previous code and replace `YOUR_NAME_HERE` with your own input.

To submit this job run,

```
sbatch Hello_MPI_Slurm.sh
```

This should return two files as output,

```
- The first file name begins with `MPI-...` is the log file from the job and contains a message produced by the code at run time.
- The second file name begins with `slurm` is the output from the script used to submit the job.
Original file line number Diff line number Diff line change
Expand Up @@ -14,14 +14,20 @@
#SBATCH --qos=standard
# Set the number of threads to the CPUs per task
export SRUN_CPUS_PER_TASK=$SLURM_CPUS_PER_TASK
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
NODES=$SLURM_JOB_NUM_NODES
CORES=$((NODES*128))
THREADS=$OMP_NUM_THREADS
export OMP_PLACES=cores
echo "job start"
# Launch the parallel job
srun --hint=nomultithread --distribution=block:block ./hello-MPI your-name > MPI-${NODES}nodes-${CORES}cores-${THREADS}threads.${SLURM_JOBID}.out
srun --hint=nomultithread --distribution=block:block ./hello-MPI YOUR-NAME-HERE > MPI-${NODES}nodes-${CORES}cores-${THREADS}threads.${SLURM_JOBID}.out
echo "job complete"
```
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@


```
cc helloWorldSerial.c -o hello-SER
```
Original file line number Diff line number Diff line change
Expand Up @@ -14,14 +14,35 @@
#SBATCH --qos=standard
# Set the number of threads to the CPUs per task
export SRUN_CPUS_PER_TASK=$SLURM_CPUS_PER_TASK
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
NODES=$SLURM_JOB_NUM_NODES
CORES=$((NODES*128))
THREADS=$OMP_NUM_THREADS
export OMP_PLACES=cores
echo "job start"
# Launch the parallel job
srun --hint=nomultithread --distribution=block:block ./hello-SER your-name > SERIAL-${NODES}nodes-${CORES}cores-${THREADS}threads.${SLURM_JOBID}.out
srun --hint=nomultithread --distribution=block:block ./hello-SER YOUR_NAME_HERE > SERIAL-${NODES}nodes-${CORES}cores-${THREADS}threads.${SLURM_JOBID}.out
echo "job complete"
```

Place this bash code into a a file called `Hello_Serial_Slurm.sh` in the same directory as the previous code and replace `YOUR_NAME_HERE` with your own input.

To submit this job run,

```
sbatch Hello_Serial_Slurm.sh
```

This should return two files as output,

- The first file name begins with `SERIAL-...` is the log file from the job and contains a message produced by the code at run time.
- The second file name begins with `slurm` is the output from the script used to submit the job.

```
Have a look in both files and identify the source of the different messages.
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@

```
cc helloWorldThreaded.c -fopenmp -o hello-THRD
```

Where `-fopenmp` is a flag that tells the compiler that we are using openmp a library that allows up to write threaded code.
Loading

0 comments on commit 8272049

Please sign in to comment.