# CIP203 - Maximizing GPU usage with MIGs, MPS, and Time-Slicing: 
## Nvidia MIGs

**Questions**
* How to speed-up my code with the use of MIGs ?

**Objectives**
* Get familiarized with the concept of NVIDIA MIG
* Learn how to change your submission script to request MIG instead of full GPU
* Practice running examples on MIG instances and do bechmarking 

### What is MIG ?

Multi-Instance GPU (MIG) is a technology that allows partitioning of a single GPU into multiple smaller, isolated GPU instances. Now a single GPU can be shared between different jobs and users. 

### Some features:
- GPUs are securely partitioned into up to 7 instances
- guaranteed resource allocation (compute, memory, cache)
- predictable performance for workloads like AI training, inference, and HPC
- Each instance gets a unique and dedicated set of hardware resources
- Users can choose MIG instances of various sizes to match the specific requirements of different workloads, optimizing resource allocation

![alt text](./images/gpu-mig-overview.jpg "Title")

#### Any MIG capable GPU starting with A100 can be split into 7 physically discrete instances:
- memory is split into 8 equal size segments
- compute units are also split into 8 segment, but only 7 are available to MIGs
- this implies the overhead of 10% of compute power

### Available MIG configurations

While there are many possible MIG configurations and profiles, the supported profiles are system dependant. For example, for A100 GPU the list of MIG configurations is the following:

![alt text](./images/A100-migs.png "Title")

The profile name describes the size of the instance. For example, a 3g.20gb instance has 20 GB of GPU memory and offers 3/8th of the computing performance of a full GPU. To list all the flavours of MIGs (plus the full size GPU names) available on a given cluster, one can run the following command:

In [1]:
! sinfo -o "%G"|grep gpu|sed 's/gpu://g'|sed 's/,/\n/g'|cut -d: -f1|sort|uniq

1g.5gb
3g.20gb


### What changes are required to switch to using MIGs ?

You don't have to change anything in your code. You only need to request a MIG instead of a full body GPU. For example, a request for the interactive job will look like this:

salloc --account=def-someuser --gpus=<font color='red'>**3g.20gb**</font>:1 --cpus-per-task=2 --mem=40gb --time=1:0:0

Pay attention to the profile name here <font color='red'>**3g.20gb**</font>. It may be different on other clusters. For example, on Narval the batch submission script with MIG request will look like this:

![alt text](./images/mig-script-narval.png "Title")

### Why should I use MIG instead of a full GPU ? 

- using GPU instances is less wasteful
- your usage is billed accordingly
- jobs submitted on such instances use less of your allocated priority compared to a full GPU
- you will then be able to execute more jobs and have shorter wait time

### How do I know whether I should choose MIG or a full GPU ?

- Jobs that use less than half of the computing power of a full GPU and less than half of the available memory should be evaluated and tested on an instance. 

### After migrating to MIGs: test your code to make sure it performs well

You need to test your code by running different MIG profiles:
1. Check how much GPU memory your code requires
2. Run your code with the smallest possible MIG profile
3. If your code fails due to out-of-memory errror, then do step 4
4. Run your code using a bigger MIG profile with more GPU memory
5. 

### Exercise 3: Testing matrix multiplication code on different MIG profiles

Our virtual machine (VM) is built with Magic Castle and has the following MIG profiles available for our tests:<br>
<font color='red'>**1g.5gb**</font><br>
<font color='red'>**3g.20gb**</font>

We will be using a Terminal to modify the PyTorch code and submit sbatch jobs.<br>
1. To open a Termimal do File->New Launcher->Terminal
2. In the terminal go to ~/cq-formation-cip203-main
3. Then open the file matmul-mig.py and modify it if needed
4. Then open the submission script submit-mig.sh and modify it if needed

For <font color='red'>**1g.5gb**</font> profile please run the script on the node directly:<br>
python ./matmul-mig.py

For <font color='red'>**3g.20gb**</font> profile submit the sbatch job:<br>
sbatch ./submit-mig.sh <br>

The result is written to output file. 

#### Tests to be performed:
1. Change matrix size to make problem more or less GPU memory hungry
2. See if your code runs on 1g.5gb profile. If not, try 3g.30gb. If not, then you may need the full GPU.
3. Execute the same problem on 1g.5gb and 3g.20gb proiles. Notice the difference in execution time. If execution time is about the same, then 1g.5g is all you need.

## Key Points

* **What is MIG**
* **MIG: full physical isolation**
* **Who profits from using MIGs**
* **Testing code for better MIG configuration**