# CIP203 - Maximizing GPU usage with MIGs, MPS, and Time-Slicing: 
## How to waste GPU cycles

**Questions**
* Why GPU cycles are wasted?
* Potential reasons why codes under-utilize GPUs

**Objectives**
* Be aware of the GPU usage on the clusters
* Understand the problem of GPU under-utilization
* Become motivated to further make your code more GPU efficient

### Most common cases when GPUs are wasted on the Alliance' clusters

1. Users only think that they need a GPU, not even sure whether their code supports GPU, but request it anyway
2. Users do know that their code has a GPU support, but don't know what it is, whether their problem is suitable on the GPU, but request it anyway
3. Users moved their calculations to the GPU, parallelized their code, but the GPU utilization is low due to various reasons (small data set, slow I/O,inadequate CPU performance, etc) 
4. GPU utilization is sporadic/scattered, the code is not written well, making GPUs stay idle for very long time (sometimes days)

### Why GPU cycles are wasted ?

1. **I/O bottlenecks**
   - slow data loading
   - data stalls
   - many small files
3. **CPU bottlenecks**
   - CPU-GPU communication (CPU can't keep up with the GPU's demand)
   - synchronous operations
5. **The problem is too small for GPU**

### What is the solution ?
<font color='blue'>**Increase the SIZE of the problem ?**</font>

Use highter resolution (e.g. in Computational Fluid Dynamics), larger batch size in Deep Learning, etc

According to Gustafson's law, increasing the problem size while increasing number of computing cores can result in a much better scaling

#### ... well... sometimes it's not the best solution ! <br>
Indeed, it does not make any sence to increase the size of the data set just so that the GPU runs more efficiently. In deep learning applications it may involve increasing the batch size, but often it does not make the problem any better. 

### What else can we do ?

One can fully move the calculations to the CPU cores only, thus excluding the GPUs from the setup completely. However, it often times requires users to re-write their codes (which sometimes is challenging)

### Another solution: why not  <font color='red'>**share single GPU**</font> either between the user's processes or between different users

#### Fortunately NVIDIA provides few solutions to achieve this

- CUDA streams
- NVIDIA Multi-Instance GPUs (MIGs)
- Multi-Process Service (MPS)
- Time-Slicing

## Key Points

* **GPUs are wasted due to slow I/O, small problem size, etc**
* **Increase efficiency = increase problem size ?**
* **Instead why not share GPUs**