# CIP203 - Maximizing GPU usage with MIGs, MPS, and Time-Slicing: 
## Time-Slicing

**Questions**
* What is Time-Slicing technology  ?

**Objectives**
* Get familiarized with the concept of NVIDIA Time-slicing
* Learn the difference between MIG, MPS, and Time-slicing

### What is Time-Slicing ?

**Definition**
- Time-slicing is a software-based technique that allows GPU compute resources to be shared across multiple virtual GPUs (vGPUs). With time-slicing, a physical GPU can only execute one vGPU task at a time. The GPU scheduler assigns each vGPU a slice of time in which it can execute, with other vGPUs waiting in a queue for their turn.
- This method allows for the concurrent processing of multiple tasks by sharing the GPU’s computational power in time-sequenced intervals.
- Time-slicing is a GPU utilization technique designed to efficiently manage GPU over-subscription. It incorporates CUDA time-slicing, thereby enabling over-subscribed workloads to interleave within the GPU and take turns utilizing GPU resources.  


**Mechanism**
- The GPU's resources are shared by rapidly switching between different tasks, giving each one a short period to execute before moving on to the next.

**Purpose**
- It provides access to GPU resources for workloads that require intermittent access or for environments with many tasks that need to share the GPU, similar to how an operating system time-slices CPU access between processes. 

### Characteristics of Time-Slicing:

- No hardware partitioning: All jobs share the same GPU memory and compute resources without dedicated isolation.
- Higher user density: Supports many users by quickly switching between jobs.
- Limited isolation: Workloads can impact each other through memory contention or delayed scheduling.
- Use Case: Suitable for bursty, low-priority tasks or general-purpose GPU access where absolute performance isolation is unnecessary.
- Time-slicing can also extend GPU sharing to older generations that do not support MIG.

![alt text](./images/time-slicing-strict.png "Title")

### What's the most obvious example of time-slicing ? 

It's when multiple Python threads (e.g. in PyTorch) target the same GPU. Here the GPU scheduler execute different tasks from the same CUDA context (as they originate from the same application). However, without MPS those thread calls will be serialized (aka time-sliced) without having any concurency 

### When is Time-slicing mostly applicable

1. vGPU (Virtual GPU) environments
2. Kubernetes clusters

## Key Points

* **What is Time-Slicing**
* **Why use Time-Slicing**
* **Who profits from using Time-Slicing**
* **Multiple Python threads running on the same GPU are also Time-Sliced**
* **Time-Slicing is mostly used when multiple containers share the same GPU (Kubernetes)**