![snap](https://media.giphy.com/media/xTiJ4cVWew0klLuY96/giphy.gif)

# Run training jobs on GPU

## What you will learn in this course 🧐🧐

As your experience in Deep Learning grows, you might need to use some other tools than Google Colab to train your models. One reason is because Google Colab might not always be free and also because GPUs running on it are the cheapest, and slowest on the market. 

In this course, we want to show you a simple way to run your training jobs on VMs that have NVIDIA GPUs installed. We will be using AWS EC2 instances to do so. You will learn:

* How to launch a specific EC2 AMI 
* Run Docker containers on GPUs 
* Run a remote training job on GPUs


## **Step-1**: Choose Amazon EC2 AMI

Let's start our tutorial. The first thing that you will need to do is to launch a specific EC2 AMI which is:

* *Deep Learning AMI (Amazon Linux 2) Version 56.0 - ami-0afac37ebdacee753*

![crack](https://full-stack-assets.s3.eu-west-3.amazonaws.com/Deployment/Deep_learning_AMI.png)

> 👋 AMI means **Amazon Machine Image**. This is simply an EC2 instance with a lot of presets that you won't have to do yourself. In the above example, you have an AMI with NVIDIA Driver, Docker etc. preinstalled. This is extremely useful because it alleviates the pain of installing this yourself! 

Follow the normal setup process (described in the course *Introducrtion to EC2*), simply **make sure that you have SSH enabled in your security group**. 

## **Step-2**: Install `mlflow`

Either you do it when setting up your EC2 in step *3. Configure Instance > User Data*, or you SSH into your instance and run: 

* `pip install mlflow` 

## **Step-3**: Run your project

Now this is where the magic begins! To run your training job on GPU, you simply need to specify a few additionnal arguments in your `mlflow run` command. Here it is: 

```bash
mlflow run GITHUB_URL\
 -A gpus=all\
 -A runtime=nvidia\
```

Both arguments `-A gpus=all` and `-A runtime=nvidia` are the ones that specifies to Docker container that it needs to look for GPU in the host machine to run. 

Now enjoy the power of GPU! 😉

![I've got the power](https://media.giphy.com/media/A9grgCQ0Dm012/giphy-downsized-large.gif)

---

## Troubleshooting 

There might be some bugs. Here are two lead to explore to fix them: 

1. Make sure that your instance has GPU already installed. 

2. Verify if Docker and GPUs work well by running:
    * `sudo docker run --rm --gpus all nvidia/cuda:11.0-base nvidia-smi`

You should see the following output:

```bash
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.51.06    Driver Version: 450.51.06    CUDA Version: 11.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla T4            On   | 00000000:00:1E.0 Off |                    0 |
| N/A   34C    P8     9W /  70W |      0MiB / 15109MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+
```

## Resources 📚📚

* [Install GPU with Docker](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html#installing-on-amazon-linux)
* [Which EC2 instances has GPUs](https://aws.amazon.com/fr/ec2/instance-types/g4/)