# Machine Learning on Compute Canada Systems

Now that we've seem how to do Machine Learning with scikit-learn and PyTorch, next we'll look at the best practices for running ML code on Compute Canada Systems. This tutorial will be divided in two parts, namely:

* Developing/testing/debugging your code

* Resources: Do you really need a GPU?


## Developing/testing/debugging your code

So you are just getting started exploring a new dataset, testing out different algorithms/model architectures and trying to figure out how you will predict some variable Y from a set of variables X. 

We recommend that, unless you are exploring a really huge dataset, you should do the first steps outilned in the **scikit-learn** notebook on your own computer, or on an interactive session on Compute Canada with very low resource requests. You generally won't need more than a single CPU and a few GBs of memory to carry out some exploratory analysis, and maybe try out some simple algorithms to establish a performance baseline that you will try to improve on with more complex ones.

In fact, the same applies for more complicated models - unless you are trying to develop a HUGE model (as in, the model itself is so big that it does not fit in most laptops' memory), you should do development on your own computer, or an interactive session on Compute Canada with low resource requests. By "doing development" we mean the process figuring out the steps of work your code needs to perform, writing the (serial) code and making sure that, given a small input dataset, your code runs without errors.

Once your code runs without errors on a small dataset, you are ready to start thinking about tweaking its performance, parallelizing work and/or preparing to crunch large amounts of data on a Compute Canada cluster.

Before we do that however, let's look into what are the best practices if you decide to do your development on an interactive session on a Compute Canada cluster.

### Developing/Testing/Debugging on an Interactive Session

The first step is to log into a compute canada cluster with:
```shell

ssh -Y username@clustername.computecanada.ca

```

Here is what you should see after logging in:

![](./images/beluga.png)

This location is known as the **Login Node**: it is your gateway to the cluster and it is **NOT** where you should run your machine learning code! Instead, we will request **Compute Resources** on the actual cluster to do our work. In this example, imagine we are dealing with a very small dataset like the Iris dataset - so we will request a single CPU and 4 GB memory for an hour with:

```shell

salloc --time=1:0:0 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=4096M

```

Here is what you will see after running this:

![](./images/salloc.png)

Now you are on a **Compute Node** - this is the ideal place to do your work!

The first thing we will do is setup a **Python Virtual Environment**. With a **Python Virtual Environment** you will be able to install your own Python libraries to use in your code. To get this done, run the following command:

```shell

module load python/3.6

virtualenv --no-download $SLURM_TMPDIR/my_virtual_environment

```

Here is what you should see after running this command:

![](./images/virtualenv.png)

What we did here is: load Python as a module, so we are able to run the command ```python``` on the command line and execute python scripts; then we created a virtual environment at a special location called **$SLURM_TMPDIR**. This special location allows you to write to the local disk of the **Compute Node**. We will also write our data to this location later on.

Now that we've created our **Python Virtual Environment** we are ready to install our libraries and start writing our code. Here is how you do that:

First, activate your **Python Virtual Environment:**

```shell
source $SLURM_TMPDIR/my_virtual_environment/bin/activate
```

Here's what you should see after activating your virtual environment:

![](./images/start_venv.png)

Note how the name of your virtual environment is now showing up on the left-hand side of the shell console. This means your are now inside your virtual environment.

Then you can install your libraries with:

```shell
pip install --no-index <libraryname>
```

Notice the use of the ```--no-index``` option. If you had typed simply ```pip install <libraryname>``` then ```pip``` will search for your library on the internet. **Compute Nodes** are not connected to the internet however, so this command will fail. Adding the ```--no-index``` option, tells ```pip``` to not search for your library on the internet and search for it on Compute Canada's internal repository of Python libraries (called a Wheelhouse) instead.

If the library you are attempting to install does not exist on Compute Canada's wheelhouse, write to **support@computecanada.ca** and let us know. If you are in a hurry, you can follow the steps above to create a virtual environment on the **Login Node** (i.e., before running the ```salloc``` step), where there is a connection to the internet and you should be able to install your library with ```pip install <libraryname>```. Then once on a **Compute Node**, activate the virtual environment and try to use your library normally.

Once your libraries have been installed, you can copy your data from wherever it is located (we are assuming you have already loaded your data on the **Login Node** either in /home, /project or /scratch) to ```$SLURM_TMPDIR``` with:

```shell
mkdir $SLURM_TMPDIR/work
cp /path/to/dataset $SLURM_TMPDIR/work
```

Copying your data to the local disk on the **Compute Node** is an especially good idea if you are developing on an image dataset. These datasets are usually made up of a large number of small files, even when the total size of the dataset on disk is small. Moving them to the local disk will make loading them from inside your code a lot more efficient and it is a good habit to acquire for when you will start running experiments on actual huge datasets!

## Developing your code on JupyterHub

Another, presumably easier, option to use Compute Canada to develop your ML code is **JupyterHub on Beluga**. You can access this at: <a href=https://jupyterhub.beluga.computecanada.ca>https://jupyterhub.beluga.computecanada.ca<a>
    
Use your Compute Canada credentials to log in:
    
![](./images/JH_beluga.png)
    

Then keep your **resource requests low**... and **NO GPUs** at this point!
    
![](./images/JH_options.png) 
    
    
To install your libraries on a JupyterHub session, run a cell with:
    
```python
   !pip install --no-index <libraryname> 
```
    
Then restart the Python kernel and you will be able to import the libraries you just installed.
    
Before you start working, open a terminal tab on JupyterHub by clicking on "file > new > terminal" and follow the steps outlined above to move your data to the local disk of the **Compute Node** where your notebook will be running.
    
Now that you've seen options to develop your code, let's turn to evaluating whether or not your code would benefit from requesting a larger amount of resources! 
    
## Resources: GPU or no GPU?
    
So you've finished writing your code and you've made sure it runs without errors on a small subset of your potentially large dataset. Now it is time to look into making your code more performant and asking whether or not you can achieve this by making more resources available to it!
    
Let's start by answering the question "Do you need a GPU?"
    
As we've warned in the **PyTorch** notebook, sometimes attempting to use one or more GPUs can introduce overhead that results in your code taking *more* time, not *less*, to run. That happens whenever you try to use a GPU to train:
    
    a) a model that is so small that the time it would take to get it loaded on a GPU and then train it is greater than the time a CPU can would take to train it. 
    
    b) models using small datasets.
    
Here are some quick rules-of-thumb to check if your work falls in one of these two categories:
    
    a) your model has less than several hundred thousand trainable parameters
    
    b) the examples in your dataset, when combined into your desired batch size (or even the entire dataset), add up to just a few MBs.
    
If you think your code falls in one of these categories, then you should not use a GPU.
        
That being said, these are just rules-of-thumb. The best way to verify whether or not your code could benefit from a GPU is to profile it. Let's look at an example profiling our code from the **PyTorch** notebook that trains a neural network on the Iris Dataset to see what that means:
    
    
First we use the ```%%timeit``` decorator to get the average execution time of the code block below using a GPU:
    
![](./images/profiling_1.png) 
    
    
Then we time the execution of the same block using only the CPU:
    
![](./images/profiling_2.png) 
    
    
As you can see, in our case training on the CPU is actually faster than on a GPU. This is a strong indication that this model will not benefit from using a GPU.
    
Another test you can run to decide whether or not your code would benefit from a GPU is shown below, using the ```profiler``` method to compute how much time in total both the CPU and the GPU spend running your code:
    
![](./images/profiling_3.png) 
    

The output of the profiler is a potentially very long table describing the CPU and GPU times of each line of your code. The result we are interested in is at the very bottom:
    
![](./images/profiling_out.png) 
    
    
This tells us that the CPU and the GPU spend a similar amount of time doing work on our code. Ideally, we would like to see most of the time being spent on the GPU as this would mean it is doing most of the work. In our case, the similar times are another indication that this code would not use a GPU efficiently.
    
Finally, you can check GPU utilisation by your code by opening a terminal tab on JupyterHub and running the ```nvidia-smi``` command *while your Python code is running on the other tab* :
    
![](./images/nvidia-smi.png) 

This shows us the GPU utilisation is only around 19%, with about only 1 out of 16GB of its memory in use. This is another indication that our code does not use a GPU efficiently.

In summary, you should run these three tests using a **small sample from your dataset** to get a sense of whether or not you should consider using a GPU when running your code on your full dataset. Consider using a GPU if:
    
    a) Execution time is **at the very least** more than 2x faster on GPU than on CPU.
    
    b) GPU time is **considerably longer** than CPU time.
    
    c) Utilisation % and/or memory usage are high.
    
    
And if your model is big, but efficiency is still not great based on the tests above, try tweaking your code to increase GPU efficiency and increase your performance gains. You can try:
    
    * Increasing the batch size.
    
    * Increasing the number of workers on your DataLoader (the number of CPUs you request for this should be 1 more than the number of workers you wish to use on your DataLoader).
    
    * Move your data to $SLURM_TMPDIR in case you forgot to do it
    

The reasoning for deciding whether or not to use multiple GPUs is similar and you can run the tests above on an interactive session where you require multiple GPUs. **Make sure your code is GPU efficient on ONE GPU before moving to trying it on multiple GPUs!**

