# Run Single GPU-Based PyTorch Experiments on Azure Machine Learning Service

## Demo Data: Dog Breed Classification

Have you ever seen a dog and not been able to tell the breed? Some dogs look so similar, that it can be nearly impossible to tell. For instance these are a few breeds that are difficult to tell apart:

#### Alaskan Malamutes vs Siberian Huskies
![Image of Alaskan Malamute vs Siberian Husky](http://cdn.akc.org/content/article-body-image/malamutehusky.jpg)

#### Whippet vs Italian Greyhound 
![Image of Whippet vs Italian Greyhound](http://cdn.akc.org/content/article-body-image/whippetitalian.jpg)

There are sites like http://what-dog.net, which use Microsoft Cognitive Services to be able to make this easier. 

In this tutorial, you will learn how to train a Pytorch image classification model using transfer learning with the Azure Machine Learning service. The Azure Machine Learning python SDK's [PyTorch estimator](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-train-pytorch) enables you to easily submit PyTorch training jobs for both single-node and distributed runs on Azure compute. The model is trained to classify dog breeds using the [Stanford Dog dataset](http://vision.stanford.edu/aditya86/ImageNetDogs/) and it is based on a pretrained ResNet18 model. This ResNet18 model has been built using images and annotation from ImageNet. The Stanford Dog dataset contains 120 classes (i.e. dog breeds), to save time however, for most of the tutorial, we will only use a subset of this dataset which includes only 10 dog breeds.

## What is Azure Machine Learning service?
Azure Machine Learning service is a cloud service that you can use to develop and deploy machine learning models. Using Azure Machine Learning service, you can track your models as you build, train, deploy, and manage them, all at the broad scale that the cloud provides.
![](img/aml-overview.png)


## How can we use it for training image classification models?
Training machine learning models, particularly deep neural networks, is often a time- and compute-intensive task. Once you've finished writing your training script and running on a small subset of data on your local machine, you will likely want to scale up your workload.

To facilitate training, the Azure Machine Learning Python SDK provides a high-level abstraction, the estimator class, which allows users to easily train their models in the Azure ecosystem. You can create and use an Estimator object to submit any training code you want to run on remote compute, whether it's a single-node run or distributed training across a GPU cluster. For PyTorch and TensorFlow jobs, Azure Machine Learning also provides respective custom PyTorch and TensorFlow estimators to simplify using these frameworks.

## Prerequisites:

1. Create an Azure Machine Learning Service (AMLS) workspace in a new Resource Group in Azure. This will also create some other resources, including a Storage Account, which you will use to house the data.
2. Create a Notebook VM in the AMLS workspace to run this Jupyter Notebook. (_Note: You can create a small Notebook VM without a GPU as you will send your experiments to a remote GPU cluster._)
3. Clone this tutorial's git repository onto your Notebook VM.

## Setup

1. Download the [dogbreeds data](https://github.com/heatherbshapiro/pycon-canada/) to your local machine. (_Note: For this tutorial, you only need the data in the `breeds-10` folder_)
2. Navigate to the datastore (Blob Storage Account) that was created with your AMLS workspace. It will be in the same Resource Group as your AMLS workspace and will have the name of your AMLS workspace and then some additional letters after it in its name. (_For example: If my AMLS workspace is called "myamls", then my storage account might be named something like "myamls6357236127"._)
3. Within this Storage Account, navigate to the Blobs, then to the container which name begins with "code-\*". (_For example: the container might be named something like "code-b0789435-d8ce-459c-8421-e1f8cea47210"._) This is the mounted storage that your Notebook VM is using as its storage.
4. Navigate to the `Users` folder and upload the entire `breeds-10` folder that you previously downloaded. This will take quite some time as it is uploading ~1700 files.

## Let's Begin!


### Create a Workspace for the Experiment
(_You will be asked to login during this step. Please use your Microsoft AAD credentials._)

In [None]:
from azureml.core import Workspace

ws = Workspace.from_config()

print('https://ms.portal.azure.com/#@microsoft.onmicrosoft.com/resource' + ws.get_details()['id'])

This will take a few minutes, so let's talk about what a [Workspace](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#workspace) is while it is being created.
![](aml-workspace.png)

### Create an `outputs` directory for the model outputs.

In [None]:
!mkdir outputs

## Create a Remote Compute Target
For this tutorial, we will create an AML Compute cluster with NC series, GPU-based machines, created for use as the [compute target](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#compute-target) on which to execute your training script. 

**Creation of the cluster takes approximately 5 minutes, but you do not have to wait for it to complete before proceeding**

If the cluster is already in your workspace, this code will skip the cluster creation process. Note that the code is not waiting for completion of the cluster creation.

In [None]:
from azureml.core.compute import AmlCompute, ComputeTarget

# choose a name for your cluster
cluster_name = "k80cluster"

try:
    compute_target = ws.compute_targets[cluster_name]
    print('Found existing compute target.')
except KeyError:
    print('Creating a new compute target...')
    compute_config = AmlCompute.provisioning_configuration(vm_size='Standard_NC6', 
                                                           idle_seconds_before_scaledown=1800,
                                                           min_nodes=0, 
                                                           max_nodes=4)


    # create the cluster
    compute_target = ComputeTarget.create(ws, cluster_name, compute_config)
    compute_target.wait_for_completion(show_output=True)

__Note: You can create clusters with better GPUs. Simply replace the `vm_size` parameter above with one of the following.__

|        VM Size        |         CPU Type         | Number of vCPUs | Memory (GB) |      GPU Type     | Number of GPUs | GPU Memory (GB) |
|:------------------:|:------------------------:|:---------------:|:-----------:|:-----------------:|:--------------:|:---------------:|
| Standard_NC6       | Intel Xeon E5-2690 v3    | 6               | 56          | NVIDIA Tesla K80  | 1              | 12              |
| Standard_NC12      | Intel Xeon E5-2690 v3    | 12              | 112         | NVIDIA Tesla K81  | 2              | 24              |
| Standard_NC24      | Intel Xeon E5-2690 v3    | 24              | 224         | NVIDIA Tesla K82  | 4              | 48              |
| Standard_NC24r     | Intel Xeon E5-2690 v3    | 24              | 224         | NVIDIA Tesla K83  | 4              | 48              |
| Standard_NC6s_v2   | Intel Xeon E5-2690 v4    | 6               | 112         | NVIDIA Tesla P100 | 1              | 16              |
| Standard_NC12s_v2  | Intel Xeon E5-2690 v4    | 12              | 224         | NVIDIA Tesla P101 | 2              | 32              |
| Standard_NC24s_v2  | Intel Xeon E5-2690 v4    | 24              | 448         | NVIDIA Tesla P102 | 4              | 64              |
| Standard_NC24rs_v2 | Intel Xeon E5-2690 v4    | 24              | 448         | NVIDIA Tesla P103 | 4              | 64              |
| Standard_NC6s_v3   | Intel Xeon E5-2690 v4    | 6               | 112         | NVIDIA Tesla V100 | 1              | 16              |
| Standard_NC12s_v3  | Intel Xeon E5-2690 v4    | 12              | 224         | NVIDIA Tesla V100 | 2              | 32              |
| Standard_NC24s_v3  | Intel Xeon E5-2690 v4    | 24              | 448         | NVIDIA Tesla V100 | 4              | 64              |
| Standard_NC24rs_v3 | Intel Xeon E5-2690 v4    | 24              | 448         | NVIDIA Tesla V100 | 4              | 64              |
| Standard_ND40s_v2  | Intel Xeon Platinum 8168 | 40              | 672         | NVIDIA Tesla V100 | 8              | 128             |
| Standard_ND6s      | Intel Xeon E5-2690 v4    | 6               | 112         | NVIDIA Tesla P40  | 1              | 24              |
| Standard_ND12s     | Intel Xeon E5-2690 v4    | 12              | 224         | NVIDIA Tesla P40  | 2              | 48              |
| Standard_ND24s     | Intel Xeon E5-2690 v4    | 24              | 448         | NVIDIA Tesla P40  | 4              | 96              |
| Standard_ND24rs    | Intel Xeon E5-2690 v4    | 24              | 448         | NVIDIA Tesla P40  | 4              | 96              |
| Standard_NV6       | Intel Xeon E5-2690 v3    | 6               | 56          | NVIDIA Tesla M60  | 1              | 8               |
| Standard_NV12      | Intel Xeon E5-2690 v3    | 12              | 112         | NVIDIA Tesla M60  | 2              | 16              |
| Standard_NV24      | Intel Xeon E5-2690 v3    | 24              | 224         | NVIDIA Tesla M60  | 4              | 32              |
| Standard_NV12s_v3  | Intel Xeon E5-2690 v4    | 12              | 112         | NVIDIA Tesla M60  | 1              | 8               |
| Standard_NV24s_v3  | Intel Xeon E5-2690 v4    | 24              | 224         | NVIDIA Tesla M60  | 2              | 16              |
| Standard_NV48s_v3  | Intel Xeon E5-2690 v4    | 48              | 448         | NVIDIA Tesla M60  | 4              | 32              |

For more information, see [GPU optimized virtual machine sizes](https://docs.microsoft.com/en-us/azure/virtual-machines/windows/sizes-gpu).

## Attach the blobstore with the training data to the workspace
While the cluster is still creating, let's attach some data to our workspace.

The dataset we will use consists of ~150 images per class. Some breeds have more, while others have less. Each class has about 100 training images each for dog breeds, with ~50 validation images for each class. We will look at 10 classes in this tutorial.

To make the data accessible for remote training, you will need to keep the data in the cloud. AML provides a convenient way to do so via a [Datastore](https://docs.microsoft.com/azure/machine-learning/service/how-to-access-data). The datastore provides a mechanism for you to upload/download data, and interact with it from your remote compute targets. It is an abstraction over Azure Storage. The datastore can reference either an Azure Blob container or Azure file share as the underlying storage. 

You can view the subset of the data used [here](https://github.com/heatherbshapiro/pycon-canada/tree/master/breeds-10). Or download it from [here](https://github.com/heatherbshapiro/pycon-canada/master/breeds-10.zip) as a zip file. 

We already copied the data to an Azure blob storage container. To attach this blob container as a data store to your workspace, you use the following function:

In [None]:
from azureml.core import Datastore

ds = Datastore(ws, 'workspaceblobstore')

path_on_datastore = 'breeds-10'
ds_data = ds.path(path_on_datastore)
print(ds_data)

## Upload the Data

If you are interested in downloading the data locally, you can run `ds.download(".", 'breeds-10')`. This might take several minutes.

In [None]:
ds.upload('breeds-10', 'breeds-10')

In [None]:
!mkdir outputs

## Train model on the remote compute
Now that you have your data and training script prepared, you are ready to train on your remote compute cluster. You can take advantage of Azure compute to leverage GPUs to cut down your training time. 

### Create an experiment
Create an [Experiment](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#experiment) to track all the runs in your workspace for this transfer learning PyTorch tutorial. 

In [None]:
from azureml.core import Experiment

experiment_name = 'pytorch-dogs' 
experiment = Experiment(ws, name=experiment_name)

### Create a PyTorch estimator
The AML SDK's PyTorch estimator enables you to easily submit PyTorch training jobs for both single-node and distributed runs. For more information on the PyTorch estimator, refer [here](https://docs.microsoft.com/azure/machine-learning/service/how-to-train-pytorch). The following code will define a single-node PyTorch job.

In [None]:
from azureml.train.dnn import PyTorch

script_params = {
    '--data_dir': ds_data.as_mount(),
    '--num_epochs': 10,
    '--output_dir': './outputs',
    '--log_dir': './logs',
    '--mode': 'fine_tune'
}

estimator10 = PyTorch(source_directory='.', 
                    script_params=script_params,
                    compute_target=compute_target, 
                    entry_script='pytorch_train.py',
                    pip_packages=['tensorboardX'],
                    use_gpu=True)


You can also see the configuration for this experiment:

In [None]:
## See the Docker image that's being used
print(estimator10.run_config.environment.docker.base_image)

## See all the depenedencies for this environment
print(estimator10.conda_dependencies.serialize_to_string())

The `script_params` parameter is a dictionary containing the command-line arguments to your training script `entry_script`.

Please note the following:
- We passed our training data reference `ds_data` to our script's `--data_dir` argument. This will 1) mount our datastore on the remote compute and 2) provide the path to the training data `breeds` on our datastore.
- We specified the output directory as `./outputs`. The `outputs` directory is specially treated by AML in that all the content in this directory gets uploaded to your workspace as part of your run history. The files written to this directory are therefore accessible even once your remote run is over. In this tutorial, we will save our trained model to this output directory.

To leverage the Azure VM's GPU for training, we set `use_gpu=True`.

## Submit the Experiment to Run

In [None]:
run = experiment.submit(estimator10)

In [None]:
from azureml.widgets import RunDetails
RunDetails(run).show()

### What happens during a run?
If you are running this for the first time, the compute target will need to pull the docker image, which will take about 2 minutes. This gives us the time to go over how a **Run** is executed in Azure Machine Learning. 

Note: had we not created the workspace with an existing ACR, we would have also had to wait for the image creation to be performed -- that takes and extra 10-20 minutes for big GPU images like this one. This is a one-time cost for a given python configuration, and subsequent runs will then be faster. We are working on ways to make this image creation faster.

![](aml-run.png)