# Train a PyTorch Classification Model with Azure ML and AML Compute
## Using zip file and Azure ML Dataset as data source


Here we have a driver notebook that uses Azure ML Python SDK to create AmlCompute (compute cluster for training) and a PyTorch estimator to tell Azure ML where to find the right resources and how to train.  The dataset used here for image classification with deep learning is a small subset of the <a href="https://homepages.inf.ed.ac.uk/rbf/CAVIARDATA1/" target="_blank">CAVIAR dataset</a>.  For better accuracy with this process it is recommended to use a larger dataset, however for demo purposes, the dataset used here is small.

IMPORTANT:
* Please use the **"Python 3.6 - Azure ML" kernel** on the DSVM for this notebook or install appropriate library versions below (to change a kernel go to Kernel in the menu bar and select "Change kernel").
* You will need your `config.json` from your Azure ML Workspace in the **same folder** as this notebook and interactive login will be performed later on so be prepared with your Azure login info.  You may wish to work with these notebooks in an **Incognito or Private browser window** in case you have other Azure accounts.
* See `alternative` folder for the alternative way of connecting to data with a DataStore wrapping Azure Blob Storage.

## Imports

Tip1:  To ensure that the packages are installed into the correct kernel for a notebook we use `{sys.prefix}` (or `{sys.executable} -m`, alternatively) for the path to the correct Python environment.
<br><br>
Tip 2:  It's a good idea to make sure the releases of `torch` and `torchvision` match (you can check on https://pypi.org/ for the release dates).

In [None]:
# Set warnings and logging level
import warnings
warnings.filterwarnings('ignore')

import logging
logger = logging.getLogger()
logger.setLevel(logging.CRITICAL)

# Install packages
import sys

# Install/upgrade the Azure ML SDK using: 
# pip and the correct Python kernel with sys.executable
! {sys.executable} -m pip install -q --upgrade azureml-sdk[notebooks,automl,contrib]==1.5.0
# ! {sys.prefix}/bin/pip install matplotlib

## If running locally or need to upgrade PyTorch - install PyTorch and Torchvision with
! {sys.prefix}/bin/pip install -q --upgrade torch==1.3 torchvision==0.4.1

In [None]:
from azureml.core import Workspace, Experiment, Datastore, Model
from azureml.exceptions import ProjectSystemException
from azureml.core.compute import ComputeTarget, AmlCompute
from azureml.core.compute_target import ComputeTargetException
from azureml.core import Dataset
from azureml.train.dnn import PyTorch
from azureml.widgets import RunDetails

from torchvision import transforms, datasets

import shutil
import os
import json
import time

In [None]:
# Check core SDK version number
import azureml.core
import torch

print("SDK version: ", azureml.core.VERSION)
print("PyTorch version: ", torch.__version__)

In [None]:
# Give your experiment a suffix
my_nickname = ***

## Diagnostics
Opt-in diagnostics for better experience, quality, and security of future releases.

In [None]:
from azureml.telemetry import set_diagnostics_collection

set_diagnostics_collection(send_diagnostics=True)

## Initialize workspace
Initialize a [Workspace](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#workspace) object from the existing workspace you created in the Prerequisites. `Workspace.from_config()` creates a workspace object from the details stored in `config.json`.  This will trigger an interactive login so please follow the instructions after running this cell.

In [None]:
from azureml.core.workspace import Workspace

ws = Workspace.from_config(path='config.json')

## Create or Attach existing AmlCompute
You will need to create a [compute target](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#compute-target) for training your model. In this tutorial, we use Azure ML managed compute ([AmlCompute](https://docs.microsoft.com/azure/machine-learning/service/how-to-set-up-training-targets#amlcompute)) for our remote training compute resource.

**Creation of AmlCompute could take several minutes.** If the AmlCompute with that name is already in your workspace, this code will skip the creation process.

As with other Azure services, there are limits on certain resources (e.g. AmlCompute) associated with the Azure Machine Learning service. Please read [this article](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-manage-quotas) on the default limits and how to request more quota.

In [None]:
# choose a name for your cluster - under 16 characters
cluster_name = "gpuforpytorch"

try:
    compute_target = ComputeTarget(workspace=ws, name=cluster_name)
    print('Found existing compute target.')
except ComputeTargetException:
    print('Creating a new compute target...')
    # AML Compute config - if max_nodes are set, it becomes persistent/dedicated resource that scales
    # Set min_nodes to 0 so that it scales down to 0 VMs when not is use to avoid incurring costs
    compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_NC6',
                                                        min_nodes=0,
                                                        max_nodes=1)
    # create the cluster
    compute_target = ComputeTarget.create(ws, cluster_name, compute_config)
    compute_target.wait_for_completion(show_output=True)

Check the provisioning status of the cluster.

In [None]:
# use get_status() to get a detailed status for the current cluster. 
print(compute_target.get_status().serialize())

## Create the project directory

The location here, where we are running this notebook, can be thought of as a local machine.  The compute target just created is where the training will actually take place.  Here, we create a `project` folder.  All of the contents of this folder are special in that they are uploaded to the target compute when we begin a run of this experiment so we definitely want our training script uploaded.

In [None]:
# Create a project directory and copy training script to it
project_folder = os.path.join(os.getcwd(), 'project')
os.makedirs(project_folder, exist_ok=True)
shutil.copy(os.path.join(os.getcwd(), 'pytorch_train_transfer_dataset.py'), project_folder)

## Create an experiment

Create an [Experiment](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#experiment) to track all the runs in your workspace for this transfer learning PyTorch tutorial.

Think of an experiment like a scenario such as "finding images of people fighting in CCTV feeds".  An experiment usually will have many "runs" which could entail updates to the data, hyperparameters, training code itself, and other optimizations.

In [None]:
# Create an experiment
experiment_name = 'suspicious-behavior-'+my_nickname
experiment = Experiment(ws, name=experiment_name)

## Set up Dataset

The data source is a subset of ImageNet.  It can be downloaded by clicking:  https://download.pytorch.org/tutorial/hymenoptera_data.zip.  The following steps set up the Dataset from the default data store in the Workspace and register it so that scripts and compute can access.

In [None]:
# Create a File Dataset from 1 URL path
url_path = ['https://github.com/harris-soh-copeland-puca/SampleFiles/raw/master/caviar_small.zip']
behavior_ds = Dataset.File.from_files(path=url_path)

# Register the dataset so that scripts and compute may access
behavior_ds = behavior_ds.register(workspace=ws,
                                 name='behavior_ds_'+my_nickname,
                                 description='Subset of CAVIAR dataset')

## Train

To train the PyTorch model we are going to use a Azure ML Estimator specific to PyTorch - see [Train models with Azure Machine Learning using estimator](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-train-ml-models) for more on Estimators.  We will use the Datastore we specified earlier which mounts the Blob Storage container to the remote compute target for training in this case.

To learn more about where read and write files in a local or remote compute see [Where to save and write files for Azure Machine Learning experiments](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-save-write-experiment-files).

In [None]:
# Set up for training ("transfer" flag means - use transfer learning and 
# this should download a model on target compute)
script_params = {
    '--data_dir': behavior_ds.as_named_input('behavior_ds_'+my_nickname).as_mount(),
    '--num_epochs': 10,
    '--learning_rate': 0.01,
    '--output_dir': './outputs',
    '--transfer': 'True'
}

In [None]:
# Instantiate PyTorch estimator with upload of final model to
# a specified blob storage container (this can be anything)
estimator = PyTorch(source_directory=project_folder, 
                    script_params=script_params,
                    compute_target=compute_target,
                    entry_script='pytorch_train_transfer_dataset.py',
                    use_gpu=True,
                    pip_packages=['matplotlib==3.1.1',
                                  'opencv-python==4.1.1.26', 
                                  'Pillow==6.2.1'],
                   framework_version='1.3')

# Submit experiment run to Azure ML
run = experiment.submit(estimator)

Check run status.  You can re-run this cell to recheck.  Training with the data provided through this notebook "as-is" will take ~30-40 minutes to complete.

In [None]:
print(run.get_details()['status'])

Check run status, interactively:

In [None]:
RunDetails(run).show()

## Register the trained model to workspace

This will allow accessibility to the model through the SDK in other runs or experiments.  It will also be available for download.

You could even have registered the model in the training script directly:

```python
model = run.register_model(model_name='pt-dnn', model_path='outputs/model_finetuned.pth')
```

Below, we register from this notebook.  After running, check the Azure Portal that the model does indeed appear registered to the Workspace.

In [None]:
## Alternatively, register within this notebook 
# (the model_path is the Azure ML workspace model path, not local)

## Get one particular Run using run id found in Azure Portal
# from azureml.core import Run
# run = Run(experiment, run_id='suspicious-behavior-...')

# Register model to Models in workspace
model = run.register_model(model_name='behavior-pytorch-'+my_nickname, model_path='outputs/model_finetuned.pth',
                          description='Squeezenet PyTorch model; 30 epochs; 0.01 LR')

## Evaluate model

- Get the test image (file:  `caviar_test_images.zip`) from the Release page for this repo on GitHub (https://github.com/Azure/Azure-AI-Camp/releases).
- Upload to the `Azure-AI-Camp/day1/2.1.ImageClassificationWithPyTorch/` folder
- Unzip the file `unzip caviar_test_images.zip` on the command line (go to "New" and select "Terminal" from drop-down in Jupyterhub)

You will need test images in a folder named `test` with the following folder structure:

```
\test
    \normal
    \suspicious
```


In [None]:
# The "data" folder is in the current working directory
data_dir = '.'

Download model from Azure ML Workspace.  Note, you can specify a `version` as well.

In [None]:
model = Model(ws, 'behavior-pytorch-'+my_nickname, version=1).download(exist_ok=True)
model = torch.load('model_finetuned.pth', map_location=torch.device('cpu'))

Data transforms with `torchvision` (similar to transforms during training minus any data augmentation).

In [None]:
data_transforms = {
    'test': transforms.Compose([
        transforms.Resize(256),
        transforms.CenterCrop(224),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
    ]),
}

PyTorch datasets and dataloaders.  Using PyTorch's `ImageFolder` to read a folder of images for classification.  It uses the names of the folders as class names.  Here, reading the `test` folder.

In [None]:
image_datasets = {x: datasets.ImageFolder(os.path.join(data_dir, x),
                                          data_transforms[x])
                  for x in ['test']}
dataloaders = {x: torch.utils.data.DataLoader(image_datasets[x], batch_size=1,
                                              shuffle=False, num_workers=0)
               for x in ['test']}
dataset_sizes = {x: len(image_datasets[x]) for x in ['test']}
class_names = image_datasets['test'].classes
print(dataset_sizes['test'])

Peform inference on test data set to evaluate.  This may take a few minutes, depending upon your CPU compute and number of test images.

In [None]:
# Iterate over data.
running_corrects = 0
for inputs, labels in dataloaders['test']:

    # Don't need to track history in the tensors
    with torch.set_grad_enabled(False):
        outputs = model(inputs)
        _, preds = torch.max(outputs, 1)
        
    # Statistics
    running_corrects += torch.sum(preds == labels.data)
    
overall_acc = running_corrects.double() / dataset_sizes['test']

If using the small CAVIAR dataset, the accuracy will be fairly low (~50%).  Try the experiment again with more data or a different dataset, also tuning hyperparameters.

In [None]:
print('Accuracy = ', overall_acc.item())

## Exercise

Update your training experiment to use hyperparameter tuning and `hyperdrive` to discover the best learning rate and number of epochs.

https://docs.microsoft.com/en-us/azure/machine-learning/how-to-tune-hyperparameters

## Extra notes


### Upload files to a Datastore and create a Dataset with the SDK

Given there is a folder called `test` with sets of images.

In [None]:
from azureml.core.workspace import Workspace
from azureml.core import Dataset
import glob

ws = Workspace.from_config(path='config.json')

datastore = ws.get_default_datastore()
# Uploads a folder of files to a folder path in the default Azure ML Blob storage
datastore.upload_files(files=glob.glob('./test/**/*.*', recursive=True),
                       target_path='caviar-small-testset/',
                       overwrite=False,
                       show_progress=False)

# Instantiates a Dataset to use in training, etc.
dataset = Dataset.File.from_files(path = [(datastore, 'caviar-small-testset/')])