# Transfer Learning for Image Classification

In this notebook, we demonstrate transfer learning on the Azure ML platform. More information on Transfer learning can be found here - http://cs231n.github.io/transfer-learning/ 

General Overview of the Notebook - 

We utilize the Transfer learning code provided on the Pytorch tutorials with slight modifications to exploit Azure ML's resources. The modifications focus on pointing the correct location of the data on the Azure Storage. The dataloaders dictionary has also been modified to correctly load the training and the test data of the CIFAR100 dataset. 

In this tutorial, we create Experiments which run in the Azure Workspace. The dataset is loaded from the Datastore. We utilize the compute resources provided by Azure Machine Learning compute.  

In [1]:
#Initial set up for the Azure workspace
import numpy as np
import matplotlib.pyplot as plt

import azureml.core
from azureml.core import Workspace

# check core SDK version number
print("Azure ML SDK Version: ", azureml.core.VERSION)

Azure ML SDK Version:  1.0.10


In [2]:
#Get Pytorch from the Training library 
from azureml.train.dnn import PyTorch

In [None]:
# Load the Workspace configuration from the config file
ws = Workspace.from_config()
print(ws.name, ws.location, ws.resource_group, ws.location, sep = '\t')


In [4]:
#Create an experiment to run in the Azure platform 
experiment_name = 'cifar100-classification'

from azureml.core import Experiment
exp = Experiment(workspace=ws, name=experiment_name)

In [None]:
# Get the Datastore associated with the current workspace
ds = ws.get_default_datastore()
print(ds.datastore_type, ds.account_name, ds.container_name)

#ds.upload(src_dir='../cifar-100-python', target_path='cifar-100-python', overwrite=True, show_progress=True)

In [6]:
#Get the compute resources to perform the training
from azureml.core.compute import AmlCompute
from azureml.core.compute import ComputeTarget
import os

# choose a name for your cluster
compute_name = os.environ.get("AML_COMPUTE_CLUSTER_NAME", "gpucluster")
compute_min_nodes = os.environ.get("AML_COMPUTE_CLUSTER_MIN_NODES", 0)
compute_max_nodes = os.environ.get("AML_COMPUTE_CLUSTER_MAX_NODES", 4)

# This example uses CPU VM. For using GPU VM, set SKU to STANDARD_NC6/STANDARD_D2_V2
vm_size = os.environ.get("AML_COMPUTE_CLUSTER_STANDARD_D2_V2", "STANDARD_D2_V2")


if compute_name in ws.compute_targets:
    compute_target = ws.compute_targets[compute_name]
    if compute_target and type(compute_target) is AmlCompute:
        print('found compute target. just use it. ' + compute_name)
else:
    print('creating a new compute target...')
    provisioning_config = AmlCompute.provisioning_configuration(vm_size = vm_size,
                                                                min_nodes = compute_min_nodes, 
                                                                max_nodes = compute_max_nodes)

    # create the cluster
    compute_target = ComputeTarget.create(ws, compute_name, provisioning_config)
    
    # can poll for a minimum number of nodes and for a specific timeout. 
    # if no min node count is provided it will use the scale settings for the cluster
    compute_target.wait_for_completion(show_output=True, min_node_count=None, timeout_in_minutes=20)
    
     # For a more detailed view of current AmlCompute status, use get_status()
    print(compute_target.get_status().serialize())

found compute target. just use it. gpucluster


In [7]:
#Set up the Pytorch estimator which wraps around the original script to perform the training. In this step, we are utilizing
# distributed training using horovod to utilize the maximum resources available in the cluster. 
script_params = {
    '--data-folder': ds.as_mount()
}

pt_est = PyTorch(source_directory='./',
                 script_params=script_params,
                 compute_target=compute_target,
                 entry_script='finetuning_vgg_cifar100.py',
                 node_count=4,
                 process_count_per_node=1,
                 distributed_backend='mpi',
                 conda_packages=['matplotlib'],
                 use_gpu=True)

In [8]:
# Submit the run in the cluster
run = exp.submit(pt_est)

In [9]:
#Monitor the remote run
from azureml.widgets import RunDetails
RunDetails(run).show()

_UserRunWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': True, 'log_level': 'INFO', 's…

In [None]:
#run1 = get_run(exp, cifar100-classification_1551033439_77164d05)
#for r in exp.get_runs():
#    print(r.id, r.get_status())

In [53]:
#from azureml.core import get_run
#r = get_run(experiment=exp, run_id="cifar100-classification_1551033439_77164d05")

In [54]:
#r.cancel()

In [None]:
run.wait_for_completion(show_output=False) # specify True for a verbose log