# Producing CNTK and Tensorflow models for image classification

In this notebook, we illustrate how one can produce residual networks (ResNets) to classify aerial images based on land use type (developed, forested, cultivated, etc.). We include an example on training a ResNet from scratch with Microsoft Cognitive Toolkit (CNTK) as well as an example of retraining the logits layer of an ImageNet-pretrained ResNet in Tensorflow (TF).

Training an image classification DNN from scratch requires more time than retraining an existing model, but may be advisable when the image features relevant for detection are unlikely to have been learned by available pretrained models. We were motivated to explore both approaches (i) because it was unclear whether a model pretrained on ImageNet would be appropriate for aerial imagery classification and (ii) producing trained models in two different deep learning frameworks (CNTK and TF) allows us to demonstrate how both frameworks can be deployed on Spark in a subsequent notebook.

This notebook is part of the [Embarrassingly Parallel Image Classification](https://github.com/Azure/Embarrassingly-Parallel-Image-Classification) git repository. For more information on how the training dataset was prepared, please see the [Image Set Preparation](https://github.com/Azure/Embarrassingly-Parallel-Image-Classification/blob/master/image_set_preparation.ipynb) notebook. For instructions on applying the trained models to large image sets using Spark, see the [Scoring on Spark](https://github.com/Azure/Embarrassingly-Parallel-Image-Classification/blob/master/scoring_on_spark.ipynb) notebook.

## Outline
- [(Optional) Set up an Azure N-Series GPU Deep Learning VM](#prepare)
   - [Provision the VM](#provision)
   - [Connect to the VM by remote desktop](#rd)
   - [Clone/download scripts and supporting files](#repo)
   - [Download training set data locally](#trainingset)
   - [(Optional) Access the VM remotely via Jupyter Notebook](#jupyter)
- [Training a ResNet from Scratch with Microsoft Cognitive Toolkit (CNTK)](#cntk)
- [Retraining a pretrained ResNet with TensorFlow](#tensorflow)
   - [Downloading a pretrained model](#tfmodel)
   - [Running the training script](#tfrun)
- [Next Steps](#nextsteps)

<a name="prepare"></a>
## (Optional) Set up an Azure N-Series GPU Deep Learning VM

Training deep neural networks is a compute- and time-intensive task. To keep your local computer free for other work, and to achieve speed improvements through the use of a GPU, we recommend performing training on an Azure N-Series GPU Deep Learning VM. This VM also comes with CNTK, Python, and a Jupyter Notebooks server pre-configured. (If you prefer to perform training locally, please install [Python 3.5](https://www.python.org/) and [CNTK 2.0 beta version 10 (or newer)](https://github.com/Microsoft/CNTK/releases), [prepare the input dataset](https://github.com/Azure/Embarrassingly-Parallel-Image-Classification/blob/master/image_set_preparation.ipynb), and proceed to the next section.)

<a name="provision"></a>
### Provision the VM

1. In the [Azure Portal](https://ms.portal.azure.com), start creation of a new Deep Learning VM.
    1. Click the "+ New" button at upper left to launch a search pane.
    1. Type in "Deep Learning Toolkit for the DSVM" and press Enter.
    1. In the search results, choose the "Deep Learning Toolkit for the DSVM" published by Microsoft.
    1. After reading the description, press "Create" to begin customization.
1. In the "Basics" pane, choose a username, password, and resource group.
    - We recommend creating a new resource group so that you can easily delete all associated resources, like network interfaces and IP addresses, when you are finished with the VM.
    - Note that GPU VMs are not available in all regions.
1. In the "Settings" pane, choose a [virtual machine size](https://docs.microsoft.com/en-us/azure/virtual-machines/virtual-machines-windows-sizes) that includes a graphics card based on your needs. (The default will suffice for this tutorial.)
1. Confirm your settings on the "Summary" pane, then click "OK" on the "Buy" pane to provision the VM.

<a name="rd"></a>
### Connect to the VM by remote desktop

After the VM deployment is finished, you can connect to the VM by remote desktop as follows:
1. Navigate to the VM's pane in Azure Portal (e.g. by searching for the VM's name).
1. Click "Connect" along the bar on top of the pane to download an RDP file.
1. Double-click the RDP file to start the connection.
1. Supply the username and password you chose earlier. You may need to specify the "domain" (VM name) as well as your username, e.g. "myvmname\myusername", so that the connection doesn't attempt to use your computer's default domain.

<a name="repo"></a>
### Clone/download scripts and supporting files
Download the contents of this repo and copy the contents of the `tf` and `cntk` subfolders to appropriate locations. We have used locations on the temporary drive, e.g. `D:\tf` and `D:\cntk`.

<a name="trainingset"></a>
### Download training set data locally
During image set preparation, a training image set and descriptive files were created for use with CNTK and TensorFlow. Transfer these files to the GPU VM and store in an appropriate location. (We have used the `D:\combined\train_subsample` folder created in the [image set preparation](https://github.com/Azure/Embarrassingly-Parallel-Image-Classification/blob/master/image_set_preparation.ipynb) notebook.) If you did not generate a larger training set earlier, you can use the small training set included in [this git repo](https://github.com/Azure/Embarrassingly-Parallel-Image-Classification). You may need to regenerate the CNTK map file (according to instructions in that notebook) if the image paths have been changed.

<a name="jupyter"></a>
### (Optional) Access the VM remotely via Jupyter Notebook

Follow these steps if you wish to be able to access the notebook server remotely:
1. In the [Azure Portal](https://portal.azure.com), navigate to the deployed VM's pane and determine its IP address.
1. In the [Azure Portal](https://portal.azure.com), navigate to the deployed VM's Network Security Group's pane and add inbound/outbound rules permitting traffic on port 9999.
1. While connected to the VM via remote desktop, launch a command prompt (Windows key + R) and type the following commands:

   ```
   cd C:\dsvm\tools\setup
   JupyterSetPasswordAndStart.cmd
   ```

   Follow the prompts to set your remote access password.
   
1. Connect to your VM remotely via Jupyter Notebooks using the IP address you determined earlier and port 9999, e.g. `https://[__.__.__.__]:9999`. The default directory on login will be `C:\dsvm\notebooks`.

<a name="cntk"></a>
## Training a ResNet from Scratch with Microsoft Cognitive Toolkit (CNTK)

The `train.py` script in the `cntk` subfolder of this repo can be used to train from scratch a 20-layer ResNet for image classification. The training script is adapted from the [CNTK ResNet/CIFAR10 image classification example](https://github.com/Microsoft/CNTK/tree/master/Examples/Image/Classification/ResNet/Python): if training on a multi-GPU VM, see their example code for distributed training. To run the script, type the following at an Anaconda prompt:

In [None]:
activate cntk-py34
python path_to_repo\cntk\train.py

For details of the model evaluation process, please see the scoring notebook in the [Embarrassingly Parallel Image Classification](https://github.com/Azure/Embarrassingly-Parallel-Image-Classification) repository.

<a name="tensorflow"></a>
## Retraining a pretrained ResNet with TensorFlow

We made use of the [`tf-slim` API](https://github.com/tensorflow/tensorflow/tree/master/tensorflow/contrib/slim) for Tensorflow, which provides pre-trained ResNet models and helpful scripts for retraining and scoring. During training set preparation, we converted raw PNG images to the [TFRecords](https://www.tensorflow.org/how_tos/reading_data/#file_formats) files that those scripts expect as input. For more details on the training data, please see the image preparation notebook in the [Embarrassingly Parallel Image Classification](https://github.com/Azure/Embarrassingly-Parallel-Image-Classification) repository. 

Our training script is a modified version of `train_image_classifier.py` from the [Tensorflow models repo's slim subdirectory](https://github.com/tensorflow/models/tree/master/slim). Changes have also been made to some of that script's dependencies. We recommend that you clone this repo and transfer the `tf` subfolder, including dependencies, to a suitable location (to be indicated with the variable `repo_dir` below).

<a name="tfmodel"></a>
### Downloading a pretrained model

We obtained a 50-layer ResNet pretrained on ImageNet from a link in the [Tensorflow models repo's slim subdirectory](https://github.com/tensorflow/models/tree/master/slim). The pretrained model can be obtained and unpacked with the code snippet below:

In [None]:
import urllib.request
import tarfile
import os

repo_dir = 'D:\\tf'

urllib.request.urlretrieve('http://download.tensorflow.org/models/resnet_v1_50_2016_08_28.tar.gz',
                           os.path.join(repo_dir, 'resnet_v1_50_2016_08_28.tar.gz'))
with tarfile.open(os.path.join(repo_dir, 'resnet_v1_50_2016_08_28.tar.gz'), 'r:gz') as f:
    f.extractall(path=repo_dir)
os.remove(os.path.join(repo_dir, 'resnet_v1_50_2016_08_28.tar.gz'))

<a name="tfrun"></a>
### Running the training script

We recommend that you run the training script from an Anaconda prompt. The code cell below will help you generate the appropriate command based on your file locations.

In [None]:
# path where retrained model and logs will be saved during training
train_dir = os.path.join(repo_dir, 'models')
if not os.path.exists(train_dir):
    os.makedirs(train_dir)
    
# location of the unpacked pretrained model
checkpoint_path = os.path.join(repo_dir, 'resnet_v1_50.ckpt')

# Location of the TFRecords and other files generated during image set preparation
image_dir = 'D:\\combined\\train_subsample'

command = '''activate py35
python {0} --train_dir={1} --dataset_name=aerial --dataset_split_name=train --dataset_dir={2} --checkpoint_path={3}
'''.format(os.path.join(repo_dir, 'retrain.py'),
           train_dir,
           dataset_dir,
           checkpoint_path)

print(command)

<a name="nextsteps"></a>
## Next Steps

For details of the model evaluation process, please see the scoring notebook in the [Embarrassingly Parallel Image Classification](https://github.com/Azure/Embarrassingly-Parallel-Image-Classification) repository.