# Train a land classification model from scratch

In this notebook, you will train a neural network model to predict land use from aerial imagery using Microsoft's Cognitive Toolkit (CNTK). Later notebooks will illustrate how you can apply the trained model to new images, both in Jupyter notebooks and in ESRI's ArcGIS Pro.

This tutorial will assume that you have already provisioned a [Geo AI Data Science Virtual Machine]() and are using this Jupyter notebook while connected via remote desktop on that VM. If not, please see our guide to [provisioning and connecting to a Geo AI DSVM](https://github.com/Azure/pixel_level_land_classification/blob/master/geoaidsvm/setup.md).

## Download supporting files

The following commands will use the [AzCopy](https://docs.microsoft.com/en-us/azure/storage/common/storage-use-azcopy) utility to download sample data, a pre-trained model, and code to your VM. The command may take a few minutes to complete. When finished, you should see a transfer summary indicating that all files were transferred successfully.

In [1]:
!AzCopy /Source:https://aiforearthcollateral.blob.core.windows.net/imagesegmentationtutorial /SourceSAS:"?st=2018-01-16T10%3A40%3A00Z&se=2028-01-17T10%3A40%3A00Z&sp=rl&sv=2017-04-17&sr=c&sig=KeEzmTaFvVo2ptu2GZQqv5mJ8saaPpeNRNPoasRS0RE%3D" /Dest:D:\pixellevellandclassification /S

[2018/02/08 22:31:40] Transfer summary:
-----------------
Total files transferred: 32
Transfer successfully:   32
Transfer skipped:        0
Transfer failed:         0
Elapsed time:            00.00:00:28


If you like, you can navigate to the `D:\pixellevellandclassification` directory to examine the files we have transferred. You will find that the sample data are composed of paired files of [National Agricultural Imagery Project](https://www.fsa.usda.gov/programs-and-services/aerial-photography/imagery-programs/naip-imagery/) aerial images and land cover labels produced by the [Chesapeake Conservancy](http://chesapeakeconservancy.org/). While these data are stored in the common TIFF format, they are not readily viewable because they do not have the usual three (RGB) color channels.

## Install Python packages

Most of the Python packages used by our code -- CNTK, numpy, scipy, etc. -- are pre-installed on the Geo AI Data Science VM. However, we will need to install a few less-common packages:
- `tifffile`: load and save TIFF images.
- `gdal`: read specialized headers in our TIFF files that contain information on the region shown, geospatial coordinate system used, etc.
- `pyproj`: to read PROJ.4-formatted geospatial projection information
- `basemap`: to help convert between lat-lon coordinates and row/column positions in our data files.

Special thanks to [Christoph Gohlke](https://www.lfd.uci.edu/~gohlke/pythonlib) for preparation of the gdal, pyproj, and basemap wheels.

In [2]:
!C:\Anaconda\envs\py35\python -m pip install tifffile
!C:\Anaconda\envs\py35\python -m pip install D:\pixellevellandclassification\wheels\GDAL-2.2.3-cp35-cp35m-win_amd64.whl
!C:\Anaconda\envs\py35\python -m pip install D:\pixellevellandclassification\wheels\pyproj-1.9.5.1-cp35-cp35m-win_amd64.whl
!C:\Anaconda\envs\py35\python -m pip install D:\pixellevellandclassification\wheels\basemap-1.1.0-cp35-cp35m-win_amd64.whl

Collecting tifffile
  Downloading tifffile-0.13.5.tar.gz (93kB)
Building wheels for collected packages: tifffile
  Running setup.py bdist_wheel for tifffile: started
  Running setup.py bdist_wheel for tifffile: finished with status 'done'
  Stored in directory: C:\Users\mawah\AppData\Local\pip\Cache\wheels\2c\94\c4\9ebc22e2fa6c509fc86e645f0e9154d4200bf1d6de755527c6
Successfully built tifffile
Installing collected packages: tifffile
Successfully installed tifffile-0.13.5
Processing d:\pixellevellandclassification\wheels\gdal-2.2.3-cp35-cp35m-win_amd64.whl
Installing collected packages: GDAL
Successfully installed GDAL-2.2.3
Processing d:\pixellevellandclassification\wheels\pyproj-1.9.5.1-cp35-cp35m-win_amd64.whl
Installing collected packages: pyproj
Successfully installed pyproj-1.9.5.1
Processing d:\pixellevellandclassification\wheels\basemap-1.1.0-cp35-cp35m-win_amd64.whl
Collecting pyshp>=1.2.0 (from basemap==1.1.0)
  Downloading pyshp-1.2.12.tar.gz (193kB)
Building wheels for collec

## Perform training

Before starting training, ensure that you do not have any running processes making use of GPUs. (This may be the case if you have other programs or Jupyter notebooks running.) To do so, execute the code cell below to check your GPU status and running processes using `nvidia-smi`:

In [6]:
import subprocess

proc = subprocess.Popen('nvidia-smi', stdout=subprocess.PIPE)
print(proc.stdout.read().decode())

Fri Feb 09 17:49:27 2018       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 385.08                 Driver Version: 385.08                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name            TCC/WDDM | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|   0  Tesla K80           TCC  | 00000CF1:00:00.0 Off |                    0 |
| N/A   36C    P8    34W / 149W |    233MiB / 11447MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  Tesla K80           TCC  | 0000BCF2:00:00.0 Off |                    0 |
| N/A   33C    P8    32W / 149W |      1MiB / 11447MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                            

To run the training script, edit the command below by replacing `%num_gpus%` with the number of GPUs on your VM:

Geo AI DSVM SKU name | Number of GPUs
:----:|:----:
NC6 | 1
NC12 | 2
NC24 | 4

In [None]:
mpiexec -n %num_gpus% C:\Anaconda\envs\py35\python ^
    D:\pixellevellandclassification\scripts\train_distributed.py ^
    --input_dir D:\pixellevellandclassification\training_data ^
    --model_dir D:\pixellevellandclassification\models ^
    --num_epochs 1

Then, open a Windows command prompt (e.g. by clicking on the Start menu, typing "Command Prompt", and pressing Enter), paste in the command, and execute the command. It will generate a new model from scratch, train the model for one epoch, and save the model to `D:\pixellevellandclassification\models\trained.model`. Training takes ~25 minutes with a single GPU, ~15 minutes with two GPUs, etc.

During this time, you can finish reading this notebook and monitor progress as follows:
- In the command prompt where you launched the training, you should soon see output messages indicating the number of GPUs ("nodes") participating.
- Using Task Manager, observe that a Python process has been spawned for each GPU and is using a substantial amount of memory.
    - This tutorial uses eight pairs of training files. They occupy more space in memory than they do on disk due to decompression on loading.
    - Because it takes so long to load files of this size, we've chosen to load the files once at the beginning of training and hold them in memory for fast access. This is especially beneficial when training for more than one epoch.
- Re-run the `nvidia-smi` cell above: you should see utilization of all GPUs (eventually resulting in high temperature and GPU memory usage) and one running process per GPU.

When training is complete, the output messages at the command prompt should indicate the duration of the training epoch and the error rate on the training set during the epoch, e.g.
```
Finished Epoch[1 of 1]: [Training] loss = 0.127706 * 16000, metric = 3.59% * 16000 1421.583s ( 11.3 samples/s);
```

## Understand the training script

While training runs, take a moment to explore the training script and model definition in your favorite text editor:
```
D:\pixellevellandclassification\scripts\train_distributed.py
D:\pixellevellandclassification\scripts\model_mini_pub.py
```

Below we provide some additional explanation of selected sections of these scripts.

### Training data access

Near the beginning of the training script is a custom minibatch source specifying how the training data should be read and used. Our training data comprise pairs of TIF images. The first image in each pair is a four-channel (red, green, blue, near-infrared) aerial image of a region of the Chesapeake Bay watershed. The second image is a single-channel "image" corresponding to the same region, in which each pixel's value corresponds to a land cover label:
- 0: Unknown land type
- 1: Water
- 2: Trees and shrubs
- 3: Herbaceous vegetation
- 4+: Barren and impervious (roads, buildings, etc.); we lump these labels together

These two images in each pair correspond to the features and labels of the data, respectively. The minibatch source specifies that the available image pairs should be partitioned evenly between the workers, and each worker should load its set of image pairs into memory at the beginning of training. This ensures that the slow process of reading the input images is performed only once per training job. To produce each minibatch, subregions of a given image pair are sampled randomly. Training proceeds by cycling through the image pairs.

### The model architecture
The [model definition script](https://aiforearthcollateral.blob.core.windows.net/imagesegmentationtutorial/scripts/model_mini_pub.py) specifies the model architecture: a form of [U-Net](https://lmb.informatik.uni-freiburg.de/people/ronneber/u-net/). The input for this model will be a 256 pixel x 256 pixel four-channel aerial image (corresponding to a 256 meter x 256 meter region), and the output will be predicted land cover labels for the 128 m x 128 m region at the center of the input region. (Predictions are not provided at the boundaries due to edge effects.)

## Next steps

Now that you have produced a trained model, you can test its performance in the following notebook on [applying your model to new aerial images](./evaluate.md). You may later wish to return to this section to:
- Train a model for more than one epoch to improve its performance
- Train with fewer GPUs to confirm the runtime scaling achieved with distributed training (NC12 and NC24 VMs only)

When you are done using your Geo AI Data Science VM, we recommend that you stop or delete it to prevent further charges.

For comments and suggestions regarding this notebook, please post a [Git issue](https://github.com/Azure/pixel_level_land_classification/issues/new) or submit a pull request in the [pixel-level land classification repository](https://github.com/Azure/pixel_level_land_classification).