# Deep learning workflow with `arcgis.learn`
 
Deep learning models 'learn' by looking at several examples of imagery and the expected outputs. In the workflow below, we will be walking you through various steps involved while training a model with `arcgis.learn`.

The below statements import Deep learning model available in Arcgis learn.
Currently the python API supports SingleShotDetector, RetinaNet, UnetClassifier, FeatureClassifier, MaskRCNN, PSPNet.

In this guide we will be taking an example of SingleShotDetector.

In [2]:
from arcgis.learn import SingleShotDetector

## Data Preparation

Data preparation can be a time consuming process that typically involves splitting the data into training and validation sets, applying various data augmentation techniques, creating the necessary data structures for loading data into the model, memory management by using the appropiately sized mini-batches of data and so on. The `prepare_data()` method can directly read the training samples exported by ArcGIS and automate the entire process.

By default, `prepare_data()` uses a default set of transforms for data augmentation that work well for satellite imagery. These transforms randomly rotate, scale and flip the images so the model sees a different image each time. Alternatively, users can compose their own transforms using [fast.ai transforms](https://docs.fast.ai/vision.transform.html) for the specific data augmentations they wish to perform.

In [3]:
from arcgis.learn import prepare_data


data = prepare_data(r'DATA_PATH')

Exception: Could not infer dataset type.

The `show_batch()` method can be used to visualize the exported training samples, along with labels, after data augmentation transformations have been applied.

In [None]:
data.show_batch()

## Model Training

`arcgis.learn` includes support for training deep learning models for object detection.

The models in `arcgis.learn` are based upon pretrained Convolutional Neural Networks (CNNs, or in short, convnets) that have been trained on millions of common images such as those in the [ImageNet](http://www.image-net.org/) dataset. The intuition of a CNN is that it uses a hierarchy of layers, with the earlier layers learning to identify simple features like edges and blobs, middle layers combining these primitive features to identify corners and object parts and the later layers combining the inputs from these in unique ways to grasp what the whole image is about. The final layer in a typical convnet is a fully connected layer that looks at all the extracted features and essentially compute a weighted sum of these to determine a probability of each object class (whether its an image of a cat or a dog, etc.).

A convnet trained on a huge corpus of images such as ImageNet is thus considered as a ready-to-use feature extractor. In practive, we could replace the last layer of these convnets with something else that uses those features for other useful tasks (e.g. object detection and pixel classification), which is also called transfer learning. The advantage of transfer learning is that we now don't need as much data to train an excellent model.

The `arcgis.learn` module is based on `PyTorch` and `fast.ai` and enables fine-tuning of pretrained [torchvision models](https://pytorch.org/docs/stable/torchvision/models.html) on satellite imagery. The `arcgis.learn` models leverages fast.ai's learning rate finder and one-cycle learning, and allows for much faster training and removes guesswork in picking hyperparameters.

In [None]:
ssd = SingleShotDetector(data)

### Find the good learning rate

Now we have defined a model architecture, we can start to train it. This process involves setting a good [learning rate](https://towardsdatascience.com/understanding-learning-rates-and-how-it-improves-performance-in-deep-learning-d0d4059c1c10). Using  ``lr_find()`` function available in ``arcgis.learn``, we can visualize loss Vs learning rate graph. In the current version, we use the improvised function that returns a suggested learning rate value along with a visual point on the graph. This will help in selecting an appropriate learning rate without needing to experiment with several learning rates and picking from among them.

In [None]:
ssd.lr_find()

### Train the model

As discussed earlier, the idea of transfer learning is to fine-tune earlier layers of the pretrained model and focus on training the newly added layers, meaning we need two different learning rates to better fit the model. We can use the above suggested learning rate to train the later layers of the model. 

Training the network is an iterative process. We can train the model using its `fit()` method till the validation loss (or error rate) continues to go down with each training pass also known as epoch. This is indicative of the model learning the task. We will train our model for __ epochs.
Optionally, if we pass ``early_stopping=True`` as a parameter in fit() method, it stops training the model if validation loss doesn't decrease for consecutive 5 epochs. Moreover, ``checkpoint=True`` parameter saves the best model based on validation loss during training.

Note: You may also choose not to pass lr parameter. The method automatically calls lr_find() function to find an optimum learning rate if lr parameter is not set.

In [None]:
# here we are training the model for 10 epochs
ssd.fit(10)

As each epoch progresses, the loss (error rate, that we are trying to minimize) for the training data and the validation set are reported. In the table above we can see the losses going down for both the training and validation datasets, indicating that the model is learning to recognize the well pads. We continue training the model for several iterations like this till we observe the validation loss going up. That indicates that the model is starting to overfit to the training data, and is not generalizing well enough for the validation data. When that happens, we can try adding more data (or data augmentations), increase regularization by increasing the `dropout` parameter in the SingleShotDetector model, or reduce the model complexity. 


### Unfreezing the backbone and fine-tuning
By default, the earlier layers of the model (i.e. the backbone or encoder) are frozen and their weights are not updated when the model is being trained. This allows the model to take advantage of the (ImageNet) pretrained weights for training the 'head' of the network. Once the later layers have been sufficiently trained, it helps to improve model performance and accuracy to `unfreeze()` the earlier layers and allow their weights to be fine-tuned to the nuances of the particular satellite imagery compared to the photos of everyday objects (from ImageNet) that the backbone was trained on. The learning rate finder can be used to identify the optimum lerning rate between the different training phases of the model. Please note that this step is only optional. If we don't call `unfreeze()`, the lower learning rate we specificed in the `fit()` won't be used.


### Visualize results
The results of how well the model has learnt can be visually observed using the model’s show_results() method. The ground truth is shown in the left column and the corresponding predictions from the model on the right. As we can see below, the model has learnt to detect well pads fairly well. In some cases, it is even able to detect the well pads that are missing in the ground truth data (due to inaccuracies in labelling or the records).

In [None]:
ssd.show_results(rows=25, thresh=0.05)

### Save and load trained models

Once you are satisfied with the model, you can save it using the `save()` method. This creates an Esri Model Definition (EMD file) that can be used for inferencing in ArcGIS Pro as well as a Deep Learning Package (DLPK zip) that can be deployed to ArcGIS Enterprise for distributed inferencing across a large geographical area using raster analytics. Saved models can also be loaded back using the `load()` method, for futher fine tuning. 

In [None]:
# save the trained model
saved_ssd = ssd.save('wellpad_model_planet_2501')

## Deploy trained model
Once a model has been trained, it can be added to ArcGIS Enterprise as a deep learning package.

In [None]:
ssd.save('Well Pad Detection Model Planet 2501', publish=True, gis=gis)

## Model management
The `arcgis.learn` module includes the `install_model()` method to install the uploaded model package (*.dlpk) to the raster analytics server.

Optionally after inferencing the necessary information from the imagery using the model, the model can be uninstalled using `uninstall_model()`. The deployed models on an Image Server can be queried using the `list_models()` method. The uploaded model package is installed automatically on first use as well. Here we are querying specific settings of the deep learning model using the model object:

In [None]:
from arcgis.learn import Model

detect_objects_model = Model(model_package)
detect_objects_model.install()
detect_objects_model.query_info()

Here we can see that `threshold` and `nms_overlap` are model arguments with default value of 0.5 and 0.1 respectively. These values may be changed in inference function call.

## Predict Video

``predict_video`` function can be used to run prediction on a video and append the output VMTI predictions in the metadata file passed as an input parameter.  By passing ``visualize=True``, It can also generate a video that contains bounding box predictions around the detected object using the specified deep learning model. 

 

Metadata file is a comma-separated values (CSV) file containing metadata about the video frames for specific times. To learn more about it, read here https://pro.arcgis.com/en/pro-app/tool-reference/image-analyst/video-multiplexer.htm.


You can pass track=True to apply tracking on the video. The output video can be saved by specifying the output_video_path. By default, the output video is saved in the original video's directory.

 

You can select assignment threshold for the assignment of trackers, the number of frames the detection should persist if object is not found and threshold of number of frames to start detecting an object. These values can be passed as a dictionary in the argument tracking options.

 

A set of visual options allows the user to save the debug video as per need. The user can select whether to display scores, labels, the color of the predictions, thickness, font face to show the labels.

 

The final saved VMTI can be used to multiplex on the input video by passing the flag multiplex=True the multiplexed video can be saved at the path specified in multiplexed_video_path. By default, the video gets saved in the original video's directory.