Copyright (c) Microsoft Corporation. All rights reserved.

Licensed under the MIT License.

# Training an Object Detection model using AutoML
In this notebook, we go over how you can use AutoML for training an Object Detection model. We will use a small dataset to train the model, demonstrate how you can tune hyperparameters of the model to optimize model performance and deploy the model to use in inference scenarios.

## Setup
To use this notebook, you will need to install the private preview package for AutoML for images from the private index.

### Note: Only Python 3.6 and 3.7 are supported for this feature.

In [49]:
%pip install --upgrade "azureml-train-core<0.1.1" "azureml-train-automl<0.1.1" "azureml-contrib-dataset<0.1.1"  --extra-index-url "https://azuremlsdktestpypi.azureedge.net/automl_for_images_private_preview/"

Looking in indexes: https://pypi.org/simple, https://azuremlsdktestpypi.azureedge.net/automl_for_images_private_preview/
Requirement already up-to-date: azureml-train-core<0.1.1 in /anaconda/envs/azureml_py36/lib/python3.6/site-packages (0.1.0.36994775)
Requirement already up-to-date: azureml-train-automl<0.1.1 in /anaconda/envs/azureml_py36/lib/python3.6/site-packages (0.1.0.36994775)
Requirement already up-to-date: azureml-contrib-dataset<0.1.1 in /anaconda/envs/azureml_py36/lib/python3.6/site-packages (0.1.0.36994775)
Note: you may need to restart the kernel to use updated packages.


### Licensing Information - 
This preview software is made available to you on the condition that you agree to
[your agreement][1] governing your use of Azure, and to the Supplemental Terms of Use for Microsoft Azure Previews[2], which supplement your agreement governing your use of Azure.
If you do not have an existing agreement governing your use of Azure, you agree that 
your agreement governing use of Azure is the [Microsoft Online Subscription Agreement][3]
(which incorporates the [Online Services Terms][4]).
By using the software you agree to these terms. This software may collect data
that is transmitted to Microsoft. Please see the [Microsoft Privacy Statement][5]
to learn more about how Microsoft processes personal data.

[1]: https://azure.microsoft.com/en-us/support/legal/
[2]: https://azure.microsoft.com/en-us/support/legal/preview-supplemental-terms/
[3]: https://azure.microsoft.com/en-us/support/legal/subscription-agreement/
[4]: http://www.microsoftvolumelicensing.com/DocumentSearch.aspx?Mode=3&DocumentTypeId=46
[5]: http://go.microsoft.com/fwlink/?LinkId=248681 


## Workspace setup
In order to train and deploy models in Azure ML, you will first need to set up a workspace.

An [Azure ML Workspace](https://docs.microsoft.com/en-us/azure/machine-learning/concept-azure-machine-learning-architecture#workspace) is an Azure resource that organizes and coordinates the actions of many other Azure resources to assist in executing and sharing machine learning workflows. In particular, an Azure ML Workspace coordinates storage, databases, and compute resources providing added functionality for machine learning experimentation, deployment, inference, and the monitoring of deployed models.

Create an Azure ML Workspace within your Azure subscription, or load an existing workspace.

In [50]:
from azureml.core.workspace import Workspace

In [51]:
ws = Workspace.from_config()
ws.get_details()

## specify workspace parameters
#subscription_id=''   
#resource_group='amldatalabeling'   
#workspace_name='Computer_Vision_Pipeline'

#from azureml.core.workspace import Workspace
#ws = Workspace.create(name=workspace_name,
#                      subscription_id=subscription_id,
#                      resource_group=resource_group, 
#                      exist_ok=True)

{'id': '/subscriptions/83da1f6b-22c0-4300-aa13-bd260eab57e5/resourceGroups/amldatalabeling/providers/Microsoft.MachineLearningServices/workspaces/Computer_Vision_Pipeline',
 'name': 'Computer_Vision_Pipeline',
 'identity': {'principal_id': '91200a66-f1b4-4077-860a-4d115e873bc5',
  'tenant_id': '72f988bf-86f1-41af-91ab-2d7cd011db47',
  'type': 'SystemAssigned'},
 'location': 'eastus',
 'type': 'Microsoft.MachineLearningServices/workspaces',
 'tags': {},
 'sku': 'Basic',
 'workspaceid': 'd869dd80-1333-484d-99ae-5869dc1b041a',
 'sdkTelemetryAppInsightsKey': 'f5784ccd-178d-4ecc-9998-b05841b44ae9',
 'description': '',
 'friendlyName': 'Computer_Vision_Pipeline',
 'creationTime': '2021-05-17T20:54:23.0413693+00:00',
 'containerRegistry': '/subscriptions/83da1f6b-22c0-4300-aa13-bd260eab57e5/resourceGroups/amldatalabeling/providers/Microsoft.ContainerRegistry/registries/d869dd801333484d99ae5869dc1b041a',
 'keyVault': '/subscriptions/83da1f6b-22c0-4300-aa13-bd260eab57e5/resourcegroups/amldatala

## Compute target setup
You will need to provide a [Compute Target](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#compute-target) that will be used for your AutoML model training. AutoML models for image tasks require GPU SKUs and support NC and ND families. We recommend using the NCsv3-series (with v100 GPUs) for faster training. Using a compute target with a multi-GPU VM SKU will leverage the multiple GPUs to speed up training. Additionally, setting up a compute target with multiple nodes will allow for faster model training by leveraging parallelism, when tuning hyperparameters for your model.

In [52]:
from azureml.core.compute import AmlCompute, ComputeTarget

cluster_name = "gpu-cluster-nc6"

try:
    compute_target = ws.compute_targets[cluster_name]
    print('Found existing compute target.')
except KeyError:
    print('Creating a new compute target...')
    compute_config = AmlCompute.provisioning_configuration(vm_size='Standard_NC6', 
                                                           idle_seconds_before_scaledown=1800,
                                                           min_nodes=0, 
                                                           max_nodes=4)

    compute_target = ComputeTarget.create(ws, cluster_name, compute_config)
    
# Can poll for a minimum number of nodes and for a specific timeout.
# If no min_node_count is provided, it will use the scale settings for the cluster.
compute_target.wait_for_completion(show_output=True, min_node_count=None, timeout_in_minutes=20)

Found existing compute target.
Succeeded
AmlCompute wait for completion finished

Minimum number of nodes requested have been provisioned


## Experiment Setup
Create an [Experiment](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#experiment) in your workspace to track your model training runs

In [53]:
from azureml.core import Experiment

experiment_name = 'automl-image-object-detection_soda' 
experiment = Experiment(ws, name=experiment_name)

## Dataset with input Training Data
In order to generate models for computer vision, you will need to bring in labeled image data as input for model training in the form of an AzureML Labeled Dataset. You can either use a Labeled Dataset that you have exported from a Data Labeling project, or create a new Labeled Dataset with your labeled training data

### Convert annotation file from COCO to JSONL
If you want to try with a dataset in COCO format, the scripts below shows how to convert it to `jsonl` format. The file "odFridgeObjects_coco.json" consists of annotation infomation for the `odFridgeObjects` dataset.

In [54]:
!python coco2jsonl.py --input_coco_file_path "./soda.json" --output_dir "./sodaObjects" --output_file_name "sodaObjects_from_coco.jsonl"  --task_type "ObjectDetection"  --base_url "AmlDatastore://workspaceblobstore/sodaObjects/images/"

Converting for ObjectDetection
Conversion completed. Converted 241 lines.


### Upload the JSONL file and images to Datastore  
In order to use the data for training in Azure ML, we upload it to our Azure ML Workspace via a [Datastore](https://docs.microsoft.com/en-us/azure/machine-learning/concept-azure-machine-learning-architecture#datasets-and-datastores). The datastore provides a mechanism for you to upload/download data, and interact with it from your remote compute targets. It is an abstraction over Azure Storage.



In [55]:
# Retrieving default datastore that got automatically created when we setup a workspace
ds = ws.get_default_datastore()
ds.upload(src_dir='./sodaObjects', target_path='sodaObjects')

Uploading an estimated of 1 files
Target already exists. Skipping upload for sodaObjects/sodaObjects_from_coco.jsonl
Uploaded 0 files


$AZUREML_DATAREFERENCE_3e8ae2868b3346cc8761e4334511fdd0

Finally, we need to create an Azure ML [Dataset](https://docs.microsoft.com/en-us/azure/machine-learning/concept-azure-machine-learning-architecture#datasets-and-datastores) from the data we uploaded to the Datastore. We create one dataset for training and one for validation.

In [56]:
from azureml.contrib.dataset.labeled_dataset import _LabeledDatasetFactory, LabeledDatasetTask
from azureml.core import Dataset

training_dataset_name = 'sodaObjects'
if training_dataset_name in ws.datasets:
    training_dataset = ws.datasets.get(training_dataset_name)
    print('Found the training dataset', training_dataset_name)
else:
    # create training dataset
    training_dataset = _LabeledDatasetFactory.from_json_lines(
        #task=LabeledDatasetTask.OBJECT_DETECTION, path=ds.path('appleObjects/train_annotations.jsonl'))
         task=LabeledDatasetTask.OBJECT_DETECTION, path=ds.path('sodaObjects/sodaObjects_from_coco.jsonl'))
    training_dataset = training_dataset.register(workspace=ws, name=training_dataset_name)
    
# create validation dataset
#validation_dataset_name = "appleObjectsValidationDataset"
#if validation_dataset_name in ws.datasets:
#    validation_dataset = ws.datasets.get(validation_dataset_name)
#    print('Found the validation dataset', validation_dataset_name)
#else:
#    validation_dataset = _LabeledDatasetFactory.from_json_lines(
#       task=LabeledDatasetTask.OBJECT_DETECTION, path=ds.path('appleObjects/validation_annotations.jsonl'))
#    validation_dataset = validation_dataset.register(workspace=ws, name=validation_dataset_name)
    
    
#print("Training dataset name: " + training_dataset.name)
#print("Validation dataset name: " + validation_dataset.name)

Found the training dataset sodaObjects


Validation dataset is optional. If no validation dataset is specified, by default 20% of your training data will be used for validation. You can control the percentage using the `split_ratio` argument - please refer to the documentation for more details.   

This is what the training dataset looks like

# View Training Set

In [57]:
import pandas as pd
training_dataset.to_pandas_dataframe()

Unnamed: 0,image_url,image_details,label
0,StreamInfo(AmlDatastore://sodaObjects/images/e...,"{'format': 'jpg', 'width': 816.0, 'height': 61...","[{'label': 'sprite', 'topX': 0.153960129310344..."
1,StreamInfo(AmlDatastore://sodaObjects/images/8...,"{'format': 'jpg', 'width': 816.0, 'height': 61...","[{'label': 'sprite', 'topX': 0.185364070197044..."
2,StreamInfo(AmlDatastore://sodaObjects/images/a...,"{'format': 'jpg', 'width': 816.0, 'height': 61...","[{'label': 'sprite', 'topX': 0.221694119458128..."
3,StreamInfo(AmlDatastore://sodaObjects/images/b...,"{'format': 'jpg', 'width': 816.0, 'height': 61...","[{'label': 'coke', 'topX': 0.3115955972906404,..."
4,StreamInfo(AmlDatastore://sodaObjects/images/0...,"{'format': 'jpg', 'width': 816.0, 'height': 61...","[{'label': 'diet_coke', 'topX': 0.317753232758..."
...,...,...,...
160,StreamInfo(AmlDatastore://sodaObjects/images/6...,"{'format': 'jpg', 'width': 816.0, 'height': 61...","[{'label': 'sprite', 'topX': 0.033886237684729..."
161,StreamInfo(AmlDatastore://sodaObjects/images/2...,"{'format': 'jpg', 'width': 816.0, 'height': 61...","[{'label': 'sprite', 'topX': 0.205068503694581..."
162,StreamInfo(AmlDatastore://sodaObjects/images/d...,"{'format': 'jpg', 'width': 816.0, 'height': 61...","[{'label': 'coke', 'topX': 0.22785175492610837..."
163,StreamInfo(AmlDatastore://sodaObjects/images/a...,"{'format': 'jpg', 'width': 816.0, 'height': 61...","[{'label': 'diet_coke', 'topX': 0.200142395320..."


## Configuring your AutoML run for image tasks
AutoML allows you to easily train models for Image Classification, Object Detection & Instance Segmentation on your image data. You can control the model algorithm to be used, specify hyperparameter values for your model as well as perform a sweep across the hyperparameter space to generate an optimal model. Parameters for configuring your AutoML runs for image related tasks are specified using the `AutoMLImageConfig` - please refer to the [documentation](https://github.com/swatig007/automlForImages/blob/main/README.md) for the details on the parameters that can be used and their values.

When using AutoML for image tasks, you need to specify the model algorithms using the `model_name` parameter. You can either specify a single model or choose to sweep over multiple ones. 
Currently supported model algorithms for object detection:`yolov5`, `fasterrcnn_resnet50_fpn`, `fasterrcnn_resnet34_fpn`, `fasterrcnn_resnet18_fpn`, `retinanet_resnet50_fpn`.

### Using default hyperparameter values for the specified algorithm
Before doing a large sweep to search for the optimal models and hyperparameters, we recommend trying the default values to get a first baseline. Next, you can explore multiple hyperparameters for the same model before sweeping over multiple models and their parameters. This is for employing a more iterative approach, because with multiple models and multiple hyperparameters for each (as we showcase in the next section), the search space grows exponentially and you need more iterations to find optimal configurations.

If you wish to use the default hyperparameter values for a given algorithm (say `yolov5`), you can specify the config for your AutoML Image runs as follows:

In [58]:
from azureml.train.automl import AutoMLImageConfig
from azureml.train.hyperdrive import GridParameterSampling, choice

image_config_yolov5 = AutoMLImageConfig(task='image-object-detection',
                                        compute_target=compute_target,
                                        training_data=training_dataset,
                                        #validation_data=validation_dataset,
                                        hyperparameter_sampling=GridParameterSampling({'model_name': choice('yolov5')}))

### Submitting an AutoML run for Image tasks 
Once you've created the config settings for your run, you can submit an AutoML run using the config in order to train an image model using your training dataset.

In [59]:
automl_image_run = experiment.submit(image_config_yolov5)

Submitting remote run.


Experiment,Id,Type,Status,Details Page,Docs Page
automl-image-object-detection_soda,AutoML_f546f754-3c46-4492-922b-7c9e53b9aa33,automl,NotStarted,Link to Azure Machine Learning studio,Link to Documentation


## Register the optimal model from the AutoML run
Once the run completes, we can register the model that was created from the best run (configuration that resulted in the best primary metric)

In [66]:
# Register the model from the best run

best_child_run = automl_image_run.get_best_child()
model_name = best_child_run.properties['model_name']
model = best_child_run.register_model(model_name = model_name, model_path='train_artifacts/model.onnx')

## Download the model and other associated files e.g. labels

In [72]:
#https://github.com/MicrosoftDocs/azure-docs/blob/master/articles/machine-learning/how-to-inference-onnx-automl-image-models.md
# Create a model folder in the current directory
os.makedirs('./model', exist_ok=True)


# Download the model from run history
best_child_run.download_file(name='train_artifacts/model.onnx',
output_file_path='./model/model.onnx')

best_child_run.download_file(name='train_artifacts/labels.json',
output_file_path='./model/labels.json')
