Copyright (c) Microsoft Corporation. All rights reserved.

Licensed under the MIT License.
# Kakadu Fish AI
## Training an Object Detection model using AutoML

This notebook was adapted from [Microsofts Azure Machine Learning examples](https://github.com/Azure/azureml-examples/tree/main/v1/python-sdk/tutorials/automl-with-azureml) 

In this notebook, we use AutoML for training an Object Detection model. We will use a custom dataset in COCO format to train the model, tune hyperparameters of the model to optimize model performance and deploy the model to use in inference scenarios.

### Workspace setup

In order to train and deploy models in Azure ML, you will first need to set up a workspace.

An Azure ML Workspace is an Azure resource that organizes and coordinates the actions of many other Azure resources to assist in executing and sharing machine learning workflows. In particular, an Azure ML Workspace coordinates storage, databases, and compute resources providing added functionality for machine learning experimentation, deployment, inference, and the monitoring of deployed models.

Create an Azure ML Workspace within your Azure subscription, or load an existing workspace.


In [None]:
## specify workspace parameters, these can be obtained from your azure subscription and AML account

subscription_id='<insert subscription id>'   
resource_group='<insert resource group name>'   
workspace_name='<insert AML workspace name>'

from azureml.core.workspace import Workspace
ws = Workspace.from_config()

### Compute target setup

You will need to provide a Compute Target that will be used for your AutoML model training. AutoML models for image tasks require GPU SKUs and support NC and ND families. We recommend using the NCsv3-series (with v100 GPUs) for faster training. Using a compute target with a multi-GPU VM SKU will leverage the multiple GPUs to speed up training. Additionally, setting up a compute target with multiple nodes will allow for faster model training by leveraging parallelism, when tuning hyperparameters for your model.


In [None]:
from azureml.core.compute import AmlCompute, ComputeTarget

cluster_name = "gpu-cluster-nc6"

try:
    compute_target = ws.compute_targets[cluster_name]
    print('Found existing compute target.')
except KeyError:
    print('Creating a new compute target...')
    compute_config = AmlCompute.provisioning_configuration(vm_size='Standard_NC6', 
                                                           idle_seconds_before_scaledown=1800,
                                                           min_nodes=0, 
                                                           max_nodes=4)

    compute_target = ComputeTarget.create(ws, cluster_name, compute_config)
    
# Can poll for a minimum number of nodes and for a specific timeout.
# If no min_node_count is provided, it will use the scale settings for the cluster.
compute_target.wait_for_completion(show_output=True, min_node_count=None, timeout_in_minutes=20)

### Experiment Setup
Create an Experiment in your workspace to track your model training runs. Give your experiment a name. This name will appear in the experiments section in the AML Studio. 

In [None]:
from azureml.core import Experiment

experiment_name = '<create name for your experiment>' 
experiment = Experiment(ws, name=experiment_name)

### Dataset with input Training Data
In order to generate models for computer vision, you will need to bring in labeled image data as input for model training in the form of an AzureML Labeled Dataset. You can either use a Labeled Dataset that you have exported from a Data Labeling project, or create a new Labeled Dataset with your labeled training data



In [None]:
from IPython.display import Image
Image(filename='000000001020.jpg') 

#### Convert annotation file from COCO to JSONL
If you want to try with a dataset in COCO format, the scripts below shows how to convert it to jsonl format. 

In [None]:
#provide the credentials for the storage account where your training dataset and COCO .json file reside.

from azure.storage.blob import BlobServiceClient
import pandas as pd
import json

STORAGEACCOUNTURL= 'https://<insert account name here>.blob.core.windows.net'
STORAGEACCOUNTKEY= '<insert storage account key here>'
CONTAINERNAME= '<specify blob container name>'
BLOBNAME= '<insert .json file name here>.json'

In [None]:
#change the name below to your .json file name (withopen("enter name here","w")

blob_service_client_instance = BlobServiceClient(account_url=STORAGEACCOUNTURL, credential=STORAGEACCOUNTKEY)
blob_client_instance = blob_service_client_instance.get_blob_client(CONTAINERNAME, BLOBNAME, snapshot=None)
with open(BLOBNAME, "wb") as my_blob:
    blob_data = blob_client_instance.download_blob()
    coco_data = json.loads(blob_data.readall())

with open("<insert json file name here>.json", "w") as outfile:
    outfile.write(json.dumps(coco_data))

In [None]:
# Generate jsonl file from coco file. Change the input_coco_file_path string to the name of your COCO .json file, the output name to the new name of your .jsonl file, and the base_url to the name of the datastore where your images and COCO .json file are.

!python coco2jsonl.py --input_coco_file_path "./<insert json file name here>.json" --output_dir "." --output_file_name "<insert new file name here>.jsonl" --task_type "ObjectDetection" --base_url "AmlDatastore://<insert datastore name here>/"

### Upload the JSONL file and images to Datastore
In order to use the data for training in Azure ML, we upload it to our Azure ML Workspace via a Datastore. The datastore provides a mechanism for you to upload/download data, and interact with it from your remote compute targets. It is an abstraction over Azure Storage.

In [None]:
# Retrieving default datastore that got automatically created when we setup a workspace. Ensure the datastore name and .jsonl name (used above) are correct below.

from azureml.core import Workspace, Datastore
ds = Datastore.get(ws, '<insert datastore name here>')
print(ds.name)
ds.upload_files(files = ['<insert jsonl file name created above here>.jsonl'], overwrite=True)

Finally, we need to create an Azure ML Dataset from the data we uploaded to the Datastore. We create one dataset for training and one for validation.

In [None]:
# give a name to the dataset that'll be created from your converted .jsonl file and ensure the name of the .jsonl file specified below is correct.

from azureml.contrib.dataset.labeled_dataset import _LabeledDatasetFactory, LabeledDatasetTask
from azureml.core import Dataset

training_dataset_name = '<create dataset name here>'
if training_dataset_name in ws.datasets:
    training_dataset = ws.datasets.get(training_dataset_name)
    print('Found the training dataset', training_dataset_name)
else:
    # create training dataset
    training_dataset = _LabeledDatasetFactory.from_json_lines(
        task=LabeledDatasetTask.OBJECT_DETECTION, path=ds.path('<insert jsonl name here>.jsonl'))
    training_dataset = training_dataset.register(workspace=ws, name=training_dataset_name)
    
    
print("Training dataset name: " + training_dataset.name)

Validation dataset is optional. If no validation dataset is specified, by default 20% of your training data will be used for validation. You can control the percentage using the split_ratio argument - please refer to the documentation for more details.

This is what the training dataset looks like:

In [None]:
# check the dataframe looks correct

training_dataset.to_pandas_dataframe()

### Configuring your AutoML run for image tasks
AutoML allows you to easily train models for Image Classification, Object Detection & Instance Segmentation on your image data. You can control the model algorithm to be used, specify hyperparameter values for your model as well as perform a sweep across the hyperparameter space to generate an optimal model. Parameters for configuring your AutoML runs for image related tasks are specified using the AutoMLImageConfig - please refer to the documentation for the details on the parameters that can be used and their values.



When using AutoML for image tasks, you need to specify the model algorithms using the model_name parameter. You can either specify a single model or choose to sweep over multiple ones. Currently supported model algorithms for object detection: yolov5, fasterrcnn_resnet50_fpn, fasterrcnn_resnet34_fpn, fasterrcnn_resnet18_fpn, retinanet_resnet50_fpn.

#### Using default hyperparameter values for the specified algorithm
Before doing a large sweep to search for the optimal models and hyperparameters, we recommend trying the default values to get a first baseline. Next, you can explore multiple hyperparameters for the same model before sweeping over multiple models and their parameters. This is for employing a more iterative approach, because with multiple models and multiple hyperparameters for each (as we showcase in the next section), the search space grows exponentially and you need more iterations to find optimal configurations.

If you wish to use the default hyperparameter values for a given algorithm (say yolov5), you can specify the config for your AutoML Image runs as follows:

In [None]:

from azureml.train.automl import AutoMLImageConfig
from azureml.train.hyperdrive import GridParameterSampling, choice

image_config_yolov5 = AutoMLImageConfig(task='image-object-detection',
                                        compute_target=compute_target,
                                        training_data=training_dataset,
                                        hyperparameter_sampling=GridParameterSampling({'model_name': choice('yolov5')}))

#### Submitting an AutoML run for Image tasks
Once you've created the config settings for your run, you can submit an AutoML run using the config in order to train an image model using your training dataset.

#### Add tags to the experiment

Add tags to the experiment so the metadata can be used to parse training runs. At a minimum enter the tags listed below to the experiment.submit(automl_image_config, tags={})


In [None]:
# this submits the experiment to the GPU-cluster specified earlier. You can track the experiment progress in the experiment tab from Azure Studio.

automl_image_run = experiment.submit(image_config_yolov5)

### Hyperparameter sweeping for your AutoML models for image tasks
In this example, we use the AutoMLImageConfig to train an Object Detection model using yolov5 and fasterrcnn_resnet50_fpn, both of which are pretrained on COCO, a large-scale object detection, segmentation, and captioning dataset that contains over 200K labeled images with over 80 label cateogories.

When using AutoML for image tasks, you can perform a hyperparameter sweep over a defined parameter space, to find the optimal model. In this example, we sweep over the hyperparameters for each algorithm, choosing from a range of values for learning_rate, optimizer, lr_scheduler, etc, to generate a model with the optimal primary metric. If hyperparameter values are not specified, then default values are used for the specified algorithm.

We use Random Sampling to pick samples from this parameter space and try a total of 20 iterations with these different samples, running 4 iterations at a time on our compute target, which has been previously set up using 4 nodes. Please note that the more parameters the space has, the more iterations you need to find optimal models.

We also leverage the Bandit early termination policy that terminates poor performing configs (those that are not within 20% slack of the best perfroming config), thus significantly saving compute resources.



In [None]:
from azureml.train.automl import AutoMLImageConfig
from azureml.train.hyperdrive import GridParameterSampling, RandomParameterSampling, BayesianParameterSampling
from azureml.train.hyperdrive import BanditPolicy, HyperDriveConfig, PrimaryMetricGoal
from azureml.train.hyperdrive import choice, uniform

parameter_space = {
    'model': choice(
        {
            'model_name': choice('yolov5'),
            'learning_rate': uniform(0.0001, 0.01),
            #'model_size': choice('small', 'medium'), # model-specific
            'img_size': choice(640, 704, 768), # model-specific
        },
        {
            'model_name': choice('fasterrcnn_resnet50_fpn'),
            'learning_rate': uniform(0.0001, 0.001),
            #'warmup_cosine_lr_warmup_epochs': choice(0, 3),
            'optimizer': choice('sgd', 'adam', 'adamw'),
            'min_size': choice(600, 800), # model-specific
        }
    )
}

tuning_settings = {
    'iterations': 20, 
    'max_concurrent_iterations': 4, 
    'hyperparameter_sampling': RandomParameterSampling(parameter_space),  
    'policy': BanditPolicy(evaluation_interval=2, slack_factor=0.2, delay_evaluation=6)
}


automl_image_config = AutoMLImageConfig(task='image-object-detection',
                                        compute_target=compute_target,
                                        training_data=training_dataset,
                                        primary_metric='mean_average_precision',
                                        **tuning_settings)

#### Add tags to the experiment

Add tags to the experiment so the metadata can be used to parse training runs. At a minimum enter the tags listed below to the experiment.submit(automl_image_config, tags={})


In [None]:
# this submits the experiment to the GPU-cluster specified earlier. You can track the experiment progress in the experiment tab from Azure Studio.

automl_image_run = experiment.submit(automl_image_config)

When doing a hyperparameter sweep, it can be useful to visualize the different configurations that were tried using the HyperDrive UI. You can navigate to this UI by going to the 'Child runs' tab in the UI of the main automl_image_run from above, which is the HyperDrive parent run. Then you can go into the 'Child runs' tab of this one. Alternatively, here below you can see directly the HyperDrive parent run and navigate to its 'Child runs' tab:

In [None]:
# running this cell will produce the 

from azureml.core import Run
hyperdrive_run = Run(experiment=experiment, run_id=automl_image_run.id + '_HD')
hyperdrive_run

## Register the optimal model from the AutoML run
Once the run completes, we can register the model that was created from the best run (configuration that resulted in the best primary metric)

In [None]:
# Register the model from the best run

best_child_run = automl_image_run.get_best_child()
model_name = best_child_run.properties['model_name']
model = best_child_run.register_model(model_name = model_name, model_path='outputs/model.pt')

#### Deploy model as a web service
Once you have your trained model, you can deploy the model on Azure. You can deploy your trained model as a web service on Azure Container Instances (ACI) or Azure Kubernetes Service (AKS). ACI is the perfect option for testing deployments, while AKS is better suited for for high-scale, production usage.
In this tutorial, we will deploy the model as a web service in AKS.

You will need to first create an AKS compute cluster, or use an existing AKS cluster. You can use either GPU or CPU VM SKUs for your deployment cluster

In [None]:
from azureml.core.compute import ComputeTarget, AksCompute
from azureml.exceptions import ComputeTargetException

# Choose a name for your cluster
aks_name = "cluster-aks-gpu"

# Check to see if the cluster already exists
try:
    aks_target = ComputeTarget(workspace=ws, name=aks_name)
    print('Found existing compute target')
except ComputeTargetException:
    print('Creating a new compute target...')
    # Provision AKS cluster with GPU machine
    prov_config = AksCompute.provisioning_configuration(vm_size="STANDARD_NC6", 
                                                        location="australiaeast")
    # Create the cluster
    aks_target = ComputeTarget.create(workspace=ws, 
                                      name=aks_name, 
                                      provisioning_configuration=prov_config)
    aks_target.wait_for_completion(show_output=True)

Next, you will need to define the inference configuration, that describes how to set up the web-service containing your model. You can use the scoring script and the environment from the training run in your inference config.

Note: To change the model's settings, open the downloaded scoring script and modify the model_settings variable before deploying the model.

In [None]:
from azureml.core.model import InferenceConfig

best_child_run.download_file('outputs/scoring_file_v_1_0_0.py', output_file_path='score.py')
environment = best_child_run.get_environment()
inference_config = InferenceConfig(entry_script='score.py', environment=environment)

You can then deploy the model as an AKS web service.

In [None]:
# Deploy the model from the best run as an AKS web service
# remeber to give your aka service a name below 

from azureml.core.webservice import AksWebservice
from azureml.core.webservice import Webservice
from azureml.core.model import Model
from azureml.core.environment import Environment

aks_config = AksWebservice.deploy_configuration(autoscale_enabled=True,                                                    
                                                cpu_cores=1,
                                                memory_gb=50,
                                                enable_app_insights=True)

aks_service = Model.deploy(ws,
                           models=[model],
                           inference_config=inference_config,
                           deployment_config=aks_config,
                           deployment_target=aks_target,
                           name='kakadufish-aks-endpoint',
                           overwrite=True)
aks_service.wait_for_deployment(show_output=True)
print(aks_service.state)

## Test the web service
Finally, let's test our deployed web service to predict new images. You can pass in any image. In this case, we'll use a random image from the dataset and pass it to te scoring URI.

In [None]:
import requests

# URL for the web service
scoring_uri = aks_service.scoring_uri

# If the service is authenticated, set the key or token
key, _ = aks_service.get_keys()

sample_image = './000000001020.jpg'

# Load image data
data = open(sample_image, 'rb').read()

# Set the content type
headers = {'Content-Type': 'application/octet-stream'}

# If authentication is enabled, set the authorization header
headers['Authorization'] = f'Bearer {key}'

# Make the request and display the response
resp = requests.post(scoring_uri, data, headers=headers)
print(resp.text)

#### Visualize detections
Now that we have scored a test image, we can visualize the bounding boxes for this image

In [None]:
%matplotlib inline
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
import matplotlib.patches as patches
from PIL import Image
import numpy as np
import json

IMAGE_SIZE = (18,12)
plt.figure(figsize=IMAGE_SIZE)
img_np=mpimg.imread(sample_image)
img = Image.fromarray(img_np.astype('uint8'),'RGB')
x, y = img.size

fig,ax = plt.subplots(1, figsize=(15,15))
# Display the image
ax.imshow(img_np)

# draw box and label for each detection 
detections = json.loads(resp.text)
for detect in detections['boxes']:
    label = detect['label']
    box = detect['box']
    conf_score = detect['score']
    if conf_score > 0.6:
        ymin, xmin, ymax, xmax =  box['topY'],box['topX'], box['bottomY'],box['bottomX']
        topleft_x, topleft_y = x * xmin, y * ymin
        width, height = x * (xmax - xmin), y * (ymax - ymin)
        print('{}: [{}, {}, {}, {}], {}'.format(detect['label'], round(topleft_x, 3), 
                                                round(topleft_y, 3), round(width, 3), 
                                                round(height, 3), round(conf_score, 3)))

        color = np.random.rand(3) #'red'
        rect = patches.Rectangle((topleft_x, topleft_y), width, height, 
                                 linewidth=3, edgecolor=color,facecolor='none')

        ax.add_patch(rect)
        plt.text(topleft_x, topleft_y - 10, label, color=color, fontsize=20)

plt.show()