# Object Segmenation on Azure Stack Hub Clusters

For this tutorial, we will fine tune a pre-trained [Mask R-CNN](https://arxiv.org/abs/1703.06870) model in the [Penn-Fudan Database for Pedestrian Detection and Segmentation](https://www.cis.upenn.edu/~jshi/ped_html/). It contains 170 images with 345 instances of pedestrians, and we will use it  to train an instance segmentation model on a custom dataset defined as PennFudanDataset in aml_src/obj_segment_step_training.py. You can learn more details at
[here](https://pytorch.org/tutorials/intermediate/torchvision_tutorial.html)


You will use [Azure Machine Learning Pipelines](https://aka.ms/aml-pipelines) to define two pipeline steps: a data process step which split data into training and testing, and training step which trains and evaluates the model.  The trained model then registered to your AML workspace.


After the model is registered, you then deploy the model for testing using Azure Kubernetes Cluster (AKS).

This notebook  uses ASH storage and ASH cluster (ARC compute) for training, please make sure the following prerequisites are met.

## Prerequisite

*     A Kubernetes cluster deployed on Azure Stack Hub, connected to Azure through ARC.
     
   For details on how to deploy kubernetes cluster on Azure Stack Hub and enabling ARC connection to Azure, please follow [this guide](https://github.com/Azure/AML-Kubernetes/blob/master/docs/ASH/AML-ARC-Compute.md)
  

*     Datastore setup in Azure Machine Learning workspace backed up by Azure Stack Hub storage account.

   [This document](https://github.com/Azure/AML-Kubernetes/blob/master/docs/ASH/Train-AzureArc.md) is a detailed guide on how to create Azure Machine Learning workspace, create a  Azure Stack Hub Storage account, and setup datastore in AML workspace with ASH storage account.


*      Last but not least, you need to be able to run a Notebook. 

   If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, make sure you go through the configuration of notebooks located at [here](https://github.com/Azure/MachineLearningNotebooks) first if you haven't. This sets you up with a working config file that has information on your workspace, subscription id, etc.

In [None]:
import os
from azureml.core import Workspace,Environment, Experiment, Datastore

from azureml.pipeline.core import Pipeline
from azureml.pipeline.steps import PythonScriptStep
from azureml.core.runconfig import RunConfiguration

### Create Workspace

Initialize a [Workspace](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#workspace) object from the existing workspace you created in the Prerequisites step. `Workspace.from_config()` creates a workspace object from the details stored in `config.json`. 

If you haven't done already please go to `config.json` file and fill in your workspace information.

In [None]:
ws = Workspace.from_config()

## Create or attach existing ArcKubernetesCompute

The attaching code here depends  python package azureml-contrib-k8s which current is in private preview. Install private preview branch of AzureML SDK by running following command (private preview):

<pre>
pip install --disable-pip-version-check --extra-index-url https://azuremlsdktestpypi.azureedge.net/azureml-contrib-k8s-preview/D58E86006C65 azureml-contrib-k8s
</pre>

In [None]:
from azureml.contrib.core.compute.arckubernetescompute import ArcKubernetesCompute

resource_id = "<resource_id>"

attach_name = "arcattached"

attach_config = ArcKubernetesCompute.attach_configuration(resource_id=resource_id)

attach_result = ArcKubernetesCompute.attach(ws, attach_name, attach_config)

attach_result.wait_for_completion(show_output=True)

print(attach_result)

compute_target = ws.compute_targets[attach_name]

## Register Dataset

After downloading and extracting the zip file from [Penn-Fudan Database for Pedestrian Detection and Segmentation](https://www.cis.upenn.edu/~jshi/ped_html/) to your local machine, make sure you will have the following folder structure:

<pre>
PennFudanPed/
  PedMasks/
    FudanPed00001_mask.png
    FudanPed00002_mask.png
    FudanPed00003_mask.png
    FudanPed00004_mask.png
    ...
  PNGImages/
    FudanPed00001.png
    FudanPed00002.png
    FudanPed00003.png
    FudanPed00004.png
</pre>

Here PennFudanPed is a sub-folder directly under working folder of this note book.

Now you are ready to upload the data and register it as dataset in AML workspace in the following code (it takes about 1 min):

In [None]:
from azureml.core import Workspace, Dataset, Datastore

dataset_name = "pennfudan"
if dataset_name not  in ws.datasets:
    
    datastore_name = "ashstore"
    datastore =  Datastore.get(ws, datastore_name)
    
    src_dir, target_path = 'PennFudanPed', 'PennFudanPed'
    datastore.upload(src_dir, target_path, overwrite=True)

    # register data uploaded as AML dataset
    datastore_paths = [(datastore, target_path)]
    pd_ds = Dataset.File.from_files(path=datastore_paths)
    pd_ds.register(ws, dataset_name, "for Pedestrian Detection and Segmentation")

## Create a Training-Test split data process Step

For this pipeline run, you will use two pipeline steps.  The first step is to split dataset into training and testing.

In [None]:
# create run_config first
datastore_name = "ashstore"
datastore =  Datastore.get(ws, datastore_name)

env = Environment.from_dockerfile(
        name='pytorch-obj-seg',
        dockerfile='./aml_src/Dockerfile.gpu',
        conda_specification='./aml_src/conda-env.yaml')

aml_run_config = RunConfiguration()
aml_run_config.target = compute_target.name
aml_run_config.environment = env

source_directory = './aml_src'


# add a data process step

dataset = ws.datasets[dataset_name]

from azureml.data import OutputFileDatasetConfig

dest = (datastore, None)

train_split_data = OutputFileDatasetConfig(name="train_split_data", destination=dest).as_upload(overwrite=False)
test_split_data = OutputFileDatasetConfig(name="test_split_data", destination=dest).as_upload(overwrite=False)

split_step = PythonScriptStep(
    name="Train Test Split",
    script_name="obj_segment_step_data_process.py",
    arguments=["--data-path", dataset.as_named_input('pennfudan_data').as_mount(),
               "--train-split", train_split_data, "--test-split", test_split_data,
               "--test-size", 50],
    compute_target=compute_target,
    runconfig=aml_run_config,
    source_directory=source_directory,
    allow_reuse=False
)

## Create Training Step

In [None]:
train_step = PythonScriptStep(
        name="training_step",
        script_name="obj_segment_step_training.py",
        arguments=[
            "--train-split", train_split_data.as_input(), "--test-split", test_split_data.as_input(),
            '--epochs', 1,  # 80
        ],

        compute_target=compute_target,
        runconfig=aml_run_config,
        source_directory=source_directory,
        allow_reuse=True
    )
    

## Create Experiment and Submit Pipeline Run

The split step takes about 8 mins. Training step takes about 25 mins per epoch for  vm comparable to Standard_DS3_v2

In [None]:
experiment_name = 'obj_seg_step'
experiment = Experiment(workspace=ws, name=experiment_name)
pipeline_steps = [train_step]

pipeline = Pipeline(workspace=ws, steps=pipeline_steps)
print("Pipeline is built.")

pipeline_run = experiment.submit(pipeline, regenerate_outputs=False)
pipeline_run.wait_for_completion()


## Register Model

The model saved at "'outputs/obj_segmentation.pkl'" is registered as   'obj_seg_model_aml'.  It contains both model parameters and network which are used by AML deployment and serving. 

In [None]:
train_step_run = pipeline_run.find_step_run(train_step.name)[0]

model_name = 'obj_seg_model_aml_60' 
train_step_run.register_model(model_name=model_name, model_path='outputs/obj_segmentation.pkl')

## Test Registered Model

To test the trained model, you can create (or use existing) a AKS cluster for serving the model using AML deployment

### Create (or use existing) a AKS cluster for serving the model

In [None]:
from azureml.core import Environment, Workspace, Model, ComputeTarget
from azureml.core.compute import AksCompute
from azureml.core.model import InferenceConfig
from azureml.core.webservice import Webservice, AksWebservice
from azureml.core.compute_target import ComputeTargetException
from PIL import Image
from torchvision.transforms import functional as F
import numpy as np
import json

In [None]:
ws = Workspace.from_config()

# Choose a name for your AKS cluster
aks_name = 'aks-service-2'

if aks_name not in  ws.compute_targets:
    # Use the default configuration (can also provide parameters to customize)
    prov_config = AksCompute.provisioning_configuration()

    # Create the cluster
    aks_target = ComputeTarget.create(workspace = ws,
                                    name = aks_name,
                                    provisioning_configuration = prov_config)
    is_new_compute  = True

    if aks_target.get_status() != "Succeeded":
        aks_target.wait_for_completion(show_output=True)
else:  
    aks_target =  ws.compute_targets[aks_name]   
    is_new_compute  = False
    
print("using compute target: ", aks_target.name)

### Deploy the model

In [None]:
env = Environment.from_dockerfile(
        name='pytorch-obj-seg',
        dockerfile='./aml_src/Dockerfile.gpu',
        conda_specification='./aml_src/conda-env.yaml')

env.inferencing_stack_version='latest'

inference_config = InferenceConfig(entry_script='score.py', environment=env)
deploy_config = AksWebservice.deploy_configuration()

deployed_model = "obj_seg_model_aml" # model_name
model = ws.models[deployed_model]

service_name = 'objservice'

service = Model.deploy(workspace=ws,
                       name=service_name,
                       models=[model],
                       inference_config=inference_config,
                       deployment_config=deploy_config,
                       deployment_target=aks_target,
                       overwrite=True)

service.wait_for_deployment(show_output=True)


### Test the trained model Using the Deployed Service

In [None]:
img_nums = ["00001"]
image_paths = ["PennFudanPed\\PNGImages\\FudanPed{}.png".format(item) for item in img_nums]
image_np_list = []
for image_path in image_paths:
    img = Image.open(image_path)
    img.show("input_image")
    img_rgb = img.convert("RGB")
    img_tensor = F.to_tensor(img_rgb)
    img_np = img_tensor.numpy()
    image_np_list.append(img_np.tolist())

inputs = json.dumps({"instances": image_np_list})
resp = service.run(inputs)
predicts = resp["predictions"]

for instance_pred in predicts:
    print("labels", instance_pred["labels"])
    print("boxes", instance_pred["boxes"])
    print("scores", instance_pred["scores"])
    
    image_data = instance_pred["masks"]
    img_np = np.array(image_data)
    output = Image.fromarray(img_np)
    output.show()

### Delete the newly created cluster

Note: This is important if you wish to avoid the cost of this cluster

In [None]:
if is_new_compute:
    aks_target.delete()