# Object Segmenation with PyTorch Using Transfer Learning

For this tutorial, we will fine tune a pre-trained [Mask R-CNN](https://arxiv.org/abs/1703.06870) model in the [Penn-Fudan Database for Pedestrian Detection and Segmentation](https://www.cis.upenn.edu/~jshi/ped_html/). It contains 170 images with 345 instances of pedestrians, and we will use it  to train an instance segmentation model on a custom dataset defined as PennFudanDataset in aml_src/obj_segment_step_training.py. You can learn more details at
[here](https://pytorch.org/tutorials/intermediate/torchvision_tutorial.html)


You will use [Azure Machine Learning Pipelines](https://aka.ms/aml-pipelines) to define two pipeline steps: a data process step which split data into training and testing, and training step which trains and evaluates the model.  The trained model then registered to your AML workspace.


After the model is registered, you then deploy the model for testing using Azure Kubernetes Cluster (AKS).

This notebook  uses ASH storage and ASH cluster (ARC compute) for training, please make sure the following prerequisites are met.

## Prerequisites

* [Setup Azure Arc-enabled Machine Learning Training and Inferencing on AKS on Azure Stack HCI](https://github.com/Azure/AML-Kubernetes/tree/master/docs/AKS-HCI/AML-ARC-Compute.md)

* [Setup NFS Server on Azure Stack HCI and Use your Data and run managed Machine Learning Experiments On-Premises](https://github.com/Azure/AML-Kubernetes/tree/master/docs/AKS-HCI/Train-AzureArc.md)


* Last but not least, you need to be able to run a Notebook. 

  If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, make sure you go through the configuration Notebook located at [here](https://github.com/Azure/MachineLearningNotebooks) first. This sets you up with a working config file that has information on your workspace, subscription id, etc.

In [None]:
import os
from azureml.core import Workspace,Environment, Experiment, Datastore

from azureml.pipeline.core import Pipeline, StepSequence
from azureml.pipeline.steps import PythonScriptStep
from azureml.core.runconfig import RunConfiguration

### Create AzureML workspace

Initialize a [Workspace](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#workspace) object from the existing workspace you created in the Prerequisites step. `Workspace.from_config()` creates a workspace object from the details stored in `config.json`. 

If you haven't done already please go to `config.json` file and fill in your workspace information.

In [None]:
ws = Workspace.from_config()

print('Workspace name: ' + ws.name, 
      'Azure region: ' + ws.location, 
      'Subscription id: ' + ws.subscription_id, 
      'Resource group: ' + ws.resource_group, sep='\n')

## Setup compute target

Find the attach name for the Arc enabled AKS-HCI in your AzureML workspace.

attach_name is the attached name for your AKS-HCI cluster you setup in [this step](https://github.com/Azure/AML-Kubernetes/blob/master/docs/AKS-HCI/AML-ARC-Compute.md#attach-your-azure-arc-enabled-cluster-to-your-azure-machine-learning-workspace-as-a-compute-target)

In [None]:
from azureml.core.compute import KubernetesCompute

attach_name = "arc-compute3"
arcK_target = KubernetesCompute(ws, attach_name)
print(f"compute target id in endpoint yaml: azureml:{arcK_target.name}, instance type name in deployment yaml: {arcK_target.default_instance_type}")

## Prepare the dataset to NFS server (Optional)

After downloading and extracting the zip file from [Penn-Fudan Database for Pedestrian Detection and Segmentation](https://www.cis.upenn.edu/~jshi/ped_html/) to your local machine, make sure you will have the following folder structure:

<pre>
PennFudanPed/
  PedMasks/
    FudanPed00001_mask.png
    FudanPed00002_mask.png
    FudanPed00003_mask.png
    FudanPed00004_mask.png
    ...
  PNGImages/
    FudanPed00001.png
    FudanPed00002.png
    FudanPed00003.png
    FudanPed00004.png
</pre>

Here PennFudanPed is a sub-folder directly under working folder of this notebook.

In [None]:
nfs_mount_path = "<NFS Mount Point on notebook execution machine>"
downloaded_folder = os.path.join(os.getcwd(), 'PennFudanPed')

import os, shutil
penn_dir = os.path.join(nfs_mount_path, 'PennFudanPed')
shutil.rmtree(penn_dir, ignore_errors=True)

def copyFiles(source_folder, dest_folder):
    os.makedirs(dest_folder, exist_ok=True)
    for filename in os.listdir(source_folder):
        filepath = os.path.join(source_folder, filename)
        destpath = os.path.join(dest_folder, filename)
        if os.path.isdir(filepath):
            copyFiles(filepath, destpath)
        else:
            print(f"Copying files from {filepath} to {destpath}")
            shutil.copyfile(filepath, destpath)

copyFiles(downloaded_folder, penn_dir)

## Create a training-test split data process step

For this pipeline run, you will use two pipeline steps.  The first step is to split dataset into training and testing.

In [None]:
# create run_config first
# data_folder = "<MountPathOnTrainingPod>"+"/PennFudanPed"
data_folder = "/nfs_share"+"/PennFudanPed"

env = Environment.from_dockerfile(
        name='pytorch-obj-seg',
        dockerfile='./aml_src/Dockerfile.gpu',
        conda_specification='./aml_src/conda-env.yaml')

aml_run_config = RunConfiguration()
aml_run_config.target = arcK_target
aml_run_config.environment = env

source_directory = './aml_src'

# add a data process step
import helpers

output_folder = "/nfs_share" + "/" + helpers.randFolderName()
print(f"output_folder: {output_folder}")

train_split_data = output_folder + "/" + "train_split_data"
test_split_data = output_folder + "/" + "test_split_data"

split_step = PythonScriptStep(
    name="Train Test Split",
    script_name="obj_segment_step_data_process.py",
    arguments=["--data-path", data_folder,
               "--train-split", train_split_data, "--test-split", test_split_data,
               "--test-size", 50],
    compute_target=arcK_target,
    runconfig=aml_run_config,
    source_directory=source_directory,
    allow_reuse=False
)

## Create training step

In [None]:
train_step = PythonScriptStep(
        name="training_step",
        script_name="obj_segment_step_training.py",
        arguments=[
            "--train-split", train_split_data, "--test-split", test_split_data,
            '--epochs', 1,  # 80
        ],

        compute_target=arcK_target,
        runconfig=aml_run_config,
        source_directory=source_directory,
        allow_reuse=True
    )
    

## Create experiment and submit pipeline run

The split step takes about 8 mins. Training step takes about 25 mins per epoch for  vm comparable to Standard_DS3_v2

In [None]:
experiment_name = 'obj_seg_seq_step'
experiment = Experiment(workspace=ws, name=experiment_name)

step_sequence = StepSequence(steps=[split_step, train_step])

pipeline = Pipeline(workspace=ws, steps=step_sequence)
print("Pipeline is built.")

pipeline_run = experiment.submit(pipeline, regenerate_outputs=False)
pipeline_run.wait_for_completion()


## Register the model

Register the trained model.

In [None]:
train_step_run = pipeline_run.find_step_run(train_step.name)[0]

model_name = 'obj_seg_model_aml' 
train_step_run.register_model(model_name=model_name, model_path='outputs/obj_segmentation.pkl')

## Get the model

In [None]:
from azureml.core.model import Model
model = Model(ws, model_name)
model_id = f"azureml:{model.name}:{model.version}"
print(f"Get {model.name}, latest version {model.version}, id in deployment.yml: {model_id}")

The machine learning model named "obj_seg_model_aml" should be registered in your AzureML workspace.

## Test the registered model

To test the trained model, you can use AKS-HCI cluster for serving the model using AML deployment.

### Deploy the model

In [None]:
# endpoint = '<pytorch-obj-seg endpoint name>'
endpoint = 'pytorch-obj-seg-jiadu'

import os
from pathlib import Path
prefix = Path(os.getcwd())
endpoint_file = str(prefix.joinpath("endpoint.yml"))
deployment_file = str(prefix.joinpath("deployment.yml"))
print(f"Using Endpoint file: {endpoint_file}, Deployment file: {deployment_file} please replace <modelId> (e.g. azureml:obj_seg_model_aml:1), <instanceTypeName> (e.g. defaultInstanceType) and <computeTargetName> (e.g. azureml:amlarc-compute) according above output")

Need to **replace the properties in deployment.yml**, including,
* `<modelId>`: example value: azureml:obj_seg_model_aml:1
* `<instanceTypeName>`: example value: defaultInstanceType

Need to **replace the properties in endpoint.yml**, including,
* `<computeTargetName>`: example value: azureml:amlarc-compute

In [None]:
import helpers
from azureml.core.workspace import Workspace

ws = Workspace.from_config()
print('Workspace name: ' + ws.name, 
      'Azure region: ' + ws.location, 
      'Subscription id: ' + ws.subscription_id, 
      'Resource group: ' + ws.resource_group, sep='\n')


In [None]:
helpers.run(f"az ml online-endpoint create -n {endpoint} -f {endpoint_file} -w {ws.name} -g {ws.resource_group}")

In [None]:
helpers.run(f"az ml online-endpoint show -n {endpoint} -w {ws.name} -g {ws.resource_group}")

In [None]:
helpers.run(f"az ml online-deployment create -n blue --endpoint {endpoint} -f {deployment_file} -w {ws.name} -g {ws.resource_group} --all-traffic")

### Test with inputs

For testing purpose, you may take the first image FudanPed00001.png as example. This image looks like this ![fishy](FudanPed00001.png)

In [None]:
# get score_url and access_token from AZ CLI
import helpers
from azureml.core.workspace import Workspace
ws = Workspace.from_config()
cmd = f"az ml online-endpoint show -n {endpoint} -w {ws.name} -g {ws.resource_group}"
properties = helpers.run(cmd, return_output=True, no_output=True)

cmd = f"az ml online-endpoint get-credentials -n {endpoint} -w {ws.name} -g {ws.resource_group}"
credentials = helpers.run(cmd, return_output=True, no_output=True)

print(f"Got endpoint and credentials.")

In [None]:
import json
prop_response = json.loads(properties.replace(os.linesep,""))
score_uri = prop_response["scoring_uri"]

cred_response = json.loads(credentials.replace(os.linesep, ""))
access_token = cred_response["accessToken"]

In [None]:
from PIL import Image
from torchvision.transforms import functional as F

image_paths = ["FudanPed00001.png"]
image_np_list = []
for image_path in image_paths:
    img = Image.open(image_path)
    img_rgb = img.convert("RGB")
    img_tensor = F.to_tensor(img_rgb)
    img_np = img_tensor.numpy()
    image_np_list.append(img_np.tolist())

inputs = json.dumps({"instances": image_np_list})

import requests
headers = {'Content-Type': 'application/json', 'Authorization': f"Bearer {access_token}"}
r = requests.post(score_uri, data=inputs, headers=headers)
predicts = r.json()["predictions"]

import numpy as np
for instance_pred in predicts:
    print("labels", instance_pred["labels"])
    print("boxes", instance_pred["boxes"])
    print("scores", instance_pred["scores"])
    
    image_data = instance_pred["masks"]
    img_np = np.array(image_data)
    output_mask = Image.fromarray(img_np)
    output_mask.show() #show the image
    output_mask.save("predict_mask.png")