# LGP defect detection in AI Core -  Part 2

The figure below summarizes the different steps one needs to go through to train and deploy a ML model in AI Core.

<img src="../../../../resources/AICoreMLOps.png" width="900">

In the [previous notebook](defect-detection-part1.ipynb), we have already took care of Section 0 and 1: we have connected a GitHub repository and a Docker Registry to the AI Core instance and we have created a resource group dedicated to our defect detection task. Moreover an AWS S3 storage bucket with our input image data has been connected to this resource group.  

In this notebook we will see how to **train, deploy and use the model for inference in SAP AI Core**.

### Before getting started: color conventions

The comments within the notebook will guide you to the required steps. Pay attention to the color conventions:

* <span style="color:magenta"> **Magenta text**  </span> indicates that you have to open certain json files and modify them according to your own set up, for instance you can be asked to enter credentials for a certain system, change names for the variables etc.  
* <span style="color:blue"> **Blue text**  </span> indicates that you have to execute commands on a terminal. 
* <span style="color:green"> **Green text** </span> indicates that you are asked to modify something in the following notebook cell. 


# Create an AI API client instance


In [None]:
import sys, os
import json
from json import dumps
import requests
import base64
from base64 import b64encode, b64decode
import time
import yaml
from IPython.display import clear_output
from pprint import pprint
import ast
import re
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
import itertools
import cv2
import glob
import io


from ai_api_client_sdk.ai_api_v2_client import AIAPIV2Client
from ai_api_client_sdk.models.artifact import Artifact
from ai_api_client_sdk.models.status import Status
from ai_api_client_sdk.models.target_status import TargetStatus
from ai_api_client_sdk.models.parameter_binding import ParameterBinding
from ai_api_client_sdk.models.input_artifact_binding import InputArtifactBinding


First of all, we need to create an AI API client instance, which will allow us to interact with our SAP AI Core tenant. You might remember we have done the same at the beginning ofthe [previous notebook](defect-detection-part1.ipynb). <span style="color:magenta">Before executing the code, double check </span> **[aic_service_key.json](./files/aic_service_key.json)** <span style="color:magenta">contains the correct credentials</span>. 

In [None]:
aic_service_key = "./files/aic_service_key.json" 
with open(aic_service_key) as ask:
    aic_s_k = json.load(ask)

ai_api_client = AIAPIV2Client(
    base_url=aic_s_k["serviceurls"]["AI_API_URL"] + "/v2",
    auth_url=aic_s_k["url"] + "/oauth/token",
    client_id=aic_s_k['clientid'],
    client_secret=aic_s_k['clientsecret']
)

## Train the defect detection model

### Create a training docker image

In order to execute a training pipeline in AI Core, you need to dockerize your training code and push the docker image to your docker registry. You should have already linked your docker registry to SAP AI Core during the previous exercises. <span style="color:blue"> You should be then ready to build and push your docker image to the docker registry. To do that, open a terminal in Jupyter (go to Files at the top-left of the Jupyter Notebook, then select New > Terminal) and type the following commands line-by-line. Note that after the first command, you will be asked for a password to log in to your Docker. Copy and paste your Docker access token (do not use your docker password). After you have copied and pasted you will see that the cursor has not moved or changed, even though the token will have been entered. Click enter.  You should see a response Login Succeeded.<span>
 
```sh
cd YOUR_PATH_TO/btp-ai-sustainability-bootcamp/src/ai-models/defect-detection/code/train_seg    
docker login docker.io -u YOUR_DOCKER_USERNAME
docker buildx build -o type=docker --platform=linux/amd64 -t YOUR_DOCKER_USERNAME/image-seg-train:mobilenet .
docker push docker.io/YOUR_DOCKER_USERNAME/image-seg-train:mobilenet
```

### Create a training workflow and register it as an application

After having prepared the docker image, you need to create a training workflow in the GitHub repository associated to our AI Core instance. The yaml file needs to be uploaded in a dedicated folder of the GitHub repository. Then you can register your application as shown below. Before executing the code, you need to:

* <span style="color:magenta">Edit your training workflow yaml file </span> in [./files/training_workflow.yaml](./files/training_workflow.yaml). It should point to your own AI scenario metadata, your own docker registry etc., so make sure you adapt the following line:
    - line 37 - <span style="color:magenta">"docker.io/YOUR_DOCKER_USERNAME/image-seg-train:mobilenet"</span> -  enter your Docker username


 
* <span style="color:blue">Create a dedicated folder in your GitHub repository and load there the training workflow yaml file. To do that, navigate to the terminal tab you have previously opened on Jupyter and type the following commands line-by-line.

```sh
cd PATH_TO_YOUR_GITHUB_REPO
mkdir workflows_defect_det
cp ../btp-ai-sustainability-bootcamp/src/ai-models/defect-detection/exercises/files/training_workflow.yaml \
workflows_defect_det
git pull
git add workflows_defect_det
git commit -m "add a new workflow folder"
git push
```


* <span style="color:magenta"> Open </span>[./files/git_setup.json](./files/git_setup.json) <span style="color:magenta"> and adjust the app section to reflect your workflow folder name, app name, and GitHub repository URL. </span>

#### Check out the available repositories

In [None]:
ai_api_client.rest_client.get(
    path="/admin/repositories"
)

#### Create a new application

In [None]:
# Loads your git_setup.json
with open('./files/git_setup.json') as gs:
    setup_json = json.load(gs)
    
# Registers the directory as app
app_json = setup_json["app"]
response = ai_api_client.rest_client.post(
    path="/admin/applications",
    body={
        "applicationName": app_json["applicationName"],
        "repositoryUrl": app_json["repositoryUrl"],
        "revision": app_json["revision"],
        "path": app_json["path"]
    }
)

It is always a good practice to check the synchronization of the workflows. 
Please, keep in mind that the synchronization is triggered by any change to the yaml files pushed to GitHub and that AI Core checks every 3 minutes for new files or changes.

In [None]:
with open('./files/git_setup.json') as gs:
    setup_json = json.load(gs)
app_json = setup_json["app"]
app_name = app_json["applicationName"]

ai_api_client.rest_client.get(
    path=f"/admin/applications/{app_name}/status"
)

### Choose a scenario and register the input dataset as an artifact

We now need to create a second API client linked to our resource group. <span style="color:green"> Please, check the **resource_group** name you have created in the first notebook</span>.

In [None]:
resource_group = 'defect-det'

aic_service_key = "./files/aic_service_key.json" # ENSURE YOU HAVE THE FILE PLACED CORRECTLY
with open(aic_service_key) as ask:
    aic_s_k = json.load(ask)

ai_api_lm = AIAPIV2Client(
    base_url=aic_s_k["serviceurls"]["AI_API_URL"] + "/v2/lm",
    auth_url=aic_s_k["url"] + "/oauth/token",
    client_id=aic_s_k['clientid'],
    client_secret=aic_s_k['clientsecret'],
    resource_group=resource_group)

The workflow yaml file that we have uploaded on GitHub contains specifications for an AI scenario, which is created as soon as AI Core syncrhonizes with the GitHub repo. The scenario appears also in the AI Launchpad. 
The cell below registers our input data as an artifact under this scenario.

In [None]:
## Load training_workflow.yaml
training_workflow_file = './files/training_workflow.yaml'
with open(training_workflow_file) as twf:
    training_workflow = yaml.safe_load(twf)

# Load scenario id from train_workflow.yaml
scenario_id = training_workflow['metadata']['labels']['scenarios.ai.sap.com/id']
#
# Set the artifact configuration
artifact = {
        "name": "image-data", # Modifiable name
        "kind": Artifact.Kind.DATASET,
    
        "url": "ai://default/data",  
    
        "description":  "Light guide plate dataset",
        "scenario_id": scenario_id
    }
# Store the artifact response to retrieve the id for the training configuration
artifact_resp = ai_api_lm.artifact.create(**artifact)
print(f"Artifacts registered for {scenario_id} scenario!")
pprint(vars(artifact_resp))
#
# Checks if the message contains expected string
assert artifact_resp.message == 'Artifact acknowledged'

### Create a training configuration

Everything is now ready to create a training configuration. This will instruct AI Core about the scenario, the executable, and the input data we want to be used for the execution.

In [None]:
# Load training_workflow.yaml
training_workflow_file =  "./files/training_workflow.yaml"
with open(training_workflow_file) as twf:
    training_workflow = yaml.safe_load(twf)
    
# Load scenario id from train_workflow.yaml
scenario_id = training_workflow['metadata']['labels']['scenarios.ai.sap.com/id']
    

<span style="color:red"> **Please wait ~ 5 minutes before you execute the following cell.**  </span> For the configuration to be created successfully, AI Core must have completed the synchronization with the GitHub repo where we have created the template. If AI Core is not yet synced, you will get an error. In that case, try again after a few minutes.  

In [None]:
input_artifact_name = training_workflow['spec']['templates'][0]['inputs']['artifacts'][0]['name']
executable_name = training_workflow['metadata']['name']

artifact_binding = {
    "key": input_artifact_name,
    "artifact_id": vars(artifact_resp)['id']
}

train_configuration = {
    "name": "image-training-configuration",
    "scenario_id": scenario_id,
    "executable_id": executable_name,
    "parameter_bindings": [],
    "input_artifact_bindings": [ InputArtifactBinding(**artifact_binding) ]
}

# store the configuration response to access the id to create an execution
train_config_resp = ai_api_lm.configuration.create(**train_configuration)
pprint(vars(train_config_resp))

assert train_config_resp.message == 'Configuration created'

print("Configuration created for running the training")

### Create a training execution

Let's use the execution API to launch the training

In [None]:
execution_resp = ai_api_lm.execution.create(train_config_resp.id)
pprint(vars(execution_resp))

#### Observe the training status

We can also use the **execution.get** API to monitor the status of the training. This operation will take several minutes. Notice that the execution produces an output artifact, the trained model, which gets its own id, name and url. This artifact will be used as input for the model deployment. 

In [None]:
status = None
while status != Status.COMPLETED and status != Status.DEAD:
    # Sleep for 5 secs to avoid overwhelming the API with requests
    time.sleep(5)
    # Clear outputs to reduce clutter
    clear_output(wait=True)

    execution = ai_api_lm.execution.get(execution_resp.id)
    status = execution.status
    print('...... execution status ......', flush=True)
    print(f"Training status: {execution.status}")
    pprint(f"Training status details: {execution.status_details}")

if execution.status == Status.COMPLETED:
    print(f"Training complete for execution [{execution_resp.id}]!")
    output_artifact = execution.output_artifacts[0]
    training_output = {
        "id": output_artifact.id,
        "name": output_artifact.name,
        "url": output_artifact.url
    }
    with open('training_output.json', 'w') as fp: #Save the reference to the model stored in S3
        json.dump(training_output, fp)

##### Metrics and performance

The metrics.query API allow us to inspect the training performance. 
In our training code, we have registered as metrics objects the loss function and IOU metric on the training and validation steps at each epoch of the training process. 
The loss and metric behavior as a function of the epoch are commonly used to check if the model training proceeded as expected. Let's plot them. 

In [None]:
filter_string = "executionId eq '" + execution_resp.id + "'"
metric_resp = ai_api_lm.metrics.query(execution_ids=execution_resp.id)

for m in metric_resp.resources:
    for metric in m.metrics:
        print(metric.name)
        print(metric.value)

In [None]:
all_metrics = []
for m in metric_resp.resources:
    for custom_info in m.custom_info:
        #print(custom_info.name)
        #print(custom_info.value)
        all_metrics.append(custom_info.value)

In [None]:
training_metrics = ast.literal_eval(all_metrics[0])
fig, axs = plt.subplots(1, 2, figsize=(20,5))

a = ast.literal_eval(training_metrics[0].get("loss"))
b = ast.literal_eval(training_metrics[1].get("val_loss"))
c = ast.literal_eval(training_metrics[2].get("iou"))
d = ast.literal_eval(training_metrics[3].get("val_iou"))

axs[0].plot(a)
axs[0].plot(b)
#axs[0].title.set_text('Training Loss vs Validation Loss')
axs[0].legend(['Train', 'Validation'], prop={'size': 20})

axs[1].plot(c)
axs[1].plot(d)
#axs[1].title.set_text('Training IoU vs Validation IoU')
axs[1].legend(['Train', 'Validation'], prop={'size': 20})

e=axs[0].set_xlabel('Epoch',fontsize=25)
e=axs[0].set_ylabel('Loss',fontsize=25)
e=axs[1].set_xlabel('Epoch',fontsize=25)
e=axs[1].set_ylabel('IoU',fontsize=25)

## Deploy the model

Now that the model is trained, le's see how the deployment works. The steps are similar to the ones we went through for the training phase: 
* create a docker image with the deployment code
* add a serving workflow by adding a dedicated yaml file on the GitHub repository
* specify the scenario for our deployment
* create a configuration and launch the deployment

### Create serving docker image

<span style="color:blue">In order to execute a deployment in AI Core, you need to dockerize your serving code and push the docker image to your docker registry. 
You can do so by executing the following commands:</span>

```sh
cd YOUR_PATH_TO/btp-ai-sustainability-bootcamp/src/ai-models/defect-detection/code/infer_seg
docker login docker.io -u YOUR_DOCKER_USERNAME
docker buildx build -o type=docker --platform=linux/amd64 -t YOUR_DOCKER_USERNAME/image-seg-infer:mobilenet .
docker push docker.io/YOUR_DOCKER_USERNAME/image-seg-infer:mobilenet
```

### Create a serving workflow and register it in SAP AI Core

After having prepared the docker image, you need to create a serving workflow in the github repository associated to our AI Core instance. 
We can upload the yaml file in the same folder we have created for the training workflow. Before executing the code, you need to:

* <span style="color:magenta">Edit your serving workflow yaml file </span> in [./files/serving_workflow.yaml](./files/serving_workflow.yaml). 
It should point to your own AI scenario name, your own docker registry and your own docker image, so make sure you adapt the following line:
    - line 36 - <span style="color:magenta">"docker.io/YOUR_DOCKER_USERNAME/image-seg-infer:mobilenet"</span> -  enter your Docker username



* <span style="color:blue">Copy the yaml file in your GitHub repository:</span> 

```sh
cd PATH_TO_YOUR_GITHUB_REPO
cp ../btp-ai-sustainability-bootcamp/src/ai-models/defect-detection/exercises/files/serving_workflow.yaml \
workflows_defect_det
git pull
git add workflows_defect_det/serving_workflow.yaml 
git commit -m "add a new serving template"
git push
```


Let's check the synchronization of the new workflows:

In [None]:
with open('./files/git_setup.json') as gs:
    setup_json = json.load(gs)
app_json = setup_json["app"]
app_name = app_json["applicationName"]

ai_api_client.rest_client.get(
    path=f"/admin/applications/{app_name}/status"
)

Notice that in the serving yaml file we have specified the same AI scenario that we have created for the training template. As soon as the AI Core synchronizes with the Docker repo, a new serving executable will be then available under our defect detection scenario. This can also be double checked in AI Launchpad. 

### Create a serving configuration
Everything is now ready to create a serving configuration. This will instruct AI Core about the scenario, the executable, and the input artifact (trained model) we want to be used for the deployment. 

<span style="color:red"> **Please wait ~ 5 minutes before you execute the following cell.**  </span> For the configuration to be created successfully, AI Core must have completed the synchronization with the GitHub repo where we have created the template. If Ai Core is not yet synced, you you will get an error. In that case, try again after a few minutes.  

In [None]:
serving_workflow_file = "./files/serving_workflow.yaml"
with open(serving_workflow_file) as swf:
    serving_workflow = yaml.safe_load(swf)

scenario_id = serving_workflow['metadata']['labels']['scenarios.ai.sap.com/id']
input_artifact_name = serving_workflow['spec']['inputs']['artifacts'][0]['name']
executable_name = serving_workflow['metadata']['name']


In [None]:
artifact_binding = {
    "key": input_artifact_name,
    "artifact_id": training_output["id"]
}

serve_configuration = {
    "name": "image-serving-configuration",
    "scenario_id": scenario_id,
    "executable_id": executable_name,
    "parameter_bindings": [],
    "input_artifact_bindings": [ InputArtifactBinding(**artifact_binding) ]
}

serve_config_resp = ai_api_lm.configuration.create(**serve_configuration)

assert serve_config_resp.message == 'Configuration created'

pprint(vars(serve_config_resp))
print("configuration for serving the model created")

We can now trigger the deployment and check its status:

In [None]:
deployment_resp = ai_api_lm.deployment.create(serve_config_resp.id)
pprint(vars(deployment_resp))

In [None]:
# Poll deployment status
status = None
while status != Status.RUNNING and status != Status.DEAD:
    time.sleep(5)
    clear_output(wait=True)
    deployment = ai_api_lm.deployment.get(deployment_resp.id)
    status = deployment.status
    print('...... deployment status ......', flush=True)
    print(deployment.status)
    pprint(deployment.status_details)

    if deployment.status == Status.RUNNING:
        print(f"Deployment with {deployment_resp.id} complete!")

# Allow some time for deployment URL to get ready
time.sleep(10)

## Using the deployed ML model

The deployment creates an endpoint which we can submit new images to. 
The API will respond to each request with the result of the defect detection. 
Let's see how to use the model with an example.

Let's define the local path to the image dataset:

In [None]:
path_normal_images = sorted(glob.glob("../data/Images/OK/*"))
path_abnormal_images = sorted(glob.glob("../data/Images/NG/*"))
path_normal_masks = sorted(glob.glob("../data/Images/OK_MSK/*"))
path_abnormal_masks = sorted(glob.glob("../data/Images/NG_MSK/*"))

First let's visualize an example of one defected LGP together with the relative ground truth defect mask.

In [None]:
i = 0
fig, axs = plt.subplots(1, 2, figsize=(10,10))
title = ['Input Image', 'Ground Truth']
axs[0].title.set_text(title[0])
axs[0].imshow(mpimg.imread(path_abnormal_images[i]), interpolation='nearest')
axs[1].title.set_text(title[1])
axs[1].imshow(mpimg.imread(path_abnormal_masks[i]), interpolation='nearest')

In order to perform the inference step, let's transform one of the images into a binary string (this will constitute the body of the API call):

In [None]:
ENCODING = 'utf-8'

# first: reading the binary stuff
# note the 'rb' flag
# result: bytes
with open(path_abnormal_images[0], 'rb') as open_file:
    byte_content = open_file.read()

# second: base64 encode read data
# result: bytes (again)
base64_bytes = b64encode(byte_content)

# third: decode these bytes to text
# result: string (in utf-8)
base64_string = base64_bytes.decode(ENCODING)

# optional: doing stuff with the data
# result here: some dict
raw_data = {"image": base64_string}

We can now post our request to the model endpoint:

In [None]:
# Preparing the input for inference
# prediciton: normal product

endpoint = f"{deployment.deployment_url}/v1/models/imagesegmodel:predict"
print(endpoint)

headers = {
        "Authorization": ai_api_lm.rest_client.get_token(),
        'ai-resource-group': resource_group,
        "Content-Type": "application/json"}
response = requests.post(endpoint, headers=headers, json=raw_data)

print('Inference result:', response.json())
#pprint(vars(response))

Let's decode the prediction and let's visualize it:

In [None]:
def load_image(data, IMG_WIDTH, IMG_HEIGHT, preproc):
    image = cv2.imdecode(data, 0)
    if preproc:
        clahe = cv2.createCLAHE(clipLimit=5.0, tileGridSize=(8,8))
        image = clahe.apply(image)
        kernel = np.ones((3,3),np.uint8)
        image = cv2.dilate(image,kernel,iterations = 1)
        # go back to 3 channels
        image=np.expand_dims(image, axis=-1)
        image = image.repeat(3,axis=-1)
    image = cv2.resize(image, (IMG_HEIGHT, IMG_WIDTH),interpolation = cv2.INTER_AREA)
    image = np.array(image)
    image = image.astype('float32')
    image /= 255
    return image

In [None]:
image_file_as_binary = base64.b64decode(response.json()['segmented_image'])
nparr = np.frombuffer(image_file_as_binary, np.uint8)
x_inference = load_image(nparr, 224, 224, False)

In [None]:
fig, axs = plt.subplots(1, 3, figsize=(15,20))
title = ['Input Image', 'Ground Truth', 'Predicted Mask']
axs[0].title.set_text(title[0])
axs[0].imshow(mpimg.imread(path_abnormal_images[i]), interpolation='nearest')
axs[1].title.set_text(title[1])
axs[1].imshow(mpimg.imread(path_abnormal_masks[i]), interpolation='nearest')
axs[2].title.set_text(title[2])
axs[2].imshow(x_inference, interpolation='nearest')

## Stop the deployed model

Once you have done playing with the API you can stop the deployment to save resources like so:

In [None]:
delete_resp = ai_api_lm.deployment.modify(deployment_resp.id, target_status=TargetStatus.STOPPED)

status = None
while status != Status.STOPPED:
    time.sleep(5)
    clear_output(wait=True)
    deployment = ai_api_lm.deployment.get(deployment_resp.id)
    status = deployment.status
    print('...... killing deployment ......', flush=True)
    print(f"Deployment status: {deployment.status}")