# Active Learning Demo with Datature Inference API and Datature SDK

In [None]:
#!/usr/bin/env python
# -*-coding:utf-8 -*-
"""
  ████
██    ██   Datature
  ██  ██   Powering Breakthrough AI
    ██
 
@File    :   active_learning_demo.ipynb
@Author  :   Leonard So & Wei Loon Cheng
@Version :   1.0
@Contact :   hello@datature.io
@License :   Apache License 2.0
@Desc    :   Active Learning Demo with Datature Inference API and Datature SDK
"""

### Introduction

This notebook is an introduction of how you can use the Datature [Inference API](https://www.datature.io/blog/how-to-use-api-deployment-for-trained-model-inference) to perform active learning. When using your trained model deployed on our servers to run inference on an image, you can use our active learning metric to identify any predictions that are inaccurate or misclassified. Entropy values are calculated for the predictions, and the image will be uploaded to your project for manual annotation and re-training if the entropy value exceeds a certain threshold. To learn more about the active learning metric and other routines that you can utilize in our Inference API, do check out our [developer docs](https://developers.datature.io/docs/making-api-calls-to-your-deployed-api).

Instead of creating a deployment instance for active learning on the Nexus platform, we can use the Datature Python SDK as a more convenient way to interact with Nexus. For more information on the Datature SDK, do check out our [SDK docs](https://developers.datature.io/reference/getting-started).

### Prerequisites

This notebook assumes that you already have a trained model on [Datature Nexus](https://nexus.datature.io/). If not, you can follow this [tutorial](https://developers.datature.io/docs/building-your-first-model) to train your very own model!

You will also need to create a hosted deployment and make inference calls using our [Inference API](https://www.datature.io/blog/how-to-use-api-deployment-for-trained-model-inference). API deployment is an add-on feature that can be enabled regardless of which subscription tier you are on. If you would like to sign up for API deployment, do [contact us](mailto:sales@datature.io) and we will be happy to help you get started.

### Install & Import Necessary Pip Packages

In [1]:
!pip3 install -U datature
!pip3 install -U numpy

Looking in indexes: https://pypi.org/simple, https://asia-python.pkg.dev/datature-puppeteer/python/simple
Looking in indexes: https://pypi.org/simple, https://asia-python.pkg.dev/datature-puppeteer/python/simple


In [2]:
import base64
import csv
import os
import time
from pprint import pprint

import datature
import numpy as np
import requests

### Define Constants

The project secret can be found in [API Management](https://developers.datature.io/docs/hub-and-api#locating-the-project-secret).

As this implementation uses local files, the code has been written such that we convert to base64 first. We should be careful to use sufficiently small base64 text to fit the json request.

The image type and image input has a few different options which you can check on our [docs](https://developers.datature.io/docs/making-api-calls-to-your-deployed-api).

Feel free to change the values of the constants below for your own use case.

In [3]:
PROJECT_SECRET = "539037bcd03da96aebd90b134631d9541f919ecddcf0e99ddc4dc585f355c267"

IMAGE_FILE_PATH = "image.png"
IMAGE_TYPE = "base_64"
PREDICTIONS_FILE_PATH = "predictions.csv"

## List of asset group names. Assets will be added to these asset groups upon upload.
ASSET_GROUP = ["active-learning-demo"]

## Average entropy threshold for active learning
AVERAGE_ENTROPY_THRESHOLD = 0.5

### Define Helper Functions

`above_threshold` is a helper function that returns a boolean value indicating whether the average entropy value of any class is above the entropy threshold.

`convert_predictions_to_4cornercsv` is a helper function that converts the predictions from the Inference API to a 4-corner CSV format that is accepted for upload to Nexus.

`upload_image_to_nexus` is a helper function that uploads an image to Nexus using Datature SDK.

`upload_predictions_to_nexus` is a helper function that uploads the 4-corner CSV  converted predictions as annotations to Nexus using Datature SDK.

In [4]:
def above_threshold(json_resp):
    """Check if the average entropy of any class is above the threshold.

    Args:
        json_resp: JSON response containing the average entropy per class.

    Returns:
        True if the average entropy of any class is above the threshold, False otherwise.
    """
    return np.any(
        np.array(list(json_resp["avgPerClass"].values())) >
        AVERAGE_ENTROPY_THRESHOLD)

In [5]:
def convert_predictions_to_4cornercsv(predict_json):
    """Writes the predictions JSON response to a CSV file in the 4-corner format.

    Args:
        json_resp: JSON response containing the predictions.
    """
    with open(PREDICTIONS_FILE_PATH, 'w') as f:
        writer = csv.writer(f)
        ## Define the header
        header = ["filename", "xmin", "ymin", "xmax", "ymax", "label"]
        writer.writerow(header)

        for prediction in predict_json["predictions"]:
            ## Only write predictions with confidence above the threshold
            if prediction["confidence"] >= AVERAGE_ENTROPY_THRESHOLD:
                label = prediction["tag"]["name"]
                xmin = prediction["bound"][0][0]
                ymin = prediction["bound"][0][1]
                xmax = prediction["bound"][2][0]
                ymax = prediction["bound"][2][1]
                
                row = [
                    os.path.basename(IMAGE_FILE_PATH), xmin, ymin, xmax, ymax, label
                ]
                writer.writerow(row)

In [6]:
def upload_image_to_nexus():
    """Upload image to Nexus.

    Args:
        image_path: Path to image to be uploaded to Nexus.
    """
    print(f"Uploading image '{IMAGE_FILE_PATH}' to Nexus...")
    upload_session = datature.Asset.upload_session()
    upload_session.add(IMAGE_FILE_PATH)
    asset_upload_op_link = upload_session.start(cohorts=ASSET_GROUP, early_return=True)["op_link"]

    while datature.Operation.retrieve(asset_upload_op_link)["status"][
            "progress"]["with_status"]["finished"] != 1:
        time.sleep(1)
    print(f"Uploaded image '{IMAGE_FILE_PATH}' to Nexus!")

In [7]:
def upload_predictions_to_nexus(predict_json):
    """Upload predictions to Nexus as annotations.

    Args:
        predict_json: JSON response containing the predictions.
    """
    print(
        f"Uploading predictions for image '{IMAGE_FILE_PATH}' to platform...")
    convert_predictions_to_4cornercsv(predict_json)

    annotation_upload_op_link = datature.Annotation.upload(
        "csv_fourcorner", PREDICTIONS_FILE_PATH, early_return=True)["op_link"]

    while datature.Operation.retrieve(annotation_upload_op_link)["status"][
            "progress"]["with_status"]["finished"] != 1:
        time.sleep(1)
    print(f"Uploaded predictions for image '{IMAGE_FILE_PATH}' to Nexus!")

### Create a New Deployment with a Trained Model

You can obtain the details of all artifacts in your project and choose one based on the name or timestamp among other variables. You will need the artifact ID to select a model format for deployment. In this demo, we have exported our model in the ONNX format and obtained the model ID to be used for deployment.

In [8]:
## Set the project secret
datature.project_secret = PROJECT_SECRET

## Obtain an artifact id from Nexus, in this case, we assume that there is only one artifact
all_artifacts = datature.Artifact.list()
artifact_id = all_artifacts[-1]["id"]

## Obtain an exported model id from Nexus in ONNX format
all_models = datature.Artifact.list_exported(artifact_id)
model_id = [model for model in all_models
            if model["format"] == "onnx"][-1]["id"]

print(f"Artifact ID: {artifact_id}")
print(f"Model ID: {model_id}")

Artifact ID: artifact_63ea0006b03587e371e708cd
Model ID: model_e83103efcb8c3a56cb17868cc55906a6


Once we have the model ID, we can create a deployment instance. We can use the Datature SDK to periodically poll for the status of the creation, and we print an output once the deployment has been successful. This may take a few minutes, so you can grab a cup of coffee in the meantime!

Please note that multiple deployment instances can be created with the same name and model ID, and unintentionally running this code block multiple times may result in multiple deployments with the same name and model ID. If you would like to delete a deployment, you can do so in the [API Management](https://developers.datature.io/docs/making-api-calls-to-your-deployed-api#deleting-your-deployment) page.

In [9]:
## Create a model deployment using the model id obtained earlier
deploy_create_response = datature.Deploy.create({
    "name": "Active Learning Deployment",
    "model_id": model_id,
    "num_of_instances": 1,
})

In [10]:
## Wait for the model deployment to be ready
while datature.Operation.retrieve(
        deploy_create_response["op_link"])["status"]["overview"] != "Finished":
    time.sleep(5)
print("Deployed model to Datature Inference API!")

Deployed model to Datature Inference API!


In [11]:
## Obtain the API URL of the model deployment
active_learning_deployment = [
    deployment for deployment in datature.Deploy.list()
    if deployment["name"] == "Active Learning Deployment"
][-1]

deployment_id = active_learning_deployment["id"]
API_URL = active_learning_deployment["url"]

print(f"Deployment ID: {deployment_id}")
print(f"API URL: {API_URL}")

Deployment ID: deploy_f5f49be7-aaf6-4558-9445-f0ac0a51db1b
API URL: https://inference.datature.io/neural/f5f49be7-aaf6-4558-9445-f0ac0a51db1b/predict


### Load the Image for Inference

For this demo, we load the image as a base64 string However, you can load the image in other formats as described in the table below.

| Image Type | Data | Example |
| --- | --- | --- | 
| url | String containing your URL | image_input = "<YOUR_IMAGE_PATH>" |
| base64 | String containing your base64 image encoding | with open(<YOUR_IMAGE_PATH>, "rb") <br /> &emsp;&emsp;base64_byte = base64.b64encode(img_file.read()) <br /> &emsp;&emsp;image_input = base64_byte.decode("utf-8") |
| array | Nested array representing your image data in array form | image_input = np.array(PIL.Image.open(<YOUR_IMAGE_PATH>)) |

In [12]:
with open(IMAGE_FILE_PATH, "rb") as img_file:
    base64_byte = base64.b64encode(img_file.read())
    image_input = base64_byte.decode("utf-8")
img_file.close()

print(type(image_input))

<class 'str'>


### Generate Payload and Headers for the Inference API Request

The prediction payload is used to make an inference call to obtain prediction results. The payload with active learning is used to make an inference call to obtain active learning entropy values for the predictions. The headers are used to authenticate the request.

In [13]:
prediction_payload = {
    "image_type": IMAGE_TYPE,
    "data": image_input,
}

payload_with_active_learning = {
    "image_type":
    IMAGE_TYPE,
    "data":
    image_input,
    "routines": [
        {
            "name": "ActiveLearningMetric",
            "arguments": {
                "class_name": ["Platelets", "RBC", "WBC"],
            },
        },
    ],
}

headers = {
    "accept": "application/json",
    "content-type": "application/json",
    "Authorization": "Bearer " + PROJECT_SECRET,
}

### Fetch the Prediction Results of the Image

The prediction results are returned as a json object with the following fields:

#### For Object Detection Bounding Box Models

 - `annotation_id`: Running index of annotation
 - `bound`: Bounding box `x,y` coordinates in the format of
    ```
    [ [xmin, ymin], [xmin, ymax], [xmax, ymax], [xmax, ymin] ]
    ```
 - `boundType`: Type of prediction shape from model output. Can be either rectangle or polygon
 - `confidence`: Prediction confidence percentage
 - `tag`: Object containing class label information that includes
   - `id`: Index of class label according to model tag map
   - `name`: Name of class label assigned to the prediction

#### Additional Information for Segmentation Mask Models

- `contourType`: Output format of segmentation predictions
- `contour`: Polygonal `x, y` coordinates in the format of
  ```
  [ [x1, y1], [x2, y2], [x3, y3], ... [xn, yn] ]
  ```
  where n is the number of polygon vertices

In [14]:
## Send a POST request to the API URL with the image payload
response = requests.post(
    API_URL,
    json=prediction_payload,
    headers=headers,
)

## Obtain the JSON response containing the predictions
predict_json = response.json()
print("Prediction Output:")
print("==================")
pprint(predict_json)

Prediction Output:
{'predictions': [{'annotationId': 0,
                  'bound': [[0.2666337490081787, 0.08433949947357178],
                            [0.2666337490081787, 1.0],
                            [1.0, 1.0],
                            [1.0, 0.08433949947357178]],
                  'boundType': 'rectangle',
                  'confidence': 0.7772380709648132,
                  'contour': None,
                  'contourType': None,
                  'tag': {'id': 2, 'name': 'RBC'}},
                 {'annotationId': 1,
                  'bound': [[0.17402619123458862, 0.0],
                            [0.17402619123458862, 0.9257490634918213],
                            [1.0, 0.9257490634918213],
                            [1.0, 0.0]],
                  'boundType': 'rectangle',
                  'confidence': 0.770164430141449,
                  'contour': None,
                  'contourType': None,
                  'tag': {'id': 2, 'name': 'RBC'}},
                 {

### Fetch the Entropy Values of the Prediction Classes

In [15]:
## Send a POST request to the API URL with the image payload and active learning routine
response = requests.post(
    API_URL,
    json=payload_with_active_learning,
    headers=headers,
)

## Obtain the JSON response containing the entropy values for each class
json_resp = response.json()
print("Average Entropy Values Per Class (lower is better):")
print("===================================================")
pprint(json_resp)

Average Entropy Values Per Class (lower is better):
{'avgPerClass': {'Platelets': 0.0,
                 'RBC': 0.777565731453065,
                 'WBC': 0.9397180680879296},
 'totalEntropy': 5.151851398622984}


### Upload the Image and Predictions to Nexus if the Average Entropy Exceeds the Threshold

In [16]:
if above_threshold(json_resp):
    print("Warning: Entropy value(s) below threshold")
    upload_image_to_nexus()
    upload_predictions_to_nexus(predict_json)

Uploading image 'image.png' to Nexus...
Uploaded image 'image.png' to Nexus!
Uploading predictions for image 'image.png' to platform...
Uploaded predictions for image 'image.png' to Nexus!


### Delete Deployment (Optional)

If you are no longer using your deployment instance, you can delete it to free up resources used by the deployment instance.

In [17]:
deploy_delete_response = datature.Deploy.delete(deployment_id)

if deploy_delete_response["deleted"] == True:
    print("Deleted deployment instance!")
else:
    print("Failed to delete deployment instance!")

Deleted deployment instance!
