# Active Learning Demo with Datature Inference API and Datature SDK

In [None]:
#!/usr/bin/env python
# -*-coding:utf-8 -*-
"""
  ████
██    ██   Datature
  ██  ██   Powering Breakthrough AI
    ██

@File    :   active_learning_demo.ipynb
@Author  :   Leonard So & Wei Loon Cheng
@Version :   1.0
@Contact :   hello@datature.io
@License :   Apache License 2.0
@Desc    :   Active Learning Demo with Datature Inference API and Datature Python SDK
"""

### Introduction

This notebook is an introduction of how you can use the Datature [Inference API](https://www.datature.io/blog/how-to-use-api-deployment-for-trained-model-inference) to perform active learning. When using your trained model deployed on our servers to run inference on an image, you can use our active learning metric to identify any predictions that are inaccurate or misclassified. Entropy values are calculated for the predictions, and the image will be uploaded to your project for manual annotation and re-training if the entropy value exceeds a certain threshold. To learn more about the active learning metric and other routines that you can utilize in our Inference API, do check out our [developer docs](https://developers.datature.io/docs/making-api-calls-to-your-deployed-api).

Instead of creating a deployment instance for active learning on the Nexus platform, we can use the Datature Python SDK as a more convenient way to interact with Nexus. For more information on the Datature Python SDK, do check out our [SDK docs](https://developers.datature.io/reference/getting-started).

### Prerequisites

This notebook assumes that you already have a trained model on [Datature Nexus](https://nexus.datature.io/). If not, you can follow this [tutorial](https://developers.datature.io/docs/building-your-first-model) to train your very own model!

You will also need to create a hosted deployment and make inference calls using our [Inference API](https://www.datature.io/blog/how-to-use-api-deployment-for-trained-model-inference). API deployment is an add-on feature that can be enabled regardless of which subscription tier you are on. If you would like to sign up for API deployment, do [contact us](mailto:sales@datature.io) and we will be happy to help you get started.

### Install & Import Necessary Pip Packages

In [1]:
!pip3 install -U datature
!pip3 install -U numpy

Looking in indexes: https://pypi.org/simple, https://asia-python.pkg.dev/datature-puppeteer/python/simple/
Looking in indexes: https://pypi.org/simple, https://asia-python.pkg.dev/datature-puppeteer/python/simple/


In [2]:
import base64
import csv
import os
import time
from pprint import pprint

from datature.nexus import Client
import numpy as np
import requests

### Define Constants

The project Secret Key can be found in the **Integrations** tab on your Nexus project page. Check out how to locate it [here](https://developers.datature.io/docs/project-keys-and-secret-keys).

As this implementation uses local files, the code has been written such that we convert to `base64` first. We should be careful to use sufficiently small `base64` text to fit the json request.

The image type and image input has a few different options which you can check on our [docs](https://developers.datature.io/docs/making-api-calls-to-your-deployed-api).

Feel free to change the values of the constants below for your own use case.

In [3]:
## Change this to your project secret key on Nexus
SECRET_KEY = "<YOUR_SECRET_KEY>"

# Change this to your project ID on Nexus. This can be found via two methods:
# 1. In the URL of the project page (https://nexus.datature.io/project/<YOUR_PROJECT_ID>)
# 2. Project Key in the Integrations page
PROJECT_ID = "proj_<YOUR_PROJECT_ID>"

## Change this to your image file path
IMAGE_FILE_PATH = "assets/image.png"
IMAGE_TYPE = "base_64"

## Change this to the CSV file path that you want to save the predictions to
PREDICTIONS_FILE_PATH = "assets/predictions.csv"

## List of asset group names. Assets will be added to these asset groups upon upload.
ASSET_GROUP = ["active-learning-demo"]

## Average entropy threshold for active learning
AVERAGE_ENTROPY_THRESHOLD = 0.5

In [4]:
## Set the project secret
client = Client(SECRET_KEY)

# Select an active project using the project ID.
project = client.get_project(PROJECT_ID)

### Define Helper Functions

`above_threshold` is a helper function that returns a boolean value indicating whether the average entropy value of any class is above the entropy threshold.

`convert_predictions_to_4cornercsv` is a helper function that converts the predictions from the Inference API to a 4-corner CSV format that is accepted for upload to Nexus.

`upload_image_to_nexus` is a helper function that uploads an image to Nexus using Datature SDK.

`upload_predictions_to_nexus` is a helper function that uploads the 4-corner CSV  converted predictions as annotations to Nexus using Datature SDK.

In [5]:
def above_threshold(json_resp):
    """Check if the average entropy of any class is above the threshold.

    Args:
        json_resp: JSON response containing the average entropy per class.

    Returns:
        True if the average entropy of any class is above the threshold, False otherwise.
    """
    return np.any(
        np.array(list(json_resp["avgEntropyForClass"].values()))
        > AVERAGE_ENTROPY_THRESHOLD
    )

In [6]:
def convert_predictions_to_4cornercsv(predict_json):
    """Writes the predictions JSON response to a CSV file in the 4-corner format.

    Args:
        json_resp: JSON response containing the predictions.
    """
    with open(PREDICTIONS_FILE_PATH, 'w') as f:
        writer = csv.writer(f)
        header = ["filename", "xmin", "ymin", "xmax", "ymax", "label"]
        writer.writerow(header)

        for prediction in predict_json["predictions"]:
            label = prediction["tag"]["name"]
            xmin = prediction["bound"][0][0]
            ymin = prediction["bound"][0][1]
            xmax = prediction["bound"][2][0]
            ymax = prediction["bound"][2][1]

            row = [
                os.path.basename(IMAGE_FILE_PATH), xmin, ymin, xmax, ymax,
                label
            ]
            writer.writerow(row)

In [7]:
def upload_image_to_nexus():
    """Upload image to Nexus.

    Args:
        image_path: Path to image to be uploaded to Nexus.
    """
    print(f"Uploading image '{IMAGE_FILE_PATH}' to Nexus...")
    upload_session = project.assets.create_upload_session(groups=ASSET_GROUP)
    with upload_session:
        upload_session.add_path(IMAGE_FILE_PATH)
    upload_session.wait_until_done()
    print(f"Uploaded image '{IMAGE_FILE_PATH}' to Nexus!")

In [8]:
def upload_predictions_to_nexus(predict_json):
    """Upload predictions to Nexus as annotations.

    Args:
        predict_json: JSON response containing the predictions.
    """
    print(
        f"Uploading predictions for image '{IMAGE_FILE_PATH}' to Nexus...")
    convert_predictions_to_4cornercsv(predict_json)

    import_session = project.annotations.create_import_session()
    with import_session:
        import_session.add_path(PREDICTIONS_FILE_PATH)
    import_session.wait_until_done()
    print(f"Uploaded predictions for image '{IMAGE_FILE_PATH}' to Nexus!")

### Create a New Deployment with a Trained Model

You can obtain the details of all artifacts in your project and choose one based on the name or timestamp among other variables. You will need the artifact ID to select a model format for deployment. In this demo, we have exported our model in the ONNX format.

In [9]:
## Obtain an artifact id from Nexus, in this case, we assume that there is only one artifact
all_artifacts = project.artifacts.list(include_exports=True)
artifact_id = all_artifacts[-1]["id"]

## Export a model to the specified model format for specified artifact id
## This function will return an 409 if an export is already in progress,
## or if an export of the same format already exists.
try:
    export_options = {
        "format": "ONNX",
    }
    project.artifacts.create_export(artifact_id, export_options)
except Exception as e:
    pass

API response errored: 409 Model has already been exported.


Once we have the artifact ID, we can create a deployment instance. We can use the Datature SDK to periodically poll for the status of the creation, and we print an output once the deployment has been successful. This may take a few minutes, so you can grab a cup of coffee in the meantime!

Please note that multiple deployment instances can be created with the same name and model ID, and unintentionally running this code block multiple times may result in multiple deployments with the same name and model ID. If you would like to delete a deployment, you can do so in the [API Management](https://developers.datature.io/docs/making-api-calls-to-your-deployed-api#deleting-your-deployment) page.

In [10]:
## Create a model deployment using the model id obtained earlier
deployment_options = {
    "name": "Active Learning Deployment",
    "artifact_id": artifact_id,
    "version_tag": "v1",
}
deploy_create_response = project.deployments.create(deployment_options)

In [11]:
## Wait for the model deployment to be ready
while project.deployments.get(
        deploy_create_response["id"])["status"]["overview"] == "Creating":
    time.sleep(5)
print("Deployed model to Datature Inference API!")
print("Deployment ID:", deploy_create_response["id"])

Deployed model to Datature Inference API!
Deployment ID: deploy_32926938-b25f-4331-beb1-d90848b8e51c


In [12]:
## Obtain the API URL of the model deployment
active_learning_deployment = project.deployments.get(deploy_create_response["id"])
API_URL = active_learning_deployment['url']

print("Deployment ID:", deploy_create_response['id'])
print("API URL:", API_URL)

Deployment ID: deploy_32926938-b25f-4331-beb1-d90848b8e51c
API URL: https://asia-inference-22a05c9e.nip.io/32926938-b25f-4331-beb1-d90848b8e51c


### Load the Image for Inference

For this demo, we load the image as a base64 string However, you can load the image in other formats as described in the table below.

| Image Type | Data | Example |
| --- | --- | --- | 
| url | String containing your URL | image_input = "<YOUR_IMAGE_PATH>" |
| base64 | String containing your base64 image encoding | with open(<YOUR_IMAGE_PATH>, "rb") <br /> &emsp;&emsp;base64_byte = base64.b64encode(img_file.read()) <br /> &emsp;&emsp;image_input = base64_byte.decode("utf-8") |
| array | Nested array representing your image data in array form | image_input = np.array(PIL.Image.open(<YOUR_IMAGE_PATH>)) |

In [13]:
with open(IMAGE_FILE_PATH, "rb") as img_file:
    base64_byte = base64.b64encode(img_file.read())
    image_input = base64_byte.decode("utf-8")
img_file.close()

print(type(image_input))

<class 'str'>


### Generate Payload and Headers for the Inference API Request

The prediction payload with active learning is used to make an inference call to obtain both prediction results and active learning entropy values. The headers are used to authenticate the request.

In [14]:
payload_with_active_learning = {
    "image_type": IMAGE_TYPE,
    "data": image_input,
    "routines": [
        {
            "name": "ActiveLearning",
            "arguments": {},
        },
    ],
}

headers = {
    "accept": "application/json",
    "content-type": "application/json",
    "Authorization": "Bearer " + SECRET_KEY,
}

### Fetch the Prediction Results of the Image

The prediction results are returned as a json object with the following fields:

#### For Object Detection Bounding Box Models

 - `annotation_id`: Running index of annotation
 - `bound`: Bounding box `x,y` coordinates in the format of
    ```
    [ [xmin, ymin], [xmin, ymax], [xmax, ymax], [xmax, ymin] ]
    ```
 - `boundType`: Type of prediction shape from model output. Can be either rectangle or polygon
 - `confidence`: Prediction confidence percentage
 - `tag`: Object containing class label information that includes
   - `id`: Index of class label according to model tag map
   - `name`: Name of class label assigned to the prediction

#### Additional Information for Segmentation Mask Models

- `contourType`: Output format of segmentation predictions
- `contour`: Polygonal `x, y` coordinates in the format of
  ```
  [ [x1, y1], [x2, y2], [x3, y3], ... [xn, yn] ]
  ```
  where n is the number of polygon vertices

#### Entropy Values for Active Learning

The entropy is a metric that provides a statistical measure for inter-class instance diversity or intra-image diversity, where higher values are preferred. The entropy values are returned together with the prediction results with the fields `avgEntropy` and `avgEntropyForClass`.

In [15]:
## Send a POST request to the API URL with the image payload
ROUTE = "/predict"
response = requests.post(
    f"{API_URL}{ROUTE}",
    json=payload_with_active_learning,
    headers=headers,
)

## Obtain the JSON response containing the predictions
predict_json = response.json()
print("Prediction Output:")
print("==================")
pprint(predict_json)

Prediction Output:
{'avgEntropy': 0.23934526020067676,
 'avgEntropyForClass': {'boat': 0.46014659457431745,
                        'fake boat': 0.01854392582703603},
 'predictions': [{'annotationId': 0,
                  'bound': [[0.8157919645309448, 0.4269965589046478],
                            [0.8157919645309448, 0.4870314300060272],
                            [0.8470613956451416, 0.4870314300060272],
                            [0.8470613956451416, 0.4269965589046478]],
                  'boundType': 'rectangle',
                  'confidence': 0.6744793057441711,
                  'contourType': None,
                  'tag': {'id': 2, 'name': 'boat'}},
                 {'annotationId': 1,
                  'bound': [[0.551198422908783, 0.4105452001094818],
                            [0.551198422908783, 0.470993310213089],
                            [0.5814647078514099, 0.470993310213089],
                            [0.5814647078514099, 0.4105452001094818]],
             

### Upload the Image and Predictions to Nexus if the Average Entropy Per Class Exceeds the Threshold

In [16]:
if above_threshold(predict_json):
    print("Active Learning triggered!")
    upload_image_to_nexus()
    upload_predictions_to_nexus(predict_json)

Active Learning triggered!
Uploading image 'assets/image.png' to Nexus...
Uploaded image 'assets/image.png' to Nexus!
Uploading predictions for image 'assets/image.png' to Nexus...
Uploaded predictions for image 'assets/image.png' to Nexus!


### Delete Deployment (Optional)

If you are no longer using your deployment instance, you can delete it to free up resources used by the deployment instance.

In [17]:
deploy_delete_response = project.deployments.delete(deploy_create_response["id"])

if deploy_delete_response["deleted"] == True:
    print("Deleted deployment instance!")
else:
    print("Failed to delete deployment instance!")

Deleted deployment instance!
