# Guide to do Real-time Inference on a Custom MONAI Bundle with NVIDIA Cloud APIs

In this guide, we will guide you through the process of setting up a real-time inference system on a custom MONAI bundle with MONAI cloud APIs. We'll cover setting up the environment, performing on-the-fly predictions, and managing the output to ensure a seamless, efficient, and real-time decision-making pipeline.

## Table of Contents

- Introduction
- Setup
- Dataset Creation
- Custom MONAI Bundle Creation
- Configuring Experiment to enable the real-time inference
- Prepare the image ID for the inference request
- Triggering Inference on a Specified Image
- Stopping the experiment from Realtime Inference mode
- Cleaning up
- Conclusion

## Introduction

Transitioning to real-time inference can substantially elevate the responsiveness and applicability of AI models in healthcare. Analyzing and interpreting medical images as they are generated, and instantly providing insights, can be transformative, offering benefits such as improved patient outcomes and more efficient use of medical resources.

<a id='Setup'></a>

## Setup

In [None]:
import json
import requests
import os

In [None]:
# API Endpoint and Credentials
host_url = "https://api.monai.ngc.nvidia.com"
ngc_api_key = os.environ.get('MONAI_API_KEY')
# Object storage info
access_id = "<user id>"
access_secret = "<storage secret>"
container_url = "<remote object storage address>"
inference_image_id = "<inference image id>"

In [None]:
# Exchange NGC_API_KEY for JWT
data = json.dumps({"ngc_api_key": ngc_api_key})
response = requests.post(f"{host_url}/api/v1/login", data=data)
print(response.status_code)
assert response.status_code == 201, f"Login failed, got status code: {response.status_code}."
assert "user_id" in response.json().keys(), "user_id is not in response."
user_id = response.json()["user_id"]
print("User ID",user_id)
assert "token" in response.json().keys(), "token is not in response."
token = response.json()["token"]
print("JWT",token)

# Construct the URL and Headers
base_url = f"{host_url}/api/v1/orgs/iasixjqzw1hj"
print("API Calls will be forwarded to",base_url)

headers = {"Authorization": f"Bearer {token}"}

## Dataset Creation

### **1. Remote Object as Data Sources**

MONAI Cloud platform supports a range of other cloud storage solutions, including Azure Blob Storage, Google Cloud Storage (GCP) and Amazon S3, providing you with the flexibility to choose the service that best fits your project's needs. Below is an example of Azure:

**Steps:**
1. Creating a Storage Account and Container
   - **Storage Account**: Start by creating a new storage account in your Azure portal. This account will host your blob storage containers.
   - **Container Creation**: Within your storage account, create a new container. This container will hold your datasets.

2. Container URL
   - Once the container is created, you will be provided with a unique URL that can be used to access it. This URL will be essential for accessing your data.

## Obtaining Credentials

- **Access Keys**: Access your storage account and navigate to the 'Access keys' section. Here, you will find the necessary credentials to access your Blob Storage programmatically.
- **Shared Access Signature (SAS)**: Alternatively, you can create a SAS for more granular control over permissions and access duration.

## Creating a Manifest JSON File

In the root of your Azure container, create a manifest JSON file to keep track of your datasets. The file format is as follows:

For a segmentation task:
```json
{
    "root_path": "https://[your-storage-account-name].blob.core.windows.net/[your-container-name]/[subfolder-path]",
    "data": [
        {
            "image": {
                "path": ["path/to/your/image_1"],
                "id": "unique-uuid-1"
            },
            "label": {
                "path": ["path/to/your/label_1"],
                "id": "unique-uuid-2"
            }
        },
        // Additional data objects follow the same format
    ]
}
````

For a non-segmentation task:
```json
{
    "root_path": "https://[your-storage-account-name].blob.core.windows.net/[your-container-name]",
    "label_key": ["bbox", "label"],
    "data": [
        {
            "image": {
                "path": ["path/to/your/image_1"],
                "id": "unique-uuid-1"
            },
            "bbox": ...,
            "label": ...
        }
        // Additional data objects should follow the same format
    ]
}
```

- Each dataset (training, testing, etc.) should have their own root directory
- All the data should be under a root directory
- The root directory should contain a `manifest.json` file
- The `manifest.json` file should contain "data" field, which is a list of all the data entries
- Each data entry should contain "image" and "label" fields
- Each "image"/"label" field should contain "path" field, which is the list of relative path to the image/label files
- Please provide the "id" field of the "image"/"label", if there is not one please provide a random uuid generated by `uuid` package
- The `label_key` is optional, with a default of `["label"]`

After preparing your dataset, please modify the following variables in [Setup](#Setup):

```python
access_id = ...
access_secret = ...
container_url = ...
inference_image_id = ...
```

## Using the Remote Object to Create Datasets

After you've completed the steps above, it's time to run the API to create your dataset.  Below you'll find an example request along with associated parameters and description.

In [None]:
data = {
    "name": "MONAI_CLOUD",
    "description":"Object storage dataset",
    "type": "semantic_segmentation",
    "format": "monai",
    "client_url": container_url,
    "client_id": access_id,
    "client_secret": access_secret,
}

endpoint = f"{base_url}/datasets"
response = requests.post(endpoint, json=data, headers=headers)

assert response.status_code == 201, f"Create dataset failed, got {response.json()}."

res = response.json()
dataset_id = res["id"]
print("Dataset creation succeeded with dataset ID: ", dataset_id)
print("---------------------------------\n")
print(json.dumps(res, indent=2))

## Custom MONAI Bundle Creation

1. **MONAI Bundle**: We're using the Endoscopic Inbody Classification MONAI bundle from the MONAI Model Zoo as an example. Users can build their own MONAI bundles fitting their applications.
2. **Dataset Setup**: All data is under one dataset ID for this demo. Adjust as per your data structure.
3. **Pretrained Weights**: The Official MONAI bundles have pretrained weights.

Here are some notes about the payload used to create the experiment:

- name: A user-defined name for the training experiment, here named "my_inbody_clf".
- description: A brief description of the experiment. Optional
- network_arch: Specifies the architecture of the network. The value "monai_custom" indicates that a custom network architecture is being used. The user must provide the `bundle_url` with such custom architecture.
- bundle_url: Indicating the specific location of the MONAI bundle to be used in this training experiment.

## Configuring Experiment to enable the real-time inference

**Note:** We're going to use the `realtime_infer` parameter when creating our experiment as that will automatically load the experiment and make sure it's ready for real-time inference workflow.


In [None]:
bundle_url = "https://api.ngc.nvidia.com/v2/models/nvidia/monaihosting/endoscopic_inbody_classification/versions/0.4.6/files/endoscopic_inbody_classification_v0.4.6.zip"
    
data = {
  "name": "my_inbody_clf",
  "description": "from MONAI model zoo",
  "network_arch": "monai_custom",
  "train_datasets": [ dataset_id ],
  "eval_dataset": dataset_id,
  "realtime_infer": True,
  "bundle_url": bundle_url,
  "model_params": {"override": {"output_filename": "realtime_inference_predictions.csv"}}
}

endpoint = f"{base_url}/experiments"
response = requests.post(endpoint, json=data, headers=headers)
assert response.status_code == 201, f"Create experiment failed, got {response.json()}."
res = response.json()
experiment_id = res["id"]
model_network = res["network_arch"]
print("Experiment creation succeeded with experiment ID:", experiment_id)
print("---------------------------------\n")
print(json.dumps(res, indent=2))

## Prepare the image ID for the inference request

Getting the ID of the image to process:
- The code sends a request to the "cacheimage" action. Users need to specify an image_id manually.

In [None]:
# get an inference image id with nextimage api
data = {
    "action": "cacheimage",
    "specs": {"image": inference_image_id}
}
endpoint = f"{base_url}/datasets/{dataset_id}/jobs"
response = requests.post(endpoint, json=data, headers=headers)

assert response.status_code == 201, f"cache image failed, got {response.json()}."

## Triggering Inference on a Specified Image

Initiate an inference process on a particular image within an experiment

In [None]:
data = {
    "action": "inference",
    "specs": {
        "image": inference_image_id
    }
}

endpoint = f"{base_url}/experiments/{experiment_id}/jobs"
response = requests.post(endpoint, json=data, headers=headers)
assert response.status_code == 201, f"Run inference failed, got {response.json()}."
print("Inference Successful.  Label is returned")
print(response.headers)

`MultipartDecoder` is used to decode the response data. If it's not installed, you can use the following command to install it:

```Bash
pip install requests_toolbelt==1.0.0
```

In [None]:
from requests_toolbelt.multipart.decoder import MultipartDecoder

multipart_data = MultipartDecoder.from_response(response)
for part in multipart_data.parts:
    filename = part.headers[b"Content-Disposition"].decode().split(";")[1].split("=")[1].strip('"')

    with open(filename, 'wb') as f:
        f.write(part.content)

## Stopping the experiment from Realtime Inference mode

When the experiment is created with `realtime_infer` as `True`, it will reserve one GPU to process the inference requests.

After we have finished the inference process, we would like to release the GPU resource for other tasks.

To achieve this, we can switch the `realtime_infer` from `True` to `False`.

Note: this step is irreversible, which means you can't set the `realtime_infer` from `False` to `True`. To bootstrap another inference, you will have to create another experiment.

In [None]:
data = {
    "realtime_infer": False,
}

endpoint = f"{base_url}/experiments/{experiment_id}"
response = requests.patch(endpoint, json=data, headers=headers)
assert response.status_code == 200, f"stop job failed, got {response.json()}."

## Cleaning up
Delete the experiment and dataset after jobs are done.

In [None]:
endpoint = f"{base_url}/experiments/{experiment_id}"
response = requests.delete(endpoint, headers=headers)
assert response.status_code == 200, f"Delete experiment failed, got {response.json()}."
print(response)

endpoint = f"{base_url}/datasets/{dataset_id}"
response = requests.delete(endpoint, headers=headers)
assert response.status_code == 200, f"Delete dataset failed, got {response.json()}."
print(response)

## Conclusion

The code snippets showcase a streamlined approach to do real-time inference on a custom MONAI bundle, and processing within a NVIDIA MONAI Cloud API-driven system. This method ensures seamless, efficient operations, allowing users to focus on model refinement and analysis while the system efficiently manages image selection and inference tasks, demonstrating the transformative potential of integrating advanced AI in real-time decision-making workflows.