# Guide to Real-time Inference with NVIDIA Cloud APIs

In this guide, we will guide you through the process of setting up a real-time inference system with MONAI cloud APIs. We will cover setting up the experiments, making on-the-fly predictions, and managing the outputs to ensure a seamless, efficient, and real-time decision-making pipeline.

## Table of Contents

- Introduction
- Dataset Setup
- Configuring Experiment to enable the real-time inference
- Prepare the image ID for the inference request
- Triggering Inference on a Specified Image
- Stopping the experiment from Real-Time Inference mode
- Cleaning up
- Conclusion

## Introduction

Transitioning to real-time inference can substantially elevate the responsiveness and applicability of AI models in healthcare. Analyzing and interpreting medical images as they are generated, and instantly providing insights, can be transformative, offering benefits such as improved patient outcomes and more efficient use of medical resources.

## Dataset Setup

We'll start by creating a new dataset for inference image sources. The dataset, hosted on a DICOMweb server, will be accessed using the `dicomweb` protocol.

In [None]:
import json
import requests
import os

In [None]:
# API Endpoint and Credentials
host_url = "https://api.monai.ngc.nvidia.com"
ngc_api_key = os.environ.get("MONAI_API_KEY", "<YOUR_API_KEY>")  # we recommend using environment variables for API keys, but you can also hardcode them here
# Dicom Server
dicom_web_endpoint = "<DICOMWeb address>" # For example "http://127.0.0.1:8042/dicom-web".
dicom_client_id = "<DICOMWeb user ID>"    # If Authentication is enabled, then provide username
dicom_client_secret = "<DICOMWeb secret>" # If Authentication is enabled, then provide password

In [None]:
# Exchange NGC_API_KEY for JWT
data = json.dumps({"ngc_api_key": ngc_api_key})
response = requests.post(f"{host_url}/api/v1/login", data=data)
print(response.status_code)
assert response.status_code == 201, f"Login failed, got status code: {response.status_code}."
assert "user_id" in response.json().keys(), "user_id is not in response."
user_id = response.json()["user_id"]
print("User ID",user_id)
assert "token" in response.json().keys(), "token is not in response."
token = response.json()["token"]
print("JWT",token)

# Construct the URL and Headers
base_url = f"{host_url}/api/v1/orgs/iasixjqzw1hj"
print("API Calls will be forwarded to",base_url)

headers = {"Authorization": f"Bearer {token}"}

data = {
    "name": "mydataset",
    "description":"a demo dataset",
    "type": "semantic_segmentation",
    "format": "monai",
    "client_url": f"{dicom_web_endpoint}",
    "client_id": f"{dicom_client_id}",
    "client_secret": f"{dicom_client_secret}",
}

endpoint = f"{base_url}/datasets"
response = requests.post(endpoint, json=data, headers=headers)
assert response.status_code == 201, f"Create dataset failed, got {response.json()}."
res = response.json()
dataset_id = res["id"]
print("Dataset creation succeeded with dataset ID: ", dataset_id)
print("---------------------------------\n")
print(json.dumps(res, indent=2))

## Configuring Experiment to Enable the Real-time Inference

**Note:** We're going to use the `realtime_infer` parameter when creating our experiment as that will automatically load the experiment and make sure it's ready for real-time inference workflow.


In [None]:
endpoint = f"{base_url}/experiments"
response = requests.get(endpoint, headers=headers)
assert response.status_code == 200, f"List Base Experiment failed, got {response.json()}."
res = response.json()

# VISTA-3D
ptm_vista = [p for p in res if p["network_arch"] == "monai_vista3d" and not len(p["base_experiment"])][0]["id"]
print(f"Base Experiment ID for VISTA Experiment: {ptm_vista}")
    
data = {
  "name": "my_vista",
  "description": "based on vista",
  "network_arch": "monai_vista3d",
  "base_experiment": [ ptm_vista ],
  "inference_dataset": dataset_id,
  "eval_dataset": dataset_id,
  "train_datasets": [ dataset_id ],
  "realtime_infer": True, # Auto loads MONAI bundle and enables real-time inference
  "model_params":{
      "labels": {
            "1": "liver",
            "2": "kidney",
            "3": "spleen",
            "4": "pancreas",
            "5": "right kidney"
    }
  }
}

endpoint = f"{base_url}/experiments"
response = requests.post(endpoint, json=data, headers=headers)
assert response.status_code == 201, f"Create experiment failed, got {response.json()}."
res = response.json()
experiment_id = res["id"]
model_network = res["network_arch"]
print("Experiment creation succeeded with experiment ID:", experiment_id)
print("---------------------------------\n")
print(json.dumps(res, indent=2))

## Prepare the image ID for the inference request

Getting the ID of the image to process:
- The code sends a request to the "nextimage" action, instructing the system to automatically select and recommend the next image for processing.
- This feature is particularly useful for workflows that require sequential processing of images or when the user prefers the system to determine the processing order. However, it's also designed to be flexible. 
- While this script uses the "nextimage" action to get an image ID automatically, users have the option to specify an image_id manually if they need to process a particular image.

In [None]:
# get an inference image id with nextimage api
data = {
    "action": "nextimage"
}
endpoint = f"{base_url}/datasets/{dataset_id}/jobs"
response = requests.post(endpoint, json=data, headers=headers)

assert response.status_code == 201, f"Recommend image failed, got {response.json()}."
res = response.json()
inference_image_id = res["image"]
print(f"Recommended Image to annotate: {inference_image_id}")
print(json.dumps(res, indent=2))

## Triggering Inference on a Specified Image

Initiate an inference process on a particular image within an experiment

In [None]:
data = {
    "action": "inference",
    "specs": {
        "image": inference_image_id,
        "bundle_params": {
            "label_prompt": list(range(1, 118))  # inference all 117 classes
        },
    }
}

endpoint = f"{base_url}/experiments/{experiment_id}/jobs"
response = requests.post(endpoint, json=data, headers=headers)
assert response.status_code == 201, f"Run inference failed, got {response.json()}."
print("Inference Successful.  Label is returned")
print(response.headers)

In [None]:
attachment_data = response.content
tmp_save_path = "pred.nrrd"

with open(tmp_save_path, 'wb') as f:
    f.write(attachment_data)
print(f"Inference result downloaded to {tmp_save_path}")

## Stopping the experiment from Real-Time Inference mode

When the experiment is created with `realtime_infer` as `True`, it will reserve one GPU to process the inference requests.

After we have finished the inference process, we would like to release the GPU resource for other tasks.

To achieve this, we can switch the `realtime_infer` from `True` to `False`.

Note: this step is irreversible, which means you can't set the `realtime_infer` from `False` to `True`. To bootstrap another inference, you will have to create another experiment.

In [None]:
data = {
    "realtime_infer": False,
}

endpoint = f"{base_url}/experiments/{experiment_id}"
response = requests.patch(endpoint, json=data, headers=headers)
assert response.status_code == 200, f"stop job failed, got {response.json()}."

## Cleaning up
Delete the experiment and dataset after jobs are done.

In [None]:
endpoint = f"{base_url}/experiments/{experiment_id}"
response = requests.delete(endpoint, headers=headers)
assert response.status_code == 200, f"Delete experiment failed, got {response.json()}."
print(response)

endpoint = f"{base_url}/datasets/{dataset_id}"
response = requests.delete(endpoint, headers=headers)
assert response.status_code == 200, f"Delete dataset failed, got {response.json()}."
print(response)

## Conclusion

This tutorial showcases a streamlined approach to real-time inference, emphasizing automation in image selection and processing within a NVIDIA MONAI Cloud API-driven system. This method ensures efficient operations, allowing users to focus on model refinement and analysis while the system efficiently manages image selection and inference tasks, demonstrating the transformative potential of integrating advanced AI in real-time decision-making workflows.