# Running Inference with a MONAI Segmentation Bundle

This tutorial is designed to show how to run inference on the given dataset with a pretrained or fine-tuned MONAI segmentation bundle on the NVIDIA DGX Cloud, focusing on leveraging the powerful capabilities of DGX systems for medical imaging applications. We will use a `spleen_deepedit_annotation` bundle to showcase this example.

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/NVIDIA/monai-cloud-api/blob/main/notebooks/Training%20a%20MONAI%20Segmentation%20Bundle.ipynb)

## Table of Contents

- Login with NGC Key
- Datasets Creation
- Experiment Creation
- Monitoring Job Status
- Clean Up

<a id='Setup'></a>

## Setup

In [None]:
import requests
import json
import time
import os

In [None]:
# Provided the following parameters to start this notebook.
host_url = "<monai service API address>"
ngc_api_key = os.environ.get('MONAI_API_KEY')
# Object storage info
client_id = "<user name for the object storage>"
client_secret = "<secret for the object storage>"
inference_manifest_url = "<inference manifest url>"


## Login with NGC Key

In [None]:
# Exchange NGC_API_KEY for JWT
data = json.dumps({"ngc_api_key": ngc_api_key})
response = requests.post(f"{host_url}/api/v1/login", data=data)
print(response.status_code)
assert response.status_code == 201, f"Login failed, got status code: {response.status_code}."
assert "user_id" in response.json().keys(), "user_id is not in response."
user_id = response.json()["user_id"]
print("User ID",user_id)
assert "token" in response.json().keys(), "token is not in response."
token = response.json()["token"]
print("JWT",token)

# Set base URL
base_url = f"{host_url}/api/v1/users/{user_id}"
print("API Calls will be forwarded to",base_url)

headers = {"Authorization": f"Bearer {token}"}


## Dataset Creation

### **1. Remote Object as Data Sources**

MONAI Cloud platform supports a range of other cloud storage solutions, including Azure Blob Storage, Google Cloud Storage (GCP) and Amazon S3, providing you with the flexibility to choose the service that best fits your project's needs. Below is an example of Azure:

**Steps:**
1. Creating a Storage Account and Container
   - **Storage Account**: Start by creating a new storage account in your Azure portal. This account will host your blob storage containers.
   - **Container Creation**: Within your storage account, create a new container. This container will hold your datasets.

2. Container URL
   - Once the container is created, you will be provided with a unique URL that can be used to access it. This URL will be essential for accessing your data.

## Obtaining Credentials

- **Access Keys**: Access your storage account and navigate to the 'Access keys' section. Here, you will find the necessary credentials to access your Blob Storage programmatically.
- **Shared Access Signature (SAS)**: Alternatively, you can create a SAS for more granular control over permissions and access duration.

## Creating a Manifest JSON File

In the root of your Azure container, create a manifest JSON file to keep track of your datasets. The file format is as follows:

For a segmentation task:
```json
{
    "root_path": "https://[your-storage-account-name].blob.core.windows.net/[your-container-name]/[subfolder-path]",
    "data": [
        {
            "image": {
                "path": ["path/to/your/image_1"],
                "id": "unique-uuid-1"
            }
        },
        // Additional data objects follow the same format
    ]
}
````

- Each dataset (training, testing, etc.) should have their own root directory
- All the data should be under a root directory
- The root directory should contain a `manifest.json` file
- The `manifest.json` file should contain "data" field, which is a list of all the data entries
- Each data entry should contain "image" fields
- Each "image" field should contain "path" field, which is the list of relative path to the image files
- Please provide the "id" field of the "image"/"label", if there is not one please provide a random uuid generated by `uuid` package

After preparing your dataset, please modify the following variables in [Setup](#Setup):

```python
access_id = ...
access_secret = ...
train_manifest_url = ...
val_manifest_url = ...
```

### **2. Create a dataset for inference**

In [None]:
# Inference dataset
data = {
    "name": "MONAI_seg_infer",
    "description":"Object storage dataset for training",
    "type": "semantic_segmentation",
    "format": "monai",
    "client_url": inference_manifest_url,
    "client_id": client_id,
    "client_secret": client_secret,
}
data=json.dumps(data)

endpoint = f"{base_url}/datasets"
print(endpoint)
print(headers)
response = requests.post(endpoint, data=data, headers=headers)
print(response.json())

assert response.status_code == 201, f"Create inference dataset failed, got {response.json()}."
res = response.json()
inference_dataset_id = res["id"]
print("Inference dataset creation succeeded with dataset ID:", inference_dataset_id)
print("---------------------------------\n")
print(json.dumps(res, indent=2))

## Experiment Creation

Create an experiment based on a MONAI segmentation bundle. In this notebook, we will use the spleen_deepeit_annotation bundle.

### **1. List Available Base Experiments**

In [None]:
endpoint = f"{base_url}/experiments"
response = requests.get(endpoint, headers=headers)
assert response.status_code == 200, f"List Base Experiments failed, got {response.json()}."
res = response.json()

# VISTA-3D
ptm_vista = [p for p in res if p["network_arch"] == "monai_vista3d" and not len(p["base_experiment"])][0]["id"]
print(f"Base Experiment ID for VISTA Experiment: {ptm_vista}")

# DeepEdit
ptm_annotation = [p for p in res if p["network_arch"] == "monai_annotation" and not len(p["base_experiment"])][0]["id"]
print(f"Base Experiment ID for DeepEdit(Annotation) Experiment: {ptm_annotation}")


### **2. Create Experiment**

In [None]:
data = {
  "name": "my_deepedit",
  "description": "based on spleen_deepedit",
  "network_arch": "monai_annotation",
  "type": "medical",
  "base_experiment": [ ptm_annotation ],
  "inference_dataset": inference_dataset_id,
}

endpoint = f"{base_url}/experiments"
response = requests.post(endpoint, json=data, headers=headers)
assert response.status_code == 201, f"Create experiment failed, got {response.json()}."
res = response.json()
experiment_id = res["id"]
model_network = res["network_arch"]
print("Experiment creation succeeded with experiment ID: ", experiment_id)
print("---------------------------------\n")
print(json.dumps(res, indent=2))


### **3. Run a DGX Inference Job**

In [None]:
inference_spec = {}
data = {"name": "deepedit_inference", "action": "batchinfer", "specs": inference_spec}
endpoint = f"{base_url}/experiments/{experiment_id}/jobs"
response = requests.post(endpoint, json=data, headers=headers)

assert response.status_code == 201, f"Run dgx inference job failed, got {response.json()}."
job_id = response.json()
print("Job creation succeeded with job ID: ", job_id)


## Monitoring Job Status and Downloading Job

Monitoring the status of your jobs is a crucial aspect of managing workflows effectively. In our system, the job monitoring feature provides a straightforward yet essential overview of your job's current state. Here's what you need to know:

1. **Basic Status Overview**: The monitoring functionality in our system is designed to inform you whether your jobs are in a pending, running, done, or error state. This status update allows you to quickly assess the overall progress and detect any immediate issues that may require attention.

Status interpretation:
- "Pending": MONAI cloud is looking for resources and preparing the datasets. This can take quite a while, and depends on the size of the dataset.
- "Running": MONAI cloud has submitted the job to the DGX. 
- "Done": The training is complete
- "Error": There is some error in the job. User probably wants to download the job as a `.tar.gz` archive and inspect the detailed log.

2. **Detailed Logging Through Download API**: For a more comprehensive view and detailed logging of your jobs, our platform offers a Download API. This API enables you to access in-depth logs, model checkpoints, and data outputs, which are instrumental for troubleshooting, in-depth analysis, and gaining insights into the specifics of your job's execution. The Download API is particularly useful if your job encounters an error or if you need to understand the performance and behavior of your job in greater detail.

In [None]:
# Helper functions for running jobs
def wait_for_job(endpoint, headers, timeout):
    start_time = time.time()
    response = requests.get(endpoint, headers=headers)
    assert response.status_code == 200, f"Failed to get job status, got {response.json()}."
    status = response.json()["status"].title()
    print("Waiting for job to complete...")
    print(status, end="", flush=True)
    while True:
        if status not in ["Pending", "Running"]:
            assert status == "Done", f"Job failed with status: {status}"
            break
        time.sleep(5)
        response = requests.get(endpoint, headers=headers)
        assert response.status_code == 200, f"Failed to get job status, got {response.json()}."
        status_new = response.json()["status"].title()
        if status_new != status:
            status = status_new
            print(f"\n{status}", end="", flush=True)
        else:
            print(".", end="", flush=True)
        if time.time() - start_time > timeout:
            print(f"Job timeout after {timeout} seconds.")
            break
    print(f"\nJob status: {status}")

# During the Job is Running 
endpoint = f"{base_url}/experiments/{experiment_id}/jobs/{job_id}"
response = requests.get(endpoint, headers=headers)

assert response.status_code == 200, f"Failed to get job status, got {response.json()}."
for k, v in response.json().items():
    if k != "result":
        print(f"{k}: {v}")
    else:
        print("result:")
        for k1, v1 in v.items():
            print(f"    {k1}: {v1}")

print("------------------------------------------------------------------------")
wait_for_job(endpoint, headers, timeout=1800)

## Cleaning Up

Delete the experiment after all jobs are done.

In [None]:
endpoint = f"{base_url}/experiments/{experiment_id}/jobs/{job_id}"
response = requests.get(endpoint, headers=headers)
# If the job is not done, need to cancel it first
if response.json()["status"] != "Done":
    endpoint = f"{base_url}/experiments/{experiment_id}/jobs/{job_id}:cancel"
    response = requests.post(endpoint, headers=headers)
    assert response.status_code == 200, f"Cancel job failed, got {response.json()}."
    print(response)

endpoint = f"{base_url}/experiments/{experiment_id}"
response = requests.delete(endpoint, headers=headers)
assert response.status_code == 200, f"Delete experiment failed, got {response.json()}."
print(response)

Delete datasets after the experiment is done.

In [None]:
# delete inference dataset
endpoint = f"{base_url}/datasets/{inference_dataset_id}"
response = requests.delete(endpoint, headers=headers)
assert response.status_code == 200, f"Delete train dataset failed, got {response.json()}."
print(response)

## Conclusion

Congratulations on reaching this pivotal milestone! With your dataset created and experiment selected, you're now fully equipped to leverage training features of the NVIDIA MONAI Cloud APIs for your medical imaging projects.