## Guide to Annotation and Continual Learning with NVIDIA MONAI Cloud APIs

This guide delves into the processes of annotation and continual learning using NVIDIA MONAI Cloud APIs. As the bedrock of medical imaging, accurate annotations are pivotal, and the continual refinement of models ensures they deliver the best results over time. We'll walk through the various steps and considerations involved in this process.

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/NVIDIA/monai-cloud-api/blob/main/notebooks/Annotation%20and%20Continual%20Learning%20Overview.ipynb)

## Table of Contents

- Introduction
- Setup
- Creating a New Dataset for Annotation
- Configuring Annotation and Continual Learning Parameters
- VISTA Workflows
- Annotation Workflow
- Stopping a Continual Learning Job
- Stopping the Experiment from Realtime Inference mode
- Check the Training Job Results
- Conclusion

## Introduction

Annotation and Continual Learning are core features of NVIDIA MONAI Cloud APIs, streamlining the process of refining datasets and enhancing model performance progressively. Continual learning leverages accumulated annotations to improve the model iteratively. This guide will assist you in setting up and optimizing these critical tasks.

Before diving into annotation and continual learning, we're going to quickly create our dataset and experiment that will be used for the annotation workflow.  

### What You Can Expect to Learn

The objective is to demonstrate how you can utilize the APIs to ensure your models adapt and improve over time with new data inputs. We will show you how to configure your datasets, manage annotation tasks, and effectively employ continual learning strategies to maximize the accuracy and efficiency of your models. By the end of this notebook, you will have a solid understanding of the annotation process and continual learning mechanisms within the MONAI Cloud API platform, empowering you to initiate these practices in your own projects.


**Note:** We're going to use the `realtime_infer` parameter when creating our experiment as that will automatically load the experiment and make sure it's ready for our annotation and continual learning workflow.

We've covered these steps in-depth in our other notebooks, you can find them below. If you haven't already gone through those notebooks, we encourage you to go back and review those first.

- [Generating and Managing Your Credentials](./Generating%20and%20Managing%20Your%20Credentials.ipynb)
- [Dataset Creation and Experiment Selection](./Dataset%20Creation%20and%20Experiment%20Selection.ipynb)
- [Perform Real-time Inference](./Perform%20Real-time%20Inference.ipynb)

## Setup

In [None]:
!python -c "import requests" || pip install -q "requests"

import json
import os
import time

import requests

#### Required Parameters

In [None]:
# API Endpoint and Credentials
host_url = "https://api.monai.ngc.nvidia.com"
ngc_api_key = os.environ.get("MONAI_API_KEY", "<YOUR_API_KEY>")  # we recommend using environment variables for API keys, but you can also hardcode them here

# Dicom Server
dicom_web_endpoint = "<DICOMWeb address>" # Please fill it with the actual endpoint (usually ended with /dicom-web). For example "http://127.0.0.1:8042/dicom-web".
dicom_client_id = "<DICOMWeb user ID>"    # If Authentication is enabled, then provide username, otherwise fill it with the default username "orthanc"
dicom_client_secret = "<DICOMWeb secret>" # If Authentication is enabled, then provide password, otherwise fill it with the default password "orthanc"

# The cloud storage type used in this notebook. Currently only support `aws` and `azure`.
cloud_type = "azure" # cloud storage provider: aws or azure
cloud_account = "account_name" # if cloud_type == "aws"  should be "access_key"
cloud_secret = "access_key" # if cloud_type == "aws" should be "secret_key"

# Cloud storage credentials. Needed for storing the data and results of the experiments.
access_id = "<user name for the remote storage object>"  # Please fill it with the actual Access ID
access_secret = "<secret for the remote storage object>"  # Please fill it with the actual Access Secret

# Experiment Cloud Storage. This is the storage where your jobs and experiments data will be stored.
cs_bucket = "<bucket or container name to push experiment job data to>"  # Please fill it with the actual bucket name

#### Login into NGC and API Setup

In [None]:
# Exchange NGC_API_KEY for JWT
api_url = f"{host_url}/api/v1"
response = requests.post(f"{api_url}/login", json={"ngc_api_key": ngc_api_key})
response.raise_for_status()
assert "user_id" in response.json(), "user_id is not in response."
assert "token" in response.json(), "token is not in response."
user_id = response.json()["user_id"]
token = response.json()["token"]

# Construct the URL and Headers
ngc_org = "iasixjqzw1hj"  # This is the default org for MONAI users. Please select the correct org if you are not using the default one.
base_url = f"{api_url}/orgs/{ngc_org}"
headers = {"Authorization": f"Bearer {token}"}
print("API Calls will be forwarded to", base_url)

# MLFlow server
use_mlflow = False  # If you want to use MLFlow, set this to True.
mlflow_server_address = ""  # For example "http://127.0.0.1:5000".
mlflow_experiment_name = ""  # For example "my_experiment"

## Creating a New Dataset for Annotation

We'll start by creating a new dataset for annotation. The dataset, hosted on a DICOMweb server, will be accessed using the `dicomweb` protocol.

In [None]:
data = {
    "name": "mydataset",
    "description":"a demo dataset",
    "type": "semantic_segmentation",
    "format": "monai",
    "client_url": f"{dicom_web_endpoint}",
    "client_id": f"{dicom_client_id}",
    "client_secret": f"{dicom_client_secret}",
}

endpoint = f"{base_url}/datasets"
response = requests.post(endpoint, json=data, headers=headers)
assert response.status_code == 201, f"Create dataset failed, got {response.json()}."
res = response.json()
dataset_id = res["id"]
print("Dataset creation succeeded with dataset ID: ", dataset_id)
print("---------------------------------\n")
print(json.dumps(res, indent=2))

## Creating a New Experiment for Annotation

#### Find the base experiment for VISTA-3D

In [None]:
endpoint = f"{base_url}/experiments:base"
response = requests.get(endpoint, headers=headers)
assert response.status_code == 200, f"List base experiments failed, got {response.text}."
res = response.json()

# VISTA-3D
vista3d_base_exps = [p for p in res["experiments"] if p["network_arch"] == "monai_vista3d"]
assert len(vista3d_base_exps) > 0, "No base experiment found for VISTA 3D bundle"
print("List of available base experiments for VISTA 3D bundle:")
for exp in vista3d_base_exps:
    print(f"  {exp['id']}: {exp['name']} v{exp['version']}")
base_experiment = sorted(vista3d_base_exps, key=lambda x: x["version"])[-1]  # Take the latest version
version = base_experiment["version"]
base_exp_vista = base_experiment["id"]
print("-----------------------------------------------------------------------------------------")
print(f"Base experiment ID for '{base_experiment['name']}' v{base_experiment['version']}: {base_exp_vista}")
print("-----------------------------------------------------------------------------------------")

### Create workspace for experiments to upload results

In [None]:
cloud_data = {
    "name": "Azure workspace info",  # A representative name for this cloud info
    "cloud_type": cloud_type,
    "cloud_specific_details": {
        "cloud_bucket_name": cs_bucket,
        cloud_account: access_id,
        cloud_secret: access_secret,
    },
}

endpoint = f"{base_url}/workspaces"
response = requests.post(endpoint, json=cloud_data, headers=headers)

assert response.status_code == 201, f"Create workspace failed, got {response.text}."
workspace_id = response.json()["id"]
print("Workspace creation succeeded with workspace ID: ", workspace_id)
print("---------------------------------\n")
print(json.dumps(res, indent=2))

Next, we create a new experiment tailored for annotation, utilizing the `realtime_infer` parameter to ensure the readiness for inference and continual learning. We'll specify the `labels` to indicate what labels we want to continually learn from.

In [None]:
data = {
    "name": "my_vista",
    "description": "based on vista",
    "network_arch": "monai_vista3d",
    "base_experiment": [ base_exp_vista ],
    "inference_dataset": dataset_id,
    "eval_dataset": dataset_id,
    "train_datasets": [ dataset_id ],
    "realtime_infer": True, # Auto loads MONAI bundle and enables real-time inference
    "workspace": workspace_id,
    "model_params":{
        "labels": {
            "1": "liver",
            "2": "kidney",
            "3": "spleen",
            "4": "pancreas",
            "5": "right kidney"
        }
    }
}

endpoint = f"{base_url}/experiments"
response = requests.post(endpoint, json=data, headers=headers)
assert response.status_code == 201, f"Create experiment failed, got {response.json()}."
res = response.json()
experiment_id = res["id"]
print("Experiment creation succeeded with experiment ID:", experiment_id)
print("---------------------------------\n")
print(json.dumps(res, indent=2))

## Configuring Annotation and Continual Learning Parameters

Continual learning is the backbone of keeping our models accurate and up-to-date. As new data is annotated, the model can learn and adapt. To initiate this process, we must define certain parameters to guide the system how and when to refine the model.

With this job setup, the model will be fine-tuned with newly labeled samples after a specific number of notifications. A fine-tuned model can produce a better annotation results, thus enhancing the annotation efficiency.

*Note: If you prefer to only annotate data without the continual learning process, you can simply skip this step. You can still use the annotation tools and workflows outlined in the upcoming sections independently.*

### API Call for Continual Learning Job

Below is the API call required to initiate a continual learning job for a model:

In [None]:
train_spec = {
    "epochs": 2,
    "val_interval": 1,
}


if use_mlflow:
    mlflow_spec = {
        "tracking": "mlflow",
        "tracking_uri": f"{mlflow_server_address}",
        "experiment_name": f"{mlflow_experiment_name}",
        "save_execute_config": False
    }
    train_spec.update(mlflow_spec)

data = {
    "action": "annotation",
    "specs": {
        "round_size": 1,  # round_size: number of images to annotate in each round, e.g. notify at least 2 different image_ids
        "stop_criteria": {
            "max_rounds": 2,
            "key_metric": 0.9,
        },
        "train_spec": train_spec,
    }
}

endpoint = f"{base_url}/experiments/{experiment_id}/jobs"
response = requests.post(endpoint, json=data, headers=headers)

assert response.status_code == 201, f"Run job failed, got {response.json()}."
cl_job_id = response.json()
print("Job creation succeeded with job ID: ", cl_job_id)

**Parameter Details**:
- `round_size`: Specifies how many new annotations are needed to trigger a new fine-tuning round for the model.
- `stop_criteria`: Criteria to decide when the continual learning job should cease. 
    - `max_rounds`: Determines the maximum rounds the job should run.
    - `key_metric`: (Optional) If specified, the job will keep running until the designated evaluation metric reaches the value set.
- `train_spec`: Overrides certain parameters in the model for this particular training. If you have an MLflow server set up, you can add its  parameters under tracking to enable logging metrics with MLflow.

#### Check Job Status

Ensure the continual learning job is up and running as expected:

In [None]:
def wait_for_job(endpoint, headers, timeout=1800, interval=5, target_status="Done"):
    """Helper function to wait for job to reach target status."""
    expected = ["Pending", "Running", "Done"]
    assert target_status in expected, f"Invalid target status: {target_status}"
    status_before_target = expected[:expected.index(target_status)]
    start_time = time.time()
    print(f"Waiting for job to reach state {target_status} ...")
    status = None
    while True:
        response = requests.get(endpoint, headers=headers)
        response.raise_for_status()
        status_new = response.json()["status"].title()
        if time.time() - start_time > timeout:
            print(f"\nJob timeout after {timeout} seconds with last status {status_new}.")
            break
        elif status_new not in status_before_target:
            assert status_new == target_status, f"Job failed with status: {status_new}"
            print(f"\nJob reached target status: {status_new}")
            break
        print(f"\n{status_new}", end="", flush=True) if status_new != status else print(".", end="", flush=True)
        status = status_new
        time.sleep(interval)

 
endpoint = f"{base_url}/experiments/{experiment_id}/jobs/{cl_job_id}"
response = requests.get(endpoint, headers=headers)
wait_for_job(endpoint, headers, timeout=60, interval=1, target_status="Running")

### Using MLflow to Monitor Metrics

If you've set up MLflow and included the relevant parameters in your continual learning job, you can actively monitor the training metrics through the platform. This is invaluable for gauging the performance of your model in real-time and making timely interventions when necessary.

![Inference Auto Segmentation](./end2end_pic/mlflow.png)

## VISTA Workflows

Deep-dive into specific workflows that allow refined interaction with the model:

1. **Segment All Classes**: Users can analyze an entire image without specific prompts, offering a comprehensive overview.
2. **Using Class Prompts**: Users direct the model's focus towards one or more specific classes. Class-based segmentation can enable a specialized focus on a particular disease/organ.
3. **Using Point Prompts**: Users specify a sequence of background and foreground clicks to guide the model’s focus, particularly when used together with class prompts.

These workflows also integrate seamlessly with the OHIF Plugin for an enhanced visual experience, we'll walk through the OHIF experience below along with the accompanying API call used in the background.

### Using Segment Everything
By default, the VISTA-3D Experiment provides 132 classes and using the Auto Segmentation panel, you can run inferencing use all available classes.

**Steps**
1. Click the `run` button under the `Auto Segmentation panel` to obtain the segmentation mask for all classes.

![Inference Auto Segmentation](./end2end_pic/inference_as.png)

The associated API call run when you click the `Run` button is below:

In [None]:
# get an inference image id with nextimage api
data = {
    "action": "nextimage"
}
endpoint = f"{base_url}/datasets/{dataset_id}/jobs"
response = requests.post(endpoint, json=data, headers=headers)

assert response.status_code == 201, f"Recommend image failed, got {response.json()}."
res = response.json()
inference_image_id = res["image"]
print(f"Recommended Image to annotate: {inference_image_id}")
print(json.dumps(res, indent=2))

data = {
    "action": "inference",
    "specs": {
        "image": inference_image_id,
        "bundle_params": {
            "label_prompt": list(range(1, 133))  # inference all 132 classes
        },
    }
}

endpoint = f"{base_url}/experiments/{experiment_id}/jobs"
response = requests.post(endpoint, json=data, headers=headers)
assert response.status_code == 201, f"Run inference failed, got {response.json()}."
print("Inference Successful.  Label is returned")
print(response.headers)

### Using Class Prompts
Instead of using all 132 labels, you can select a few labels that you're interested in and run inference only on those classes.  If you're using a customize version of VISTA-3D as referenced in our [Dataset Creation and Experiment Selection](./Dataset-Creation-and-Experiment-Selection.ipynb) notebook, you'll see only the classes you created with the model listed in this section.

**Steps**
 1. Click the `Class Prompts` panel.
 2. Select classes that you want to inference with class prompts.
 3. Click the `Run` button to get the inference result.

![Inference Point Prompts](./end2end_pic/inference_class_prompts.png)

After a few seconds, you will see the inference result.

![Inference Point Prompts Result](./end2end_pic/inference_class_prompts_res.png)

The associated API call run when you click the `Run` button is below:

In [None]:
bundle_params = {
    "label_prompt": [1, 2, 3, 4, 5], # Whichever classes were selected
}

data = {
    "action": "inference",
    "specs": {
        "image": inference_image_id,
        "bundle_params": bundle_params,
    }
}

endpoint = f"{base_url}/experiments/{experiment_id}/jobs"
response = requests.post(endpoint, json=data, headers=headers)
assert response.status_code == 201, f"Run inference failed, got {response.json()}."
print("Inference Successful.  Label is returned")
print(response.headers)

### Using Point Prompts
Last, instead of using only class prompts, you can use point+class prompts.  This allows you to add points to the indicated classes to help guide the model and refine your segmentation using an interactive workflow.

**Steps**
 1. Click the `Point Prompts` panel.
 2. Select a class that you want to inference with point prompts.
 3. Add some point to the image where you want to get the mask by clicking.
 4. Click the `Run` button to get the inference result.

 ![Inference Point Prompts](./end2end_pic/inference_point.png)

After a few seconds, you will see the inference result.

 ![Inference Point Prompts Result](./end2end_pic/inference_point_res.png)

If you want to clear some points, you can either clear specific class points or clear all points by clicking the `Clear Points` or `Clear All Points` button.

![Clear Points](./end2end_pic/clearpoints.png)

The associated API call run when you click the `Run` button is below:

In [None]:
bundle_params = {
    "points": [[20,20,20], [20, 40, 60]],
    "point_labels": [2, 2],
    "label_prompt": [3],
}

data = {
    "action": "inference",
    "specs": {
        "image": inference_image_id,
        "bundle_params": bundle_params,
    }
}

endpoint = f"{base_url}/experiments/{experiment_id}/jobs"
response = requests.post(endpoint, json=data, headers=headers)
assert response.status_code == 201, f"Run inference failed, got {response.json()}."
print("Inference Successful.  Label is returned")
print(response.headers)

## Annotation Workflow

Annotating medical images efficiently and precisely is a multi-step process. Here's a breakdown of the typical workflow you'd employ when using NVIDIA MONAI Cloud APIs and OHIF. We'll cover any relevant APIs not already covered as we walk through the workflow.

`Load Image` --> `Run Inference` --> `Annotate/Fix Annotation` --> `Save /Notify` --> `Repeat`

### 1. **Load Image**

Begin by loading the desired medical image that you wish to annotate. If you're using OHIF, you'll see the study list and can select a patient the annotate.  Make sure to use the `MONAI Service` to load the NVIDIA MONAI Cloud API plugin.

![Select an image](end2end_pic/selectanimage.png)

If you're using the API directly, you can use the `nextimage` endpoint.

### 2. **Run Inferencing Using Selected Method**

Choose one of the inferencing methods discussed above:

1. **Segment All Classes**
2. **Using Class Prompts**
3. **Using Point Prompts**

Once you've picked your preferred method, run the inference to get an initial annotation.

![allclass](./end2end_pic/allclassohif.png)

### 3. **Annotate / Refine Annotations**

With the initial mask in place, you might notice areas that require manual tweaking. Use the provided annotation tools to:

- Refine boundaries
- Add or remove regions

This step ensures that your annotations are as accurate as possible.

**Steps**
1. Click the Segmentation button.
2. Select a class of segmentation that needs to be updated.
3. Select a segmentation tool.
4. Update the segmentation with this tool.

![Annotate](./end2end_pic/annotate.png)

### 4. **Save and Notify the Server**

Once you're satisfied with your annotations, the first step is to save the annotated image, ensuring that your work is captured. This will write back the image using the DICOMWeb protocal back to your datastore.

![Save Label](./end2end_pic/savelabel.png)

Next, notify the server that an image has been annotated. This step is crucial for continual learning. The system will take note of the new annotations and after the indicated number of annotated images it will use them to improve the model over time.

![Notify](end2end_pic/notify.png)

The associated API call run when you click the `Notify Server` button is below:

```python
# After uploading a DICOM Seg into DICOM Web
endpoint = f"{base_url}/datasets/{dataset_id}/jobs"
label_id = "<series_id_1>"
data = {
    "action": "notify",
    "specs": {
        "added": {
            "image": inference_image_id,
            "label": label_id,
        },
        "updated": [],
        "removed": [],
    }
}

response = requests.post(endpoint, json=data, headers=headers)
if response.status_code == 201:
    print("Notified.")
else:
    print(response.json())
    print(response)
```

### 5. **Repeat**

Continue the process for all the images in your dataset. With each iteration, not only do you expand your annotated dataset, but you also contribute to the model's learning, making future annotations even more accurate.

You can check the job log of continual learning by:

In [None]:
endpoint = f"{base_url}/experiments/{experiment_id}/jobs/{cl_job_id}"
response = requests.get(endpoint, headers=headers)
assert response.status_code == 200, f"Failed to get job status, got {response.json()}."
status = response.json()["status"].title()
if status in ["Running", "Done", "Error"]:
    endpoint = f"{base_url}/experiments/{experiment_id}/jobs/{cl_job_id}/logs"
    response = requests.get(endpoint, headers=headers)
    assert response.status_code == 200, f"Failed to get job logs, got {response.text}."
    print(response.text)
else:
    print(f"Job status: {status}, logs are not available.")

## Stopping a Continual Learning Job

As your model refines itself over time using continual learning, there might come a point where you need to halt the ongoing CL job. Whether you're satisfied with the model's performance or have other reasons, here's how you can stop the CL job:

In [None]:
# Manually stop the CL job. No need to execute this cell if the job has reached the stop criteria.
endpoint = f"{base_url}/experiments/{experiment_id}/jobs/{cl_job_id}"
response = requests.get(endpoint, headers=headers)
if response.json()["status"] != "Done":
    endpoint = f"{base_url}/experiments/{experiment_id}/jobs/{cl_job_id}:cancel"
    response = requests.post(endpoint, headers=headers)
    assert response.status_code == 200, f"cancel job failed, got {response.json()}."

## Stopping the Experiment from Realtime Inference mode

When the experiment is created with `realtime_infer` as `True`, it will reserve one GPU to process the inference requests.

After we have finished the inference process, we would like to release the GPU resource for other tasks.

To achieve this, we can switch the `realtime_infer` from `True` to `False`.

Note: this step is irreversible, which means you can't set the `realtime_infer` from `False` to `True`. To bootstrap another inference, you will have to create another experiment.

In [None]:
data = {
    "realtime_infer": False,
}

endpoint = f"{base_url}/experiments/{experiment_id}"
response = requests.patch(endpoint, json=data, headers=headers)
assert response.status_code == 200, f"stop job failed, got {response.json()}."

## Check the Training Job Results

After you've trained the model, you might want to check the details.  Here's how you can accomplish that using the following APIs:

In [None]:
# List all jobs and pick one job that meets your requirement.
endpoint = f"{base_url}/experiments/{experiment_id}/jobs"
response = requests.get(endpoint, headers=headers)

assert response.status_code == 200, f"List all jobs failed, got {response.json()}."
job_metas = response.json()["jobs"]
for job_meta in job_metas:
    if job_meta["id"] == cl_job_id:
        print("Continual Learning Job status: ", job_meta["status"])
    else:
        train_job_id = job_meta["id"]
        print(f"Training Job {train_job_id} status: ", job_meta["status"])
        if job_meta["status"] == "Done":
            print(f"Training Job {train_job_id} completed with key metric: ", job_meta["result"]["key_metric"])

In [None]:
# Pick a job id from the last cell output. For example, choose train_job_id
if "train_job_id" in locals():
    endpoint = f"{base_url}/experiments/{experiment_id}/jobs/{train_job_id}"
    response = requests.get(endpoint, headers=headers)
    if response.json()["status"] != "Done":
        endpoint = f"{base_url}/experiments/{experiment_id}/jobs/{train_job_id}:cancel"
        response = requests.post(endpoint, headers=headers)
        assert response.status_code == 200, f"cancel train job failed, got {response.json()}."

### Detailed Logging Through Download API

For a more comprehensive view and detailed logging of your jobs, our platform offers a Download API. This API enables you to access in-depth logs and gaining insights into the specifics of your job's execution. The Download API is particularly useful if your job encounters an error or if you need to understand the performance and behavior of your job in greater detail.

In [None]:
if "download_job_id" in locals():
    endpoint = f"{base_url}/experiments/{experiment_id}/jobs/{download_job_id}"
    response = requests.get(endpoint, headers=headers)
    if response.json()["status"] in ["Running", "Done", "Error"]:
        # Download the job log
        endpoint = f"{base_url}/experiments/{experiment_id}/jobs/{download_job_id}/logs"
        response = requests.get(endpoint, headers=headers)
        assert response.status_code == 200, f"Failed to download job log, got {response.json()}."
        print(response.text)

### Check the job results (checkpoint, scripts, logs, etc.)

You'll find the results in the cloud storage bucket you specified when creating the experiment. The results will include the model checkpoints, scripts, logs, and other relevant data.

The path to the results will be in the following format:

```python
f"{bucket_name}/results/{job_id}"
```

## Cleaning Up

After completing your jobs, it's good practice to clean up any experiments and datasets that are no longer needed. This helps maintain an organized workspace and ensures efficient resource management.

In [None]:
endpoint = f"{base_url}/experiments/{experiment_id}"
response = requests.delete(endpoint, headers=headers)
assert response.status_code == 200, f"Delete experiment failed, got {response.json()}."
print(response)

endpoint = f"{base_url}/datasets/{dataset_id}"
response = requests.delete(endpoint, headers=headers)
assert response.status_code == 200, f"Delete dataset failed, got {response.json()}."
print(response)

endpoint = f"{base_url}/workspaces/{workspace_id}"
response = requests.delete(endpoint, headers=headers)
assert response.status_code == 200, f"Delete workspace failed, got {response.text}."
print(response)

## Conclusion

Remember, NVIDIA MONAI Cloud APIs are designed to make the process intuitive and efficient, allowing you to concentrate on the quality of your annotations while the technical details are managed in the background. Take full advantage of continual learning and annotation with NVIDIA MONAI Cloud APIs to achieve excellence in medical imaging.