# Dataset Creation and Experiment Selection using NVIDIA MONAI Cloud APIs

In this guide, we'll walk you through the essential steps for creating a dataset and selecting a suitable base experiment for your medical imaging projects using NVIDIA MONAI Cloud APIs. These foundational steps are crucial for the success of any medical imaging project.

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/NVIDIA/monai-cloud-api/blob/main/notebooks/Dataset%20Creation%20and%20Experiment%20Selection.ipynb)

## Table of Contents

- Introduction
- Setup
- Dataset Creation
- Experiment Selection
- Deleting Datasets and Experiments
- Conclusion

## Introduction

Creating a coherent dataset and selecting the correct base experiment are cornerstones of any medical imaging project. NVIDIA MONAI Cloud APIs streamline this process, allowing you to focus on what's essential. This guide provides step-by-step instructions to facilitate these foundational steps.

### What You Can Expect to Learn

This notebook will introduce how to create and manage datasets and select experiments before running actual jobs. By following this guide, users can expect to gain a overall understanding of the role of `datasets` and `experiments` in NVIDIA MONAI Cloud APIs to organize data and experiment for a new project.

If you have not yet generated your key, or if you are unsure about the process, please follow our step-by-step for [Generating and Managing Your Credentials](./Generating%20and%20Managing%20Your%20Credentials.ipynb).

## Setup

In [None]:
!python -c "import requests" || pip install -q "requests"

import json
import os

import requests

### Parameters

The following cell contains all parameters that need to be replaced when executing.

In [None]:
# API Endpoint and Credentials
host_url = "https://api.monai.ngc.nvidia.com"
ngc_api_key = os.environ.get("MONAI_API_KEY", "<YOUR_API_KEY>")  # we recommend using environment variables for API keys, but you can also hardcode them here

# dicomweb parameters (will be introduced in Section: Dataset Creation)
dicom_web_endpoint = "<DICOMWeb address>" # Please fill it with the actual endpoint (usually ended with /dicom-web). For example "http://127.0.0.1:8042/dicom-web".
dicom_client_id = "<DICOMWeb user ID>"    # If Authentication is enabled, then provide username, otherwise fill it with the default username "orthanc"
dicom_client_secret = "<DICOMWeb secret>" # If Authentication is enabled, then provide password, otherwise fill it with the default password "orthanc"

# The cloud storage type used in this notebook. Currently only support `aws` and `azure`.
cloud_type = "azure" # cloud storage provider: aws or azure
cloud_account = "account_name" # if cloud_type == "aws"  should be "access_key"
cloud_secret = "access_key" # if cloud_type == "aws" should be "secret_key"

# Cloud storage credentials. Needed for storing the data and results of the experiments.
access_id = "<user name for the remote storage object>"  # Please fill it with the actual Access ID
access_secret = "<secret for the remote storage object>"  # Please fill it with the actual Access Secret

# Experiment Cloud Storage. This is the storage where your jobs and experiments data will be stored.
cs_bucket = "<bucket or container name to push experiment job data to>"  # Please fill it with the actual bucket name

In [None]:
# Exchange NGC_API_KEY for JWT
api_url = f"{host_url}/api/v1"
response = requests.post(f"{api_url}/login", data=json.dumps({"ngc_api_key": ngc_api_key}))
assert response.status_code == 201, f"Login failed, got status code: {response.status_code}."
assert "user_id" in response.json(), "user_id is not in response."
assert "token" in response.json(), "token is not in response."

uid = response.json()["user_id"]
token = response.json()["token"]

# Construct the URL and Headers
ngc_org = "iasixjqzw1hj"  # This is the default org for MONAI users. Please select the correct org if you are not using the default one.
base_url = f"{api_url}/orgs/{ngc_org}"
headers = {"Authorization": f"Bearer {token}"}

## Dataset Creation

#### Using a DICOMWeb Endpoint to Create Datasets

Below you'll find an example request along with associated parameters and description.

In [None]:
data = {
    "name": "mydataset",
    "description":"a demo dataset",
    "type": "semantic_segmentation",
    "format": "monai",
    "client_url": dicom_web_endpoint,
    "client_id": dicom_client_id,
    "client_secret": dicom_client_secret,
}

endpoint = f"{base_url}/datasets"
response = requests.post(endpoint, json=data, headers=headers)
assert response.status_code == 201, f"Create dataset failed, got {response.text}."
res = response.json()
dataset_id = res["id"]
print("Dataset creation succeeded with dataset ID: ", dataset_id)
print("---------------------------------\n")
print(json.dumps(res, indent=2))

## Experiment Selection

### Available Base Experiments

NVIDIA MONAI Cloud APIs boast a variety of base experiments (including pre-trained models and algorithm templates), each honed for different tasks including **DeepEdit**, **VISTA-3D** and **Auto3DSeg**.

**Recommendation:** Start with VISTA-3D. Its versatile design allows you to branch out and customize as your requirements evolve.

### List Available Base Experiments

When referring to experiments in API calls, you'll want to reference the Base Experiment ID when indicated.  You can see all available experiments by calling to the experiment API endpoint.

In [None]:
endpoint = f"{base_url}/experiments:base"
response = requests.get(endpoint, headers=headers)
assert response.status_code == 200, f"List base experiments failed, got {response.text}."
res = response.json()

# VISTA-3D
vista3d_base_exps = [p for p in res["experiments"] if p["network_arch"] == "monai_vista3d"]
assert len(vista3d_base_exps) > 0, "No base experiment found for VISTA 3D bundle"
print("List of available base experiments for VISTA 3D bundle:")
for exp in vista3d_base_exps:
    print(f"  {exp['id']}: {exp['name']} v{exp['version']}")
base_experiment = sorted(vista3d_base_exps, key=lambda x: x["version"])[-1]  # Take the latest version
version = base_experiment["version"]
base_exp_vista = base_experiment["id"]
print("-----------------------------------------------------------------------------------------")
print(f"Base experiment ID for '{base_experiment['name']}' v{base_experiment['version']}: {base_exp_vista}")
print("-----------------------------------------------------------------------------------------")


deepedit_base_exps = [p for p in res["experiments"] if p["network_arch"] == "monai_annotation" and not p["base_experiment"]]
assert len(deepedit_base_exps) > 0, "No base experiment found for MONAI Annotation (DeepEdit) bundle"
print("List of available base experiments for MONAI Annotation (DeepEdit) bundle:")
for exp in deepedit_base_exps:
    print(f"  {exp['id']}: {exp['name']} v{exp['version']}")
base_experiment = sorted(deepedit_base_exps, key=lambda x: x["version"])[-1]  # Take the latest version
version = base_experiment["version"]
base_exp_annotation = base_experiment["id"]
print("-----------------------------------------------------------------------------------------")
print(f"Base experiment ID for '{base_experiment['name']}' v{base_experiment['version']}: {base_exp_vista}")
print("-----------------------------------------------------------------------------------------")

### Create Experiment

1. **MONAI Bundle**: We're using the VISTA-3D bundle as an example. Choose the one fitting your application.
2. **Dataset Setup**: All data is under one dataset ID for this demo. Adjust as per your data structure.
3. **Pretrained Weights**: Opt for a pretrained model to enhance performance.
4. **Real-time Inference**: For real-time inference during annotation jobs or auto segmentation, set `realtime_infer` to **True** and provide an `inference_dataset`; otherwise, set it to **False**. In this example, we're setting it to **False** as we aren't initiating an annotation job..

In [None]:
experiment_cloud_details = {
    "cloud_type": cloud_type,
    "cloud_file_type": "folder",  # If the file is tar.gz key in "file", else "folder"
    "cloud_specific_details": {
        "cloud_bucket_name": cs_bucket,  # Bucket link to save files
        cloud_account: access_id,  # Access and Secret for Azure
        cloud_secret: access_secret,  # Access and Secret for Azure
    }
}

data = {
    "name": "monai_vista",
    "description": "Based on vista",
    "network_arch": "monai_vista3d",
    "type": "medical",
    "base_experiment": [ base_exp_vista ],
    "inference_dataset": dataset_id,
    "eval_dataset": dataset_id,
    "train_datasets": [ dataset_id ],
    "cloud_details": experiment_cloud_details,
    "realtime_infer": False,
}

endpoint = f"{base_url}/experiments"
response = requests.post(endpoint, json=data, headers=headers)
assert response.status_code == 201, f"Create experiment failed, got {response.json()}."
res = response.json()
experiment_id = res["id"]
print("Experiment creation succeeded with experiment ID:", experiment_id)
print("---------------------------------\n")
print(json.dumps(res, indent=2))

#### **Customize VISTA-3D Experiment**

The VISTA-3D model provides a comprehensive set of 132 classes. However, there might be scenarios where you need a subset of these classes or want to introduce new ones. Customizing is made easy with the MONAI Cloud APIs:

1. **Selecting a Subset of Classes**

If you're interested in specific classes such as liver, kidney, and spleen, you can choose them without using the entire set by modifying the request payload to add a `model_params` key, along with the `labels` you want included from the base 132 classes.

In [None]:
data = {
    "name": "my_vista_3_organ",
    "description": "based on vista",
    "network_arch": "monai_vista3d",
    "base_experiment": [ base_exp_vista ],
    "inference_dataset": dataset_id,
    "eval_dataset": dataset_id,
    "train_datasets": [ dataset_id ],
    "realtime_infer": True,
    "model_params":{
       "labels":{
           "1": "liver",
           "2": "kidney",
           "3": "spleen"
        }
    }
}

2. **Adding Custom Classes**

If you have specific classes not present in the base VISTA-3D model, you can easily add them. This customization allows developers to tailor the experiment to their specific needs, ensuring that only relevant classes are present, while also offering the flexibility to introduce new classes as needed.

In [None]:
data = {
    "model_params":{
        "labels":{
            "1": "liver",
            "2": "kidney",
            "133": "myorgan" # add customized class
        }
    }
}

## Deleting Datasets and Experiments

If you have created test datasets or experiments that are no longer needed, you can easily remove them using the MONAI Cloud APIs. Let's walk through the cleanup process.

### Deleting an Experiment

To delete an experiment, use the following API call. Remember to replace `<experiment_id>` with the actual ID of the experiment you want to delete.

In [None]:
endpoint = f"{base_url}/experiments/{experiment_id}"
response = requests.delete(endpoint, headers=headers)
assert response.status_code == 200, f"Delete experiment failed, got {response.json()}."
print(response.json())
print(response)

### Deleting a Dataset

To delete a dataset, use the provided API endpoint. Replace `<dataset_id>` with the ID of the dataset you wish to remove:

In [None]:
endpoint = f"{base_url}/datasets/{dataset_id}"
response = requests.delete(endpoint, headers=headers)
assert response.status_code == 200, f"Delete dataset failed, got {response.json()}."
print(response.json())
print(response)

These commands ensure that your work environment remains clutter-free, allowing for more efficient resource management.

## Conclusion

Bravo! You have successfully created a dataset and selected an experiment, setting the stage to harness the full capabilities of the NVIDIA MONAI Cloud APIs. Always keep your workspace organized, and you'll find that managing complex projects becomes significantly more straightforward. The subsequent notebooks will cover executing annotations and continual learning tasks, or utilizing platforms like the OHIF Viewer.