#  Fetching Brain Tumor Segemntation Dataset

In this notebook, we will learn:
- how we can use [MONAI Core APIs](https://github.com/Project-MONAI/MONAI) to download the brain tumor segmentation data from the [Medical Segmentation Decathlon](http://medicaldecathlon.com) challenge.
- how we can upload the dataset to Weights & Biases and use it as a dataset artifact.

## 🌴 Setup and Installation

First, let us install the latest version of both MONAI and Weights and Biases.

In [1]:
!pip install -q -U "monai[nibabel, tqdm]"
!pip install -q -U wandb

[33mDEPRECATION: flatbuffers 1.12.1-git20200711.33e2d80-dfsg1-0.6 has a non-standard version number. pip 24.1 will enforce this behaviour change. A possible replacement is to upgrade to a newer version of flatbuffers or contact the author to suggest that they release a version with a conforming version number. Discussion can be found at https://github.com/pypa/pip/issues/12063[0m[33m
[0m[33mDEPRECATION: flatbuffers 1.12.1-git20200711.33e2d80-dfsg1-0.6 has a non-standard version number. pip 24.1 will enforce this behaviour change. A possible replacement is to upgrade to a newer version of flatbuffers or contact the author to suggest that they release a version with a conforming version number. Discussion can be found at https://github.com/pypa/pip/issues/12063[0m[33m
[0m

In [2]:
import os
import wandb
from monai.apps import DecathlonDataset



## 🌳 Initialize a W&B Run

We will start a new W&B run to start tracking our experiment.

In [3]:
wandb.init(
    project="brain-tumor-segmentation",
    entity="lifesciences",
    job_type="fetch_dataset"
)

[34m[1mwandb[0m: Currently logged in as: [33mgeekyrakshit[0m ([33mlifesciences[0m). Use [1m`wandb login --relogin`[0m to force relogin


## 🍁 Fetching the Dataset using MONAI

The [`monai.apps.DecathlonDataset`](https://docs.monai.io/en/stable/apps.html#monai.apps.DecathlonDataset) lets us automatically download the data of [Medical Segmentation Decathlon challenge](http://medicaldecathlon.com/) and generate items for training, validation, or testing. We will use this API in the later notebooks to load and transform our datasets automatically.

In [4]:
# Make the dataset directory
os.makedirs("./dataset/", exist_ok=True)

# Fetch the training split of the brain tumor segmentation dataset
train_dataset = DecathlonDataset(
    root_dir="./dataset/",
    task="Task01_BrainTumour",
    section="training",
    download=True,
    cache_rate=0.0,
    num_workers=4,
)

# Fetch the validation split of the brain tumor segmentation dataset
val_dataset = DecathlonDataset(
    root_dir="./dataset/",
    task="Task01_BrainTumour",
    section="validation",
    download=False,
    cache_rate=0.0,
    num_workers=4,
)

# Fetch the test split of the brain tumor segmentation dataset
test_dataset = DecathlonDataset(
    root_dir="./dataset/",
    task="Task01_BrainTumour",
    section="test",
    download=False,
    cache_rate=0.0,
    num_workers=4,
)

Task01_BrainTumour.tar: 7.09GB [05:55, 21.4MB/s]                               

2024-04-18 22:17:59,197 - INFO - Downloaded: dataset/Task01_BrainTumour.tar





2024-04-18 22:18:10,984 - INFO - Verified 'Task01_BrainTumour.tar', md5: 240a19d752f0d9e9101544901065d872.
2024-04-18 22:18:10,985 - INFO - Writing into directory: dataset.


In [5]:
print("Train Set Size:", len(train_dataset))
print("Validation Set Size:", len(val_dataset))
print("Test Set Size:", len(test_dataset))

Train Set Size: 388
Validation Set Size: 96
Test Set Size: 266


## 💿 Upload the Dataset to W&B as an Artifact

[W&B Artifacts](https://docs.wandb.ai/guides/artifacts) can be used to track and version any serialized data as the inputs and outputs of your W&B Runs. For example, a model training run might take in a dataset as input and a trained model as output.

Let us now see how we can upload this dataset as a W&B artifact.

In [6]:
artifact = wandb.Artifact(name="decathlon_brain_tumor", type="dataset")
artifact.add_dir(local_path="./dataset/")
wandb.log_artifact(artifact)

[34m[1mwandb[0m: Adding directory to artifact (./dataset)... Done. 24.3s


<Artifact decathlon_brain_tumor>

Now we end the experiment by calling `wandb.finish()`.

In [7]:
wandb.finish()

VBox(children=(Label(value='14510.691 MB of 14510.691 MB uploaded\r'), FloatProgress(value=1.0, max=1.0)))