# Auto3Dseg Data Analyzer

Data Analysis is one of the MONAI Auto3Dseg modules. This module provides a comprehensive analysis report via using DataAnalyzer class. In this notebook, we will provide a tutorial on how to use the DataAnalyzer class on simulated and real-world datasets

## 1 Set up environment, imports and datasets
### 1.1 Set up Environment


In [1]:
!python -c "import monai" || pip install -q "monai-weekly[nibabel]"

### 1.2 Set up imports

In [2]:
import os
import nibabel as nib
import numpy as np
import tempfile

from monai.apps import download_and_extract
from monai.apps.auto3dseg import DataAnalyzer
from monai.data import create_test_image_3d

### 1.3 Simulate a dataset and Auto3D datalist using MONAI functions
#### 1.3.1 Create a datalist for the simulated datasets

In [3]:
sim_datalist = {
    "testing": [
        {"image": "val_001.fake.nii.gz"},
        {"image": "val_002.fake.nii.gz"},
        {"image": "val_003.fake.nii.gz"},
        {"image": "val_004.fake.nii.gz"},
        {"image": "val_005.fake.nii.gz"},
    ],
    "training": [
        {"fold": 0, "image": "tr_image_001.fake.nii.gz", "label": "tr_label_001.fake.nii.gz"},
        {"fold": 0, "image": "tr_image_002.fake.nii.gz", "label": "tr_label_002.fake.nii.gz"},
        {"fold": 0, "image": "tr_image_003.fake.nii.gz", "label": "tr_label_003.fake.nii.gz"},
        {"fold": 0, "image": "tr_image_004.fake.nii.gz", "label": "tr_label_004.fake.nii.gz"},
        {"fold": 0, "image": "tr_image_005.fake.nii.gz", "label": "tr_label_005.fake.nii.gz"},
        {"fold": 0, "image": "tr_image_006.fake.nii.gz", "label": "tr_label_006.fake.nii.gz"},
        {"fold": 0, "image": "tr_image_007.fake.nii.gz", "label": "tr_label_007.fake.nii.gz"},
        {"fold": 0, "image": "tr_image_008.fake.nii.gz", "label": "tr_label_008.fake.nii.gz"},
        {"fold": 0, "image": "tr_image_009.fake.nii.gz", "label": "tr_label_009.fake.nii.gz"},
        {"fold": 0, "image": "tr_image_010.fake.nii.gz", "label": "tr_label_010.fake.nii.gz"},
        {"fold": 1, "image": "tr_image_006.fake.nii.gz", "label": "tr_label_006.fake.nii.gz"},
        {"fold": 1, "image": "tr_image_007.fake.nii.gz", "label": "tr_label_007.fake.nii.gz"},
        {"fold": 1, "image": "tr_image_008.fake.nii.gz", "label": "tr_label_008.fake.nii.gz"},
        {"fold": 1, "image": "tr_image_009.fake.nii.gz", "label": "tr_label_009.fake.nii.gz"},
        {"fold": 1, "image": "tr_image_010.fake.nii.gz", "label": "tr_label_010.fake.nii.gz"},
        {"fold": 1, "image": "tr_image_011.fake.nii.gz", "label": "tr_label_011.fake.nii.gz"},
        {"fold": 1, "image": "tr_image_012.fake.nii.gz", "label": "tr_label_012.fake.nii.gz"},
        {"fold": 1, "image": "tr_image_013.fake.nii.gz", "label": "tr_label_013.fake.nii.gz"},
        {"fold": 1, "image": "tr_image_014.fake.nii.gz", "label": "tr_label_014.fake.nii.gz"},
        {"fold": 1, "image": "tr_image_015.fake.nii.gz", "label": "tr_label_015.fake.nii.gz"},
    ],
}

#### 1.3.2 Generate image data

In [4]:
def simulate():
    test_dir = tempfile.TemporaryDirectory()
    dataroot = test_dir.name

    # Generate a fake dataset
    for d in sim_datalist["testing"] + sim_datalist["training"]:
        im, seg = create_test_image_3d(39, 47, 46, rad_max=10)
        nib_image = nib.Nifti1Image(im, affine=np.eye(4))
        image_fpath = os.path.join(dataroot, d["image"])
        nib.save(nib_image, image_fpath)

        if "label" in d:
            nib_image = nib.Nifti1Image(seg, affine=np.eye(4))
            label_fpath = os.path.join(dataroot, d["label"])
            nib.save(nib_image, label_fpath)

    return dataroot, test_dir


sim_dataroot, test_dir = simulate()
print("data are generated and saved in this directory: ", sim_dataroot)

data are generated and saved in this directory:  /tmp/tmpiw4ai2hg


## 2 Run the DataAnalyzer on simulated datasets

In [5]:
analyser = DataAnalyzer(sim_datalist, sim_dataroot)
datastat = analyser.get_all_case_stats()
# pprint(datastat)

100%|██████████| 20/20 [00:00<00:00, 28.26it/s]


## 2 Perform data analysis on a real-world dataset

### 2.1 Setup data directory and download data

Here specify a directory with the `MONAI_DATA_DIRECTORY` environment variable to save downloaded dataset and outputs. The dataset comes from http://medicaldecathlon.com/.

In [6]:
root_dir = './'  # can also specify your own!
print(f"root dir is: {root_dir}")
msd_task = "Task05_Prostate"
resource = "https://msd-for-monai.s3-us-west-2.amazonaws.com/" + msd_task + ".tar"

compressed_file = os.path.join(root_dir, msd_task + ".tar")
dataroot = os.path.join(root_dir, msd_task)
if not os.path.exists(dataroot):
    download_and_extract(resource, compressed_file, root_dir)

root dir is: ./


Task05_Prostate.tar: 229MB [00:08, 27.4MB/s]                              


2022-09-28 16:32:41,231 - INFO - Downloaded: Task05_Prostate.tar
2022-09-28 16:32:41,233 - INFO - Expected md5 is None, skip md5 check for file Task05_Prostate.tar.
2022-09-28 16:32:41,234 - INFO - Writing into directory: ./.


In [8]:
datalist_file = os.path.join("..", "tasks", "msd", msd_task, "msd_" + msd_task.lower() + "_folds.json")

analyser = DataAnalyzer(datalist_file, dataroot)
datastat = analyser.get_all_case_stats()
# pprint(datastat)  # optionally you can print all the stats info 

100%|██████████| 30/30 [00:07<00:00,  4.07it/s]






### 2.2 Run the data analyzer in shell (via Python Fire)

If you have downloaded Task05_Prostate from previous step to your data directory `/workspace/data`, you can run the following in the terminal.

```bash
python monai.apps.auto3dseg DataAnalyzer get_all_case_stats \
            --datalist="../tasks/msd/Task05_Prostate/msd_task05_prostate_folds.json" \
            --dataroot="/workspace/data/Task05_Prostate"
```
