# Brain Tumor Segmentation (BraTS) <br> EDA and Data Visualization

- **Author:** *Marcus Ng*
- **Date created:** *1st March, 2023*

## Overview

The goal of our project is to segment the tumor from the brain MRI scans. 

All BraTS multimodal scans are available as NIfTI files (.nii.gz) and describe:

- native (**T1**),
- post-contrast T1-weighted (**T1Gd**),
- T2-weighted (**T2**),
- T2 Fluid Attenuated Inversion Recovery (**T2-FLAIR**)

volumes, and were acquired with different clinical protocols and various scanners from multiple (n=19) institutions.

All the imaging datasets have been segmented manually, by one to four raters, following the same annotation protocol, and their annotations were approved by experienced neuro-radiologists. Annotations comprise:

- the necrotic and non-enhancing tumor core (**NCR/NET — label 1**),
- the peritumoral edema (**ED — label 2**),
- the GD-enhancing tumor (**ET — label 4**),
 
as described both in the [BraTS 2012-2013 TMI paper](https://ieeexplore.ieee.org/document/6975210) and in the [latest BraTS summarizing paper](https://arxiv.org/abs/1811.02629). The provided data are distributed after their pre-processing, i.e., co-registered to the same anatomical template, interpolated to the same resolution ($1 {mm}^3$) and skull-stripped.

## References

- [📄 The Multimodal Brain Tumor Image Segmentation Benchmark (BRATS)](https://ieeexplore.ieee.org/document/6975210)
- [📄 Advancing The Cancer Genome Atlas glioma MRI collections with expert segmentation labels and radiomic features](https://www.nature.com/articles/sdata2017117)
- [📄 Identifying the Best Machine Learning Algorithms for Brain Tumor Segmentation, Progression Assessment, and Overall Survival Prediction in the BRATS Challenge](https://arxiv.org/abs/1811.02629)

In [1]:
# Imports 

import os
from glob import glob

import matplotlib.patches as mpatches
import matplotlib.pyplot as plt
import nibabel as nib
import numpy as np
import pandas as pd
from matplotlib import animation, cm, colors, rc
from scipy import ndimage

MASK_LABELS:

This is a list of labels that represent different regions or types of tumors in the brain segmentation masks. Each label corresponds to a specific type of tumor region. In this case, the labels are:
'Non-Enhancing Tumor Core'
'Peritumoral Edema'
'GD-Enhancing Tumor'
These labels provide semantic information about the different regions of interest within the brain tumor segmentation masks. <br><br>
MASK_VALUES:

This is a list of numerical values assigned to each label in the segmentation masks. These numerical values are used to represent the corresponding labels in the image data.
For example, if a pixel in the segmentation mask represents 'Non-Enhancing Tumor Core', it will have a value of 0. Similarly, 'Peritumoral Edema' will have a value of 1, and 'GD-Enhancing Tumor' will have a value of 2.
The value 4 might represent background or other regions not classified under the specified tumor types.

## Exploratory Data Analysis

In [2]:
DATA_PATH = '../Dataset'
TRAIN_PATH = f'{DATA_PATH}/BraTS2020_TrainingData/MICCAI_BraTS2020_TrainingData'
TEST_PATH = f'{DATA_PATH}/BraTS2020_ValidationData/MICCAI_BraTS2020_ValidationData'
DATA_TYPES = ['flair', 't1', 't1ce', 't2', 'seg']
MASK_LABELS = ['Non-Enhancing Tumor Core',
               'Peritumoral Edema', 'GD-Enhancing Tumor']
MASK_VALUES = [0, 1, 2, 4]

This code snippet is designed for organizing and summarizing the file paths of neuroimaging data (specifically, NIfTI files, which are commonly used for storing MRI data) for a Brain Tumor Segmentation Exploratory Data Analysis (EDA). The code is structured to handle both training and testing datasets

In [3]:
train_data_paths = {
    data_type: sorted(
        glob(f'{TRAIN_PATH}/**/*_{data_type}.nii')
    ) for data_type in DATA_TYPES
}

train_data_paths['seg'].append(f'{TRAIN_PATH}/BraTS20_Training_355/W39_1998.09.19_Segm.nii')
train_data_paths['seg'] = sorted(train_data_paths['seg'])

for k, v in train_data_paths.items():
    print(f'[TRAIN] Number of {k} images: {len(v)}')
print()

test_data_paths = {
    data_type: sorted(
        glob(f'{TEST_PATH}/**/*_{data_type}.nii')
    ) for data_type in DATA_TYPES
}

for k, v in test_data_paths.items():
    print(f'[TEST] Number of {k} images: {len(v)}')

[TRAIN] Number of flair images: 369
[TRAIN] Number of t1 images: 369
[TRAIN] Number of t1ce images: 369
[TRAIN] Number of t2 images: 369
[TRAIN] Number of seg images: 369

[TEST] Number of flair images: 125
[TEST] Number of t1 images: 125
[TEST] Number of t1ce images: 125
[TEST] Number of t2 images: 125
[TEST] Number of seg images: 0


In [4]:
for path in sorted(glob(f'{TRAIN_PATH}/*')):
    if os.path.isdir(path):
        if not any(f.endswith('seg.nii') or f.endswith('Segm.nii') for f in os.listdir(path)):
            print(
                f'Missing segmentation mask for volume: {path.split("/")[-1]}.')