# Objectives:

> In this file, you will learn the following:
>    
><ol>
>  <li>How to store and access large heterogenous medical imaging data</li>
>  <li>Characteristics and composition of an MRI scan</li>
>  <li>Brain as a 3D structure</li>
>  <li>The Segmentation Problem</li>
></ol>

## Import Modules

In [5]:
import tensorflow as tf
import os
import h5py
import numpy as np

## 1. Data Storage and Access

Throughtout this project, you will be working with the [Medical Segmentation Decathlon Challenge, Brain Tumor Dataset](https://arxiv.org/abs/2106.05735). This dataset consists of clinically acquired MRI scans of patients having brain tumor of type glioma. Patients either have glioblastoma (GBM/HGG) or lower-grade glioma (LGG). These MRI scans are multi-parametric, meaning the MRI is composed of four different types of images : T1-weighted (T1), post-Gadolinium (Gd) contrast T1-weighted (T1-Gd), native T2-weighted (T2),
and T2 Fluid-Attenuated Inversion Recovery (FLAIR). The data was acquired from 19 different institutions.


We have converted the raw MRI scans to an HDF5 file that you can [download from here](https://drive.google.com/file/d/19EYI_R-0ea5LbRdxettsy1uinJAqjfZy/view?usp=sharing). Download the file and store it in the same data folder as this notebook.

In [13]:
import json
f = open('dataset.json')
data = json.load(f)
data['numTraining']

69696

In [7]:
data_path = 'Task01_BrainTumour.h5'
hdf5_filename = os.path.join(data_path)
with h5py.File(hdf5_filename, "r") as f:
    # List all groups
    print("Keys: %s" % f.keys())
    a_group_key = list(f.keys())[0]

    # Get the data
    data = list(f[a_group_key])

Keys: <KeysViewHDF5 ['description', 'imgs_testing', 'imgs_train', 'imgs_validation', 'license', 'modalities', 'msks_testing', 'msks_train', 'msks_validation', 'name', 'reference', 'release', 'tensorImageSize', 'testing_input_files', 'testing_label_files', 'training_input_files', 'training_label_files', 'validation_input_files', 'validation_label_files']>


In [8]:
df = h5py.File(hdf5_filename, "r")

X_train = df["imgs_train"]
y_train = df["msks_train"]

X_test = df["imgs_testing"]
y_test = df["msks_testing"]

X_valid = df["imgs_validation"]
y_valid = df["msks_validation"]

In [15]:
X_train.shape[0]/data['numTraining'] 

120.79338842975207

In [None]:
# np.save('X_train.npy',X_train)
# np.save('y_train.npy',y_train)

# np.save('X_test.npy',X_test)
# np.save('y_test.npy',y_test)

# np.save('X_valid.npy',X_valid)
# np.save('y_valid.npy',y_valid)

## 2. Characteristics and composition of an MRI scan

## 3. Brain as a 3D Structure

## 4. The Segmentation Problem

---
***

## Additional Resources:

[1] B. H. Menze, A. Jakab, S. Bauer, J. Kalpathy-Cramer, K. Farahani, J. Kirby, et al. "The Multimodal Brain Tumor Image Segmentation Benchmark (BRATS)", IEEE Transactions on Medical Imaging 34(10), 1993-2024 (2015) DOI: 10.1109/TMI.2014.2377694

[2] S. Bakas, H. Akbari, A. Sotiras, M. Bilello, M. Rozycki, J.S. Kirby, et al., "Advancing The Cancer Genome Atlas glioma MRI collections with expert segmentation labels and radiomic features", Nature Scientific Data, 4:170117 (2017) DOI: 10.1038/sdata.2017.117

[3] S. Bakas, M. Reyes, A. Jakab, S. Bauer, M. Rempfler, A. Crimi, et al., "Identifying the Best Machine Learning Algorithms for Brain Tumor Segmentation, Progression Assessment, and Overall Survival Prediction in the BRATS Challenge", arXiv preprint arXiv:1811.02629 (2018)

[4] Dougherty MT, Folk MJ, Zadok E, et al. Unifying Biological Image Formats with HDF5. Commun ACM. 2009;52(10):42-47. [doi:10.1145/1562764.1562781](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3016045/)