<a href="https://colab.research.google.com/github/PadmarajBhat/Machine-Learning/blob/master/BrainTumorClassification.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Detection of 3 Brain Tumors (Meningioma, Glioma and Pituitary) in T1-weighted contrast enhanced images

### - Revisitng the Udacity Capstone Project in pursuit of better accuracy



# Import Packages
* read the input MRI images (.mat) files through ***h5py***
* ***pandas*** for data analysis and preprocessing
* ***tensorflow*** for modelling and predicting

In [0]:
import os
import zipfile
import h5py
import numpy as np
from matplotlib import pyplot as plt
import pandas as pd

# Load Data
* Mount Google Drive
* Unzip it in colab disk
* load mat attributes to list of tuples
* create a panda dataframe for analysis

##### Issues Faced:
* loading to panda with image took half(6GB) of RAM
* loading tumor along with mri image as in mat file crashed the colab
  * Solution: let us load image but save only 5 point summary for both mri image and tumor

In [3]:
from google.colab import drive
drive.mount('/content/drive')

Go to this URL in a browser: https://accounts.google.com/o/oauth2/auth?client_id=947318989803-6bn6qk8qdgf4n4g3pfee6491hc0brc4i.apps.googleusercontent.com&redirect_uri=urn%3Aietf%3Awg%3Aoauth%3A2.0%3Aoob&scope=email%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdocs.test%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdrive%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdrive.photos.readonly%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fpeopleapi.readonly&response_type=code

Enter your authorization code:
··········
Mounted at /content/drive


In [4]:
!ls /content/drive/'My Drive'/1512427

brainTumorDataPublic_1533-2298.zip  brainTumorDataPublic_767-1532.zip
brainTumorDataPublic_1-766.zip	    cvind.mat
brainTumorDataPublic_2299-3064.zip  README.txt


In [5]:
!ls /content/drive/'My Drive'/1512427/brainTumorDataPublic_1-766.zip

'/content/drive/My Drive/1512427/brainTumorDataPublic_1-766.zip'


In [0]:
def return_imageInfo_from_mat_file(file_name):
    f = h5py.File(file_name,'r')

    mri_image = np.array(f['cjdata']['image'],dtype=np.int)
    mri_quartiles = np.percentile(mri_image, [25, 50, 75])

    tumor_image = mri_image * np.array(f['cjdata']['tumorMask'], dtype=np.int)
    tumor_quartiles = np.percentile(tumor_image, [25, 50, 75])

    return np.array(f['cjdata']['PID'],dtype=np.int)[0][0] \
            ,mri_image.min() \
            ,mri_image.max() \
            ,mri_quartiles[1] \
            ,mri_quartiles[0] \
            ,mri_quartiles[2] \
            ,tumor_image.min() \
            ,tumor_image.max() \
            ,tumor_quartiles[1] \
            ,tumor_quartiles[0] \
            ,tumor_quartiles[2] \
            ,np.array(f['cjdata']['label'], dtype=np.int)[0][0] 

In [0]:
def loadDf():
  patients_details = []
  for root, dirs, files in os.walk("/content/drive/My Drive/1512427/", topdown = False):
    for f in files:
      if ".zip" in f:
          file = zipfile.ZipFile(root+f, "r")
          for name in file.namelist():
            file.extract(name,".")
            patients_details.append(return_imageInfo_from_mat_file(name))
          #break
      #break
  mri_col_names = ["mri_min","mri_max","mri_median","mri_1q", "mri_3q"]
  tumor_col_names = ["t_min","t_max","t_median","t_1q","t_3q"]
  col_names = ["pid"] + mri_col_names + tumor_col_names+ ["label"]
  return pd.DataFrame(patients_details,columns=col_names)


In [30]:
tumor_names = ["","meningioma","glioma","pituitary"]
df = loadDf()
df.sample(20)

Unnamed: 0,pid,mri_min,mri_max,mri_median,mri_1q,mri_3q,t_min,t_max,t_median,t_1q,t_3q,label
535,49,0,2031,363.0,35.0,689.0,0,1498,0.0,0.0,0.0,1
2648,49,0,3255,216.0,0.0,991.0,0,2409,0.0,0.0,0.0,3
2344,49,0,578,144.0,25.0,199.0,0,578,0.0,0.0,0.0,3
2298,49,0,951,41.0,21.0,241.0,0,531,0.0,0.0,0.0,3
1574,77,0,4195,128.0,31.0,1058.0,0,1249,0.0,0.0,0.0,2
1140,77,0,7093,96.0,0.0,1509.0,0,3256,0.0,0.0,0.0,2
1094,77,0,4978,73.0,0.0,1290.0,0,1837,0.0,0.0,0.0,2
453,49,0,3177,411.0,43.0,1237.0,0,3059,0.0,0.0,0.0,1
2868,57,0,2606,339.0,33.0,856.0,0,1703,0.0,0.0,0.0,2
145,49,0,3760,47.0,0.0,884.0,0,2179,0.0,0.0,0.0,1


# Analysis


In [0]:
df.pid.unique()

There are only 5 patients info present !!!!

In [0]:
df.groupby("pid").agg("count")

In [0]:
df.groupby(["pid","label"]).agg("count").reset_index()

In [0]:
df.groupby("label").agg("count")

# Preprocessing


Preprocessing ideas:

1.  Dataset has tumor region indicator which would allow us to get the average brightness of the area.

2. It is said that brightest region is skull and skull is not important for the tumor detection. It is only brain position determines the tumor class. If we remove skull remaining image is brain ?

3. if we start with a window of image which would maximize the presence of tumor and expand to include some brain region around the tumor then i guess it is the best data for training(and predicting). Because tumor position in brain is THE factor that decides the tumor class.