<a href="https://colab.research.google.com/github/PadmarajBhat/Machine-Learning/blob/master/BrainTumorClassification.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Detection of 3 Brain Tumors (Meningioma, Glioma and Pituitary) in T1-weighted contrast enhanced images

### - Revisitng the Udacity Capstone Project in pursuit of better accuracy



# What is the problem statement?
  * predict the tumor class given only MRI image
  * OR predict the tumor class when both MRI and Tumor region is given !!!
      * tumor region is identified and put in input dataset by experts
          * can we have Image Segmentation problem ?


  * I think this is the order of problem from easy level to difficult level
    * Identify the tumor class from raw MRI image (here accuracy may be low)
    * Identify the tumor class from raw MRI image + tumor region identified (here accuracy may be better)
    * Auto detect the tumor segment in a MRI image and classify the tumor (ideal application for a radiologist)

    Let us try all the 3 !!!

# Import Packages
* read the input MRI images (.mat) files through ***h5py***
* ***pandas*** for data analysis and preprocessing
* ***tensorflow*** for modelling and predicting

In [1]:
try:
  # %tensorflow_version only exists in Colab.
  %tensorflow_version 2.x
except Exception:
  pass

TensorFlow 2.x selected.


In [0]:
import os
import zipfile
import h5py
import numpy as np

import pandas as pd

from sklearn.preprocessing import MinMaxScaler

from matplotlib import pyplot as plt
from bokeh.io import output_notebook, show
from bokeh.layouts import row
from bokeh.plotting import figure
output_notebook()

import tensorflow as tf
from tensorflow.keras.preprocessing.image import ImageDataGenerator

In [3]:
tf.__version__

'2.0.0-rc1'

# Load Data
* Mount Google Drive
* Unzip it in colab disk
* load mat attributes to list of tuples
* create a panda dataframe for analysis

##### Issues Faced:
1. loading to panda with image took half(6GB) of RAM
* loading tumor along with mri image (as in mat file) crashed the colab
  * Solution: let us load image but save only 5 point summary for both mri image and tumor

2. How do we scale/normalize the data?
  * would tumor region have 0 in it ?
    * only way to know is through the value present in the binary indicator == 1
        * implementation through 2 for loops takes forever !!!
          * need to implement throuhg np.where
        

3. Some images are less than 512
    * pad the difference with 0s.
    
4. Should tumor image be scaled between 0 -1? For now, brightness values are relative to that of the whole image to which it belongs to.

5. Epoch run failed due to no data generated by the custom generator.
  * Going to try the ImageGenerator from the TF.

In [4]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [5]:
!ls /content/drive/'My Drive'/1512427

brainTumorDataPublic_1533-2298.zip  brainTumorDataPublic_767-1532.zip
brainTumorDataPublic_1-766.zip	    cvind.mat
brainTumorDataPublic_2299-3064.zip  README.txt


In [6]:
!ls /content/drive/'My Drive'/1512427/brainTumorDataPublic_1-766.zip

'/content/drive/My Drive/1512427/brainTumorDataPublic_1-766.zip'


### Load Image Array

In [0]:
def retrieveImage(file_name):
  f = h5py.File(file_name,'r')
  mri_image = np.array(f['cjdata']['image'],dtype=np.float64)
  if mri_image.shape[0] < 512:
      print("Shape of the image : ", mri_image.shape)
      mri_image = np.pad(mri_image,(512 - mri_image.shape[0])//2,'constant',constant_values=0)
  return mri_image/mri_image.max()

### Load Tumor Array

In [0]:
def retrieveTumorImage(file_name):
  f = h5py.File(file_name,'r')
  mri_image = np.array(f['cjdata']['tumorMask'],dtype=np.float128)
  if mri_image.shape[0] < 512:
      print("Shape of the image : ", mri_image.shape)
      mri_image = np.pad(mri_image,(512 - mri_image.shape[0])//2,'constant',constant_values=0)
  return mri_image/mri_image.max()

### Load the images from ImageGenerator

In [9]:
!mkdir "data"
!mkdir "data/1"
!mkdir "data/2"
!mkdir "data/3"

mkdir: cannot create directory ‘data’: File exists
mkdir: cannot create directory ‘data/1’: File exists
mkdir: cannot create directory ‘data/2’: File exists
mkdir: cannot create directory ‘data/3’: File exists


In [10]:

import imageio

def loadImageSaveJpg(file_name,label):
  image = retrieveImage(file_name)

  imageio.imwrite("data/"+label+"/"+file_name.split(".")[0]+'.jpg', image)


loadImageSaveJpg("1.mat","1")



### Load Image and Tumor Statistics to Panda

In [0]:
def return_imageInfo_from_mat_file(file_name):
    f = h5py.File(file_name,'r')

    mri_image = np.array(f['cjdata']['image'],dtype=np.float128)
    #scaler = MinMaxScaler(feature_range=(1,2))
    #mri_image = scaler.fit(mri_image)
    mri_image = mri_image/mri_image.max()

    if mri_image.shape[0] < 512:
      print("Shape of the image : ", mri_image.shape)
      mri_image = np.pad(mri_image,(512 - mri_image.shape[0])//2,'constant',constant_values=0)
    
    temp_mri_image = np.copy(mri_image)
    temp_mri_image[temp_mri_image == 0 ] = 2

    mri_quartiles = np.percentile(mri_image[mri_image > 0], [25, 50, 75,80,85,90,95,96,97,98,99])

    tumor_image = np.array(f['cjdata']['tumorMask'], dtype=np.float128)
    if tumor_image.shape[0] < 512:
      print("Shape of the tumor image : ", tumor_image.shape)
      tumor_image = np.pad(tumor_image,(512 - tumor_image.shape[0])//2,'constant',constant_values=0)
    
    tumor_image = temp_mri_image * tumor_image
    tumor_image = tumor_image[tumor_image > 0]
    tumor_image[tumor_image == 2] = 0

    '''tumor_array =[]
    for i in range(0,512):
      for j in range(0,512):
        if tumor_image[i][j]:
          tumor_array.append(mri_image[i][j])

    tumor_image = np.array(tumor_array, dtype=np.float)'''

    tumor_quartiles = np.percentile(tumor_image, [25, 50, 75,80,85,90,95,96,97,98,99])

    label=np.array(f['cjdata']['label'], dtype=np.int)[0][0]
    imageio.imwrite("data/"+str(label)+"/"+file_name.split(".")[0]+'.jpg', np.array(f['cjdata']['image'],dtype=np.uint8))

    return np.array(f['cjdata']['PID'],dtype=np.int)[0][0] \
            ,mri_image.min() \
            ,mri_image.max() \
            ,mri_quartiles[0] \
            ,mri_quartiles[1] \
            ,mri_quartiles[2] \
            ,mri_quartiles[3] \
            ,mri_quartiles[4] \
            ,mri_quartiles[5] \
            ,mri_quartiles[6] \
            ,mri_quartiles[7] \
            ,mri_quartiles[8] \
            ,mri_quartiles[9] \
            ,mri_quartiles[10] \
            ,tumor_image.min() \
            ,tumor_image.max() \
            ,tumor_quartiles[0] \
            ,tumor_quartiles[1] \
            ,tumor_quartiles[2] \
            ,tumor_quartiles[3] \
            ,tumor_quartiles[4] \
            ,tumor_quartiles[5] \
            ,tumor_quartiles[6] \
            ,tumor_quartiles[7] \
            ,tumor_quartiles[8] \
            ,tumor_quartiles[9] \
            ,tumor_quartiles[10] \
            ,tumor_image.shape \
            ,file_name\
            ,np.array(f['cjdata']['label'], dtype=np.int)[0][0] 

In [12]:

mri_col_names = ["mri_min","mri_max","mri_1q","mri_median", "mri_3q","mri_80","mri_85","mri_90","mri_95","mri_96","mri_97","mri_98","mri_99"]
tumor_col_names = ["t_min","t_max","t_1q","t_median","t_3q","t_80","t_85","t_90","t_95","t_96","t_97","t_98","t_99","tumor_size"]
col_names = ["pid"] + mri_col_names + tumor_col_names+ ["file_name","label"]
len(col_names)

30

In [0]:
def loadDf():
  patients_details = []
  for root, dirs, files in os.walk("/content/drive/My Drive/1512427/", topdown = False):
    for f in files:
      if ".zip" in f:
          file = zipfile.ZipFile(root+f, "r")
          for name in file.namelist():
            file.extract(name,".")
            patients_details.append(return_imageInfo_from_mat_file(name))
          #break
      #break                                                              25, 50, 75,80,85,90,95,96,97,98,99
  mri_col_names = ["mri_min","mri_max","mri_1q","mri_median", "mri_3q","mri_80","mri_85","mri_90","mri_95","mri_96","mri_97","mri_98","mri_99"]
  tumor_col_names = ["t_min","t_max","t_1q","t_median","t_3q","t_80","t_85","t_90","t_95","t_96","t_97","t_98","t_99","tumor_size"]
  col_names = ["pid"] + mri_col_names + tumor_col_names+ ["file_name","label"]
  return pd.DataFrame(patients_details,columns=col_names)


In [0]:

tumor_names = ["","Meningioma","Glioma","Pituitary"]

In [0]:

df = loadDf()
df["square_shape"] = df.tumor_size.apply(lambda x: np.sqrt(x[0]))
df.sample(20)

In [0]:
!rm -rf "test"
!mkdir "test"
!mkdir "test/1"
!mkdir "test/2"
!mkdir "test/3"

In [0]:
!ls -l /content/data/2/3046.jpg
!ls -l /content/test

In [0]:
!ls -l /content/data/2/2404.jpg

In [0]:
import shutil
import random

for root, dirs, files in os.walk("/content/data", topdown = False):
  
    
    if len(files) > 0:
      print(root, dirs, files)

      #indices = np.random.randint(0,len(files),size=round(len(files)*.2))
      rand_files = random.choices(files,k=round(len(files)*.2))
      
      for f in rand_files:
        print(f)
        try:
          shutil.move(root+"/"+f, "/content/test/"+root.split("/")[-1]+"/"+f)
        except :
          print("Ignoring : ",f)

#list(os.walk("/content/data")) /content/test

### ImageGenerators

In [0]:
train_datagen = ImageDataGenerator(
        #samplewise_std_normalization=True
        ,horizontal_flip=True
        ,vertical_flip= True)

test_datagen = ImageDataGenerator(samplewise_std_normalization=True)

train_generator = train_datagen.flow_from_directory(
        '/content/data',
        target_size=(512,512),
        batch_size=32,
        class_mode='categorical')


# Analysis


## Statistical Analysis
* Number of patients in the dataset
* Patient wise distribution of tumor classes
* Comparison of below attributes for the 3 tumor classes
  * 1st quantile of MRI image
  * Median of the MRI image
  * 3rd quantile of the MRI image
  * min value distribution of the Tumor
  * max value distribution of the Tumor
  * 1st quantile of the Tumor
  * median of the Tumor
  * 3rd quantile of the Tumor
    * Analysis: 
      * All tumors have darkest area which may indicate the tumor itself
      * All tumors have uniform distribution of brightness (apart from the dark area)
      * MRI images have darker area outside the skull (non scan area)
          * will this influence the model ?
          * should the color of the tumor and the non important area of the MRI scan be different ?
          
* 256x256 image size distribution (any bias in there ?)

##### Issues Faced:
* Bokeh plots are interactive but they consume a lot of space(>100mb) in the notebook 
  * markdown for now, when interested can be seen by enabling it as code cell

In [0]:
df.pid.unique()

There are only 5 patients info present !!!!

In [0]:
df.groupby("pid").agg("count").reset_index()[['pid','mri_min']]

In [0]:
df.groupby(["pid","label"]).agg("count").reset_index()[['pid','label','mri_min']]

In [0]:
df.groupby("label").agg("count").reset_index()[['label','pid']]

In [0]:
def plotStatistics(df, tumor_name):
  df = df[["mri_1q","mri_median","mri_3q","t_min","t_1q","t_median","t_3q","t_max"]]
  df=(df-df.min())/(df.max()-df.min())
  fig, ax = plt.subplots(1, 8,sharex=True,sharey=True,tight_layout=True)
  fig.set_figheight(4)
  fig.set_figwidth(13)
  
  fig.suptitle(tumor_name+" Tumor")
  #plt.subplot(1,8,1)
  ax[0].hist(df.mri_1q.tolist())
  ax[0].set_title("mri_1q")
  #plt.subplot(1,8,2)
  ax[1].hist(df.mri_median.tolist())
  ax[1].set_title("mri_median")
  #plt.subplot(1,8,3)
  ax[2].hist(df.mri_3q.tolist())
  ax[2].set_title("mri_3q")
  #plt.subplot(1,8,4)
  ax[3].hist(df.t_min.tolist())
  ax[3].set_title("t_min")
  #plt.subplot(1,8,5)
  ax[4].hist(df.t_1q.tolist())
  ax[4].set_title("t_1q")
  #plt.subplot(1,8,6)
  ax[5].hist(df.t_median.tolist())
  ax[5].set_title("t_median")
  #plt.subplot(1,8,7)
  ax[6].hist(df.t_3q.tolist())
  ax[6].set_title("t_3q")
  #plt.subplot(1,8,8)
  ax[7].hist(df.t_max.tolist())
  ax[7].set_title("t_max")
  plt.show()

plotStatistics(df[df.label ==1], tumor_names[1])
plotStatistics(df[df.label ==2], tumor_names[2])
plotStatistics(df[df.label ==3], tumor_names[3])
plotStatistics(df, "All")

## Can we reduce the image size ?
* Why?
  * faster model building
  * lower convolution experiment iterations
  * lower ram usage and hence higher batch size



### Approach 1 : Can we segregate the skull ?
  * removing the unwanted area 
    * percentile approach: identify the percentile and see if any of the tumor percentile is always less that the MRI percentile. 
        i.e. to prove mri_99 > t_99. This failed as indicated below
    * brightness based skull identification:
      * nearest neighbor ???

In [0]:
df[df.mri_99 < df.t_99]

In [0]:
plt.imshow(retrieveImage("120.mat"))
plt.show()

### Approach 2: PCA
* Note that if we do the PCA transformation, we 2D image will be reduced to 1D. Therefore, we can not use it for the Convolution approach.
  * we can note here that just by 2 features (components) and the trained PCA model, we are able to recreate the image with not much difference. [See the last 3 images] This showcases the PCA strength.

* Withhelding the PCA approach as we are going to pursuit the Convolution 


In [0]:
import numpy as np
from sklearn.decomposition import PCA
pca = PCA(n_components=5,whiten=True)
image=[]
image.append(retrieveImage("120.mat").reshape(-1))
image.append(retrieveImage("1.mat").reshape(-1))
image.append(retrieveImage("2.mat").reshape(-1))
image.append(retrieveImage("3.mat").reshape(-1))
image.append(retrieveImage("4.mat").reshape(-1))
image.append(retrieveImage("5.mat").reshape(-1))
image.append(retrieveImage("6.mat").reshape(-1))
image.append(retrieveImage("7.mat").reshape(-1))

#print(image.shape)

pca.fit(image)
plt.imshow(pca.mean_.reshape((512,512)),
           cmap=plt.cm.bone)
plt.show()

print(pca.noise_variance_)
print(image[0].reshape((1,-1)).shape)
pca.transform(image[0].reshape((1,-1)))

#plt.imshow(pca.transform(image[1].reshape(1,-1)).reshape((512,512)),cmap=plt.cm.bone)


In [0]:
components = pca.transform(image[0].reshape(1,-1))
projected = pca.inverse_transform(components)
plt.imshow(projected.reshape((512,512)))
plt.show()
plt.imshow(image[0].reshape((512,512)))
plt.show()
plt.imshow(retrieveImage("120.mat"))
plt.show()
print("doe it match :", projected.reshape((512,512)) == retrieveImage("120.mat"))

## Visual Analysis

### Smallest Tumor Sample

plt.imshow(retrieveImage(list(df[df.tumor_size == df.tumor_size.min()]['file_name'])[0]));
plt.imshow(retrieveTumorImage(list(df[df.tumor_size == df.tumor_size.min()]['file_name'])[0]),alpha=0.5);
plt.show()


### Biggest Tumor in the Dataset

plt.imshow(retrieveImage(list(df[df.tumor_size == df.tumor_size.max()]['file_name'])[0]));
plt.imshow(retrieveTumorImage(list(df[df.tumor_size == df.tumor_size.max()]['file_name'])[0]),alpha=0.5);
plt.show()


### Numpy Resize failed

plt.imshow(np.resize(retrieveImage(list(df[df.tumor_size == df.tumor_size.max()]['file_name'])[0]),(256,256)));
plt.show()

### Bokeh Plot

def bokehPlot(file_name, tumor_label):
  tumor_names = ["","Meningioma","Glioma","Pituitary"]
  im = retrieveImage(file_name)
  s1 = figure(width=512, plot_height=512, title=tumor_names[tumor_label]+" MRI Image")
  s1.image([im],x=[0],y=[0],dw=[512],dh=[512])

  im2 = retrieveTumorImage(file_name)

  s2 = figure(width=500, plot_height=500, title=tumor_names[tumor_label]+" MRI Image with Tumor Highlighted")
  s2.image([im2],x=[0],y=[0],dw=[512],dh=[512])
  s2.image([im],x=[0],y=[0],dw=[512],dh=[512],global_alpha=0.5)

  show(row(s1,s2))

bokehPlot(list(df[df.tumor_size == df.tumor_size.max()]['file_name'])[0], list(df[df.tumor_size == df.tumor_size.max()]['label'])[0])

#### Meningioma Plots

for fname in list(df[df.label == 1].sample(3)["file_name"]):
  bokehPlot(fname,1)

#### Glioma Plots

for fname in list(df[df.label == 2].sample(3)["file_name"]):
  bokehPlot(fname,2)

#### Pituitary Plots

for fname in list(df[df.label == 3].sample(3)["file_name"]):
  bokehPlot(fname,3)

# Preprocessing


Preprocessing ideas:

1.  Dataset has tumor region indicator which would allow us to get the average brightness of the area.

2. It is said that brightest region is skull and skull is not important for the tumor detection. It is only brain position determines the tumor class. If we remove skull remaining image is brain ?

3. if we start with a window of image which would maximize the presence of tumor and expand to include some brain region around the tumor then i guess it is the best data for training(and predicting). Because tumor position in brain is THE factor that decides the tumor class.

4. what is the optimum batch size for training?

5. what is the overall Image augumented training dataset size ?



## Train & Test split



In [0]:
def getSplit(df):
  df_test=df.sample(frac=.2)
  df = df.drop(df_test.index)
  return df, df_test

df_orig = df.copy()
df,df_test = getSplit(df)

## Batch Creation

In [0]:
df.groupby("label").agg("count").reset_index()

In [0]:
def returnBatchIndices(df,batch_size):
  label_1 = df[df.label == 1].index.tolist()
  label_2 = df[df.label == 2].index.tolist()
  label_3 = df[df.label == 3].index.tolist()

  label_list = []
  #print(len(label_1), len(label_2),len(label_3),list(range(0,max(len(label_1),len(label_2),len(label_3)),batch_size)))
  for i in range(0,max(len(label_1),len(label_2),len(label_3)),batch_size):
    label_list.append(label_1[i:i+batch_size] + label_2[i:i+batch_size] + label_3[i:i+batch_size])
  return label_list

#yieldbatch(df,5)
for batch in returnBatchIndices(df,5):
  print(batch)

print("Total Number of Batches: ", len(returnBatchIndices(df,5)))

In [0]:
df[df.index == 648]

In [0]:
for i in [1,2,3,4,5]:
  print(i)

### For Convolution

In [0]:
def returnABatch(df,batch_size):
  #returns a balanced label mri images
  index_list = returnBatchIndices(df,batch_size)
  #print("index list",len(list(index_list)))
  df2 = pd.get_dummies(df['label'], prefix = 'label')
  df = pd.concat([df,df2],axis=1)
  for j in index_list:
    batch_images=[]
    batch_labels=[]
    #print("j",j)
    for i in j:
      #print("i",i)
      label_list=[]
      image = retrieveImage(list(df[df.index == i]['file_name'])[0])
      transformed_image = image.reshape((512,512,1))
      batch_images.append(transformed_image)
      label_list.append(df[df.index == i]['label_1'].tolist()[0])
      label_list.append(df[df.index == i]['label_2'].tolist()[0])
      label_list.append(df[df.index == i]['label_3'].tolist()[0])
      batch_labels.append(label_list)
      #print("Batches :",len(batch_images),len(batch_labels))

    #from keras.utils import to_categorical
    #batch_labels = to_categorical(batch_labels)
    yield np.array(batch_images), np.array(batch_labels)

for i in returnABatch(df.reset_index(),2)  :
  if i[1].shape[0] < 6:
    print(len(i),len(i[1]))
    print(i[0].shape)
    print(i[1])
    break

### For Logistic Regression

In [0]:
def returnABatch1d(df,batch_size):
  #returns a balanced label mri images
  index_list = returnBatchIndices(df,batch_size)
  #print("index list",len(list(index_list)))
  df2 = pd.get_dummies(df['label'], prefix = 'label')
  df = pd.concat([df,df2],axis=1)
  for j in index_list:
    batch_images=[]
    batch_labels=[]
    #print("j",j)
    for i in j:
      #print("i",i)
      label_list=[]
      image = retrieveImage(list(df[df.index == i]['file_name'])[0])
      transformed_image = image.reshape(512*512)
      batch_images.append(transformed_image)
      label_list.append(df[df.index == i]['label_1'].tolist()[0])
      label_list.append(df[df.index == i]['label_2'].tolist()[0])
      label_list.append(df[df.index == i]['label_3'].tolist()[0])
      batch_labels.append(label_list)
      #print("Batches :",len(batch_images),len(batch_labels))

    #from keras.utils import to_categorical
    #batch_labels = to_categorical(batch_labels)
    yield np.array(batch_images), np.array(batch_labels)

for i in returnABatch1d(df.reset_index(),2)  :
  if i[1].shape[0] < 6:
    print(len(i),len(i[1]))
    print(i[0].shape)
    print(i[1])
    break

#Model Building


## CNN Approach using Tensorflow keras

In [0]:
model = tf.keras.models.Sequential([
            tf.keras.layers.Conv2D(64, (3,3), activation="relu", input_shape=(512,512,1))
            ,tf.keras.layers.MaxPooling2D(2,2)
            ,tf.keras.layers.Conv2D(64, (3,3), activation="relu")
            ,tf.keras.layers.MaxPooling2D(2,2)
            ,tf.keras.layers.Conv2D(64, (3,3), activation="relu")
            ,tf.keras.layers.MaxPooling2D(2,2)
            ,tf.keras.layers.Conv2D(64, (3,3), activation="relu")
            ,tf.keras.layers.MaxPooling2D(2,2)
            ,tf.keras.layers.Conv2D(64, (3,3), activation="relu")
            ,tf.keras.layers.MaxPooling2D(2,2)
            ,tf.keras.layers.Flatten()
            ,tf.keras.layers.Dropout(0.5)
            ,tf.keras.layers.Dense(512, activation="relu")
            ,tf.keras.layers.Dense(3,activation="softmax")            
])

model.compile(loss="categorical_crossentropy"
              ,optimizer= "adam"
              ,metrics=["accuracy"])
model.summary()

In [47]:
for t  in train_generator:
  print(t.shape)

(array([[[[-1.016162  , -1.016162  , -1.016162  ],
         [-1.016162  , -1.016162  , -1.016162  ],
         [-1.016162  , -1.016162  , -1.016162  ],
         ...,
         [-1.016162  , -1.016162  , -1.016162  ],
         [-1.016162  , -1.016162  , -1.016162  ],
         [-1.016162  , -1.016162  , -1.016162  ]],

        [[-1.016162  , -1.016162  , -1.016162  ],
         [-1.016162  , -1.016162  , -1.016162  ],
         [-1.016162  , -1.016162  , -1.016162  ],
         ...,
         [-1.016162  , -1.016162  , -1.016162  ],
         [-1.016162  , -1.016162  , -1.016162  ],
         [-1.016162  , -1.016162  , -1.016162  ]],

        [[-1.016162  , -1.016162  , -1.016162  ],
         [-1.016162  , -1.016162  , -1.016162  ],
         [-1.016162  , -1.016162  , -1.016162  ],
         ...,
         [-1.016162  , -1.016162  , -1.016162  ],
         [-1.016162  , -1.016162  , -1.016162  ],
         [-1.016162  , -1.016162  , -1.016162  ]],

        ...,

        [[-1.016162  , -1.016162  , -

KeyboardInterrupt: ignored

In [45]:
history=model.fit_generator(train_generator
                  #,steps_per_epoch=286
                  #, epochs=5
                  )

ValueError: ignored

## Logistic Regression using Tensorflow Keras

In [38]:
model = tf.keras.models.Sequential([
            tf.keras.layers.Dense(3,input_dim=512*512, activation="softmax")            
])

model.compile(loss="categorical_crossentropy"
              ,optimizer= "adam"
              ,metrics=["accuracy"])
model.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense (Dense)                (None, 3)                 786435    
Total params: 786,435
Trainable params: 786,435
Non-trainable params: 0
_________________________________________________________________


batch_size = 64
steps_per_epoch = round(df.groupby("label").agg("count").reset_index()['pid'].max()/batch_size)

print("Total Training Dataset : ", df.shape[0])
print("Batch Size : ", batch_size)
print("Steps per epoch : ", steps_per_epoch)
print("Test Datasize shape : ", df_test.shape[0])

history=model.fit_generator(returnABatch1d(df,batch_size)
                  ,steps_per_epoch=steps_per_epoch
                  , epochs=5
                  )

In [0]:
# summarize history for accuracy
plt.plot(history.history['acc'])
plt.plot(history.history['val_acc'])
plt.title('model accuracy')
plt.ylabel('accuracy')
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='upper left')
plt.show()
# summarize history for loss
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('model loss')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='upper left')
plt.show()

# Model Testing
* is the model bias for 256 size images ?
* is there any imbalance in 256 size images ?
* converting 512x512 to 256x256 size would definitely speed up the process but would it impact the accuracy ?
* is the model has better accuracy for any type of tumor class? (as we have imbalanced set ?

# Observations / Lesson Learnt:

* Iteration 1:
  * CNN of 512x512 took half an hour even on TPU
  * more and more convolution layer decreases the neurons required for training( duh!!!) and hence the batch size can be increased.
  * testing result was 49%. Not Acceptable.
  