# Clinical Heart Failure Detection Using Whole-Slide Images of H&E tissue

## Version

- **0.05**: Migrate to Google Colab/Drive, changed loss function to 'binary_crossentropy', epochs to 1 : Train/Val/Test Acc = 55.7/55.9/55.2
- **0.04**: Class Label Info in List was not accurate - fixed it. Now, Train/Validate/Test Accuracy is more realistic = ~55%
- **0.03**: Tweak CNN model done for MNIST to work for this dataset to have a working end to end CNN model. Train/Validate/Test Accuracy = ~100%
- **0.02**: Prepare Train/Validate/Test Labels and Images 
- **0.01**: Prepare Train/Validate/Test Images

## Improvement Opportunity

- Convert code sections in data preparation for train/validation/test to functions
- As this is a 2 class classification - loss function can be changed to binary_crossentropy instead of categorical_crossentropy
- Reduce parameters, epochs.

## Download Dataset

### Download Train, Validate and Test Images
- Source Link to the Dataset / Annotation File: https://idr.openmicroscopy.org/webclient/?show=project-402
- Follow the instructions at following link, install IBM Aspera Desktop Client to download the dataset.
- Copy downloaded folders to '**data/images**' folder in your working directory where you have this Jupyter Notebook:
  - 'held-out_validation'
  - 'training'

### Download Label Information for Train, Validate and Test Images 
- Following link will point to below Github link which has the annotation File: https://idr.openmicroscopy.org/webclient/?show=project-402
- Source Link for the Annotation File: https://github.com/IDR/idr0042-nirschl-wsideeplearning/tree/master/experimentA
- Download and copy file '**idr0042-experimentA-annotation.csv**' to '**data/labels/**' folder in your working directory where you have this Jupyter Notebook

## References

#### Data Preparation
- Access Google Drive files from Google Colab
  - https://www.youtube.com/watch?reload=9&v=lHRC5gFvQnA
- Reading an image
  - mathplotlib: https://stackoverflow.com/questions/9298665/cannot-import-scipy-misc-imread
  - pathlib: https://medium.com/@ageitgey/python-3-quick-tip-the-easy-way-to-deal-with-file-paths-on-windows-mac-and-linux-11a072b58d5f#:~:text=To%20use%20it%2C%20you%20just,for%20the%20current%20operating%20system.
  - OpenCV: https://www.geeksforgeeks.org/python-opencv-cv2-imread-method/
- Load multiple images into a numpy array
  - glob / os.listdir: https://stackoverflow.com/questions/39195113/how-to-load-multiple-images-in-a-numpy-array
  - glob / cv2: https://medium.com/@muskulpesent/create-numpy-array-of-images-fecb4e514c4b
- Load a CSV file
  - Datacamp: https://www.datacamp.com/community/tutorials/pandas-read-csv?utm_source=adwords_ppc&utm_campaignid=1455363063&utm_adgroupid=65083631748&utm_device=c&utm_keyword=&utm_matchtype=b&utm_network=g&utm_adpostion=&utm_creative=278443377095&utm_targetid=dsa-429603003980&utm_loc_interest_ms=&utm_loc_physical_ms=9061994&gclid=EAIaIQobChMIz5TKz-v17QIV1AorCh0bfw96EAAYASAAEgKiGPD_BwE
- Split a String
  - Python Central: https://www.pythoncentral.io/cutting-and-slicing-strings-in-python/


## Understand Dataset

### Understand Images Folder Structure and Number of Images Available

Training/Validation
- \..\training\fold_1: has images for training = 770#
- \..\training\test_fold_1: has images for validation = 374#
- Total = 770 + 374 = 1144 images

Test
- \..\held-out_validation: has images for testing = 1155#

### Understand Annotation File and Label Information Available

Relevant columns of interest:
- Column A: Dataset Name: Classifies each row/instance as 'training' or 'test'
- Column B: Image Name: Specifies filename of the image for the row/instance
- Column Z: Experimental Condition [Diagnosis]: has 3 classes:
  - 'chronic heart failure'
  - 'heart tissue pathology' - We will treat this as 'not chronic heart failure'
  - 'not chronic heart failure'
- Column AA: Channels: mentions RGB => images are color images and will have 3 channels Red/Green/Blue (for CNN). 
  
Breakup of training/test instances:
- training
  - 'chronic heart failure' = 517
  - 'not chronic heart failure' = 627
- test
  - 'chronic heart failure' = 517
  - 'not chronic heart failure' = 638

Total 'training' = 517 + 627 = 1144  (Note: 'validate' is a portion of this 'training' set.)

Total 'test' = 517 + 638 = 1155

## Load Libraries

We need to read 'train, validate and test images' to arrays so that we can then use them to feed to our CNN model. We need to import the annotation file into a dataframe so that we can then access the labels information.

In [None]:
# install OpenCV package - this is required only once
# pip install opencv-python

In [None]:
# aids in reading image files
import cv2
import glob

In [None]:
# aids in working with arrays
import numpy as np

In [None]:
# aids in working with dataframes
import pandas as pd

We need to mount the google drive so that we can then access the files from google drive.

In [None]:
# run this. click on the link it will ask for. get the authentication code. Copy/Paste in the cell. Hit Enter.
from google.colab import drive
drive.mount('/content/gdrive')

Drive already mounted at /content/gdrive; to attempt to forcibly remount, call drive.mount("/content/gdrive", force_remount=True).


## Get labels info into a dataframe

In [None]:
# Google Drive / Colab
filepath_annotation_file = r'/content/gdrive/MyDrive/Colab Notebooks/Clinical Heart Failure using H&E Images /data/labels/idr0042-experimentA-annotation.csv'
labels = pd.read_csv(filepath_annotation_file)

# Local Drive / Jupyter
# labels = pd.read_csv('data/labels/idr0042-experimentA-annotation.csv')

### Explore and Understand

In [None]:
# uncomment & check the contents of labels is as expected
labels

Unnamed: 0,Dataset Name,Image Name,Characteristics [Organism],Term Source 1 REF,Term Source 1 Accession,Characteristics [Organism Part],Term Source 2 REF,Term Source 2 Accession,Characteristics [Diagnosis],Term Source 3 REF,Term Source 3 Accession,Characteristics [Disease Subtype],Term Source 4 REF,Term Source 4 Accession,Characteristics [Sex],Term Source 5 REF,Term Source 5 Accession,Characteristics [Ethnic or Racial Group],Term Source 6 REF,Term Source 6 Accession,Characteristics [Age],Characteristics [Individual],Characteristics [Clinical History],Protocol REF,Protocol REF.1,Experimental Condition [Diagnosis],Channels
0,training,33381_0_fal_10_0.png,Homo sapiens,NCBITaxon,NCBITaxon_9606,heart,UBERON,UBERON_0000948,chronic heart failure,SNOMED,SNOMED_D3-16007,ischemic cardiomyopathy,CVDO,CVDO_0000558,male,PATO,PATO_0000384,African American,SNOMED,SNOMED_S-62310,65 years,33381,ischemic cardiomyopathy,treatment protocol,image acquisition,chronic heart failure,RGB
1,training,33381_0_fal_14_0.png,Homo sapiens,NCBITaxon,NCBITaxon_9606,heart,UBERON,UBERON_0000948,chronic heart failure,SNOMED,SNOMED_D3-16007,ischemic cardiomyopathy,CVDO,CVDO_0000558,male,PATO,PATO_0000384,African American,SNOMED,SNOMED_S-62310,65 years,33381,ischemic cardiomyopathy,treatment protocol,image acquisition,chronic heart failure,RGB
2,training,33381_0_fal_16_0.png,Homo sapiens,NCBITaxon,NCBITaxon_9606,heart,UBERON,UBERON_0000948,chronic heart failure,SNOMED,SNOMED_D3-16007,ischemic cardiomyopathy,CVDO,CVDO_0000558,male,PATO,PATO_0000384,African American,SNOMED,SNOMED_S-62310,65 years,33381,ischemic cardiomyopathy,treatment protocol,image acquisition,chronic heart failure,RGB
3,training,33381_0_fal_18_0.png,Homo sapiens,NCBITaxon,NCBITaxon_9606,heart,UBERON,UBERON_0000948,chronic heart failure,SNOMED,SNOMED_D3-16007,ischemic cardiomyopathy,CVDO,CVDO_0000558,male,PATO,PATO_0000384,African American,SNOMED,SNOMED_S-62310,65 years,33381,ischemic cardiomyopathy,treatment protocol,image acquisition,chronic heart failure,RGB
4,training,33381_0_fal_25_0.png,Homo sapiens,NCBITaxon,NCBITaxon_9606,heart,UBERON,UBERON_0000948,chronic heart failure,SNOMED,SNOMED_D3-16007,ischemic cardiomyopathy,CVDO,CVDO_0000558,male,PATO,PATO_0000384,African American,SNOMED,SNOMED_S-62310,65 years,33381,ischemic cardiomyopathy,treatment protocol,image acquisition,chronic heart failure,RGB
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2294,test,36175_1_nrm_18_0.png,Homo sapiens,NCBITaxon,NCBITaxon_9606,heart,UBERON,UBERON_0000948,not chronic heart failure,SNOMED,SNOMED_F-30001,,,,female,PATO,PATO_0000383,Caucasian,SNOMED,SNOMED_S-0003D,53 years,36175,normal cardiovascular function by cardiac cath...,treatment protocol,image acquisition,not chronic heart failure,RGB
2295,test,36175_1_nrm_1_0.png,Homo sapiens,NCBITaxon,NCBITaxon_9606,heart,UBERON,UBERON_0000948,not chronic heart failure,SNOMED,SNOMED_F-30001,,,,female,PATO,PATO_0000383,Caucasian,SNOMED,SNOMED_S-0003D,53 years,36175,normal cardiovascular function by cardiac cath...,treatment protocol,image acquisition,not chronic heart failure,RGB
2296,test,36175_1_nrm_20_0.png,Homo sapiens,NCBITaxon,NCBITaxon_9606,heart,UBERON,UBERON_0000948,not chronic heart failure,SNOMED,SNOMED_F-30001,,,,female,PATO,PATO_0000383,Caucasian,SNOMED,SNOMED_S-0003D,53 years,36175,normal cardiovascular function by cardiac cath...,treatment protocol,image acquisition,not chronic heart failure,RGB
2297,test,36175_1_nrm_21_0.png,Homo sapiens,NCBITaxon,NCBITaxon_9606,heart,UBERON,UBERON_0000948,not chronic heart failure,SNOMED,SNOMED_F-30001,,,,female,PATO,PATO_0000383,Caucasian,SNOMED,SNOMED_S-0003D,53 years,36175,normal cardiovascular function by cardiac cath...,treatment protocol,image acquisition,not chronic heart failure,RGB


In [None]:
print(labels['Dataset Name'])

0       training
1       training
2       training
3       training
4       training
          ...   
2294        test
2295        test
2296        test
2297        test
2298        test
Name: Dataset Name, Length: 2299, dtype: object


In [None]:
print(labels['Dataset Name'][0])

training


In [None]:
type(labels['Dataset Name'][0])

str

In [None]:
print(labels['Image Name'])

0       33381_0_fal_10_0.png
1       33381_0_fal_14_0.png
2       33381_0_fal_16_0.png
3       33381_0_fal_18_0.png
4       33381_0_fal_25_0.png
                ...         
2294    36175_1_nrm_18_0.png
2295     36175_1_nrm_1_0.png
2296    36175_1_nrm_20_0.png
2297    36175_1_nrm_21_0.png
2298     36175_1_nrm_2_0.png
Name: Image Name, Length: 2299, dtype: object


In [None]:
print(labels['Image Name'][0])

33381_0_fal_10_0.png


In [None]:
type(labels['Image Name'][0])

str

In [None]:
print(labels['Experimental Condition [Diagnosis]'])

0           chronic heart failure
1           chronic heart failure
2           chronic heart failure
3           chronic heart failure
4           chronic heart failure
                  ...            
2294    not chronic heart failure
2295    not chronic heart failure
2296    not chronic heart failure
2297    not chronic heart failure
2298    not chronic heart failure
Name: Experimental Condition [Diagnosis], Length: 2299, dtype: object


In [None]:
print(labels['Experimental Condition [Diagnosis]'][0])

chronic heart failure


In [None]:
type(labels['Experimental Condition [Diagnosis]'][0])

str

In [None]:
# confirm 'no info' cells have been encoded as 'nan'... check one entry
print(labels['Characteristics [Disease Subtype]'][463])

nan


## Prepare Train Images and Train Labels

All the images are of type '*.png'. We will read filepath for all "filenames with extension as 'png'" into a list. Here, filepath means 'relative directory + filename'.

In [None]:
# Google Drive / Colab
filepathlist_train = glob.glob('/content/gdrive/MyDrive/Colab Notebooks/Clinical Heart Failure using H&E Images /data/images/training/fold_1/*.png')

# Local Drive / Jupyter
# filepathlist_train = glob.glob('data/images/training/fold_1/*.png')

We will extract filename of the image from the file path. This filename can then be used to get the label information from the annotation file.

In [None]:
# confirm you have got the total number of desired items in the list
len(filepathlist_train)

770

In [None]:
# check what an element in the filelist contain.
# it has both directory information and the filename, we need to extract filename 
# the filename can then be used to check for the label info in the labels dataframe
filepathlist_train[0]

'/content/gdrive/MyDrive/Colab Notebooks/Clinical Heart Failure using H&E Images /data/images/training/fold_1/33392_0_fal_4_0.png'

Local Drive / Jupyter 

In [None]:
# this scenario has '\\' between the directory and filename
# split the string
# directory, filename = filepathlist_train[0].split('\\')
# gives the directory info
# directory
# gives the filname we need
# filename

Google Drive / Colab

In [None]:
len(filepathlist_train[0])

128

In [None]:
filepathlist_train[0][-19:]

'33392_0_fal_4_0.png'

In [None]:
filepathlist_train[0][len(filepathlist_train[0]) - 1]

'g'

In [None]:
filepathlist_train[0][-1]

'g'

In [None]:
# POC
idx = -1
while (filepathlist_train[0][idx] != '/'):
  idx = idx - 1 
# index currently points to '/' location, we need to start reading from next location to get file name
print(idx)
filename = filepathlist_train[0][idx + 1:]
print(filename)

-20
33392_0_fal_4_0.png


In [None]:
# POC
index_filepathlist = 0
for filepath in filepathlist_train:
    #print(index_filepathlist, filepath)
    index_filepathlist += 1

In [None]:
# read a file using the list containing the file path
img = cv2.imread(filepathlist_train[0])

In [None]:
# define the empty list that need to populated with info
train_images = []
train_labels = []
index_filepathlist = 0
# iterate for all items in the file path list
for filepath in filepathlist_train:
    # prepare image list
    img = cv2.imread(filepath)
    train_images.append(img)
    # prepare labels list
    # extract filename from the file path
    # Local Drive / Jupyter
    # directory, filename = filepath.split('\\')
    # Google Drive / Colab
    index_character = -1
    while (filepathlist_train[index_filepathlist][index_character] != '/'):
      index_character = index_character - 1 
    # character index currently points to '/' location, we need to start reading from next location to get file name
    filename = filepathlist_train[index_filepathlist][index_character + 1:]
    index_filepathlist += 1
    # iterate for all items in our labels dataframe to search for the label
    for index in range(len(labels)):
        # we will compare the filename with all the filenames in the 'Image Name' column of the labels dataframe
        # when there is a match, we will copy the label from the 'Experimental Condition [Diagnosis]' column
        if (filename == labels['Image Name'][index]):
            label = labels['Experimental Condition [Diagnosis]'][index]
            # encode Class1 and Class0 as applicable
            if (label == 'chronic heart failure'):
                label = 1
            elif (label == 'not chronic heart failure'):
                label = 0
            elif (label == 'heart tissue pathology'):
                label = 0
            # append the label to the list
            train_labels.append(label) 

In [None]:
# train_labels

Convert images to numpy arrays and confirm shape is as required for CNN. 

In [None]:
# confirm you have got the total number desired images in the list
len(train_images)

770

In [None]:
# train is a list
type(train_images)

list

In [None]:
# convert list to a numpy array and the values to float
train_images = np.array(train_images, dtype = 'float32')

In [None]:
# check the shape to confirm it is ready for CNN
# number of instances, width, height, number of channels
# number of instances = number of image
# number of channels = 3 ... as these are color images
train_images.shape

(770, 250, 250, 3)

Convert labels to numpy arrays and confirm shape is as required for CNN. 

In [None]:
len(train_labels)

770

In [None]:
type(train_labels)

list

In [None]:
train_labels[0]

1

In [None]:
train_labels[432]

0

In [None]:
# convert list to a numpy array and the values to int64
train_labels = np.array(train_labels, dtype = 'int64')

In [None]:
# check the shape to confirm it is ready for CNN
train_labels.shape

(770,)

In [None]:
len(train_labels)

770

Let us check on the number of Class 0 and Class 1s that we have. 

In [None]:
count_ones = 0
count_zeroes = 0
for i in range(len(train_labels)):
    if (train_labels[i] == 1):
            count_ones += 1
    elif(train_labels[i] == 0):
            count_zeroes += 1
print('Total Labels:',(count_ones + count_zeroes))
print('# of Class 1:',count_ones)   
print('# of Class 0:',count_zeroes)    

Total Labels: 770
# of Class 1: 352
# of Class 0: 418


We will convert the labels to 2bit values: 01 and 10 to correspond to the 2 classes. This is required to match to the model's output layer expectation so that we can effectively train and test. 

In [None]:
from keras.utils import to_categorical

In [None]:
# convert labels to categorical
train_labels = to_categorical(train_labels)

### Prepare Validation Images and Validation Labels

In [None]:
# read filepath for all "filenames with extension as 'png'" into a list
# here filepath means 'relative directory + filename'

# Google Drive / Colab
filepathlist_validation = glob.glob('/content/gdrive/MyDrive/Colab Notebooks/Clinical Heart Failure using H&E Images /data/images/training/test_fold_1/*.png')

# Local Drive / Jupyter
# filepathlist_validation = glob.glob('data/images/training/test_fold_1/*.png')

In [None]:
# define the empty list that need to populated with info
validation_images = []
validation_labels = []
index_filepathlist = 0
# iterate for all items in the file path list
for filepath in filepathlist_validation:
    # prepare image list
    img = cv2.imread(filepath)
    validation_images.append(img)
    # prepare labels list
    # extract filename from the file path
    # Local Drive / Jupyter
    # directory, filename = filepath.split('\\')
    # Google Drive / Colab
    index_character = -1
    while (filepathlist_validation[index_filepathlist][index_character] != '/'):
      index_character = index_character - 1 
    # character index currently points to '/' location, we need to start reading from next location to get file name
    filename = filepathlist_validation[index_filepathlist][index_character + 1:]
    index_filepathlist += 1
    # iterate for all items in our labels dataframe to search for the label
    for index in range(len(labels)):
        # we will compare the filename with all the filenames in the 'Image Name' column of the labels dataframe
        # when there is a match, we will copy the label from the 'Experimental Condition [Diagnosis]' column
        if (filename == labels['Image Name'][index]):
            label = labels['Experimental Condition [Diagnosis]'][index]
            # encode Class1 and Class0 as applicable
            if (label == 'chronic heart failure'):
                label = 1
            elif (label == 'not chronic heart failure'):
                label = 0
            elif (label == 'heart tissue pathology'):
                label = 0
            # append the label to the list
            validation_labels.append(label) 

In [None]:
# convert list to a numpy array and the values to float
validation_images = np.array(validation_images, dtype = 'float32')

In [None]:
# check the shape to confirm it is ready for CNN
validation_images.shape

(374, 250, 250, 3)

In [None]:
# convert list to a numpy array and the values to int64
validation_labels = np.array(validation_labels, dtype = 'int64')

In [None]:
# check the shape to confirm it is ready for CNN
validation_labels.shape

(374,)

Let us check on the number of Class 0 and Class 1s that we have. 

In [None]:
count_ones = 0
count_zeroes = 0
for i in range(len(validation_labels)):
    if (validation_labels[i] == 1):
            count_ones += 1
    elif(validation_labels[i] == 0):
            count_zeroes += 1
print('Total Labels:',(count_ones + count_zeroes))
print('# of Class 1:',count_ones)   
print('# of Class 0:',count_zeroes)   

Total Labels: 374
# of Class 1: 165
# of Class 0: 209


In [None]:
# convert labels to categorical
validation_labels = to_categorical(validation_labels)

### Prepare Test Images and Test Labels

In [None]:
# read filepath for all "filenames with extension as 'png'" into a list
# here filepath means 'relative directory + filename'

# Google Drive / Colab
filepathlist_test = glob.glob('/content/gdrive/MyDrive/Colab Notebooks/Clinical Heart Failure using H&E Images /data/images/held-out_validation/*.png')

# Local Drive / Jupyter
# filepathlist_test = glob.glob('data/images/held-out_validation/*.png')

In [None]:
# define the empty list that need to populated with info
test_images = []
test_labels = []
index_filepathlist = 0
# iterate for all items in the file path list
for filepath in filepathlist_test:
    # prepare image list
    img = cv2.imread(filepath)
    test_images.append(img)
    # prepare labels list
    # extract filename from the file path
    # Local Drive / Jupyter
    # directory, filename = filepath.split('\\')
    # Google Drive / Colab
    index_character = -1
    while (filepathlist_test[index_filepathlist][index_character] != '/'):
      index_character = index_character - 1 
    # character index currently points to '/' location, we need to start reading from next location to get file name
    filename = filepathlist_test[index_filepathlist][index_character + 1:]
    index_filepathlist += 1
    # iterate for all items in our labels dataframe to search for the label
    for index in range(len(labels)):
        # we will compare the filename with all the filenames in the 'Image Name' column of the labels dataframe
        # when there is a match, we will copy the label from the 'Experimental Condition [Diagnosis]' column
        if (filename == labels['Image Name'][index]):
            label = labels['Experimental Condition [Diagnosis]'][index]
            # encode Class1 and Class0 as applicable
            if (label == 'chronic heart failure'):
                label = 1
            elif (label == 'not chronic heart failure'):
                label = 0
            elif (label == 'heart tissue pathology'):
                label = 0
            # append the label to the list
            test_labels.append(label) 

In [None]:
# define the empty list that need to populated with info
#test_images = []
#test_labels = []
# iterate for all items in the file path list
#for filepath in filepathlist_test:
    # prepare image list
#    img = cv2.imread(filepath)
#    test_images.append(img)
    # prepare labels list
    # extract filename from the file path
#    directory, filename = filepath.split('\\')
    # iterate for all items in our labels dataframe to search for the label
#    for index in range(len(labels)):
        # we will compare the filename with all the filenames in the 'Image Name' column of the labels dataframe
        # when there is a match, we will copy the label from the 'Experimental Condition [Diagnosis]' column
#        if (filename == labels['Image Name'][index]):
#            label = labels['Experimental Condition [Diagnosis]'][index]
            # encode Class1 and Class0 as applicable
#            if (label == 'chronic heart failure'):
#                label = 1
#            elif (label == 'not chronic heart failure'):
#                label = 0
#            elif (label == 'heart tissue pathology'):
#                label = 0
            # append the label to the list
#            test_labels.append(label) 

In [None]:
# convert list to a numpy array and the values to float
test_images = np.array(test_images, dtype = 'float32')

In [None]:
# check the shape to confirm it is ready for CNN
test_images.shape

(1155, 250, 250, 3)

In [None]:
# convert list to a numpy array and the values to int64
test_labels = np.array(test_labels, dtype = 'int64')

In [None]:
# check the shape to confirm it is ready for CNN
test_labels.shape

(1155,)

Let us check on the number of Class 0 and Class 1s that we have. 

In [None]:
count_ones = 0
count_zeroes = 0
for i in range(len(test_labels)):
    if (test_labels[i] == 1):
            count_ones += 1
    elif(test_labels[i] == 0):
            count_zeroes += 1
print('Total Labels:',(count_ones + count_zeroes))
print('# of Class 1:',count_ones)   
print('# of Class 0:',count_zeroes)   

Total Labels: 1155
# of Class 1: 517
# of Class 0: 638


In [None]:
# convert labels to categorical
test_labels = to_categorical(test_labels)

## Define Model

In [None]:
from keras import models

In [None]:
from keras import layers

In [None]:
model_cnn = models.Sequential()

Layer Details:
- 2 dimensional Convolution Layer
- Number of filters/kernels = 32
- Filter/Kernel Size = 3x3
- Activation Function = relu (for non-linearity detection)
- Input Shape = 250x250 matrix with 3 channel (as we have a color image)

In [None]:
model_cnn.add(layers.Conv2D(32, (3,3), activation='relu', input_shape=(250,250,3)))

Layer Details:
- Downsample the output from previous layer
- We will take the max value for a every 2x2 window ... moved over the input

In [None]:
model_cnn.add(layers.MaxPooling2D(2,2))

Layer Details:
- 2 dimensional Convolution Layer
- Number of filters/kernels = 64
- Filter/Kernel Size = 3x3
- Activation Function = relu (for non-linearity detection)

In [None]:
model_cnn.add(layers.Conv2D(64, (3,3), activation = 'relu'))

Layer Details:
- Downsample the output from previous layer
- We will take the max value for a every 2x2 window ... moved over the input

In [None]:
model_cnn.add(layers.MaxPooling2D(2,2))

Layer Details:
- 2 dimensional Convolution Layer
- Number of filters/kernels = 64
- Filter/Kernel Size = 3x3
- Activation Function = relu (for non-linearity detection)

In [None]:
model_cnn.add(layers.Conv2D(64, (3,3), activation='relu'))

Data at this stage is in matrix form. We will convert it to vector form to feed to a fully connected network (FCN).

In [None]:
model_cnn.add(layers.Flatten())

We will design for 64 outputs with activation function as relu (to learn non-linearity).

In [None]:
model_cnn.add(layers.Dense(64, activation = 'relu'))

This is the final layer. Hence, the outputs will be 2 corresponding to the 2 classes:
- clinical heart failure = yes: 1
- clinical heart failure = no: 0

Activation Function chosen here is softmax to have a probabilistic output. 

In [None]:
model_cnn.add(layers.Dense(2, activation = 'softmax'))

In [None]:
model_cnn.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d (Conv2D)              (None, 248, 248, 32)      896       
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 124, 124, 32)      0         
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 122, 122, 64)      18496     
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 61, 61, 64)        0         
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 59, 59, 64)        36928     
_________________________________________________________________
flatten (Flatten)            (None, 222784)            0         
_________________________________________________________________
dense (Dense)                (None, 64)                1

## Define Optimizer, Loss Function and Metrics to be used for the Model
- Going ahead with the well known functions at this point in time
- Selected accuracy as the metrics to understand validation / test accuracy of the model

In [None]:
model_cnn.compile(optimizer='rmsprop', loss='binary_crossentropy', metrics=['accuracy'])

## Train and Validate the Model
#### We will now train the model using train images and train labels. 
- We will use a batch size = 10.
- 1 epoch = 770 / 10 = 77 batches
- 1 epoch = 1 complete run of all train samples for training the model
- We will go for a total of 5 epochs = 5 complete run of the all train samples

#### We will validate the model using validation images and validation labels.

In [None]:
model_cnn.fit(train_images, train_labels, epochs = 1, batch_size = 10, validation_data = (validation_images, validation_labels))



<tensorflow.python.keras.callbacks.History at 0x7fa7b005f7b8>

## Test the Model
We will now test model's performance with the test data.
- We predict the class for each of the 1155 test using the model.
- We will check the test accuracy.

In [None]:
test_loss, test_acc = model_cnn.evaluate(test_images, test_labels)



In [None]:
print('test accuracy:', (test_acc*100))

test accuracy: 55.23809790611267
