# **LifeHack 2020- Problem #2**

Vision Impairment can be a serious implication on one’s health. According to the key facts stated by the World Health Organization “Globally, at least 2.2 billion people have a vision impairment or blindness, of whom at least 1 billion have a vision impairment that could have been prevented or has yet to be addressed.” Do you have the ability to better detect potential vision issues and help change thousands, if not, millions of lives? 

For example, the given dataset looks at cataract and normal eye image dataset for cataract detection. Participants are encouraged to use other available datasets to better analyse and share their findings regarding the detection of the potential risk of visual problems detection, preferably relating to the Asia region. Indeed this problem statement covers a wide area of visual impairments and does not need to focus solely on cataract. Participants are free to explore other options of eye disease detection.  

* Present Insight into the data by mean of visualization and/or interactive means
* Prediction model for eye disease 

For reference:
https://www.kaggle.com/jr2ngb/cataractdataset?Vision 

In [23]:
import os
import numpy as np
import shutil
import matplotlib                  # 2D Plotting Library
import matplotlib.pyplot as plt
import seaborn as sns              # Python Data Visualization Library based on matplotlib
from sklearn.metrics import classification_report
#import geopandas as gpd            # Python Geospatial Data Library
plt.style.use('fivethirtyeight')
%matplotlib inline

In [24]:
# Importing all necessary libraries 
from keras.preprocessing.image import ImageDataGenerator 
from keras.preprocessing import image
from keras.models import Model, Sequential 
from keras.layers import Dense, Conv2D, GlobalAveragePooling2D 
from keras.layers import Activation, Dropout, Flatten, Dense 
from keras import backend as K
from keras.optimizers import SGD

from keras.applications import ResNet50

In [4]:
img_width, img_height = 224, 224

In [5]:
#Creating Train / Val / Test folders 
root_dir = 'cataract_dataset_bundle_archive'
normal = '/1_normal'
cataract = '/2_cataract'
glaucoma = '/2_glaucoma'
retina_disease = '/3_retina_disease'

In [17]:
#Creating Train / Val / Test folders 
root_dir = 'cataract_dataset_bundle_archive'
classes = ['/1_normal', '/2_cataract', '/2_glaucoma', '/3_retina_disease']

for class_type in classes:
    os.makedirs(root_dir + '/train' + class_type)
    os.makedirs(root_dir + '/val' + class_type)
    os.makedirs(root_dir + '/test' + class_type)

    src = root_dir + class_type

    allFiles = os.listdir(src)
    np.random.shuffle(allFiles)
    train_Files, val_Files, test_Files = np.split(np.array(allFiles),
                                                          [int(len(allFiles)*0.7), int(len(allFiles)*0.85)])
                                                          
    train_Files = [src+'/'+ name for name in train_Files.tolist()]
    val_Files = [src+'/' + name for name in val_Files.tolist()]
    test_Files = [src+'/' + name for name in test_Files.tolist()]

    print('Class: ', class_type)
    print('Total images: ', len(allFiles))
    print('Training: ', len(train_Files))
    print('Validation: ', len(val_Files))
    print('Testing: ', len(test_Files))

    for name in train_Files:
        shutil.copy(name, root_dir + "/train" + class_type)
    
    for name in val_Files:
        shutil.copy(name, root_dir + "/val" + class_type)
    
    for name in test_Files:
        shutil.copy(name, root_dir + "/test" + class_type)

Class:  /1_normal
Total images:  300
Training:  210
Validation:  45
Testing:  45
Class:  /2_cataract
Total images:  100
Training:  70
Validation:  15
Testing:  15
Class:  /2_glaucoma
Total images:  101
Training:  70
Validation:  15
Testing:  16
Class:  /3_retina_disease
Total images:  100
Training:  70
Validation:  15
Testing:  15


### Pre-processed data is stored in a new directory - "Data"

**Running Resnet** 

In [7]:
train_data_dir = f'data/train'
validation_data_dir = f'data/val'
nb_train_samples = 420 
nb_validation_samples = 90
epochs = 10
batch_size = 32

In [8]:
train_datagen = ImageDataGenerator(rescale=1. / 255,
    shear_range=0.2, zoom_range=0.2, horizontal_flip=True)

test_datagen = ImageDataGenerator(rescale=1. / 255)

train_generator = train_datagen.flow_from_directory(train_data_dir,
    target_size=(img_width, img_height),
    batch_size=batch_size, class_mode='categorical')

validation_generator = test_datagen.flow_from_directory(validation_data_dir,
    shuffle=False,
    target_size=(img_width, img_height),
    batch_size=batch_size, class_mode='categorical')

Found 420 images belonging to 4 classes.
Found 90 images belonging to 4 classes.


In [9]:
base_model = ResNet50(weights='imagenet', include_top=False)

In [10]:
base_model.summary()

                 
__________________________________________________________________________________________________
res4a_branch2a (Conv2D)         (None, None, None, 2 131328      activation_22[0][0]              
__________________________________________________________________________________________________
bn4a_branch2a (BatchNormalizati (None, None, None, 2 1024        res4a_branch2a[0][0]             
__________________________________________________________________________________________________
activation_23 (Activation)      (None, None, None, 2 0           bn4a_branch2a[0][0]              
__________________________________________________________________________________________________
res4a_branch2b (Conv2D)         (None, None, None, 2 590080      activation_23[0][0]              
__________________________________________________________________________________________________
bn4a_branch2b (BatchNormalizati (None, None, None, 2 1024        res4a_branch2b[0][0]      

In [11]:
x = base_model.output
x = GlobalAveragePooling2D()(x)
x = Dense(64, activation='relu')(x)
predictions = Dense(4, activation='softmax')(x)

In [26]:
model = Model(inputs=base_model.input, outputs=predictions)
for layer in base_model.layers: layer.trainable = False
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

In [27]:
%%time
model.fit_generator(train_generator, train_generator.n // batch_size, epochs=5, workers=4,
        validation_data=validation_generator, validation_steps=validation_generator.n // batch_size)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Wall time: 1h 5min 37s


<keras.callbacks.History at 0x2010663ddd8>

ADAM seems to give the best results

Optimizers used - ADAM, SGD with parameters (lr=1e-1, momentum=0.9, decay=1e-1 / epochs) and rmsprop

In [17]:
test_data_dir = f'data/test'

testing_generator = test_datagen.flow_from_directory(test_data_dir,
    shuffle=False,
    target_size=(img_width, img_height),
    batch_size=batch_size, class_mode='categorical')

Found 91 images belonging to 4 classes.


In [25]:
# test model
print("-------------------------------  TESTING MULTI CLASSIFICATION MODEL  ----------------------------------------")
testing_generator.reset()
prediction_index = model.predict_generator(testing_generator, steps=(91 // batch_size) + 1)

# for each image in the testing set we need to find the index of the
# label with corresponding largest predicted probability
prediction_index = np.argmax(prediction_index, axis=1)

# show a nicely formatted classification report
print(classification_report(testing_generator.classes, prediction_index, target_names=testing_generator.class_indices.keys()))

-------------------------------  TESTING MULTI CLASSIFICATION MODEL  ----------------------------------------
                  precision    recall  f1-score   support

        1_normal       0.49      1.00      0.66        45
      2_cataract       0.00      0.00      0.00        15
      2_glaucoma       0.00      0.00      0.00        16
3_retina_disease       0.00      0.00      0.00        15

        accuracy                           0.49        91
       macro avg       0.12      0.25      0.17        91
    weighted avg       0.24      0.49      0.33        91

