**Problem Statement: Automated Diagnosis of Diabetic Retinopathy **
**Background**  
Diabetic Retinopathy (DR) is a serious complication of diabetes that damages the retina, potentially leading to vision loss. With the rising prevalence of diabetes, DR has become a significant public health concern. Early detection and timely intervention are vital for preventing severe vision impairment and improving patient outcomes. However, the current methods of diagnosing DR through manual interpretation of retinal images are labor-intensive and prone to human error, making them inefficient and inconsistent.

**Objective**  
The goal is to develop an automated and accurate tool for diagnosing Diabetic Retinopathy using retinal images. This tool should assist healthcare professionals by providing consistent and reliable grading of DR severity, thus facilitating early intervention and personalized treatment plans.

**Dataset Description**  
The dataset comprises a large collection of high-resolution retinal images captured under various imaging conditions. Each image has been assessed by a medical professional, who determined the presence of Diabetic Retinopathy and assigned a binary rating:

- 0: No Diabetic Retinopathy  
- 1: Diabetic Retinopathy  

**Challenges**  
- **Subjectivity and Manual Labor**: Current methods rely on subjective assessments, which can lead to inconsistencies and inefficiencies.  
- **Increasing Prevalence**: The rising number of diabetes cases and the limited availability of ophthalmologists exacerbate the need for timely and accurate screening.  
- **Imaging Variability**: The dataset includes images captured under different conditions, which may affect the consistency of the automated diagnosis.  

**Solution**  
To address these challenges, an automated system utilizing advanced machine learning techniques will be developed. This system aims to:

- Accurately detect and grade Diabetic Retinopathy from retinal images.  
- Provide consistent and reliable results, reducing the dependence on subjective human interpretation.  
- Enable early detection and intervention, improving patient outcomes.  

**Impact**  
Implementing an automated DR diagnosis system will streamline the screening process, reduce the workload on healthcare professionals, and ensure timely and accurate diagnosis. This will ultimately lead to better management of Diabetic Retinopathy and improved vision health for patients with diabetes.

In [17]:
import os
import matplotlib.pyplot as plt
import numpy as np
import tensorflow as tf
from tensorflow.keras import layers
from tensorflow.keras.models import Sequential
from tensorflow.keras import layers, models
from keras.layers import Conv2D, MaxPooling2D, Flatten, Dense, Dropout
from keras.callbacks import EarlyStopping
from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay
from sklearn.model_selection import train_test_split
from pathlib import Path
import random

In [18]:
# Set seeds for reproducibility
tf.random.set_seed(42)
np.random.seed(42)
random.seed(42)

In [19]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [20]:
data_root_path = '/content/drive/My Drive/Diabetic_Retinopathy/train'
total_classes = os.listdir(data_root_path)
print(total_classes)

['DR', 'No_DR']


In [21]:
# Load images and labels
def load_data(data_path):
    images = []
    labels = []
    for label, category in enumerate(total_classes):
        category_path = Path(data_path) / category
        for image_path in category_path.glob('*.jpg'):
            img = tf.keras.preprocessing.image.load_img(image_path, target_size=(224, 224))
            img_array = tf.keras.preprocessing.image.img_to_array(img)
            images.append(img_array)
            labels.append(label)
    return np.array(images), np.array(labels)

In [23]:
X, y = load_data(data_root_path)
print(f'Total images: {len(X)}, Total labels: {len(y)}')

Total images: 2076, Total labels: 2076


In [None]:
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

In [None]:
# Normalize the data
X_train = X_train / 255.0
X_test = X_test / 255.0

In [None]:
# Identify the class label for DR
DR_label = 1

# Filter images belonging to the DR class
DR_indices = np.where(y_train == DR_label)[0]

plt.figure(figsize=(10, 10))
for i in range(3):
  ax = plt.subplot(1, 3, i + 1)
  plt.imshow(X_train[DR_indices[i]])
  plt.title("DR")
  plt.axis("off")

plt.show()

In [None]:
# Identify the class label for NO_DR
NO_DR_label = 0

# Filter images belonging to the NO_DR class
NO_DR_indices = np.where(y_train == NO_DR_label)[0]


plt.figure(figsize=(10, 10))
for i in range(3):
  ax = plt.subplot(1, 3, i + 1)
  plt.imshow(X_train[NO_DR_indices[i]])
  plt.title("NO_DR")
  plt.axis("off")
plt.show()

In [None]:
# Define the CNN model
model = models.Sequential([
    layers.Conv2D(64, (3, 3), activation='relu', input_shape=(224, 224, 3)),
    layers.MaxPooling2D((2, 2)),
    layers.Conv2D(128, (3, 3), activation='relu'),
    layers.MaxPooling2D((2, 2)),
    layers.Conv2D(128, (3, 3), activation='relu'),
    layers.Flatten(),
    layers.Dense(128, activation='relu'),
    layers.Dense(10)
])

In [None]:
# Compile the model
model.compile(optimizer='adam',
              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
              metrics=['accuracy'])

In [None]:
# Train the model
history = model.fit(X_train, y_train, epochs=10, batch_size=32,
                    validation_data=(X_test, y_test))

In [None]:
# Evaluate the model
test_loss, test_acc = model.evaluate(X_test, y_test, verbose=1)
print(f'Test accuracy: {test_acc}')

There is an accuracy of 94.95% of the model predicting accurately an individual with diabetic retinopathy.

In [None]:
# Visualize training history
plt.figure(figsize=(12, 4))

plt.subplot(1, 2, 1)
plt.plot(history.history['accuracy'], label='Training Accuracy')
plt.plot(history.history['val_accuracy'], label='Validation Accuracy')
plt.title('Training and Validation Accuracy')
plt.xlabel('Epoch')
plt.ylabel('Accuracy')
plt.legend()


plt.subplot(1, 2, 2)
plt.plot(history.history['loss'], label='Training Loss')
plt.plot(history.history['val_loss'], label='Validation Loss')
plt.title('Training and Validation Loss')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.legend()

plt.show()

In [None]:
# Predict on the test set
y_pred = model.predict(X_test)
y_pred_classes = np.argmax(y_pred, axis=1)

# Confusion matrix
cm = confusion_matrix(y_test, y_pred_classes)
disp = ConfusionMatrixDisplay(confusion_matrix=cm)
# the plot size
plt.figure(figsize=(12, 12))
disp.plot(cmap=plt.cm.Blues, ax=plt.gca(), xticks_rotation='vertical')

# font sizes
plt.xticks(fontsize=10)
plt.yticks(fontsize=10)
plt.xlabel('Predicted Label', fontsize=12)
plt.ylabel('True Label', fontsize=12)
plt.title('Confusion Matrix', fontsize=14)

The model correctly make 207 prediction when actually there was no diabetic rectinopathy and 188 prediction when there was diabetic retinopathy.
