# Malaria Detection

## Problem Summary

**The context:** Malaria is a life-threatening disease caused by Plasmodium parasites, transmitted through the bites of infected female Anopheles mosquitoes. It remains prevalent in tropical and subtropical regions, leading to over 500,000 deaths in 2023 alone, with 76% of these fatalities occurring in children under five. While malaria is not directly contagious between individuals, it can spread through blood transfusions or contaminated needles.

Early detection is crucial, as malaria is curable if treated promptly. However, without timely intervention, it can become fatal within 24 hours. Symptoms range from mild—such as fever, chills, and headaches—to severe manifestations like extreme fatigue, confusion, seizures, and respiratory distress.
<br>
**The objectives:** Given the urgency of early diagnosis, this deep learning project aims to develop an efficient and accurate model to identify malaria-infected cells, ultimately supporting faster detection and improving patient outcomes.<br>
**The key questions:**

- Is the data for infected and uninfected cells balanced?

- Which deep learning architecture will be most appropriate for handling classification?

- How will overfitting be handled?

- What performance level should be prioritized?

- What is an appropriate measure of performance level for a medical diagnosis?

**The problem formulation:** Develop a deep learning-based image classification system that can accurately and efficiently identify malaria-infected cells from microscopic images, while ensuring the model is interpretable, unbiased, and scalable for real-world medical applications.

## Data Description

There are a total of 24,958 train and 2,600 test images (colored) that we have taken from microscopic images. These images are of the following categories:<br>


**Parasitized:** The parasitized cells contain the Plasmodium parasite which causes malaria<br>
**Uninfected:** The uninfected cells are free of the Plasmodium parasites<br>

## Import Libraries

In [2]:
# Transformation
import numpy as np
import os
import zipfile
import cv2
import h5py
import random

Defaulting to user installation because normal site-packages is not writeable
Note: you may need to restart the kernel to use updated packages.


ERROR: Could not find a version that satisfies the requirement cv2 (from versions: none)
ERROR: No matching distribution found for cv2


ModuleNotFoundError: No module named 'cv2'

In [3]:
# Visualization
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix, classification_report

In [4]:
# Deep learning
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense, Dropout, BatchNormalization, LeakyReLU
from tensorflow.keras import backend
from tensorflow.keras.callbacks import EarlyStopping, ModelCheckpoint

ModuleNotFoundError: No module named 'tensorflow'

## Import Data

In [None]:
# Define file path
data = h5py.File('/content/drive/MyDrive/PythonCourse/Data/MalariaDetection/data.h5', 'r')

# Read data
with data as f:
  train_data = np.array(f['train_data'])
  train_label = np.array(f['train_label'])
  test_data = np.array(f['test_data'])
  test_label = np.array(f['test_label'])

In [None]:
# Check shape of the data
train_data.shape, test_data.shape

In [None]:
# Check shape of the labels
train_label.shape, test_label.shape

In [None]:
# Normalize the data
train_data = train_data / 255.0
test_data = test_data / 255.0

## Define Functions

In [None]:
# Function for plotting individual images
def plot_image(data, num):

  # Plot the first image
  plt.imshow(data[num])

  # Remove ticks
  plt.xticks([])
  plt.yticks([])

  plt.show()

In [None]:
# Function for plotting an array of images
def plot_image_arr(data, label, num):

  # Define figure size
  plt.figure(figsize=(num, num))

  # Loop through n iterations
  for i in range(num):

    # Define subplots
    plt.subplot(6, 6, i+1)

    # Remove ticks
    plt.xticks([])
    plt.yticks([])

    # Add y labels to plots
    if label[i] == 0:
      plt.ylabel('Uninfected')
    else:
      plt.ylabel('Parasitized')

    # Plot images
    plt.imshow(data[i])
    plt.xlabel(label[i])

    # Plot images
    plt.imshow(data[i], cmap='binary')

In [None]:
# Function for plotting model accuracy and confusion matrix
def evaluate_model(history, model):
  classes = ['uninfected', 'parasitized']
  # Plot accuracy
  plt.plot(history.history['recall'])
  plt.plot(history.history['val_recall'])
  plt.title('Model Performance')
  plt.ylabel('Recall')
  plt.xlabel('Epoch')
  plt.legend(['Train', 'Validation'], loc='upper left')
  plt.show()

  # Plot confusion matrix
  pred = model.predict(test_data)
  pred = np.argmax(pred, axis=1)
  y_true = np.argmax(test_label, axis=1)
  cm = confusion_matrix(y_true, pred)
  cm_labels = np.arange(2)
  sns.heatmap(cm, annot=True, fmt='d', xticklabels=classes, yticklabels=classes)
  plt.title('Confusion Matrix')
  plt.xlabel('Predicted Label')
  plt.ylabel('True Label')
  plt.show()

In [None]:
# Function for converting RGB to HSV for images
def rgb_to_hsv(data):
  arr = []

  for img in data:
    img = (img * 255).astype(np.uint8)
    img_hsv = cv2.cvtColor(img, cv2.COLOR_RGB2HSV)
    arr.append(img_hsv)

  # Convert to numpy array
  arr = np.array(arr)

  return arr

In [None]:
# Function to create blurred images
def blur_image(data):
  blur = cv2.GaussianBlur(data, (5, 5), 0)
  return blur

## Exploritory Data Analysis

### Training Data

In [None]:
# Plot 1st image
plot_image(train_data, 0)

In [None]:
# Plot array of first 12 images
plot_image_arr(train_data, train_label, 12)

**Observations:**

### Testing Data

In [None]:
# Plot 1st image
plot_image(test_data, 0)

In [None]:
# Plot array of first 12 images
plot_image_arr(test_data, test_label, 12)

**Observations:**

### Check Balance of Data

In [None]:
# Plot pie chart for both training and test data
plt.pie(
    [np.sum(train_label == 0), np.sum(train_label == 1)],
    labels=['Uninfected', 'Parasitized'],
    autopct='%1.1f%%'
)

plt.title('Pie Chart for Training Data')
plt.show()

plt.pie(
    [np.sum(test_label == 0), np.sum(test_label == 1)],
    labels=['Uninfected', 'Parasitized'],
    autopct='%1.1f%%'
)

plt.title('Pie Chart for Test Data')
plt.show()

## Data Transformation

In [None]:
# One hot encoding on train and test labels
train_label = to_categorical(train_label)
test_label = to_categorical(test_label)

## Building the Model

In [None]:
# Fix seed
random.seed(13)
np.random.seed(13)
tf.random.set_seed(13)

### Model 1