# **Malaria Detection**

##<b>Problem Definition</b>



**The context:** Why is this problem important to solve?<br>
Malaria is a global health issue that affects millions of people, particularly in regions with limited access to healthcare resources. Early and accurate detection of malaria is crucial for effective treatment and prevention of complications. However, traditional laboratory diagnosis methods for malaria are time-consuming, require specialized expertise, and may suffer from inter-observer variability. By developing an automated system for malaria detection using computer vision and deep learning techniques, we can significantly improve the speed, accuracy, and accessibility of diagnosis. This can lead to earlier detection, prompt treatment, and ultimately save lives.


**The objectives:** What is the intended goal?<br>
The goal of this project is to build a computer vision model using deep learning algorithms that can accurately identify and classify red blood cells as parasitized (infected with the Plasmodium parasite) or uninfected (free of the parasite). The model should provide a reliable and efficient tool for malaria detection, enabling early diagnosis and timely intervention.

**The key questions:** What are the key questions that need to be answered?<br>

1.   How can we leverage machine learning and artificial intelligence techniques to automate malaria detection?
2.   What are the key features or patterns in the images of red blood cells that can distinguish between infected and uninfected cells?
3.   Which deep learning algorithms and computer vision techniques are most suitable for this task?
4.   How can we evaluate the performance of the developed model in terms of accuracy, precision, recall, and other relevant metrics?


**The problem formulation:** What is it that we are trying to solve using data science?
Using data science, we aim to develop an efficient computer vision model that can accurately detect malaria by analyzing images of red blood cells. The model will be trained to differentiate between infected (parasitized) and uninfected red blood cells, based on visual patterns and features present in the images. The goal is to build a robust and reliable system that can automate the detection process, reducing the reliance on manual inspection and improving diagnostic accuracy. By leveraging machine learning and artificial intelligence techniques, we can enable early and accurate malaria diagnosis, particularly in resource-constrained areas where access to skilled healthcare professionals may be limited.




## <b>Data Description </b>

There are a total of 24,958 train and 2,600 test images (colored) that we have taken from microscopic images. These images are of the following categories:<br>


**Parasitized:** The parasitized cells contain the Plasmodium parasite which causes malaria<br>
**Uninfected:** The uninfected cells are free of the Plasmodium parasites<br>

## <b>Start of the Screening </b>

Follow the general steps below to achieve the objective of this project.

In [24]:
!pip install --upgrade pip

Collecting pip
  Downloading pip-23.1.2-py3-none-any.whl (2.1 MB)
     ---------------------------------------- 0.0/2.1 MB ? eta -:--:--
      --------------------------------------- 0.0/2.1 MB 991.0 kB/s eta 0:00:03
     - -------------------------------------- 0.1/2.1 MB 1.3 MB/s eta 0:00:02
     -- ------------------------------------- 0.2/2.1 MB 1.1 MB/s eta 0:00:02
     ---- ----------------------------------- 0.2/2.1 MB 1.3 MB/s eta 0:00:02
     ------- -------------------------------- 0.4/2.1 MB 1.7 MB/s eta 0:00:01
     ---------- ----------------------------- 0.5/2.1 MB 2.0 MB/s eta 0:00:01
     ---------------- ----------------------- 0.9/2.1 MB 2.7 MB/s eta 0:00:01
     ---------------------- ----------------- 1.1/2.1 MB 3.2 MB/s eta 0:00:01
     ---------------------------- ----------- 1.5/2.1 MB 3.6 MB/s eta 0:00:01
     --------------------------------- ------ 1.7/2.1 MB 3.8 MB/s eta 0:00:01
     -------------------------------------- - 2.0/2.1 MB 4.0 MB/s eta 0:00:01
   

ERROR: To modify pip, please run the following command:
C:\Users\DELL\AppData\Local\Programs\Python\Python310\python.exe -m pip install --upgrade pip

[notice] A new release of pip is available: 23.0.1 -> 23.1.2
[notice] To update, run: python.exe -m pip install --upgrade pip


In [7]:
from google.colab import drive
drive.mount('/content/drive')
#import tensorflow as tf
#from tensorflow.keras import models , layers

#dataset = tf.keras.preprocessing.image_dataset_from_directory(
#"cell_images",
#    shuffle=True,
    #image_size = (IMAGE_SIZE,IMAGE_SIZE)
#)

ModuleNotFoundError: No module named 'google.colab'

### <b>Loading libraries</b>

In [5]:
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from sklearn.metrics import classification_report, confusion_matrix


### <b>Load and reformat the data</b>

In [19]:
import zipfile
'''
zip_ref1 = zipfile.ZipFile("/content/drive/MyDrive/test.zip", 'r')
zip_ref1.extractall("/content/drive/MyDrive/test_unzipped")
zip_ref1.close()

zip_ref = zipfile.ZipFile("/content/drive/MyDrive/train.zip", 'r')
zip_ref.extractall("/content/drive/MyDrive/train_unzipped")
zip_ref.close()
'''
train_data_dir = 'E:\\Process\\cell_images\\train'
test_data_dir = 'E:\\Process\\cell_images\\test'

###<b> Check the shape of train and test images

In [21]:
import os
from PIL import Image

# Train data
train_images = os.listdir(train_data_dir)
train_image_path = os.path.join(train_data_dir, train_images[0])  
train_image = Image.open(train_image_path)
train_image_shape = train_image.size
print("Train Image Shape:", train_image_shape)

# Test data
test_images = os.listdir(test_data_dir)
test_image_path = os.path.join(test_data_dir, test_images[0]) 
test_image = Image.open(test_image_path)
test_image_shape = test_image.size
print("Test Image Shape:", test_image_shape)


PermissionError: [Errno 13] Permission denied: 'E:\\Process\\cell_images\\train\\parasitized'

###<b> Check the shape of train and test labels

In [None]:
import numpy as np

# Train labels
train_labels = os.listdir(train_data_dir)
train_labels_shape = len(train_labels)
print("Train Labels Shape:", train_labels_shape)

# Test labels
test_labels = os.listdir(test_data_dir)
test_labels_shape = len(test_labels)
print("Test Labels Shape:", test_labels_shape)


### <b>Check the minimum and maximum range of pixel values for train and test images

In [None]:
# Train images
train_min_value = np.min(train_generator[0][0])
train_max_value = np.max(train_generator[0][0])
print("Train Image Pixel Range: {} - {}".format(train_min_value, train_max_value))

# Test images
test_min_value = np.min(test_generator[0][0])
test_max_value = np.max(test_generator[0][0])
print("Test Image Pixel Range: {} - {}".format(test_min_value, test_max_value))


###<b> Count the number of values in both uninfected and parasitized

In [None]:
# Counting values in uninfected class
uninfected_count = np.sum(train_generator.classes == 0)
print("Count of Uninfected Class:", uninfected_count)

# Counting values in parasitized class
parasitized_count = np.sum(train_generator.classes == 1)
print("Count of Parasitized Class:", parasitized_count)


###<b>Normalize the images

In [None]:
train_datagen = ImageDataGenerator(
    rescale=1./255,
    shear_range=0.2,
    zoom_range=0.2,
    horizontal_flip=True)

test_datagen = ImageDataGenerator(rescale=1./255)

train_generator = train_datagen.flow_from_directory(
    train_data_dir,
    target_size=(img_width, img_height),
    batch_size=batch_size,
    class_mode='binary')

test_generator = test_datagen.flow_from_directory(
    test_data_dir,
    target_size=(img_width, img_height),
    batch_size=batch_size,
    class_mode='binary',
    shuffle=False)

###<b> Plot to check if the data is balanced

In [None]:
# Get the class labels and their corresponding count
class_labels = train_generator.class_indices
class_count = train_generator.classes.sum(axis=0)

# Plotting the class distribution
plt.figure(figsize=(8, 6))
plt.bar(class_labels.keys(), class_count)
plt.title("Class Distribution")
plt.xlabel("Class")
plt.ylabel("Count")
plt.show()


### <b>Data Exploration</b>
Visualize the images from the train data

In [None]:
# Visualize images from train data
images, labels = next(train_generator)

###<b> Visualize the images with subplot(6, 6) and figsize = (12, 12)

In [None]:

# Plotting images
fig, axes = plt.subplots(6, 6, figsize=(12, 12))
axes = axes.ravel()

for i in range(36):
    axes[i].imshow(images[i])
    axes[i].axis('off')

plt.subplots_adjust(hspace=0.5)
plt.show()

###<b> Plotting the mean images for parasitized and uninfected

In [None]:
# Plotting mean images for parasitized and uninfected classes
mean_parasitized_image = np.mean(images[labels == 1], axis=0)
mean_uninfected_image = np.mean(images[labels == 0], axis=0)

plt.figure(figsize=(8, 8))
plt.subplot(1, 2, 1)
plt.imshow(mean_parasitized_image)
plt.title("Mean Parasitized Image")
plt.axis('off')

plt.subplot(1, 2, 2)
plt.imshow(mean_uninfected_image)
plt.title("Mean Uninfected Image")
plt.axis('off')

plt.tight_layout()
plt.show()

### <b>Converting RGB to HSV of Images using OpenCV

###<b> Converting the train data

###<b> Converting the test data

In [None]:
# Converting test data
test_hsv_images = []
for image in test_generator[0][0]:
    hsv_image = cv2.cvtColor(image, cv2.COLOR_RGB2HSV)
    test_hsv_images.append(hsv_image)

test_hsv_images = np.array(test_hsv_images)

###<b> Processing Images using Gaussian Blurring


###<b> Gaussian Blurring on train data

In [None]:
train_blurred_images = []
for image in train_generator[0][0]:
    blurred_image = cv2.GaussianBlur(image, (5, 5), 0)
    train_blurred_images.append(blurred_image)

train_blurred_images = np.array(train_blurred_images)


###<b> Gaussian Blurring on test data

In [None]:
test_blurred_images = []
for image in test_generator[0][0]:
    blurred_image = cv2.GaussianBlur(image, (5, 5), 0)
    test_blurred_images.append(blurred_image)

test_blurred_images = np.array(test_blurred_images)


## **Model Building**

### **Base Model**

**Note:** Build 3-5 models with  CNN architectures. Use custom or pretrained models of your choice. Start with a base model and continue from there

###<b> Importing the required libraries for building and training our base Model

In [None]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense

####<B>One Hot Encoding the train and test labels

In [None]:
# One Hot Encoding the train and test labels
train_labels_encoded = tf.keras.utils.to_categorical(train_generator.classes)
test_labels_encoded = tf.keras.utils.to_categorical(test_generator.classes)

###<b> Building the model

In [None]:
# Building the model
model = Sequential()
model.add(Conv2D(32, (3, 3), activation='relu', input_shape=(img_width, img_height, 3)))
model.add(MaxPooling2D((2, 2)))
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(MaxPooling2D((2, 2)))
model.add(Conv2D(128, (3, 3), activation='relu'))
model.add(MaxPooling2D((2, 2)))
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dense(1, activation='sigmoid'))

###<b> Compiling the model

In [None]:
# Compiling the model
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

<b> Using Callbacks

In [None]:
# Using Callbacks
callbacks = [
    tf.keras.callbacks.EarlyStopping(patience=3, restore_best_weights=True),
    tf.keras.callbacks.ModelCheckpoint('base_model.h5', save_best_only=True)
]

<b> Fit and train our Model

In [None]:
# Fit and train the model
history = model.fit(
    train_generator,
    steps_per_epoch=train_generator.samples // batch_size,
    epochs=10,
    validation_data=test_generator,
    validation_steps=test_generator.samples // batch_size,
    callbacks=callbacks
)

###<b> Evaluating the model on test data

In [None]:
# Evaluating the model on test data
test_loss, test_accuracy = model.evaluate(test_generator)
print(f'Test Loss: {test_loss:.4f}')
print(f'Test Accuracy: {test_accuracy:.4f}')

<b> Plotting the confusion matrix

In [None]:
import seaborn as sns

# Predict the classes for test data
y_pred = model.predict(test_generator)
y_pred = np.argmax(y_pred, axis=1)

# Get the true classes for test data
y_true = test_generator.classes

# Plot the confusion matrix
cm = confusion_matrix(y_true, y_pred)
class_names = ['Uninfected', 'Parasitized']
plt.figure(figsize=(8, 8))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', cbar=False,
            xticklabels=class_names, yticklabels=class_names)
plt.xlabel('Predicted')
plt.ylabel('True')
plt.title('Confusion Matrix')
plt.show()

<b>Plotting the train and validation curves

In [None]:
# Plot the train and validation curves
plt.plot(history.history['accuracy'], label='Train Accuracy')
plt.plot(history.history['val_accuracy'], label='Validation Accuracy')
plt.plot(history.history['loss'], label='Train Loss')
plt.plot(history.history['val_loss'], label='Validation Loss')
plt.xlabel('Epoch')
plt.ylabel('Accuracy/Loss')
plt.title('Train/Validation Curves')
plt.legend()
plt.show()

###<b> Model 1
####<b> Try to improve the performance of our model by tuning hyperparameters or building a different model using other techniques


###<b> Building the Model

In [None]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense, Dropout
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.callbacks import EarlyStopping

# Building the Model
model = Sequential()
model.add(Conv2D(32, (3, 3), activation='relu', input_shape=(img_width, img_height, 3)))
model.add(MaxPooling2D((2, 2)))
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(MaxPooling2D((2, 2)))
model.add(Conv2D(128, (3, 3), activation='relu'))
model.add(MaxPooling2D((2, 2)))
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(1, activation='sigmoid'))

###<b> Compiling the model

In [None]:
# Compiling the model
model.compile(optimizer=Adam(learning_rate=0.001), loss='binary_crossentropy', metrics=['accuracy'])

<b> Using Callbacks

In [None]:
# Using Callbacks
early_stop = EarlyStopping(patience=3, restore_best_weights=True)


<b>Fit and Train the model





In [None]:
# Fit and Train the model
history = model.fit(
    train_generator,
    steps_per_epoch=train_generator.samples // batch_size,
    epochs=10,
    validation_data=test_generator,
    validation_steps=test_generator.samples // batch_size,
    callbacks=[early_stop]
)

###<b> Evaluating the model

In [None]:
# Evaluating the model
loss, accuracy = model.evaluate(test_generator)
print("Test Loss:", loss)
print("Test Accuracy:", accuracy)

<b> Plotting the confusion matrix

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns

# Predict classes for test data
y_pred = model.predict(test_generator)
y_pred_classes = np.round(y_pred)

# Get true classes
y_true = test_generator.classes

# Compute confusion matrix
confusion_mtx = confusion_matrix(y_true, y_pred_classes)

# Plot confusion matrix
plt.figure(figsize=(8, 6))
sns.heatmap(confusion_mtx, annot=True, fmt="d", cmap='Blues')
plt.title('Confusion Matrix')
plt.xlabel('Predicted Label')
plt.ylabel('True Label')
plt.show()

<b> Plotting the train and the validation curves

In [None]:
# Plot training and validation curves
plt.figure(figsize=(10, 5))

# Plot training accuracy
plt.subplot(1, 2, 1)
plt.plot(history.history['accuracy'], label='Train')
plt.plot(history.history['val_accuracy'], label='Validation')
plt.title('Training and Validation Accuracy')
plt.xlabel('Epoch')
plt.ylabel('Accuracy')
plt.legend()

# Plot training loss
plt.subplot(1, 2, 2)
plt.plot(history.history['loss'], label='Train')
plt.plot(history.history['val_loss'], label='Validation')
plt.title('Training and Validation Loss')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.legend()

plt.tight_layout()
plt.show()

###<b> Model 2
####<b> Try to improve the performance of our model by tuning hyperparameters or building a different model using other techniques such as Augmentation, Batch Normalization, etc

###<b> Use image data generator if augmentation is used

In [None]:
# Data augmentation and normalization
train_datagen = ImageDataGenerator(
    rescale=1./255,
    shear_range=0.2,
    zoom_range=0.2,
    horizontal_flip=True)

test_datagen = ImageDataGenerator(rescale=1./255)

# Data generators
train_generator = train_datagen.flow_from_directory(
    train_data_dir,
    target_size=(img_width, img_height),
    batch_size=batch_size,
    class_mode='binary')

test_generator = test_datagen.flow_from_directory(
    test_data_dir,
    target_size=(img_width, img_height),
    batch_size=batch_size,
    class_mode='binary',
    shuffle=False)

####<B>Visualizing images

In [None]:
# Visualizing images
class_names = train_generator.class_indices
num_classes = len(class_names)

sample_images, sample_labels = next(train_generator)

plt.figure(figsize=(10, 10))
for i in range(25):
    plt.subplot(5, 5, i+1)
    plt.imshow(sample_images[i])
    plt.title(class_names[int(sample_labels[i])])
    plt.axis('off')
plt.show()

###<b>Building the Model

In [None]:
# Building the Model
model = Sequential()
model.add(Conv2D(32, (3, 3), activation='relu', input_shape=(img_width, img_height, 3)))
model.add(MaxPooling2D((2, 2)))
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(MaxPooling2D((2, 2)))
model.add(Conv2D(128, (3, 3), activation='relu'))
model.add(MaxPooling2D((2, 2)))
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(BatchNormalization())
model.add(Dense(num_classes, activation='softmax'))

# Compiling the model
model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

<b>Using Callbacks

In [None]:
# Callbacks
callbacks = [
    tf.keras.callbacks.EarlyStopping(patience=3),
    tf.keras.callbacks.ModelCheckpoint('model.h5', save_best_only=True)
]


<b> Fit and Train the model

In [None]:
# Fit and train the model
history = model.fit(
    train_generator,
    steps_per_epoch=train_generator.samples // batch_size,
    epochs=epochs,
    validation_data=test_generator,
    validation_steps=test_generator.samples // batch_size,
    callbacks=callbacks)

###<B>Evaluating the model

In [None]:
# Evaluating the model
test_loss, test_accuracy = model.evaluate(test_generator)
print("Test Loss:", test_loss)
print("Test Accuracy:", test_accuracy)

<b>Plot the train and validation accuracy

In [None]:
# Plotting the train and validation accuracy
plt.plot(history.history['accuracy'])
plt.plot(history.history['val_accuracy'])
plt.title('Model Accuracy')
plt.xlabel('Epoch')
plt.ylabel('Accuracy')
plt.legend(['Train', 'Validation'], loc='upper left')
plt.show()

<B>Plotting the classification report and confusion matrix

In [None]:
# Plotting the classification report and confusion matrix
test_predictions = model.predict(test_generator)
test_pred_labels = np.argmax(test_predictions, axis=1)

test_true_labels = test_generator.classes
class_labels = list(test_generator.class_indices.keys())

classification_report = classification_report(test_true_labels, test_pred_labels, target_names=class_labels)
confusion_mat = confusion_matrix(test_true_labels, test_pred_labels)

print("Classification Report:")
print(classification_report)
print("Confusion Matrix:")
print(confusion_mat)


### **Model 3: Use a Pre-trained model (VGG16) or other**
- Import VGG16 or another pre-trained model upto any layer you choose
- Add Fully Connected Layers on top of it

In [None]:
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
from tensorflow.keras.applications import VGG16
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Dense, Flatten
from tensorflow.keras.optimizers import Adam
from sklearn.metrics import classification_report, confusion_matrix


# Load the pre-trained VGG16 model without the top classification layer
base_model = VGG16(weights='imagenet', include_top=False, input_shape=(img_width, img_height, 3))

# Freeze the weights of the pre-trained layers
for layer in base_model.layers:
    layer.trainable = False

# Add your own fully connected layers on top of the pre-trained base model
x = Flatten()(base_model.output)
x = Dense(128, activation='relu')(x)
x = Dense(num_classes, activation='softmax')(x)

# Create the final model
model = Model(inputs=base_model.input, outputs=x)


###<b>Compiling the model

<b> using callbacks

In [None]:
# Compile the model
model.compile(optimizer=Adam(),
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])
# Callbacks
callbacks = [
    tf.keras.callbacks.EarlyStopping(patience=3),
    tf.keras.callbacks.ModelCheckpoint('model.h5', save_best_only=True)
]

<b>Fit and Train the model

In [None]:
# Fit and train the model
history = model.fit(
    train_generator,
    steps_per_epoch=train_generator.samples // batch_size,
    epochs=epochs,
    validation_data=test_generator,
    validation_steps=test_generator.samples // batch_size,
    callbacks=callbacks)

<b>Plot the train and validation accuracy

In [None]:
# Plotting the train and validation accuracy
plt.plot(history.history['accuracy'])
plt.plot(history.history['val_accuracy'])
plt.title('Model Accuracy')
plt.xlabel('Epoch')
plt.ylabel('Accuracy')
plt.legend(['Train', 'Validation'], loc='upper left')
plt.show()

###**Observations and insights: _____**

*   What can be observed from the validation and train curves?




###<b> Evaluating the model

In [None]:
# Evaluating the model
test_loss, test_accuracy = model.evaluate(test_generator)
print("Test Loss:", test_loss)
print("Test Accuracy:", test_accuracy)

<b>Plotting the classification report and confusion matrix

In [None]:
# Plotting the classification report and confusion matrix
test_predictions = model.predict(test_generator)
test_pred_labels = np.argmax(test_predictions, axis=1)

test_true_labels = test_generator.classes
class_labels = list(test_generator.class_indices.keys())

classification_report = classification_report(test_true_labels, test_pred_labels, target_names=class_labels)
confusion_mat = confusion_matrix(test_true_labels, test_pred_labels)

print("Classification Report:")
print(classification_report)
print("Confusion Matrix:")
print(confusion_mat)

####<b> Observations and Conclusions drawn from the final model: _____








**Improvements that can be done:**<br>


*  **Can the model performance be improved using other pre-trained models or different CNN architecture?**


*  **Use other filtering or other techniques to improve performance, if you would like**



#### **Insights**

####**Refined insights**:
- What are the most meaningful insights from the data relevant to the problem?


####**Comparison of various techniques and their relative performance**:
- How do different techniques perform? Which one is performing relatively better? Is there scope to improve the performance further?


####**Proposal for the final solution design**:
- What model do you propose to be adopted? Why is this the best solution to adopt?
