<a href="https://colab.research.google.com/github/amitgal21/Final_Project/blob/main/Identify_Gram_Pos%26Neg.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

At this stage, we are training our model on 80% of the dataset, and 20% is reserved for validation. Through these segmentation operations, we are essentially training and predicting images of bacteria that were stained using Gram staining.

Using the VGG16 model, we train our model on our dataset to distinguish between positive and negative Gram bacteria. The learning process is computationally intensive for personal computers, so we use Google Colab and allocate a sum of money to Google Colab to gain additional computational power.

This allows us to perform the training process, during which we can train the system for approximately 3 hours, instead of an entire day on a regular computer.


In [None]:
from google.colab import drive
drive.mount('/content/drive')
# Connect to Google Drive for Using Our DataSet Of Bacteria Images

script is aimed at training a Convolutional Neural Network (CNN) model for classifying images of purple and red bacteria using transfer learning with the VGG16 architecture.

Here's a detailed explanation of each part of the code:

Importing necessary libraries: The script starts by importing required libraries for its operation, including NumPy for data processing, os for file system navigation, PIL for image processing, scikit-learn for data operations, Matplotlib for plotting graphs, and TensorFlow for model development and training.

Image preprocessing functions: Defines a function named preprocess_image for image preprocessing, and another function named load_tiff_images_from_folder for loading images from directories and assigning labels based on their folder names.

Paths to datasets: Defines paths to directories containing images of Positive and Negative Gram Staining photos.

Loading and labeling images: Utilizes the functions defined earlier to load images from the specified directories, preprocess them, and assign labels accordingly.

Combining datasets: Merges the loaded images and labels into sequential arrays using NumPy.

Splitting dataset into training and validation sets: Splits the data into training and validation sets using scikit-learn's train_test_split function.

Loading VGG16 model: Loads the VGG16 model pre-trained on the ImageNet dataset without its top layers.

Freezing base model layers: Freezes the layers of the pre-trained VGG16 model to prevent them from being updated during training.

Adding new layers: Adds new dense layers for binary classification on top of the VGG16 base model.

Compiling the model: Configures the model for training with an Adam optimizer and binary cross-entropy loss function.

Setting up Data Generator: Creates an ImageDataGenerator for data augmentation during training.

Preparing Data Generator for training: Configures the data generator for training using the flow method.

Training the model: Fits the model to the training data with the specified number of epochs, using the data generator for augmented images.

Saving the model: Saves the trained model to a file for future use.

This script essentially performs transfer learning using the VGG16 model to classify images of Positive and Negative Gram Staining photos

In [None]:
# Importing necessary libraries
import numpy as np
import os
from PIL import Image
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt
from tensorflow.keras.applications.vgg16 import VGG16
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Dense, Flatten, Dropout
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.optimizers import Adam

# Function to preprocess image
def preprocess_image(img):
    img = img.resize((224, 224))  # Resize the image to match VGG16 input size
    img = np.array(img)
    if img.ndim == 2:  # Convert grayscale to RGB if necessary
        img = np.stack((img,)*3, axis=-1)
    img = img / 255.0  # Normalize the image
    return img

# Function to load images from a folder
def load_tiff_images_from_folder(folder, label):
    images = []
    labels = []
    for filename in os.listdir(folder):
        img_path = os.path.join(folder, filename)
        if img_path.lower().endswith('.tif'):
            img = Image.open(img_path)
            img = preprocess_image(img)
            images.append(img)
            labels.append(label)
    return images, labels

# Paths to datasets
train_path_purple = '/content/drive/My Drive/Part_B/DataSet_Stage1/Purple_Bacteria'
train_path_red = '/content/drive/My Drive/Part_B/DataSet_Stage1/Red_Bacteria'

# Loading and labeling images
purple_images, purple_labels = load_tiff_images_from_folder(train_path_purple, 0)
red_images, red_labels = load_tiff_images_from_folder(train_path_red, 1)

# Combining datasets and converting to numpy arrays
all_images = np.array(purple_images + red_images)
all_labels = np.array(purple_labels + red_labels)

# Splitting dataset into training and validation sets
X_train, X_val, y_train, y_val = train_test_split(all_images, all_labels, test_size=0.2, random_state=42)

# Loading VGG16 model without top layers
base_model = VGG16(weights='imagenet', include_top=False, input_shape=(224, 224, 3))

# Freezing the weights of the base model
for layer in base_model.layers:
    layer.trainable = False

# Adding new layers to the model
x = Flatten()(base_model.output)
x = Dense(4096, activation='relu')(x)
x = Dropout(0.5)(x)
x = Dense(4096, activation='relu')(x)
x = Dropout(0.5)(x)
predictions = Dense(1, activation='sigmoid')(x)  # Binary classification layer

# Creating the new model
model = Model(inputs=base_model.input, outputs=predictions)

# Compiling the model
model.compile(optimizer=Adam(lr=0.0001), loss='binary_crossentropy', metrics=['accuracy'])

# Setting up the Data Generator
data_gen = ImageDataGenerator(rotation_range=20, width_shift_range=0.2, height_shift_range=0.2, horizontal_flip=True)

# Preparing the Data Generator for training
train_gen = data_gen.flow(X_train, y_train, batch_size=32)

# Training the model
model.fit(train_gen, epochs=10, validation_data=(X_val, y_val), steps_per_epoch=len(X_train) // 32)

# Saving the model
model.save('/content/drive/My Drive/trained_vgg16_model.h5')

The code loads a pre-trained model that has already been trained to recognize red and purple bacteria. Then, it goes through new images of bacteria, processes each image, and predicts which type of bacteria is in the image - either Gram Positive and Gram Negative images.
Positive Are Red and Negative are Purple.



In [None]:
# Step 3.1: Load the Trained Model
from keras.models import load_model
from keras.preprocessing.image import load_img, img_to_array
import os
import numpy as np

# Load the trained model
model_path = '/content/drive/My Drive/trained_vgg16_model.h5'
model = load_model(model_path)

# Modified Second preprocess_image Function
def preprocess_image(image, target_size=(224, 224)):
    img = load_img(image, target_size=target_size)  # Load and resize the image
    img = np.array(img)
    if img.ndim == 2:  # Convert grayscale to RGB if necessary
        img = np.stack((img,) * 3, axis=-1)
    img = img / 255.0  # Normalize the image
    img = np.expand_dims(img, axis=0)
    return img

# Define the path to the verify images
verify_path = '/content/drive/MyDrive/Part_B/DataSet_Stage1/Red_Veirfy_test'

# Read each file in the verify directory and predict
for filename in os.listdir(verify_path):
    if filename.endswith('.tif'): # Check the file extension
        image_path = os.path.join(verify_path, filename)
        new_image = preprocess_image(image_path)

        # Step 3.3: Use the Model to Predict the Class of the New Image
        prediction = model.predict(new_image)
        predicted_class = 'Purple' if prediction[0][0] < 0.5 else 'Red'
        print(f"File: {filename}, Predicted class: {predicted_class}")