# Problem Statement
TeamId: PTID-CDS-NOV-24-2192<br>
ProjectID: PRCP- 1001- RiceLeaf disease detection

The cultivation and harvesting of rice involves several processes, similar to those of other crops. However, rice crops are susceptible to diseases, which can be identified through their leaves. This notebook demonstrates how Convolutional Neural Networks (CNNs) can be utilized to detect diseases by training on pre-captured images of rice leaves, with a primary focus on bacterial damage, brown spot, and leaf smut.

In [None]:
filepath="/Data.zip"
# filepath="zip only"
from zipfile import ZipFile
try:
  with ZipFile(filepath,'r') as zip:
    zip.extractall()
    print('Done')
except FileNotFoundError:
  print('File Not Found')

In [None]:
import numpy as np
import tensorflow as tf
import matplotlib.pyplot as plt
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense
from tensorflow.keras.preprocessing.image import load_img, img_to_array
from tensorflow.keras.preprocessing.image import ImageDataGenerator as dg


# Structured data **preparation**

## Preparation for Traing Dataset

In [None]:
path_for_training_data = "/content/Data/tRaning_Data"
path_for_validation_data="/content/Data/vAlidation_data"

In [None]:
train_Datagen=dg(rescale=1./255, #To scale from pixal 0 to 1
                shear_range=0.2,
                zoom_range=0.2,
                horizontal_flip=True,
                preprocessing_function= lambda img : 255 - img ) #negative Image For Negative Gradiant


training_set = train_Datagen.flow_from_directory(path_for_training_data,
                                                 target_size=(64,64),
                                                 batch_size=32,
                                                 class_mode='categorical')

class_labels = list(training_set.class_indices.keys()) # get class labels from training_set
print(class_labels)

## **Preparation for validation dataset**

In [None]:
test_datagen = dg(rescale = 1./255)



validation_dataset = test_datagen.flow_from_directory(path_for_validation_data,
                                                 target_size=(64,64),
                                                 batch_size=32,
                                                 class_mode='categorical')

# **Building A CNN network with very Basic construction**

In [None]:
seq = Sequential()
#input Layer or First Layer in CNN
seq.add(Conv2D(filters=32,kernel_size=(3,3),activation='relu',input_shape=[64,64,3]))
seq.add(MaxPool2D(pool_size=(2,2),strides=1))



## **Flattening**
### Above this block ***CNN*** and Below this block ***ANN*** part

In [None]:
seq.add(Flatten())

## **ANN**

In [None]:
# First layer on ANN
seq.add(Dense(units=128,activation='relu'))

# output Layer
seq.add(Dense(units=3,activation='softmax'))

# set Compiler
seq.compile(optimizer='adam' , loss='categorical_crossentropy' , metrics=['accuracy'])

# Simple **CNN** Architecture

In [None]:
def simple_Cnn():
  seq = tf.keras.models.Sequential()
  seq.add(Conv2D(filters=32,kernel_size=(3,3),activation='relu',input_shape=[64,64,3]))
  seq.add(MaxPooling2D(pool_size=(2,2),strides=1))
  seq.add(Flatten())
  seq.add(tf.keras.layers.Dense(units=128,activation='relu'))
  seq.add(tf.keras.layers.Dense(units=3,activation='softmax'))
  seq.compile(optimizer='adam' , loss='categorical_crossentropy' , metrics=['accuracy'])
  return seq



In [None]:
modelCnn = seq.fit(x=training_set,validation_data=validation_dataset,epochs=23)

In [None]:

import os

def iterate_and_predict(data_dir, datagen, model):
  for root, dirs, files in os.walk(data_dir):
    for file in files:
      if file.lower().endswith(('.png', '.jpg', '.jpeg')):  # Check for image files

        image_path = os.path.join(root, file)
        # Load and preprocess the image
        img = tf.keras.preprocessing.image.load_img(image_path, target_size=(64, 64))
        img_array = tf.keras.preprocessing.image.img_to_array(img)
        img_array = img_array / 255.0
        img_array = tf.expand_dims(img_array, 0)

        # Make prediction
        predictions = model.predict(img_array)
        # Process the prediction
        predicted_class = tf.argmax(predictions, axis=1)  # Get class index with highest probability
        class_labels = list(training_set.class_indices.keys()) # get class labels from training_set
        predicted_label = class_labels[predicted_class[0]]


        print(f"File: {file} || Predicted Class: || {predicted_label} || Predictions: {predictions}")


AlienDatapath= "/content/Data/Alien_data"
iterate_and_predict(AlienDatapath, test_datagen,modelCnn.model)

# Modefied VGG-16 Architecture

## Key Characteristics:

Depth: VGG-16 is characterized by its depth, consisting of 16 layers, including 13 convolutional layers and 3 fully connected layers.
Uniformity: The architecture features a stack of convolutional layers followed by max-pooling layers, with progressively increasing depth. This uniform design makes it relatively simple to understand and implement.   
Small Filter Sizes: VGG-16 uses small 3x3 convolutional filters throughout the network, which has been shown to be more effective than larger filters.
Advantages:

Strong Performance: VGG-16 has demonstrated strong performance on various computer vision tasks, including image classification and object recognition.
Simplicity: The uniform architecture and small filter sizes make VGG-16 relatively easy to understand and implement.
Transfer Learning: Pre-trained VGG-16 models can be used as a starting point for transfer learning, which can significantly improve the performance of models on new tasks with limited data.
Disadvantages:

Computational Cost: The depth of VGG-16 and the large number of parameters can make it computationally expensive to train and use.
Memory Requirements: The large number of parameters also requires significant memory resources.
Vanishing Gradients: The deep architecture can be prone to vanishing gradients during training, which can slow down the learning process.

In [None]:
def create_vgg16():
    model = Sequential()
    # just try with image 128 and w'll see the gradient vanishing on marked point
    model.add(Conv2D(32, (3, 3), activation='relu', input_shape=(224, 224, 3)))
    model.add(Conv2D(32, (3, 3), activation='relu'))
    model.add(MaxPooling2D(pool_size=(2, 2)))
    model.add(Conv2D(64, (3, 3), activation='relu'))
    model.add(Conv2D(64, (3, 3), activation='relu'))
    model.add(MaxPooling2D(pool_size=(2, 2)))
    model.add(Conv2D(128, (3, 3), activation='relu'))
    model.add(Conv2D(128, (3, 3), activation='relu'))
    model.add(Conv2D(128, (3, 3), activation='relu'))
    model.add(MaxPooling2D(pool_size=(2, 2)))
    model.add(Conv2D(256, (3, 3), activation='relu'))
    model.add(Conv2D(256, (3, 3), activation='relu'))
    model.add(Conv2D(256, (3, 3), activation='relu'))
    model.add(MaxPooling2D(pool_size=(2, 2)))
    model.add(Conv2D(512, (3, 3), activation='relu'))
    model.add(Conv2D(512, (3, 3), activation='relu')) # vanishing point
    model.add(Conv2D(512, (3, 3), activation='relu'))
    model.add(MaxPooling2D(pool_size=(2, 2)))
    model.add(Flatten())
    model.add(Dense(1024, activation='relu'))
    model.add(Dense(2048, activation='relu'))
    model.add(Dense(3, activation='softmax'))
    model.compile(optimizer='adam' , loss='categorical_crossentropy' , metrics=['accuracy'])
    return model


# Modefied AlexNet Architecture

### Key Characteristics:

Depth: AlexNet consists of eight layers, including five convolutional layers and three fully connected layers.
ReLU Activation: AlexNet employed the ReLU (Rectified Linear Unit) activation function, which addressed the vanishing gradient problem and improved training efficiency.
GPU Acceleration: AlexNet was one of the first deep learning models to leverage the power of GPUs for training, significantly accelerating the training process.
Advantages:

State-of-the-Art Performance: AlexNet achieved groundbreaking performance on the ImageNet dataset, significantly surpassing previous methods.
ReLU Activation: The use of ReLU activation function improved training speed and accuracy compared to traditional activation functions like sigmoid and tanh.
GPU Acceleration: The utilization of GPUs for training enabled faster and more efficient training of deep neural networks.
Disadvantages:

Large Model Size: AlexNet is a relatively large model with a significant number of parameters, which can make it computationally expensive to train and deploy.
Overfitting: AlexNet can be prone to overfitting, especially when trained on smaller datasets. Techniques like data augmentation and dropout were used to mitigate this issue.
Limited Applicability to Small Datasets: Due to its large size and complexity, AlexNet may not be suitable for training on smaller datasets, as it may require a large amount of data to avoid overfitting.

In [None]:
from tensorflow.keras.layers import Dropout
def create_alexnet():
    model = Sequential()
    model.add(Conv2D(128, (11, 11), activation='relu', input_shape=(128, 128, 3)))
    model.add(MaxPooling2D(pool_size=(3, 3), strides=2))
    model.add(Conv2D(16, (5, 5), activation='relu'))
    model.add(MaxPooling2D(pool_size=(3, 3), strides=2))
    model.add(Conv2D(32, (3, 3), activation='relu'))
    model.add(Conv2D(32, (3, 3), activation='relu'))
    model.add(Conv2D(128, (3, 3), activation='relu'))
    model.add(MaxPooling2D(pool_size=(3, 3), strides=2))
    model.add(Flatten())
    model.add(Dense(512, activation='relu'))
    model.add(Dropout(0.5))
    model.add(Dense(1024, activation='relu'))
    model.add(Dropout(0.3))
    model.add(Dense(3, activation='softmax'))
    model.compile(optimizer='adam' , loss='categorical_crossentropy' , metrics=['accuracy'])
    return model


# Model Creation and summary

In [None]:
# Modefied VGG-16 Architecture
model_Alex = create_alexnet()
model_Alex.summary()

# Modefied VGG-16 Architecture
model_vgg16 = create_vgg16()
model_vgg16.summary()

# Simple **CNN** Architecture
model_simple_CNN = simple_Cnn()
model_simple_CNN.summary()

# Model Evaluation

## Pre_Training_Function

In [None]:
def pre_trainingCheckpoints(T_set,V_set,model,C_P='pre_model'):

  from keras.callbacks import ModelCheckpoint

  C_P = C_P + '.keras'

  # created a checkpoint and save the best training output
  model_checkpoint_callback = ModelCheckpoint(
    filepath = C_P,
    save_weights_only = False,
    monitor = 'val_accuracy',
    mode = 'max',
    save_best_only = True)

  # pre_training
  preModel = model.fit(
    x=T_set,
    validation_data=V_set,
    epochs=5,
    batch_size=1,
    callbacks=[model_checkpoint_callback])

  return preModel , C_P


In [None]:

model_simple_CNN,filepathS = pre_trainingCheckpoints(training_set,validation_dataset,model_simple_CNN,'SimpleCnn_pre_model')

model_Alex,filepathA = pre_trainingCheckpoints(training_set,validation_dataset,model_Alex,'SimpleCnn_pre_model')

model_vgg16,filepathV = pre_trainingCheckpoints(training_set,validation_dataset,model_vgg16,'SimpleCnn_pre_model')


# Fitting the model and Evaluation

In [None]:
# From pretrained File

saved_model = tf.keras.models.load_model(filepathS) 

EvCNN = saved_model.fit(
                          x=training_set,
                          validation_data=validation_dataset,
                          epochs=5,
                          batch_size=3,
                          )

In [None]:
SCnn = model_simple_CNN.fit(
                          x=training_set,
                          validation_data=validation_dataset,
                          epochs=35,
                          batch_size=11,
                          )

In [None]:
ACnn = model_Alex.fit(
                          x=training_set,
                          validation_data=validation_dataset,
                          epochs=21,
                          batch_size=11,
                          )

In [None]:
VCnn = model_vgg16.fit(
                          x=training_set,
                          validation_data=validation_dataset,
                          epochs=19,
                          batch_size=11,
                          )

# Model Graph Representation with Accuracy and Epochs

In [None]:
def model_evaluation(modelCnn):
  test_loss, test_acc = modelCnn.model.evaluate(validation_dataset)
  print(f"Validation Accuracy: {test_acc * 100:.2f}%")

  plt.plot(modelCnn.history['accuracy'], label='Training Accuracy')
  plt.plot(modelCnn.history['val_accuracy'], label='Validation Accuracy')
  plt.xlabel('Epochs')
  plt.ylabel('Accuracy')
  plt.legend()
  plt.show()

In [None]:
# Simple Model
model_evaluation(SCnn)
# AlexNet
model_evaluation(ACnn)
# VGG-16
model_evaluation(VCnn)

# Custome File Prediction
> Pest your image file on Alien_Data Folder



In [None]:

import os
path= "Data/Alien_Data/"

# saved_model = tf.keras.models.load_model('best_model.keras')

def dir_images(path,model):
    for fileName in os.listdir(path):
        if fileName.lower().endswith(('.png', '.jpg', '.jpeg')):
            # Image importation form Dir
            img = tf.keras.preprocessing.image.load_img(path+fileName, target_size=(64, 64))
            img_array = tf.keras.preprocessing.image.img_to_array(img)
            img_array = img_array / 255.0
            img_array = tf.expand_dims(img_array, 0)

            # Process the prediction
            predictions = model.predict(img_array)
            predicted_class = tf.argmax(predictions, axis=1)
            class_labels = list(training_set.class_indices.keys())
            predicted_label = class_labels[predicted_class[0]]
            print(f"Prediction for {fileName}: ----> {predicted_label}")

dir_images(path,saved_model) # pass direct model or select from the file if .keras file Exist

# Project Summary
>> A Neural network is a complex and powerful method and, on the otherhand very sensitive with parameter change.<br>
>> Image quality is crucial, image must be ensuring proper object clarity.<br>
>> For high-performance processing, one can save the learning progress to a file and subsequently proceed through each epoch, monitoring the accuracy.<br>
>> The graph illustrates the validation and training performance throughout each epoch.<br>


>> Modified version of different architecture are used here<br>
>> Custom prediction methods allows you to product with your images<br>

# Challages
>>**Data Requirements**: CNNs typically require large amounts of labeled data for effective training. Acquiring and labeling such datasets can be time-consuming and expensive.<br>

>>**Computational Costs**: Training CNNs can be computationally expensive, especially for deep architectures. This necessitates powerful hardware resources and can make training time extensive.<br>
  
>>**Overfitting**: CNNs can be prone to overfitting, where the model performs well on the training data but poorly on unseen data. Techniques like regularization and data augmentation are used to mitigate this.<br>
 
>>**Interpretability**: CNNs are often considered "black box" models, making it difficult to understand why they make certain predictions. This lack of transparency can be a concern in critical applications.<br>
 
>>**Hardware Limitations**: Training and deploying large CNN models can be challenging due to memory and processing power limitations of available hardware.<br>

## Gradient Vanishing

In essence: Gradient vanishing is a phenomenon that occurs during the training of deep neural networks, particularly those with many layers. It refers to the situation where the gradients of the loss function with respect to the earlier layers in the network become extremely small.   

Impact:

Slow Training: As gradients approach zero, the weight updates during backpropagation become very small. This significantly slows down the training process, making it difficult for the network to learn effectively.   
Difficulty in Learning Deeper Representations: Smaller gradients hinder the flow of information from later layers to earlier layers. This prevents the network from learning meaningful representations in the initial layers, which are crucial for capturing essential features.   
Why it happens:

Chain Rule: Backpropagation relies on the chain rule to compute gradients for each layer. In deep networks, the repeated multiplication of small gradients across many layers leads to a product that quickly approaches zero.   
Activation Functions: Some activation functions, like sigmoid and tanh, tend to saturate (produce outputs close to their limits) in certain regions. This can result in very small derivatives, contributing to the vanishing gradient problem.   
Mitigation Strategies:

ReLU (Rectified Linear Unit): ReLU is a popular activation function that helps alleviate vanishing gradients. It introduces non-linearity without saturating for positive inputs, leading to larger gradients.   
Leaky ReLU, ELU, etc.: Variants of ReLU (like Leaky ReLU, ELU) address the "dying ReLU" problem (where ReLU units can become inactive) and further improve gradient flow.   
Batch Normalization: This technique helps stabilize the training process by normalizing the activations of each layer, reducing internal covariate shift and improving gradient flow.   
Residual Connections: Skip connections in residual networks allow gradients to flow more directly through the network, bypassing some layers and preventing them from vanishing.

# <center>The End</center>