# Plant Disease Identification

This Jupyter Notebook aims to implement a Convolutional Neural Network (CNN) for the identification of plant diseases based on images. Accurate and timely identification of plant diseases is crucial for effective agricultural management, enabling early intervention and mitigation strategies.

## About Dataset

This dataset is recreated using offline augmentation from the original dataset. The original dataset can be found on this github repo. This dataset consists of about 87K rgb images of healthy and diseased crop leaves which is categorized into 38 different classes. The total dataset is divided into 80/20 ratio of training and validation set preserving the directory structure. A new directory containing 33 test images is created later for prediction purpose.

## Importing Necessary Libraries

In [1]:
import os
import warnings
warnings.filterwarnings('ignore')#Ignore all warnings

import pandas as pd
import tensorflow as tf              # Import TensorFlow library

# Set TensorFlow logging verbosity to ERROR to suppress warnings
tf.compat.v1.logging.set_verbosity(tf.compat.v1.logging.ERROR)
tf.get_logger().setLevel('ERROR')  # Set logging level to ERROR
from tensorflow.keras import layers, models  # Import specific modules from Keras, which is now part of TensorFlow
from tensorflow.keras.preprocessing.image import ImageDataGenerator  # Import image data generator for data augmentation
import matplotlib.pyplot as plt    # Import matplotlib for data visualization
import pickle                       # Import pickle for saving/loading data in binary format


2024-02-03 10:10:01.740403: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-02-03 10:10:01.740461: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-02-03 10:10:01.742596: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-02-03 10:10:01.754090: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


## Loading and Exploring the data

In [2]:
train_dir = "../data/raw/plant_diseases_data/train"
validation_dir = "../data/raw/plant_diseases_data/valid"
test_dir = "../data/raw/plant_diseases_data/test"

In [3]:
diseases = os.listdir(train_dir)
#printing the disease names 
print(diseases)

['Apple___Apple_scab', 'Apple___Black_rot', 'Apple___Cedar_apple_rust', 'Apple___healthy', 'Blueberry___healthy', 'Cherry_(including_sour)___healthy', 'Cherry_(including_sour)___Powdery_mildew', 'Corn_(maize)___Cercospora_leaf_spot Gray_leaf_spot', 'Corn_(maize)___Common_rust_', 'Corn_(maize)___healthy', 'Corn_(maize)___Northern_Leaf_Blight', 'Grape___Black_rot', 'Grape___Esca_(Black_Measles)', 'Grape___healthy', 'Grape___Leaf_blight_(Isariopsis_Leaf_Spot)', 'Orange___Haunglongbing_(Citrus_greening)', 'Peach___Bacterial_spot', 'Peach___healthy', 'Pepper,_bell___Bacterial_spot', 'Pepper,_bell___healthy', 'Potato___Early_blight', 'Potato___healthy', 'Potato___Late_blight', 'Raspberry___healthy', 'Soybean___healthy', 'Squash___Powdery_mildew', 'Strawberry___healthy', 'Strawberry___Leaf_scorch', 'Tomato___Bacterial_spot', 'Tomato___Early_blight', 'Tomato___healthy', 'Tomato___Late_blight', 'Tomato___Leaf_Mold', 'Tomato___Septoria_leaf_spot', 'Tomato___Spider_mites Two-spotted_spider_mite',

In [4]:
#  Printing count diseases
print(f"Number of diseases : {len(diseases)} ")

Number of diseases : 38 


In [5]:
#finding number of unique diseases and unique plants
plants = []
number_of_diseases = 0
for plant in diseases:
    plant_name,disease = plant.split("___")
    
    if plant_name not in plants:
        plants.append(plant_name)
    
    if disease != "healthy":
        number_of_diseases +=1
        
#Print the count of unique plants
print(f"Number of Plants:{len(plants)}")

#Print unique plant names
print(f"Unique Plant Names:{plants}")

#Print the count of diseases
print(f"Number of diseases : {number_of_diseases}")


Number of Plants:14
Unique Plant Names:['Apple', 'Blueberry', 'Cherry_(including_sour)', 'Corn_(maize)', 'Grape', 'Orange', 'Peach', 'Pepper,_bell', 'Potato', 'Raspberry', 'Soybean', 'Squash', 'Strawberry', 'Tomato']
Number of diseases : 26


In [6]:
#Number of images for each disease using a dictionary comprehension
number_of_images = {disease: len(os.listdir(os.path.join(train_dir,disease))) for disease in diseases}

#Convering the nums dictionary to a pandas DataFrame
img_per_class = pd.DataFrame(list(number_of_images.items()), columns = ["Disease Name", "No.of Images"])

#Display the DataFrame
img_per_class

Unnamed: 0,Disease Name,No.of Images
0,Apple___Apple_scab,2016
1,Apple___Black_rot,1987
2,Apple___Cedar_apple_rust,1760
3,Apple___healthy,2008
4,Blueberry___healthy,1816
5,Cherry_(including_sour)___healthy,1826
6,Cherry_(including_sour)___Powdery_mildew,1683
7,Corn_(maize)___Cercospora_leaf_spot Gray_leaf_...,1642
8,Corn_(maize)___Common_rust_,1907
9,Corn_(maize)___healthy,1859


## Data preprocessing and augmentation

In [8]:
#Data Augmentation for the trainging data
train_datagen = ImageDataGenerator(
    rescale = 1./255,
    rotation_range=20,
    width_shift_range = 0.2,
    height_shift_range = 0.2,
    shear_range = 0.2,
    zoom_range = 0.2,
    horizontal_flip = True,
    fill_mode = 'nearest'
)

In [9]:
#Creating a generator for the training data
train_generator = train_datagen.flow_from_directory(
    train_dir,
    target_size = (150,150),
    batch_size = 32,
    class_mode = 'categorical'

)

Found 70295 images belonging to 38 classes.


In [10]:
# Set up data rescaling for the validation set
validation_datagen = ImageDataGenerator(rescale=1./255)

In [11]:
# Create a generator for the validation set
validation_generator = validation_datagen.flow_from_directory(
    validation_dir,
    target_size=(150, 150),
    batch_size=32,
    class_mode='categorical'
)

Found 17572 images belonging to 38 classes.


## Building a CNN Model

In [13]:
num_classes = 38
# Create a sequential model
model = models.Sequential()

# Layer 1: Conv2D -> BatchNormalization -> ReLU
model.add(layers.Conv2D(64, (3, 3), input_shape=(256, 256, 3)))
model.add(layers.BatchNormalization())
model.add(layers.Activation('relu'))

# Layer 2: Conv2D -> BatchNormalization -> ReLU -> MaxPooling
model.add(layers.Conv2D(128, (3, 3)))
model.add(layers.BatchNormalization())
model.add(layers.Activation('relu'))
model.add(layers.MaxPooling2D((2, 2)))

# Layer 3: Conv2D -> BatchNormalization -> ReLU -> Conv2D -> BatchNormalization -> ReLU
model.add(layers.Conv2D(128, (3, 3)))
model.add(layers.BatchNormalization())
model.add(layers.Activation('relu'))
model.add(layers.Conv2D(128, (3, 3)))
model.add(layers.BatchNormalization())
model.add(layers.Activation('relu'))

# Layer 4: Conv2D -> BatchNormalization -> ReLU -> MaxPooling
model.add(layers.Conv2D(256, (3, 3)))
model.add(layers.BatchNormalization())
model.add(layers.Activation('relu'))
model.add(layers.MaxPooling2D((2, 2)))

# Layer 5: Conv2D -> BatchNormalization -> ReLU
model.add(layers.Conv2D(512, (3, 3)))
model.add(layers.BatchNormalization())
model.add(layers.Activation('relu'))

# Layer 6: MaxPooling
model.add(layers.MaxPooling2D((2, 2)))

# Flatten the output and add dense layers
model.add(layers.Flatten())
model.add(layers.Dense(512, activation='relu'))
model.add(layers.Dropout(0.5))
model.add(layers.Dense(256, activation='relu'))

# Output layer
model.add(layers.Dense(num_classes, activation='softmax'))

# Display the model summary
model.summary()


2024-02-03 10:12:27.515493: W external/local_tsl/tsl/framework/bfc_allocator.cc:485] Allocator (GPU_0_bfc) ran out of memory trying to allocate 841.00MiB (rounded to 881852416)requested by op StatelessRandomUniformV2
If the cause is memory fragmentation maybe the environment variable 'TF_GPU_ALLOCATOR=cuda_malloc_async' will improve the situation. 
Current allocation summary follows.
Current allocation summary follows.
2024-02-03 10:12:27.515682: I external/local_tsl/tsl/framework/bfc_allocator.cc:1039] BFCAllocator dump for GPU_0_bfc
2024-02-03 10:12:27.515840: I external/local_tsl/tsl/framework/bfc_allocator.cc:1046] Bin (256): 	Total Chunks: 29, Chunks in use: 29. 7.2KiB allocated for chunks. 7.2KiB in use in bin. 2.6KiB client-requested in use in bin.
2024-02-03 10:12:27.515895: I external/local_tsl/tsl/framework/bfc_allocator.cc:1046] Bin (512): 	Total Chunks: 30, Chunks in use: 30. 15.0KiB allocated for chunks. 15.0KiB in use in bin. 15.0KiB client-requested in use in bin.
2024-0

ResourceExhaustedError: {{function_node __wrapped__StatelessRandomUniformV2_device_/job:localhost/replica:0/task:0/device:GPU:0}} OOM when allocating tensor with shape[430592,512] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc [Op:StatelessRandomUniformV2] name: 

In [14]:
model.summary()

ValueError: This model has not yet been built. Build the model first by calling `build()` or by calling the model on a batch of data.

## Compile the Model

In [33]:
# Compile the model with specified optimizer, loss function, and metrics
model.compile(optimizer=tf.compat.v1.train.AdamOptimizer(), 
              loss='categorical_crossentropy',
              metrics=['accuracy'])

## Train the model

In [40]:
# Train the model with the training and validation sets
history = model.fit(
    train_generator,
    steps_per_epoch=train_generator.samples // train_generator.batch_size,
    epochs=15,  # Increase epochs for better training
    validation_data=validation_generator,
    validation_steps=validation_generator.samples // validation_generator.batch_size
)


Epoch 1/15
Epoch 2/15
Epoch 3/15
Epoch 4/15
Epoch 5/15
Epoch 6/15
Epoch 7/15
Epoch 8/15
Epoch 9/15
Epoch 10/15
Epoch 11/15
Epoch 12/15
Epoch 13/15
Epoch 14/15
 508/2196 [=====>........................] - ETA: 11:53 - loss: 0.6301 - accuracy: 0.8094

KeyboardInterrupt: 

In [None]:
#Model Saving

In [41]:
model.save("../models/plant_disease_prediction.h5")

In [None]:

# Display the Training and Validation Accuracy/Loss Plots

In [42]:
acc = history.history['accuracy']
val_acc = history.history['val_accuracy']

loss = history.history['loss']
val_loss = history.history['val_loss']

epochs_range = range(epochs)

plt.figure(figsize=(8, 8))
plt.subplot(1, 2, 1)
plt.plot(epochs_range, acc, label='Training Accuracy')
plt.plot(epochs_range, val_acc, label='Validation Accuracy')
plt.legend(loc='lower right')
plt.title('Training and Validation Accuracy')

plt.subplot(1, 2, 2)
plt.plot(epochs_range, loss, label='Training Loss')
plt.plot(epochs_range, val_loss, label='Validation Loss')
plt.legend(loc='upper right')
plt.title('Training and Validation Loss')
plt.show()


NameError: name 'history' is not defined