<center Plant Disease Detection Using CNN</center>



In [1]:
! git clone https://github.com/spMohanty/PlantVillage-Dataset

Cloning into 'PlantVillage-Dataset'...
remote: Enumerating objects: 163235, done.[K
remote: Counting objects: 100% (6/6), done.[K
remote: Compressing objects: 100% (6/6), done.[K
remote: Total 163235 (delta 2), reused 1 (delta 0), pack-reused 163229 (from 1)[K
Receiving objects: 100% (163235/163235), 2.00 GiB | 31.09 MiB/s, done.
Resolving deltas: 100% (101/101), done.
Updating files: 100% (182401/182401), done.


The **PlantVillage-Dataset** from GitHub, This dataset contains images of various plant diseases. **54K+ images**

Set the path to the root directory where the raw color images are stored.

In [2]:
DATASET_ROOT = "/content/PlantVillage-Dataset/raw/color"

List the contents of the dataset directory to verify the download.

`head` shows only the first few folders (the classes)

In [3]:
!ls $DATASET_ROOT | head

Apple___Apple_scab
Apple___Black_rot
Apple___Cedar_apple_rust
Apple___healthy
Blueberry___healthy
Cherry_(including_sour)___healthy
Cherry_(including_sour)___Powdery_mildew
Corn_(maize)___Cercospora_leaf_spot Gray_leaf_spot
Corn_(maize)___Common_rust_
Corn_(maize)___healthy


Install the necessary Python libraries for deep learning and data handling.

This ensures all dependencies are met for the rest of the code

In [4]:
# !pip install tensorflow tensorflow-datasets matplotlib seaborn scikit-learn

In [5]:
import os
import numpy as np
import matplotlib.pyplot as plt
# Import specific modules from Keras for building the neural network.
import tensorflow as tf
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from sklearn.metrics import classification_report, confusion_matrix
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense, Dropout
from tensorflow.keras.preprocessing.image import ImageDataGenerator

In [6]:
img_size = (224, 224)
batch_size = 16

datagen = ImageDataGenerator(
    validation_split=0.2,
    rescale=1./255,
    rotation_range=20,
    zoom_range=0.2,
    horizontal_flip=True
)

In [7]:
train_gen = datagen.flow_from_directory(
    DATASET_ROOT,
    target_size=img_size,
    batch_size=batch_size,
    subset="training",
    class_mode="categorical"
)

Found 43456 images belonging to 38 classes.


In [8]:
val_gen = datagen.flow_from_directory(
    DATASET_ROOT,
    target_size=img_size,
    batch_size=batch_size,
    subset="validation",
    class_mode="categorical"
)

Found 10849 images belonging to 38 classes.


In [9]:
num_classes = len(train_gen.class_indices)
print("Classes:", train_gen.class_indices)

Classes: {'Apple___Apple_scab': 0, 'Apple___Black_rot': 1, 'Apple___Cedar_apple_rust': 2, 'Apple___healthy': 3, 'Blueberry___healthy': 4, 'Cherry_(including_sour)___Powdery_mildew': 5, 'Cherry_(including_sour)___healthy': 6, 'Corn_(maize)___Cercospora_leaf_spot Gray_leaf_spot': 7, 'Corn_(maize)___Common_rust_': 8, 'Corn_(maize)___Northern_Leaf_Blight': 9, 'Corn_(maize)___healthy': 10, 'Grape___Black_rot': 11, 'Grape___Esca_(Black_Measles)': 12, 'Grape___Leaf_blight_(Isariopsis_Leaf_Spot)': 13, 'Grape___healthy': 14, 'Orange___Haunglongbing_(Citrus_greening)': 15, 'Peach___Bacterial_spot': 16, 'Peach___healthy': 17, 'Pepper,_bell___Bacterial_spot': 18, 'Pepper,_bell___healthy': 19, 'Potato___Early_blight': 20, 'Potato___Late_blight': 21, 'Potato___healthy': 22, 'Raspberry___healthy': 23, 'Soybean___healthy': 24, 'Squash___Powdery_mildew': 25, 'Strawberry___Leaf_scorch': 26, 'Strawberry___healthy': 27, 'Tomato___Bacterial_spot': 28, 'Tomato___Early_blight': 29, 'Tomato___Late_blight': 30

Build Model (Transfer Learning)

Load a pre-trained EfficientNetB0 model.

`include_top=False` means we are only using the feature-extracting base,not the final classification layers.

Freeze the base model's layers so their weights are not updated during training.

In [10]:
base_model = tf.keras.applications.EfficientNetB0(
    input_shape=(224,224,3),
    include_top=False,
    weights="imagenet"
)
base_model.trainable = False  # Freeze base initially


Downloading data from https://storage.googleapis.com/keras-applications/efficientnetb0_notop.h5
[1m16705208/16705208[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 0us/step


Add a GlobalAveragePooling2D layer on top of the base model's output.

This averages the feature maps, preparing the data for the final classification layers.

In [11]:
x = tf.keras.layers.GlobalAveragePooling2D()(base_model.output)

Add a BatchNormalization layer to stabilize and speed up training.

In [12]:
# Batch Normalisation
x = tf.keras.layers.BatchNormalization()(x)

Add a Dropout layer to randomly ignore some neurons, which helps prevent overfitting.

In [13]:
# Dropout for regularization
x = tf.keras.layers.Dropout(0.4)(x)

Add a fully connected (Dense) layer with 256 neurons and ReLU activation.

In [14]:
x = tf.keras.layers.Dense(256, activation="relu")(x)

Add another BatchNormalization layer.

In [15]:
# Batch Normalisation after dense layer
x = tf.keras.layers.BatchNormalization()(x)

In [16]:
x = tf.keras.layers.Dropout(0.4)(x)
outputs = tf.keras.layers.Dense(num_classes, activation="softmax")(x)

Create a new Sequential model, which stacks layers one after another, Add convolutional layers and pooling layers to extract features from images, more complex features.

Flatten the output of the convolutional layers into a 1D vector, Add a fully connected layer with 128 neurons


Add the final output layer with `num_classes` neurons for classification.

In [17]:
model = Sequential([
    Conv2D(32, (3, 3), activation='relu', input_shape=(224, 224, 3)),
    MaxPooling2D(2, 2),
    Dropout(0.2),
    Conv2D(64, (3, 3), activation='relu'),
    MaxPooling2D(2, 2),
    Dropout(0.2),
    Flatten(),
    Dense(128, activation='relu'),
    Dropout(0.5),
    Dense(num_classes, activation='softmax')
])

  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


Compile the Model to configure the training,

Using the Adam optimizer and categorical_crossentropy as loss function for multiclass classification.

In [18]:
model.compile(
    optimizer='adam',
    loss='categorical_crossentropy',
    metrics=['accuracy']
)

Data Augmentation (Second, more detailed attempt)

Create a new ImageDataGenerator for training with a wider range of augmentations.

In [19]:
train_datagen = ImageDataGenerator(
    rescale=1./255,
    rotation_range=20,
    width_shift_range=0.2,
    height_shift_range=0.2,
    shear_range=0.2,
    zoom_range=0.2,
    horizontal_flip=True,
    fill_mode='nearest'
)

Splitting the Dataset into train and val and creating separate folders for both, for training model images from train folder is used

In [20]:
import os
import shutil
import random

# Create train/val directories
train_dir = "/content/PlantVillage-Dataset/train"
val_dir = "/content/PlantVillage-Dataset/val"

os.makedirs(train_dir, exist_ok=True)
os.makedirs(val_dir, exist_ok=True)

Loop through each class folder to divide the dataset to 80:20 train val

In [21]:
dataset_dir = "/content/PlantVillage-Dataset/raw/color"
for class_name in os.listdir(dataset_dir):
    class_path = os.path.join(dataset_dir, class_name)
    if os.path.isdir(class_path):
        images = os.listdir(class_path)
        random.shuffle(images)
        split = int(len(images) * 0.8)  # 80% train, 20% val

        train_class_dir = os.path.join(train_dir, class_name)
        val_class_dir = os.path.join(val_dir, class_name)
        os.makedirs(train_class_dir, exist_ok=True)
        os.makedirs(val_class_dir, exist_ok=True)

        # Move images
        for img in images[:split]:
            shutil.copy(os.path.join(class_path, img), train_class_dir)
        for img in images[split:]:
            shutil.copy(os.path.join(class_path, img), val_class_dir)

Creating a new generator for the training data using the new `train` directory

In [22]:
train_generator = train_datagen.flow_from_directory(
    '/content/PlantVillage-Dataset/train',
    target_size=(224, 224),
    batch_size=32,
    class_mode='categorical'
)

Found 43429 images belonging to 38 classes.


Creating a separate generator for the validation data.

In [23]:
test_datagen = ImageDataGenerator(rescale=1./255)
val_generator = test_datagen.flow_from_directory(
    '/content/PlantVillage-Dataset/val',
    target_size=(224, 224),
    batch_size=32,
    class_mode='categorical'
)

Found 10876 images belonging to 38 classes.


Model training

In [None]:
history = model.fit(
    train_generator,
    steps_per_epoch=train_generator.samples // train_generator.batch_size,
    epochs=30,
    validation_data=val_generator,
    validation_steps=val_generator.samples // val_generator.batch_size
)

  self._warn_if_super_not_called()


Epoch 1/50
[1m1357/1357[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m551s[0m 400ms/step - accuracy: 0.2373 - loss: 3.2506 - val_accuracy: 0.4828 - val_loss: 1.8182
Epoch 2/50
[1m   1/1357[0m [37m━━━━━━━━━━━━━━━━━━━━[0m [1m1:27[0m 65ms/step - accuracy: 0.3750 - loss: 2.1823



[1m1357/1357[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m14s[0m 10ms/step - accuracy: 0.3750 - loss: 2.1823 - val_accuracy: 0.4950 - val_loss: 1.7652
Epoch 3/50
[1m1357/1357[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m622s[0m 416ms/step - accuracy: 0.4505 - loss: 1.9165 - val_accuracy: 0.6396 - val_loss: 1.1960
Epoch 4/50
[1m1357/1357[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m14s[0m 11ms/step - accuracy: 0.4688 - loss: 1.4112 - val_accuracy: 0.6218 - val_loss: 1.2496
Epoch 5/50
[1m1357/1357[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m600s[0m 410ms/step - accuracy: 0.5340 - loss: 1.5857 - val_accuracy: 0.6950 - val_loss: 1.0149
Epoch 6/50
[1m1357/1357[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m14s[0m 11ms/step - accuracy: 0.4375 - loss: 1.5220 - val_accuracy: 0.7011 - val_loss: 0.9940
Epoch 7/50
[1m1357/1357[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m564s[0m 416ms/step - acc

In [None]:
model.summary()

In [None]:
plt.plot(history.history['accuracy'], label='Training Accuracy')
plt.plot(history.history['val_accuracy'], label='Validation Accuracy')
plt.title('Model Accuracy')
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.legend()
plt.show()

In [None]:
plt.plot(history.history['loss'], label='Training Loss')
plt.plot(history.history['val_loss'], label='Validation Loss')
plt.title('Model Loss')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()
plt.show()

In [None]:
# Save the trained Keras model
model.save('/content/crop_disease_model.h5')
print("Model saved as crop_disease_model.h5")

In [None]:
from google.colab import drive
drive.mount('/content/drive')

In [None]:
# Create a folder inside Google Drive
save_path = "/content/drive/MyDrive/CropDiseaseModel"
import os
os.makedirs(save_path, exist_ok=True)

# Save Keras model (.h5)
model.save(f"{save_path}/crop_disease_model.h5")

# Save TFLite model (.tflite)
import tensorflow as tf
converter = tf.lite.TFLiteConverter.from_keras_model(model)
tflite_model = converter.convert()
with open(f"{save_path}/crop_disease_model.tflite", "wb") as f:
    f.write(tflite_model)

print("✅ Model saved to Google Drive at:", save_path)
