# Overview

FungiScan AI: Automated Detection of Fungal Diseases in Crops

* Introduction


Fungal diseases pose a major threat to global agriculture, leading to reduced crop yields and economic losses. Early detection and accurate diagnosis are critical for mitigating their impact. This project leverages computer vision and machine learning to develop an AI-powered solution for identifying fungal infections in crops based on image data.

* Dataset Overview

We utilize the PlantVillage Dataset from Mendeley Data [Link](https://data.mendeley.com/datasets/tywbtsjrjv/1), a widely recognized open-source dataset for plant disease classification. It contains thousands of labeled images of healthy and diseased plants, covering various fungal infections

In [None]:
!wget -O plant_disease_dataset.zip "https://data.mendeley.com/public-files/datasets/tywbtsjrjv/files/b4e3a32f-c0bd-4060-81e9-6144231f2520/file_downloaded"

--2025-02-07 21:27:15--  https://data.mendeley.com/public-files/datasets/tywbtsjrjv/files/b4e3a32f-c0bd-4060-81e9-6144231f2520/file_downloaded
Resolving data.mendeley.com (data.mendeley.com)... 162.159.130.86, 162.159.133.86
Connecting to data.mendeley.com (data.mendeley.com)|162.159.130.86|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://prod-dcd-datasets-public-files-eu-west-1.s3.eu-west-1.amazonaws.com/349ac012-2948-4172-bbba-3bf8f76596fd [following]
--2025-02-07 21:27:16--  https://prod-dcd-datasets-public-files-eu-west-1.s3.eu-west-1.amazonaws.com/349ac012-2948-4172-bbba-3bf8f76596fd
Resolving prod-dcd-datasets-public-files-eu-west-1.s3.eu-west-1.amazonaws.com (prod-dcd-datasets-public-files-eu-west-1.s3.eu-west-1.amazonaws.com)... 3.5.66.189, 52.92.32.66, 3.5.69.200, ...
Connecting to prod-dcd-datasets-public-files-eu-west-1.s3.eu-west-1.amazonaws.com (prod-dcd-datasets-public-files-eu-west-1.s3.eu-west-1.amazonaws.com)|3.5.66.189|:443... con

In [None]:
import zipfile

with zipfile.ZipFile("plant_disease_dataset.zip", 'r') as zip_ref:
    zip_ref.extractall("Plant_Disease_Data")


In [None]:
import os

data_path = "Plant_Disease_Data"
print(os.listdir(data_path))

['Plant_leave_diseases_dataset_with_augmentation']


In [None]:
# Step 1: Import necessary libraries
import pandas as pd
import numpy as np
import os
from sklearn.model_selection import train_test_split
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.applications import EfficientNetB0
from tensorflow.keras.layers import Dense, GlobalAveragePooling2D, Dropout
from tensorflow.keras import layers, models
from tensorflow.keras.optimizers import Adam

In [None]:
import os
import shutil

# Original dataset path
dataset_path = "/content/Plant_Disease_Data/Plant_leave_diseases_dataset_with_augmentation"
filtered_dataset_path = "/content/Fungal_Disease_Data"

# List of relevant fungal disease categories
fungal_diseases = [
    "Apple___Apple_scab",
    "Apple___Cedar_apple_rust",
    "Cherry___Powdery_mildew",
    "Corn___Common_rust",
    "Corn___Northern_Leaf_Blight",
    "Grape___Black_rot",
    "Grape___Esca_(Black_Measles)",
    "Grape___Leaf_blight_(Isariopsis_Leaf_Spot)",
    "Potato___Early_blight",
    "Potato___Late_blight",
    "Squash___Powdery_mildew",
    "Strawberry___Leaf_scorch",
    "Tomato___Early_blight",
    "Tomato___Late_blight",
    "Tomato___Leaf_Mold",
    "Tomato___Septoria_leaf_spot",
    "Tomato___Target_Spot"
]


# Create new dataset directory
os.makedirs(filtered_dataset_path, exist_ok=True)

# Copy only relevant categories
for disease in fungal_diseases:
    src = os.path.join(dataset_path, disease)
    dst = os.path.join(filtered_dataset_path, disease)

    if os.path.exists(src):
        shutil.copytree(src, dst, dirs_exist_ok=True)

print("Filtered dataset created successfully!")

Filtered dataset created successfully!


In [None]:
from sklearn.model_selection import train_test_split
import random

# Define split paths
train_dir = "/content/Fungal_Train"
val_dir = "/content/Fungal_Val"
test_dir = "/content/Fungal_Test"

def split_dataset(source_dir, train_dir, val_dir, test_dir, train_size=0.8, val_size=0.1):
    os.makedirs(train_dir, exist_ok=True)
    os.makedirs(val_dir, exist_ok=True)
    os.makedirs(test_dir, exist_ok=True)

    for category in os.listdir(source_dir):
        category_path = os.path.join(source_dir, category)
        images = os.listdir(category_path)
        random.shuffle(images)

        # Compute split indices
        train_idx = int(len(images) * train_size)
        val_idx = train_idx + int(len(images) * val_size)

        # Create subdirectories
        os.makedirs(os.path.join(train_dir, category), exist_ok=True)
        os.makedirs(os.path.join(val_dir, category), exist_ok=True)
        os.makedirs(os.path.join(test_dir, category), exist_ok=True)

        # Move images
        for i, img in enumerate(images):
            src = os.path.join(category_path, img)
            if i < train_idx:
                dst = os.path.join(train_dir, category, img)
            elif i < val_idx:
                dst = os.path.join(val_dir, category, img)
            else:
                dst = os.path.join(test_dir, category, img)
            shutil.copy2(src, dst)

split_dataset(filtered_dataset_path, train_dir, val_dir, test_dir)
print("Dataset split into train, validation, and test sets.")

Dataset split into train, validation, and test sets.


In [None]:
import tensorflow as tf
from tensorflow.keras.preprocessing.image import ImageDataGenerator

# Define EfficientNet-compatible preprocessing
train_datagen = ImageDataGenerator(rescale=1./255)
val_datagen = ImageDataGenerator(rescale=1./255)
test_datagen = ImageDataGenerator(rescale=1./255)

# Load data
train_generator = train_datagen.flow_from_directory(
    train_dir, target_size=(224, 224), batch_size=32, class_mode='categorical'
)

val_generator = val_datagen.flow_from_directory(
    val_dir, target_size=(224, 224), batch_size=32, class_mode='categorical'
)

test_generator = test_datagen.flow_from_directory(
    test_dir, target_size=(224, 224), batch_size=32, class_mode='categorical'
)

print("Data preprocessing completed!")


Found 16725 images belonging to 17 classes.
Found 2087 images belonging to 17 classes.
Found 2099 images belonging to 17 classes.
Data preprocessing completed!


In [None]:
import tensorflow as tf
from tensorflow.keras.applications import MobileNetV2
from tensorflow.keras.layers import Dense, Dropout, GlobalAveragePooling2D
from tensorflow.keras.models import Model
from tensorflow.keras.optimizers import Adam

# Load MobileNetV2 model without top layers
base_model = MobileNetV2(weights="imagenet", include_top=False, input_shape=(224, 224, 3))

# Freeze base layers (initial training)
base_model.trainable = False

# Add custom classification layers
x = GlobalAveragePooling2D()(base_model.output)
x = Dropout(0.5)(x)  # Reduce overfitting
x = Dense(128, activation="relu")(x)
x = Dense(len(train_generator.class_indices), activation="softmax")(x)  # Output layer

# Create model
model = Model(inputs=base_model.input, outputs=x)

# Compile model
model.compile(optimizer=Adam(learning_rate=0.001), loss="categorical_crossentropy", metrics=["accuracy"])

# Model summary
model.summary()


Downloading data from https://storage.googleapis.com/tensorflow/keras-applications/mobilenet_v2/mobilenet_v2_weights_tf_dim_ordering_tf_kernels_1.0_224_no_top.h5
[1m9406464/9406464[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 0us/step


In [None]:
# Train the model
history = model.fit(
    train_generator,
    validation_data=val_generator,
    epochs=10,  # You can adjust epochs based on validation performance
    steps_per_epoch=len(train_generator),
    validation_steps=len(val_generator)
)


  self._warn_if_super_not_called()


Epoch 1/10
[1m523/523[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m844s[0m 2s/step - accuracy: 0.6747 - loss: 1.0601 - val_accuracy: 0.9200 - val_loss: 0.2463
Epoch 2/10
[1m523/523[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m833s[0m 2s/step - accuracy: 0.8842 - loss: 0.3356 - val_accuracy: 0.9238 - val_loss: 0.2316
Epoch 3/10
[1m523/523[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m864s[0m 2s/step - accuracy: 0.9071 - loss: 0.2731 - val_accuracy: 0.9229 - val_loss: 0.2212
Epoch 4/10
[1m523/523[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m948s[0m 2s/step - accuracy: 0.9128 - loss: 0.2527 - val_accuracy: 0.9348 - val_loss: 0.1738
Epoch 5/10
[1m523/523[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m873s[0m 2s/step - accuracy: 0.9193 - loss: 0.2246 - val_accuracy: 0.9391 - val_loss: 0.1719
Epoch 6/10
[1m523/523[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m856s[0m 2s/step - accuracy: 0.9269 - loss: 0.2086 - val_accuracy: 0.9339 - val_loss: 0.1815
Epoch 7/10
[1m523/523

In [None]:
# Save the fine-tuned model
model.save("mobilenetv2_fungal_disease.h5")



In [None]:
# Unfreeze layers for fine-tuning
base_model.trainable = True
model.compile(optimizer=Adam(learning_rate=0.0001), loss="categorical_crossentropy", metrics=["accuracy"])

# Continue training
history_fine_tune = model.fit(
    train_generator,
    validation_data=val_generator,
    epochs=10,  # You can adjust epochs
    steps_per_epoch=len(train_generator),
    validation_steps=len(val_generator)
)


Epoch 1/10
[1m523/523[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m838s[0m 2s/step - accuracy: 0.9465 - loss: 0.1522 - val_accuracy: 0.9506 - val_loss: 0.1353
Epoch 2/10
[1m523/523[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m854s[0m 2s/step - accuracy: 0.9517 - loss: 0.1351 - val_accuracy: 0.9516 - val_loss: 0.1369
Epoch 3/10
[1m523/523[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m858s[0m 2s/step - accuracy: 0.9489 - loss: 0.1445 - val_accuracy: 0.9545 - val_loss: 0.1290
Epoch 4/10
[1m523/523[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m862s[0m 2s/step - accuracy: 0.9559 - loss: 0.1253 - val_accuracy: 0.9530 - val_loss: 0.1277
Epoch 5/10
[1m523/523[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m861s[0m 2s/step - accuracy: 0.9572 - loss: 0.1227 - val_accuracy: 0.9554 - val_loss: 0.1278
Epoch 6/10
[1m523/523[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m822s[0m 2s/step - accuracy: 0.9541 - loss: 0.1322 - val_accuracy: 0.9530 - val_loss: 0.1335
Epoch 7/10
[1m523/523