<a href="https://colab.research.google.com/github/KAREN154/PlantPathoDetect-/blob/main/index.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# PLANTPATHODETECT


<img src ="Images/main.jpeg" width = "1000" height ="300">

<h1>1. Business Understanding</h3>

<h2>1.1 Business Overview</h2>

Agriculture is crucial for food security and economic stability, yet essential crops like maize, potatoes, and tomatoes face rising threats from diseases that significantly reduce yields and impact farmers' livelihoods. In regions reliant on these crops, such challenges lead to economic strain, increased pesticide use, and food shortages, affecting entire communities. This project aims to address these issues by developing a machine learning-powered image classification system for early disease detection. By enabling farmers to upload crop images to a web-based platform for real-time analysis and treatment recommendations, the solution empowers them with vital insights, helping to protect yields and sustain food production.

<h2>1.2 Problem Statement<h2>

Crop diseases significantly reduce yields and incomes for farmers, impacting food security and economic stability, especially in regions with limited resources. In Kenya, farmers often lack access to timely and accurate disease detection, relying on traditional methods that delay intervention and increase losses. This project addresses the need for a more efficient, accessible solution by developing a machine learning-based system for early disease detection in maize, potatoes, and tomatoes, providing farmers with quick, actionable insights to improve crop health and productivity.

<h2>1.3 Proposed Solutions</h2>

1. Machine Learning-Based Image Classification: Developing a machine learning model, specifically a convolutional neural network (CNN), to classify images of maize, potatoes, and tomatoes for early disease detection. This model will be trained to recognize various crop diseases accurately.

2. Web-Based Platform: Creating a user-friendly web application where farmers can upload images of their crops for real-time analysis. The platform will offer diagnosis results and treatment recommendations, making disease management accessible to farmers with limited resources.

3. Data Collection and Preprocessing: Building a robust, diverse dataset of crop images to enhance model accuracy and ensure it can detect diseases across different crop species and environmental conditions.

4. Agile Development and Feedback Loop: Employing an iterative, agile development approach to continuously refine the model and application based on user feedback and evolving needs.

5. Integration with Local Agriculture Practices: Tailoring the solution to the specific agricultural challenges in Kenya, ensuring that the system aligns with local crop health practices and disease management strategies.

6. Educational Resources and Support: Providing guidance on disease prevention and sustainable farming practices through the web application to further support farmers in maintaining crop health.

## 1.4 Objectives

### 1.4.1 General Objective

To develop a web-based application that facilitates early detection and prediction of crop diseases in maize, potatoes, and tomatoes using machine learning-based image classification.

### 1.4.2 Specific Objectives

1. **Data Collection and Preprocessing**: To collect and preprocess a diverse dataset of images capturing common diseases affecting maize, potatoes, and tomatoes.
2. **Model Development**: To design and train a convolutional neural network (CNN) for accurate image classification of crop diseases.
3. **Application Deployment**: To implement the trained model within a user-friendly web application where farmers can upload images for disease diagnosis.
4. **System Evaluation and Feedback**: To assess the model’s performance, gather user feedback, and iteratively refine the application for practical use by local farmers.

<h2>1.5 Research Questions</h2>


1. What crop diseases can be effectively detected in maize, potatoes, and tomatoes using image classification?
2. How accurate is the proposed machine learning model in identifying diseases in these crops?
3. What features are necessary for a web-based application to aid farmers in disease diagnosis?
4. How can the web application provide actionable recommendations based on detected diseases?

<h2>1.6 Justification</h2>

This research addresses the pressing issue of crop diseases, which impacts farmers’ livelihoods and food security. Providing an accessible tool for disease detection empowers farmers with knowledge, enabling timely action. The project contributes to agricultural technology and highlights machine learning's potential in enhancing agricultural practices.

<h2>1.7 Proposed Research and System Methodologies</h2>


The project will collect data from public agricultural databases and farmer collaborations, focusing on maize, potatoes, and tomatoes. A CNN will be used for its effectiveness in image classification, and an Agile development process will ensure iterative feedback from users. Key tools include Python for model development, Flask for the web application, and TensorFlow for deep learning.

<h2>1.8 Scope</h2>


The research focuses on common diseases affecting maize, potatoes, and tomatoes in Kenya, targeting smallholder farmers who may lack resources for effective disease management. Limitations may include data variability and disease manifestation differences across crop species. This study is confined to digital image analysis and does not include physical disease testing.

<h2>1.9 StakeHolders</h2>

1. Farmers: Smallholder and commercial farmers, particularly in regions where crop diseases like those affecting maize, potatoes, and tomatoes are prevalent. They are the end-users who will directly benefit from the disease detection tool.

2. Agricultural Extension Officers: Professionals who work closely with farmers, providing advice and support in crop management. They may use this tool to assist farmers in disease identification and treatment recommendations.

3. Agricultural Research Institutions: Organizations focused on agricultural technology and innovation. They may be interested in the data, findings, and outcomes to support further research on crop diseases and digital agricultural solutions.

4. Government and Policymakers: Officials responsible for food security and agricultural productivity initiatives. They might use insights from the project to shape policies and support digital innovations that aid farmers.

5. Technology and Data Science Professionals: Individuals or teams involved in the development, maintenance, and improvement of the machine learning model and web application.

6. Non-Governmental Organizations (NGOs): Organizations that work with smallholder farmers to improve food security and promote sustainable farming practices could also benefit from the tool by integrating it into their support programs.

<h1>Data Understanding</h2>

The dataset used in this project is sourced from PlantVillage and comprises 70,000 high-quality images of both healthy and diseased plant leaves from nine distinct species. This dataset is meticulously organized into three splits—train, test, and validation—ensuring consistent categories across each split. It offers an excellent foundation for machine learning research and applications in plant disease detection and classification.

Ideal for both agricultural experts and machine learning practitioners, this diverse dataset captures a broad range of plant species, disease types, and growth stages. By leveraging this dataset, the project aims to advance research in plant pathology and support farmers in enhancing crop health and productivity, ultimately contributing to more sustainable agricultural practices.

Link to The Plant Village Website [Plant Village](https://plantvillage.psu.edu/)

In [1]:
# Load the drive helper and mount
from google.colab import drive
# This will prompt for authorization
drive.mount('/content/drive')

Mounted at /content/drive


In [2]:
!ls '/content/drive/MyDrive/PlantPathoDetect-'

data		       data-processing.py  LICENSE.unknown		       README.md
data-augementation.py  index.ipynb	   PlantPathoDetectDocumentation.docx


In [4]:
!ls '/content/drive/MyDrive/PlantPathoDetect-/data'

'Bell Pepper'  'Corn (Maize)'   Potato	 Tomato


In [5]:
base_path = '/content/drive/MyDrive/PlantPathoDetect-'
data_path = f"{base_path}/data"

# List all subdirectories in 'data'
import os

print("Contents of the 'data' directory:")
if os.path.exists(data_path):
    for subdir in os.listdir(data_path):
        subdir_path = os.path.join(data_path, subdir)
        if os.path.isdir(subdir_path):
            print(f"\n{subdir}:")
            !ls "{subdir_path}"
else:
    print(f"Directory does not exist: {data_path}")


Contents of the 'data' directory:

Potato:
Test  Train  Val

Tomato:
Test  Train  Val

Bell Pepper:
Test  Train  Val

Corn (Maize):
Test  Train  Val


***Import necessary libraries***

In [6]:
import os
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from tensorflow import keras
from tensorflow.keras import layers
# from tensorflow.keras.callbacks import ModelCheckpo
import tensorflow as tf
# from keras.models import sequential
from tensorflow.keras import layers

from tensorflow.keras.preprocessing.image import ImageDataGenerator



***Function to create a DataFrame***

In [8]:
import os

base_path = '/content/drive/MyDrive/PlantPathoDetect-'
tomato_path = f"{base_path}/data/Tomato"

print("Contents of the 'Tomato' directory:")
if os.path.exists(tomato_path):
    for item in os.listdir(tomato_path):
        item_path = os.path.join(tomato_path, item)
        if os.path.isdir(item_path):
            print(f"Directory: {item}")
        else:
            print(f"File: {item}")
else:
    print(f"The directory does not exist: {tomato_path}")


Contents of the 'Tomato' directory:
Directory: Train
Directory: Val
Directory: Test


In [11]:
import os
import numpy as np
from PIL import Image

# Path to the train set (adjust if necessary)
train_tomato = '/content/drive/MyDrive/PlantPathoDetect-/data/Tomato/Train'

def img_to_array(dir_path):
    images = []
    labels = []

    # Iterate through class folders in the training directory
    for folderName in os.listdir(dir_path):
        folder_dir = os.path.join(dir_path, folderName)
        if os.path.isdir(folder_dir):
            for fileName in os.listdir(folder_dir):
                if fileName.endswith('.JPG') or fileName.endswith('.jpg'):  # Handle case sensitivity
                    img_path = os.path.join(folder_dir, fileName)
                    try:
                        img = Image.open(img_path).convert('RGB')  # Ensure 3-channel RGB
                        resized_img = img.resize((128, 128))  # Resize to 128x128
                        img_arr = np.array(resized_img)
                        images.append(img_arr)
                        labels.append(folderName)
                    except Exception as e:
                        print(f"Error processing image {img_path}: {e}")

    # Convert lists to numpy arrays
    images = np.array(images)
    labels = np.array(labels)
    print(f"Images array shape: {images.shape}, Labels array shape: {labels.shape}")

    return images, labels

# Process the train set
images, labels = img_to_array(train_tomato)


Images array shape: (11105, 128, 128, 3), Labels array shape: (11105,)


### Image processig and data Loading for the Tomato Data

In [None]:
# Paths to directories
train_tomato = '/content/drive/MyDrive/PlantPathoDetect-/data/Tomato/Train'
test_tomato = '/content/drive/MyDrive/PlantPathoDetect-/data/Tomato/Test'
val_tomato = '/content/drive/MyDrive/PlantPathoDetect-/data/Tomato/Val'

# Read and store images and labels for each dataset
print("Processing training data...")
train_images, train_labels = img_to_array(train_tomato)

print("\nProcessing testing data...")
test_images, test_labels = img_to_array(test_tomato)

print("\nProcessing validation data...")
val_images, val_labels = img_to_array(val_tomato)

Processing training data...


 Reading the images and converting them into np.arrays, the visualized images prove that it has been done properly

In [None]:
print(train_labels)

['Bacterial Spot' 'Bacterial Spot' 'Bacterial Spot' ...
 'Yellow Leaf Curl Virus' 'Yellow Leaf Curl Virus'
 'Yellow Leaf Curl Virus']


In [None]:
test_labels

array(['Bacterial Spot', 'Bacterial Spot', 'Bacterial Spot', ...,
       'Yellow Leaf Curl Virus', 'Yellow Leaf Curl Virus',
       'Yellow Leaf Curl Virus'], dtype='<U22')

In [None]:
# # Normalize pixel values to the range of 0-1
# train_tomato = train_tomato.astype('float32') / 255.0
# val_tomato = val_tomato.astype('float32') / 255.0
# test_tomato= test_tomato.astype('float32') / 255.0

In [None]:
# # Save train_images
# np.save('/train_images.npy', train_tomato)

# # Save val_images
# np.save('val_images.npy', val_tomato)

# # Save test_images
# np.save('test_images.npy', test_tomato)

In [None]:
import numpy as np
from keras.preprocessing.image import ImageDataGenerator

# Assuming 'train_images' and 'train_labels' are your NumPy arrays

# Create an instance of ImageDataGenerator
datagen = ImageDataGenerator(
    rotation_range=60,  # Rotation range in degrees
    vertical_flip=True  # Enable vertical flipping
)

# Reshape train_images to 4D (required by ImageDataGenerator)
train_images_4D = train_images.reshape(train_images.shape
[0], train_images.shape[1], train_images.shape[2], 3)

# Generate augmented images
augmented_images = []
augmented_labels = []
for img, label in zip(train_images_4D, train_labels):
    img_batch = np.expand_dims(img, axis=0)
    aug_iter = datagen.flow(img_batch, batch_size=1)
    for _ in range(4):  # Generate 4 augmented images for each original image
        augmented_img = next(aug_iter)[0].astype(np.uint8)
        augmented_images.append(augmented_img)
        augmented_labels.append(label)

# Convert augmented images and labels to NumPy arrays
augmented_images = np.array(augmented_images)
augmented_labels = np.array(augmented_labels)
# Reshape augmented_images to 4D (original shape)
augmented_images = augmented_images.reshape(
    augmented_images.shape[0], train_images.shape[1], train_images.shape[2], 3
)

# Concatenate augmented and original images and labels
final_train_images = np.concatenate((train_images, augmented_images), axis=0)
final_train_labels = np.concatenate((train_labels, augmented_labels), axis=0)

# Shuffle the data
shuffle_indices = np.random.permutation(final_train_images.shape[0])
final_train_images = final_train_images[shuffle_indices]
final_train_labels = final_train_labels[shuffle_indices]

 preprocessing

Handles all preprocessing steps
Features:

Label encoding and one-hot encoding
Image normalization
Detailed statistics and distribution reporting
Organized class structure with helper methods
Comprehensive documentation

#### Data Augementation

Handles all data augmentation tasks
Features:

- Customizable augmentation parameters
- Generator creation for training
- Augmentation preview functionality
- Detailed documentation
- Visual inspection of augmentations

In [None]:
# Initialize augmentor
augmentor = DataAugmentor()

# Create generator for training
train_generator = augmentor.create_generator(
    processed_images['train'],
    processed_labels['train'],
    batch_size=32
)

# Optionally, preview augmentations
sample_image = processed_images['train'][0]
augmentor.preview_augmentations(sample_image, num_augmentations=5)

# Set the paths to the train, test, and validation directories
train_dir = '/kaggle/input/plant-village-dataset-updated/Tomato/Train'
test_dir = '/kaggle/input/plant-village-dataset-updated/Tomato/Test'
val_dir = '/kaggle/input/plant-village-dataset-updated/Tomato/Val'