# Maize Plant Disease Detection Starter Notebook

# AI for Crop Health - Diagnosing Maize Plant Diseases in Zimbabwe Using Deep Learning

## 📌 Info Page

### About the Challenge
Maize is the staple crop that sustains millions of Zimbabweans, underpinning both food security and livelihoods across rural and urban communities. Yet, Zimbabwe’s maize production is persistently threatened by several devastating leaf diseases—primarily **Common Rust**, **Gray Leaf Spot**, and **Blight**—which cause significant yield reductions and economic losses.

These diseases are widespread across Zimbabwe’s diverse agro-ecological zones, including:
- High-rainfall areas like **Mashonaland East** and **Manicaland**.
- Drier regions like **Masvingo**.
- The central **Midlands**.

Their impact is exacerbated by limited access to timely and accurate disease diagnostics, especially for smallholder farmers who form the backbone of Zimbabwe’s agriculture.

### Your Task
Develop **deep learning models** that can accurately detect and classify maize diseases from leaf images. Leveraging AI for early and precise disease identification can transform farming practices by:
- Providing farmers with **real-time, accessible tools** to identify diseases before they spread widely.
- Reducing reliance on **manual inspection**, which is often subjective and slow.
- Enabling **targeted interventions** to minimize crop loss and reduce pesticide overuse.
- Contributing to **improved food security** and agricultural sustainability in Zimbabwe.

### Dataset Overview
The dataset includes images of maize leaves categorized into:
1. **Common Rust**
2. **Gray Leaf Spot**
3. **Blight**
4. **Healthy**

Your challenge is to design and train models robust to diverse field conditions (e.g., varying lighting, leaf angles, and disease severity).

### Impact
By addressing this challenge, you will contribute to a **high-impact solution** with direct applications in Zimbabwe’s farming communities and beyond, driving the adoption of AI-powered precision agriculture in sub-Saharan Africa.

---

## 📊 Evaluation
- **Metrics**: Accuracy, Precision, Recall, F1-Score.
- **Leaderboard**: Based on performance on the evaluation set.
# 
### 🏆 Prizes
 - **Top 3 performers** will receive:
   - Official certification of achievement
   - Recognition on our community platforms
   - Priority consideration for future opportunities with:
     - The Deep Learning Indaba X Zimbabwe community
     - Agricultural sector partners

## ⏳ Timeline
- **Start Date**: [Insert Date]
- **Submission Deadline**: [Insert Date]
- **Results Announcement**: [Insert Date]

## 📜 Rules
- This challenge is **only open to the Deep Learning Indaba X Zimbabwe Community**.
- Teams must adhere to the **code of conduct** and **submission guidelines**.

---

## 📂 Data Page

### About the Data
The dataset contains labelled images of crop leaves, categorized into four classes:

| Class          | Label | Number of Images |
|----------------|-------|------------------|
| Common Rust    | 0     | 1,306            |
| Gray Leaf Spot | 1     | 574              |
| Blight         | 2     | 1,146            |
| Healthy        | 3     | 1,162            |





# 🌱 Starter Notebook:  Maize Plant Disease Detection 

## 📌 Overview
This notebook serves as a **starter template** for the *AI for Crop Health* hackathon challenge. It provides a foundational workflow for loading, preprocessing, and analyzing the maize disease dataset, as well as training a baseline deep learning model. Use this as a jumping-off point to build and refine your solution.

---

## 🎯 Objectives
By the end of this notebook, you will:
1. **Explore the dataset**: Visualize sample images and understand class distributions.
2. **Preprocess data**: Resize, normalize, and augment images for model training.
3. **Train a baseline model**: Implement a simple CNN or transfer learning model.
4. **Evaluate performance**: Calculate metrics (accuracy, F1-score) and identify areas for improvement.

---

## **Importing Libraries**

In [1]:
import os

import pandas as pd
import matplotlib
import matplotlib.pyplot as plt
import numpy as np
import PIL
import torch
import torch.nn as nn
import torch.optim as optim
import torchinfo
import torchvision
from sklearn.metrics import ConfusionMatrixDisplay, confusion_matrix
from torch.utils.data import DataLoader, random_split
from torchinfo import summary
from torchvision import datasets, transforms
from tqdm import tqdm

### Exploring and Preparing Our Data

In [2]:
data_dir = os.path.join("crop pictures", "data")

print("Data Directory:", data_dir)


Data Directory: crop pictures\data


In [3]:
#Create and print a list of class names in our directory
classes = os.listdir(data_dir)

print("List of classes:", classes)


List of classes: ['Blight', 'Common_Rust', 'Gray_Leaf_Spot', 'Healthy']


In [4]:
# Convert grayscale images to RGB format since our model expects 3-channel input
# Grayscale images only have 1 channel, which would cause dimension mismatch errors
# This ensures all images have consistent 3-channel RGB format for model training
def convert_to_rgb(img):
    """Convert PIL image to RGB format if it isn't already.
    
    Args:
        img: PIL Image object
    
    Returns:
        PIL Image object in RGB format
    """
    if img.mode != "RGB":
        img = img.convert("RGB")
    return img

In [5]:
def get_mean_std(loader):
    """Computes the mean and standard deviation of image data.

    Input: a `DataLoader` producing tensors of shape [batch_size, channels, pixels_x, pixels_y]
    Output: the mean of each channel as a tensor, the standard deviation of each channel as a tensor
            formatted as a tuple (means[channels], std[channels])"""

    channels_sum, channels_squared_sum, num_batches = 0, 0, 0
    for data, _ in tqdm(loader, desc="Computing mean and std", leave=False):
        channels_sum += torch.mean(data, dim=[0, 2, 3])
        channels_squared_sum += torch.mean(data**2, dim=[0, 2, 3])
        num_batches += 1
    mean = channels_sum / num_batches
    std = (channels_squared_sum / num_batches - mean**2) ** 0.5

    return mean, std

In [7]:
# Resize images to 224x224 to ensure consistent input dimensions for the model (common size for CNNs)
# Convert images to tensors to enable GPU acceleration and matrix operations
# Normalize each color channel separately (mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]) 
# to standardize pixel values across the dataset (values based on ImageNet statistics)
# First create a basic transform to get images to tensor format for mean/std calculation
temp_transform = transforms.Compose([
    transforms.Lambda(convert_to_rgb),
    transforms.Resize((224, 224)),
    transforms.ToTensor()
])

# Create temporary dataset and loader for mean/std calculation
temp_dataset = datasets.ImageFolder(data_dir, transform=temp_transform)
temp_loader = DataLoader(temp_dataset, batch_size=32, shuffle=False)

# Calculate dataset statistics
mean, std = get_mean_std(temp_loader)

# Final transform with calculated normalization values
transform_normalized = transforms.Compose([
    transforms.Lambda(convert_to_rgb),  # First convert to RGB if needed
    transforms.Resize((224, 224)),      # Resize to 224x224
    transforms.ToTensor(),              # Convert to tensor
    transforms.Normalize(               # Normalize with calculated stats
        mean=mean,
        std=std
    )
])


print(type(transform_normalized))
print("-----------------")
print(transform_normalized)

                                                                         

<class 'torchvision.transforms.transforms.Compose'>
-----------------
Compose(
    Lambda()
    Resize(size=(224, 224), interpolation=bilinear, max_size=None, antialias=True)
    ToTensor()
    Normalize(mean=tensor([0.4379, 0.4983, 0.3759]), std=tensor([0.2096, 0.2157, 0.2093]))
)




In [8]:
#We make a normalizes dataset using ImageFolder from datasets and print lenght
dataset = datasets.ImageFolder(data_dir,transform_normalized)
print('Length of dataset:', len(dataset))

Length of dataset: 4188
