In [1]:
%pip install numpy


Note: you may need to restart the kernel to use updated packages.



[notice] A new release of pip is available: 24.3.1 -> 25.0
[notice] To update, run: python.exe -m pip install --upgrade pip


1.Defined paths for training and validation datasets (TRAIN_PATH, VALID_PATH) with image size (224, 224) 
2.load_and_preprocess_data_in_batches() loads images in batches, preprocesses them, and saves as .pkl files.
3.Splits the dataset into 80% training and 20% validation using sklearn.model_selection.train_test_split.
4.Data Processing: Converts images into CNN-ready tensors and stacks them for efficiency. Handles corrupted images by replacing them with placeholder tensors.
5.Data Visualization:
Displays one sample image per emotion category.
Visualizes batches of images using matplotlib for training verification.
7.DataLoader Creation: Implements batch processing for training using DataLoader.
8.Error Handling: Manages image loading errors and corrupted files.
9.Visualization create png files using matplotlib for visualization 
Efficient preprocessing optimizes model training and stores processed data to avoid redundancy.

In [None]:
import os
import numpy as np
import pickle
import cv2
import matplotlib.pyplot as plt
import seaborn as sns
from tqdm import tqdm
TRAIN_PATH = r"D:\bhachu\plant_project\Plant_Diseases\train"
VALID_PATH = r"D:\bhachu\plant_project\Plant_Diseases\valid"
IMG_SIZE = (224, 224) 
BATCH_SIZE = 5000 
def load_and_preprocess_data_in_batches(data_path, output_file):
    images = []
    labels = []
    class_names = []
    classes = sorted(os.listdir(data_path))
    class_to_label = {class_name: idx for idx, class_name in enumerate(classes)}
    print(f"Processing images from {data_path} in batches...")
    batch_counter = 0
    for class_name in tqdm(classes):
        class_names.append(class_name)
        class_path = os.path.join(data_path, class_name)
        for img_name in os.listdir(class_path):
            img_path = os.path.join(class_path, img_name)
            img = cv2.imread(img_path)
            if img is None:
                continue
            img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
            img = cv2.resize(img, IMG_SIZE)  
            img = img.astype(np.float32) / 255.0  # Normalize to float32
            images.append(img)
            labels.append(class_to_label[class_name])
            if len(images) == BATCH_SIZE:
                batch_counter += 1
                batch_filename = f"{output_file}_batch_{batch_counter}.pkl"
                save_processed_data(images, labels, batch_filename)
                images.clear()
                labels.clear()
    if images:
        batch_counter += 1
        batch_filename = f"{output_file}_batch_{batch_counter}.pkl"
        save_processed_data(images, labels, batch_filename)

    print(f"Saved all batches to disk for {output_file}.")
    return class_names, class_to_label
def save_processed_data(images, labels, filename):
    data = {
        'images': images,
        'labels': labels
    }
    with open(filename, 'wb') as f:
        pickle.dump(data, f)
    print(f"Saved processed data batch to {filename}")
def main():
    train_class_names, class_to_label = load_and_preprocess_data_in_batches(TRAIN_PATH, 'train_cnn')
    valid_class_names, _ = load_and_preprocess_data_in_batches(VALID_PATH, 'ready_for_plant_valid_cnn')
    
if __name__ == "__main__":
    main()

Processing images from D:\bhachu\plant_project\Plant_Diseases\train in batches...


  5%|▌         | 2/38 [00:12<03:54,  6.52s/it]

Saved processed data batch to train_cnn_batch_1.pkl


 13%|█▎        | 5/38 [00:34<03:40,  6.68s/it]

Saved processed data batch to train_cnn_batch_2.pkl


 21%|██        | 8/38 [00:52<03:02,  6.08s/it]

Saved processed data batch to train_cnn_batch_3.pkl


 26%|██▋       | 10/38 [01:06<02:54,  6.24s/it]

Saved processed data batch to train_cnn_batch_4.pkl


 34%|███▍      | 13/38 [01:23<02:24,  5.77s/it]

Saved processed data batch to train_cnn_batch_5.pkl


 42%|████▏     | 16/38 [01:38<01:52,  5.10s/it]

Saved processed data batch to train_cnn_batch_6.pkl


 47%|████▋     | 18/38 [01:51<01:50,  5.55s/it]

Saved processed data batch to train_cnn_batch_7.pkl


 55%|█████▌    | 21/38 [02:09<01:35,  5.59s/it]

Saved processed data batch to train_cnn_batch_8.pkl


 63%|██████▎   | 24/38 [02:27<01:20,  5.78s/it]

Saved processed data batch to train_cnn_batch_9.pkl


 68%|██████▊   | 26/38 [02:43<01:19,  6.64s/it]

Saved processed data batch to train_cnn_batch_10.pkl


 76%|███████▋  | 29/38 [03:05<00:59,  6.57s/it]

Saved processed data batch to train_cnn_batch_11.pkl


 84%|████████▍ | 32/38 [03:22<00:32,  5.39s/it]

Saved processed data batch to train_cnn_batch_12.pkl


 92%|█████████▏| 35/38 [03:35<00:12,  4.24s/it]

Saved processed data batch to train_cnn_batch_13.pkl


 97%|█████████▋| 37/38 [03:45<00:04,  4.53s/it]

Saved processed data batch to train_cnn_batch_14.pkl


100%|██████████| 38/38 [03:54<00:00,  6.17s/it]


Saved processed data batch to train_cnn_batch_15.pkl
Saved all batches to disk for train_cnn.
Processing images from D:\bhachu\plant_project\Plant_Diseases\valid in batches...


 29%|██▉       | 11/38 [00:16<01:24,  3.14s/it]

Saved processed data batch to ready_for_plant_valid_cnn_batch_1.pkl


 55%|█████▌    | 21/38 [00:24<00:14,  1.18it/s]

Saved processed data batch to ready_for_plant_valid_cnn_batch_2.pkl


 84%|████████▍ | 32/38 [00:40<00:04,  1.22it/s]

Saved processed data batch to ready_for_plant_valid_cnn_batch_3.pkl


100%|██████████| 38/38 [00:53<00:00,  1.40s/it]


Saved processed data batch to ready_for_plant_valid_cnn_batch_4.pkl
Saved all batches to disk for ready_for_plant_valid_cnn.

Dataset Processing Completed.


1.Model Architecture:
`Custom CNN with three convolutional blocks.
`Filter sizes: 64, 128, and 256.
`Fully connected layers: 512 neurons and output layer for multi-class classification (38 classes).
2.Activation Functions:
`Used ReLU (Rectified Linear Unit) in all layers to introduce non-linearity.
3.Pooling and Regularization:
`MaxPooling: Applied after each convolutional layer to reduce feature map dimensions.
`Batch Normalization: Used for faster convergence and better generalization.
`Dropout: Applied (0.5 and 0.3) to prevent overfitting.
4.Loss Function:
`CrossEntropy Loss for multi-class classification.
5.Optimizer:
`AdamW with a learning rate of 0.001 and weight decay of 0.01 for adaptive learning and regularization.
6.Learning Rate Scheduler:
`ReduceLROnPlateau: Dynamically reduced the learning rate when validation loss plateaued.
7.Data Augmentation:
`Resizing to (224, 224), random horizontal flips, random rotations (10 degrees), and normalization using ImageNet statistics.
8.Training Configuration:
Epochs: 5 (changed from initial 20 to optimize training time).
Batch Size: 64 for both training and validation.
Device: Trained using CUDA (GPU) if available.
9.Training and Validation Performance:
`Separate loops for training and validation.
`Computed loss and accuracy in each epoch.
10.Accuracy:
`Best model weights saved as plant_disease_model_acc_{best_acc}.pth.
11.Performance Summary:
`The structured approach demonstrated efficient learning, optimized architecture, and best model selection based on validation accuracy.

In [None]:
#main
import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import transforms, datasets, models
from torch.utils.data import DataLoader
from tqdm import tqdm
import copy
class AdvancedPlantClassifier(nn.Module):
    def __init__(self, num_classes=38):
        super(AdvancedPlantClassifier, self).__init__()
        self.features = nn.Sequential(
            nn.Conv2d(3, 64, kernel_size=3, padding=1),
            nn.BatchNorm2d(64),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(2, 2),
            
            nn.Conv2d(64, 128, kernel_size=3, padding=1),
            nn.BatchNorm2d(128),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(2, 2),
            
            nn.Conv2d(128, 256, kernel_size=3, padding=1),
            nn.BatchNorm2d(256),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(2, 2)
        )
        
        self.classifier = nn.Sequential(
            nn.AdaptiveAvgPool2d((1, 1)),
            nn.Flatten(),
            nn.Dropout(0.5),
            nn.Linear(256, 512),
            nn.ReLU(inplace=True),
            nn.Dropout(0.3),
            nn.Linear(512, num_classes)
        )

    def forward(self, x):
        x = self.features(x)
        x = self.classifier(x)
        return x

def train_model(model, train_loader, valid_loader, criterion, optimizer, scheduler, num_epochs=5):  #5 epochs
    device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
    model.to(device)
    best_model_wts = copy.deepcopy(model.state_dict())
    best_acc = 0.0
    for epoch in range(num_epochs):
        print(f'Epoch {epoch+1}/{num_epochs}')
        
        # Training
        model.train()
        train_loss, train_corrects = 0.0, 0

        for inputs, labels in tqdm(train_loader, desc='Training'):
            inputs, labels = inputs.to(device), labels.to(device)
            
            optimizer.zero_grad()
            outputs = model(inputs)
            loss = criterion(outputs, labels)
            loss.backward()
            optimizer.step()

            _, preds = torch.max(outputs, 1)
            train_loss += loss.item() * inputs.size(0)
            train_corrects += torch.sum(preds == labels.data)

        train_loss = train_loss / len(train_loader.dataset)
        train_acc = train_corrects.double() / len(train_loader.dataset)
        print(f'Train Loss: {train_loss:.4f} Acc: {train_acc:.4f}')

        # Validation
        model.eval()
        val_loss, val_corrects = 0.0, 0

        with torch.no_grad():
            for inputs, labels in tqdm(valid_loader, desc='Validation'):
                inputs, labels = inputs.to(device), labels.to(device)
                
                outputs = model(inputs)
                loss = criterion(outputs, labels)

                _, preds = torch.max(outputs, 1)
                val_loss += loss.item() * inputs.size(0)
                val_corrects += torch.sum(preds == labels.data)

        val_loss = val_loss / len(valid_loader.dataset)
        val_acc = val_corrects.double() / len(valid_loader.dataset)
        print(f'Val Loss: {val_loss:.4f} Acc: {val_acc:.4f}')

        # Update best model
        if val_acc > best_acc:
            best_acc = val_acc
            best_model_wts = copy.deepcopy(model.state_dict())

        scheduler.step(val_loss)

    print(f'Best Validation Accuracy: {best_acc:.4f}')
    model.load_state_dict(best_model_wts)
    return model, best_acc

def main():
    # Data transformations
    data_transforms = {
        'train': transforms.Compose([
            transforms.Resize((224, 224)),
            transforms.RandomHorizontalFlip(),
            transforms.RandomRotation(10),
            transforms.ToTensor(),
            transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
        ]),
        'valid': transforms.Compose([
            transforms.Resize((224, 224)),
            transforms.ToTensor(),
            transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
        ])
    }
    train_folder_path = r"D:\bhachu\plant_project\Plant_Diseases\train"
    valid_folder_path = r"D:\bhachu\plant_project\Plant_Diseases\valid"
    train_dataset = datasets.ImageFolder(train_folder_path, transform=data_transforms['train'])
    valid_dataset = datasets.ImageFolder(valid_folder_path, transform=data_transforms['valid'])
    train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True, num_workers=0) # data  loaders 
    valid_loader = DataLoader(valid_dataset, batch_size=64, shuffle=False, num_workers=0)
    print("Classes:", train_dataset.classes)
    print("Number of classes:", len(train_dataset.classes))

    model = AdvancedPlantClassifier(num_classes=len(train_dataset.classes))
    criterion = nn.CrossEntropyLoss()
    optimizer = optim.AdamW(model.parameters(), lr=0.001, weight_decay=0.01)
    scheduler = optim.lr_scheduler.ReduceLROnPlateau(optimizer, mode='min', factor=0.5, patience=3)
    trained_model, best_acc = train_model(model, train_loader, valid_loader, criterion, optimizer, scheduler)
    model_filename = f'plant_disease_model_acc_{best_acc:.4f}.pth'
    torch.save(trained_model.state_dict(), model_filename)
    print(f"\nBest Model Saved: {model_filename}")

if __name__ == "__main__":
    main()

Classes: ['Apple___Apple_scab', 'Apple___Black_rot', 'Apple___Cedar_apple_rust', 'Apple___healthy', 'Blueberry___healthy', 'Cherry_(including_sour)___Powdery_mildew', 'Cherry_(including_sour)___healthy', 'Corn_(maize)___Cercospora_leaf_spot Gray_leaf_spot', 'Corn_(maize)___Common_rust_', 'Corn_(maize)___Northern_Leaf_Blight', 'Corn_(maize)___healthy', 'Grape___Black_rot', 'Grape___Esca_(Black_Measles)', 'Grape___Leaf_blight_(Isariopsis_Leaf_Spot)', 'Grape___healthy', 'Orange___Haunglongbing_(Citrus_greening)', 'Peach___Bacterial_spot', 'Peach___healthy', 'Pepper,_bell___Bacterial_spot', 'Pepper,_bell___healthy', 'Potato___Early_blight', 'Potato___Late_blight', 'Potato___healthy', 'Raspberry___healthy', 'Soybean___healthy', 'Squash___Powdery_mildew', 'Strawberry___Leaf_scorch', 'Strawberry___healthy', 'Tomato___Bacterial_spot', 'Tomato___Early_blight', 'Tomato___Late_blight', 'Tomato___Leaf_Mold', 'Tomato___Septoria_leaf_spot', 'Tomato___Spider_mites Two-spotted_spider_mite', 'Tomato___

Training: 100%|██████████| 1099/1099 [2:01:09<00:00,  6.61s/it] 


Train Loss: 1.4203 Acc: 0.5663


Validation: 100%|██████████| 275/275 [11:42<00:00,  2.56s/it]


Val Loss: 0.6806 Acc: 0.7855
Epoch 2/5


Training: 100%|██████████| 1099/1099 [1:59:28<00:00,  6.52s/it]


Train Loss: 0.7793 Acc: 0.7509


Validation: 100%|██████████| 275/275 [11:36<00:00,  2.53s/it]


Val Loss: 0.4491 Acc: 0.8566
Epoch 3/5


Training: 100%|██████████| 1099/1099 [2:00:27<00:00,  6.58s/it] 


Train Loss: 0.5970 Acc: 0.8073


Validation: 100%|██████████| 275/275 [11:41<00:00,  2.55s/it]


Val Loss: 0.3623 Acc: 0.8821
Epoch 4/5


Training: 100%|██████████| 1099/1099 [2:00:14<00:00,  6.56s/it] 


Train Loss: 0.4962 Acc: 0.8395


Validation: 100%|██████████| 275/275 [11:35<00:00,  2.53s/it]


Val Loss: 0.2647 Acc: 0.9146
Epoch 5/5


Training: 100%|██████████| 1099/1099 [1:58:29<00:00,  6.47s/it]


Train Loss: 0.4255 Acc: 0.8604


Validation: 100%|██████████| 275/275 [11:38<00:00,  2.54s/it]

Val Loss: 0.2041 Acc: 0.9352
Best Validation Accuracy: 0.9352

Best Model Saved: plant_disease_model_acc_0.9352.pth





In [None]:
# this points i was not break to creating the project 

Key Components:
	1.	User Interface Development:
	•	Design a Streamlit Application: Create a web interface that allows users to upload images of plant leaves (check for the type of file uploaded).
	•	Interface Usability: Ensure the application is intuitive and user-friendly, with clear instructions and feedback for users.
	2.	Image Preprocessing:
	•	Data Preparation: Implement image preprocessing steps such as resizing, normalization, and augmentation to improve model performance.
	•	Dataset Handling: Use the New Plant Diseases Dataset from the Kaggle, which contains images of plant leaves with labeled diseases.
	3.	Disease Classification:
	•	CNN Model: Develop and train a Convolutional Neural Network (CNN) model to classify plant diseases based on the uploaded images.
	•	Model Training: Utilize the dataset from Kaggle for training and testing, applying techniques such as data augmentation and transfer learning to enhance model accuracy.
Compare the performance of your model with at least 3 pretrained models. Your model should outperform the existing models
	4.	Performance and Optimization:
	•	Model Evaluation: Assess the CNN model’s performance using metrics like accuracy, precision, and recall.
	•	System Optimization: Ensure the application performs efficiently with minimal latency for real-time predictions.
	5.	Deployment and Testing:
	•	Application Deployment: Deploy the Streamlit application for accessibility by end-users.
	•	Testing: Conduct extensive testing to ensure the application correctly predicts plant diseases and handles various image inputs effectively.
