# Cassava Leaf Disease Classification Using Vision Transformer (ViT)

## Introduction

Cassava is a vital carbohydrate source for millions of households in Sub-Saharan Africa, serving as a cornerstone for food security among smallholder farmers. However, viral diseases significantly threaten cassava yields, leading to substantial economic and nutritional challenges. Traditional disease detection methods rely on visual inspections by agricultural experts, which are often labor-intensive, costly, and inaccessible to many farmers, especially those in remote areas with limited resources. To address these challenges, this project leverages advancements in data science and machine learning to develop an automated system capable of accurately classifying cassava leaf diseases from images. By utilizing a Vision Transformer (ViT) model, the system aims to provide timely and reliable disease diagnostics, empowering farmers to take proactive measures and safeguard their crops against devastating losses.


In [1]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        # Check if the file extension is not '.jpg'
        if not filename.lower().endswith('.jpg'):
            print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

/kaggle/input/cassava-leaf-disease-classification/sample_submission.csv
/kaggle/input/cassava-leaf-disease-classification/label_num_to_disease_map.json
/kaggle/input/cassava-leaf-disease-classification/train.csv
/kaggle/input/cassava-leaf-disease-classification/train_tfrecords/ld_train14-1338.tfrec
/kaggle/input/cassava-leaf-disease-classification/train_tfrecords/ld_train13-1338.tfrec
/kaggle/input/cassava-leaf-disease-classification/train_tfrecords/ld_train04-1338.tfrec
/kaggle/input/cassava-leaf-disease-classification/train_tfrecords/ld_train01-1338.tfrec
/kaggle/input/cassava-leaf-disease-classification/train_tfrecords/ld_train08-1338.tfrec
/kaggle/input/cassava-leaf-disease-classification/train_tfrecords/ld_train00-1338.tfrec
/kaggle/input/cassava-leaf-disease-classification/train_tfrecords/ld_train10-1338.tfrec
/kaggle/input/cassava-leaf-disease-classification/train_tfrecords/ld_train02-1338.tfrec
/kaggle/input/cassava-leaf-disease-classification/train_tfrecords/ld_train15-1327.tf

## Methodology

The project employs a comprehensive deep learning pipeline centered around the Vision Transformer (ViT) architecture to classify cassava leaf diseases. The key components of the methodology are outlined below:

### 1. **Configuration and Reproducibility**
   - **Configuration Class (`Config`)**: Encapsulates all hyperparameters and settings required for model training and inference, including model architecture details, training parameters, data augmentation settings, and device configurations.
   - **Seed Setting Function (`seed_everything`)**: Ensures reproducibility by setting fixed seeds across various libraries (`random`, `numpy`, `torch`) and configuring PyTorch's backend for deterministic behavior.

### 2. **Data Handling and Preprocessing**
   - **Dataset Class (`CassavaDataset`)**: Custom PyTorch `Dataset` class responsible for loading and preprocessing images. It handles image path construction, loading images using OpenCV, applying transformations, and managing labels.
   - **Data Augmentation (`ViTDataTransforms`)**: Utilizes the Albumentations library to apply a series of augmentations tailored for Vision Transformers. Training transformations include random resizing, cropping, flipping, rotation, color adjustments, dropout, and normalization to enhance model generalization.

### 3. **Model Architecture**
   - **Vision Transformer Model (`CassavaViT`)**: Implements the Vision Transformer (ViT) using the `timm` library. The model is initialized with pre-trained weights and configured to output logits corresponding to the five disease categories (four diseases and one healthy class).

### 4. **Training Strategy**
   - **Trainer Class (`ViTTrainer`)**: Manages the training and validation processes. It sets up the loss function (`CrossEntropyLoss`), optimizer (`AdamW`), and learning rate scheduler (`CosineAnnealingLR`). The trainer also handles mixed-precision training using PyTorch's Automatic Mixed Precision (AMP) to accelerate computations and reduce memory usage.
   - **Cross-Validation (`StratifiedKFold`)**: Implements Stratified K-Fold cross-validation to ensure each fold maintains the same class distribution as the entire dataset, enhancing the robustness of the model evaluation.
   - **Training Loop (`train_model`)**: Orchestrates the training across specified folds and epochs. For each fold, it initializes the datasets and dataloaders, trains the model for the defined number of epochs, validates performance, and saves the best-performing model checkpoints based on validation loss.

### 5. **Error Handling and Optimization**
   - **Robust Image Loading**: The `CassavaDataset` class includes error handling to manage missing or corrupted images by attempting alternative file extensions and substituting with blank images if necessary.
   - **Memory Management**: After training each fold, the model and trainer instances are deleted, and CUDA caches are cleared to optimize memory usage, especially when training on GPUs with limited resources.


In [2]:
import os
import json
import random
from pathlib import Path
from typing import Optional, Tuple, Dict

import numpy as np
import pandas as pd
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import Dataset, DataLoader
import timm
import cv2
from sklearn.model_selection import StratifiedKFold
from torch.cuda.amp import autocast, GradScaler
from tqdm import tqdm

from albumentations import (
    Compose, RandomResizedCrop, Transpose, HorizontalFlip, VerticalFlip,
    ShiftScaleRotate, HueSaturationValue, RandomBrightnessContrast,
    CoarseDropout, Normalize, CenterCrop, Resize
)
from albumentations.pytorch import ToTensorV2

# ---------------------------------------------------
# Define paths for data and outputs
# ---------------------------------------------------
PATHS = {
    'TRAIN_CSV': '/kaggle/input/cassava-leaf-disease-classification/train.csv',  # Path to training CSV
    'TEST_CSV': '/kaggle/input/cassava-leaf-disease-classification/sample_submission.csv',  # Path to test submission CSV
    'DISEASE_MAP': '/kaggle/input/cassava-leaf-disease-classification/label_num_to_disease_map.json',  # Path to disease mapping JSON
    'TRAIN_IMAGES': '/kaggle/input/cassava-leaf-disease-classification/train_images',  # Directory containing training images
    'TEST_IMAGES': '/kaggle/input/cassava-leaf-disease-classification/test_images',  # Directory containing test images
    'OUTPUT': '/kaggle/working/submission.csv',  # Path to save the final submission
    'WEIGHTS': '/kaggle/working/weights'  # Directory to save model weights
}

class Config:
    """Configuration class for ViT model training and inference."""
    
    def __init__(self):
        # Check CUDA availability
        self.device_type = 'cuda' if torch.cuda.is_available() else 'cpu'  # Device type
        self.device = torch.device(self.device_type)  # PyTorch device
        
        # Model Configuration
        self.model_name: str = 'vit_base_patch16_384'  # Vision Transformer model name
        self.image_size: int = 384  # Input image size
        self.patch_size: int = 16  # Patch size for ViT
        self.hidden_size: int = 768  # Hidden size of the transformer
        self.num_heads: int = 12  # Number of attention heads
        self.num_layers: int = 12  # Number of transformer layers
        self.pretrained: bool = True  # Whether to use pretrained weights
        
        # Training Configuration
        self.seed: int = 719  # Random seed for reproducibility
        self.num_epochs: int = 10  # Number of training epochs
        self.train_batch_size: int = 8 if self.device_type == 'cuda' else 4  # Training batch size based on device
        self.valid_batch_size: int = 16 if self.device_type == 'cuda' else 8  # Validation batch size based on device
        self.learning_rate: float = 1e-4  # Learning rate for optimizer
        self.weight_decay: float = 0.01  # Weight decay for optimizer
        self.num_workers: int = 4 if self.device_type == 'cuda' else 2  # Number of workers for data loading
        self.grad_accum_steps: int = 2  # Gradient accumulation steps
        
        # Mixed Precision
        self.fp16: bool = self.device_type == 'cuda'  # Use mixed precision only if CUDA is available
        
        # Cross Validation
        self.num_folds: int = 5  # Number of cross-validation folds
        self.tta_steps: int = 3  # Number of Test Time Augmentation steps
        self.used_epochs: list = [7, 8, 9]  # Epochs to use for inference
        self.used_folds: list = [0, 2, 3]  # Folds to use for inference
        
        # Normalization
        self.mean: list = [0.485, 0.456, 0.406]  # Mean values for normalization (ImageNet)
        self.std: list = [0.229, 0.224, 0.225]  # Standard deviation values for normalization (ImageNet)

class CassavaDataset(Dataset):
    """Dataset class for Cassava Leaf Disease Classification."""
    
    def __init__(
        self,
        df: pd.DataFrame,
        data_root: str,
        transforms: Optional[Compose] = None,
        output_label: bool = True
    ):
        super().__init__()
        self.df = df.reset_index(drop=True)  # Reset DataFrame index for consistency
        self.transforms = transforms         # Data augmentation and preprocessing transforms
        self.data_root = Path(data_root)     # Root directory for image data
        self.output_label = output_label     # Flag to determine if labels are returned
    
    def __len__(self) -> int:
        return len(self.df)  # Return the total number of samples
    
    def __getitem__(self, index: int) -> Tuple[torch.Tensor, Optional[int]]:
        # Get image ID from the DataFrame
        image_id = self.df.iloc[index]['image_id']
        
        # Ensure image has extension
        if not image_id.lower().endswith(('.jpg', '.jpeg', '.png')):
            image_id = f"{image_id}.jpg"  # Add .jpg extension if missing
        
        # Construct image path
        image_path = self.data_root / image_id
        
        try:
            image = self._load_image(str(image_path))  # Attempt to load the image
        except Exception as e:
            print(f"Error loading image {image_path}: {str(e)}")  # Print error message
            # Return a blank image in case of error
            image = np.zeros((384, 384, 3), dtype=np.uint8)
        
        if self.transforms:
            image = self.transforms(image=image)['image']  # Apply transformations if any
        
        if self.output_label:
            target = self.df.iloc[index]['label']  # Get label if output_label is True
            return image, target  # Return image and label
        return image  # Return only image for inference
    
    @staticmethod
    def _load_image(path: str) -> np.ndarray:
        """Load and convert BGR image to RGB."""
        image = cv2.imread(path)  # Read image using OpenCV
        if image is None:
            raise ValueError(f"Failed to load image at {path}")  # Raise error if image is not found
        return cv2.cvtColor(image, cv2.COLOR_BGR2RGB)            # Convert BGR to RGB format

def print_dataset_info(train_df: pd.DataFrame) -> None:
    """Print dataset information."""
    print(f"Total training samples: {len(train_df)}")      # Print total number of training samples
    print("\nLabel distribution:")                         # Header for label distribution
    print(train_df['label'].value_counts(normalize=True))  # Print normalized label counts
    print("\nSample image IDs:")                           # Header for sample image IDs
    print(train_df['image_id'].head())                     # Print first few image IDs

def seed_everything(seed: int) -> None:
    """Set random seeds for reproducibility."""
    random.seed(seed)                          # Set Python random seed
    os.environ['PYTHONHASHSEED'] = str(seed)   # Set environment variable for Python hash seed
    np.random.seed(seed)                       # Set NumPy random seed
    torch.manual_seed(seed)                    # Set PyTorch random seed
    torch.cuda.manual_seed(seed)               # Set CUDA random seed
    torch.backends.cudnn.deterministic = True  # Ensure deterministic behavior
    torch.backends.cudnn.benchmark = True      # Enable benchmarking for performance

class CassavaViT(nn.Module):
    """Vision Transformer model for Cassava disease classification."""
    
    def __init__(
        self,
        num_classes: int,
        pretrained: bool = True,
        model_name: str = 'vit_base_patch16_384'
    ):
        super().__init__()
        self.model = timm.create_model(
            model_name,
            pretrained=pretrained,   # Use pretrained weights if True
            num_classes=num_classes  # Set number of output classes
        )
    
    def forward(self, x):
        return self.model(x)  # Forward pass through the ViT model

class ViTDataTransforms:
    """Data augmentation and preprocessing transforms optimized for ViT."""
    
    @staticmethod
    def get_train_transforms(config: Config) -> Compose:
        """Return training data augmentation transforms."""
        return Compose([
            RandomResizedCrop(
                height=config.image_size,
                width=config.image_size,
                scale=(0.8, 1.0)
            ),  # Randomly crop and resize image
            Transpose(p=0.5),       # Randomly transpose image dimensions
            HorizontalFlip(p=0.5),  # Random horizontal flip
            VerticalFlip(p=0.5),    # Random vertical flip
            ShiftScaleRotate(
                shift_limit=0.2,
                scale_limit=0.2,
                rotate_limit=30,
                p=0.5
            ),  # Random shift, scale, and rotation
            HueSaturationValue(
                hue_shift_limit=20,
                sat_shift_limit=30,
                val_shift_limit=20,
                p=0.5
            ),  # Randomly change hue, saturation, and value
            RandomBrightnessContrast(
                brightness_limit=0.2,
                contrast_limit=0.2,
                p=0.5
            ),  # Random brightness and contrast adjustments
            Normalize(
                mean=config.mean,
                std=config.std,
                max_pixel_value=255.0,
                p=1.0
            ),  # Normalize image with mean and std
            CoarseDropout(
                max_holes=8,
                max_height=config.image_size // 16,
                max_width=config.image_size // 16,
                min_holes=5,
                min_height=config.image_size // 32,
                min_width=config.image_size // 32,
                fill_value=0,
                p=0.5
            ),  # Randomly drop large regions of the image
            ToTensorV2(p=1.0),  # Convert image to PyTorch tensor
        ], p=1.)
    
    @staticmethod
    def get_valid_transforms(config: Config) -> Compose:
        """Return validation data preprocessing transforms."""
        return Compose([
            Resize(config.image_size, config.image_size),  # Resize image to desired size
            Normalize(
                mean=config.mean,
                std=config.std,
                max_pixel_value=255.0,
                p=1.0
            ),  # Normalize image with mean and std
            ToTensorV2(p=1.0),  # Convert image to PyTorch tensor
        ], p=1.)

class ViTTrainer:
    """Trainer class optimized for Vision Transformer."""
    
    def __init__(self, model: nn.Module, config: Config):
        self.model = model                             # Assign the model to an instance variable
        self.config = config                           # Store the configuration parameters
        self.criterion = nn.CrossEntropyLoss()         # Define the loss function (Cross-Entropy Loss)
        
        # Optimizer with weight decay
        self.optimizer = optim.AdamW(
            model.parameters(),
            lr=config.learning_rate,                   # Set learning rate from config
            weight_decay=config.weight_decay           # Apply weight decay regularization
        )
        
        # Cosine annealing scheduler
        self.scheduler = optim.lr_scheduler.CosineAnnealingLR(
            self.optimizer,
            T_max=config.num_epochs,                   # Maximum number of iterations (epochs)
            eta_min=1e-6                               # Minimum learning rate after decay
        )
        
        # Initialize GradScaler for mixed precision training if CUDA is available
        self.scaler = torch.cuda.amp.GradScaler() if config.fp16 else None
                                                     # Use GradScaler for mixed precision if fp16 is enabled
    
    def train_epoch(self, train_loader: DataLoader, device: torch.device) -> float:
        """Train the model for one epoch."""
        self.model.train()                                     # Set the model to training mode
        total_loss = 0                                         # Initialize total loss for the epoch
        
        with tqdm(train_loader, desc='Training') as pbar:      # Create a progress bar for the training loop
            for batch_idx, (images, targets) in enumerate(pbar):
                images = images.to(device)                     # Move images to the specified device (GPU or CPU)
                targets = targets.to(device)                   # Move targets to the device
                
                # Mixed precision training if enabled
                if self.config.fp16:
                    with torch.cuda.amp.autocast():                 # Enable autocasting for mixed precision
                        outputs = self.model(images)                # Forward pass through the model
                        loss = self.criterion(outputs, targets)     # Compute loss between outputs and targets
                        loss = loss / self.config.grad_accum_steps  # Normalize loss for gradient accumulation
                    
                    self.scaler.scale(loss).backward()         # Backward pass with scaled loss for mixed precision
                    
                    if (batch_idx + 1) % self.config.grad_accum_steps == 0:
                        self.scaler.step(self.optimizer)       # Update model parameters
                        self.scaler.update()                   # Update the scaler for next iteration
                        self.optimizer.zero_grad()             # Reset gradients
                else:
                    outputs = self.model(images)               # Forward pass through the model
                    loss = self.criterion(outputs, targets)    # Compute loss
                    loss = loss / self.config.grad_accum_steps # Normalize loss for gradient accumulation
                    
                    loss.backward()                            # Backward pass
                    
                    if (batch_idx + 1) % self.config.grad_accum_steps == 0:
                        self.optimizer.step()                  # Update model parameters
                        self.optimizer.zero_grad()             # Reset gradients
                
                total_loss += loss.item() * self.config.grad_accum_steps  # Accumulate total loss
                pbar.set_postfix({'loss': loss.item() * self.config.grad_accum_steps})  # Update progress bar with current loss
        
        self.scheduler.step()                                  # Update learning rate scheduler
        return total_loss / len(train_loader)                  # Return average loss for the epoch
    
    @torch.no_grad()
    def validate(self, valid_loader: DataLoader, device: torch.device) -> Tuple[float, float]:
        """Evaluate the model on the validation set."""
        self.model.eval()                                      # Set the model to evaluation mode
        total_loss = 0                                         # Initialize total loss for the validation
        predictions = []                                       # List to store predicted labels
        targets = []                                           # List to store true labels
        
        with tqdm(valid_loader, desc='Validating') as pbar:    # Create a progress bar for the validation loop
            for images, batch_targets in pbar:
                images = images.to(device)                     # Move images to the specified device
                batch_targets = batch_targets.to(device)       # Move targets to the device
                
                if self.config.fp16:
                    with torch.cuda.amp.autocast():                    # Enable autocasting for mixed precision
                        outputs = self.model(images)                   # Forward pass through the model
                        loss = self.criterion(outputs, batch_targets)  # Compute loss
                else:
                    outputs = self.model(images)                   # Forward pass through the model
                    loss = self.criterion(outputs, batch_targets)  # Compute loss
                
                total_loss += loss.item()                          # Accumulate total loss
                predictions.extend(torch.argmax(outputs, dim=1).cpu().numpy())  # Store predicted labels
                targets.extend(batch_targets.cpu().numpy())        # Store true labels
                
                pbar.set_postfix({'loss': loss.item()})            # Update progress bar with current loss
        
        accuracy = np.mean(np.array(predictions) == np.array(targets))  # Calculate accuracy
        return total_loss / len(valid_loader), accuracy            # Return average loss and accuracy

def train_model(config: Config, train_df: pd.DataFrame):
    """Train the ViT model using cross-validation."""
    print(f"Using device: {config.device_type}")  # Inform about the device being used
    print(f"Mixed precision training: {'enabled' if config.fp16 else 'disabled'}")  # Inform about mixed precision
    
    # Create Stratified K-Fold splits to maintain class distribution
    skf = StratifiedKFold(
        n_splits=config.num_folds,
        shuffle=True,
        random_state=config.seed
    )
    
    for fold, (train_idx, valid_idx) in enumerate(skf.split(train_df, train_df.label)):
        if fold not in config.used_folds:
            continue  # Skip folds that are not used for inference
        
        print(f'Training fold {fold}')  # Inform about the current fold being trained
        
        # Split data into training and validation sets based on indices
        train_data = train_df.iloc[train_idx].reset_index(drop=True)
        valid_data = train_df.iloc[valid_idx].reset_index(drop=True)
        
        # Create training dataset with data augmentation
        train_dataset = CassavaDataset(
            train_data,
            PATHS['TRAIN_IMAGES'],
            transforms=ViTDataTransforms.get_train_transforms(config)
        )
        
        # Create validation dataset without data augmentation
        valid_dataset = CassavaDataset(
            valid_data,
            PATHS['TRAIN_IMAGES'],
            transforms=ViTDataTransforms.get_valid_transforms(config)
        )
        
        # Create DataLoader for training data
        train_loader = DataLoader(
            train_dataset,
            batch_size=config.train_batch_size,
            shuffle=True,  # Shuffle training data
            num_workers=config.num_workers,
            pin_memory=True if config.device_type == 'cuda' else False  # Enable pin memory for CUDA
        )
        
        # Create DataLoader for validation data
        valid_loader = DataLoader(
            valid_dataset,
            batch_size=config.valid_batch_size,
            shuffle=False,  # Do not shuffle validation data
            num_workers=config.num_workers,
            pin_memory=True if config.device_type == 'cuda' else False  # Enable pin memory for CUDA
        )
        
        # Initialize the Vision Transformer model and move it to the device
        model = CassavaViT(
            num_classes=train_df.label.nunique(),  # Number of unique classes
            pretrained=config.pretrained,  # Use pretrained weights
            model_name=config.model_name  # Specify model architecture
        ).to(config.device)
        
        trainer = ViTTrainer(model, config)  # Initialize the trainer
        best_loss = float('inf')  # Initialize best loss for checkpointing
        
        for epoch in range(config.num_epochs):
            print(f'Epoch {epoch + 1}/{config.num_epochs}')  # Inform about the current epoch
            
            train_loss = trainer.train_epoch(train_loader, config.device)  # Train for one epoch
            valid_loss, accuracy = trainer.validate(valid_loader, config.device)  # Validate the model
            
            print(f'Train Loss: {train_loss:.4f}')  # Print training loss
            print(f'Valid Loss: {valid_loss:.4f}, Accuracy: {accuracy:.4f}')  # Print validation loss and accuracy
            
            if valid_loss < best_loss:
                best_loss = valid_loss  # Update best loss if current validation loss is lower
                checkpoint_path = os.path.join(
                    PATHS['WEIGHTS'],
                    f'{config.model_name}_fold_{fold}_{epoch}'
                )  # Define checkpoint path
                torch.save(model.state_dict(), checkpoint_path)  # Save model weights
                print(f'Saved checkpoint: {checkpoint_path}')  # Inform about saved checkpoint
        
        del model, trainer  # Delete model and trainer to free memory
        if config.device_type == 'cuda':
            torch.cuda.empty_cache()  # Clear CUDA cache to free up GPU memory

def main():
    """Main training function."""
    # Create weights directory if it doesn't exist
    os.makedirs(PATHS['WEIGHTS'], exist_ok=True)
    
    config = Config()  # Initialize configuration
    seed_everything(config.seed)  # Set random seeds for reproducibility
    
    # Load training data from CSV
    train_df = pd.read_csv(PATHS['TRAIN_CSV'])
    print(f"Training data shape: {train_df.shape}")  # Print shape of training data
    
    # Print dataset information
    print_dataset_info(train_df)  # Display dataset statistics
    
    # Train the Vision Transformer model using cross-validation
    train_model(config, train_df)
    
    print("Training completed successfully!")  # Inform that training is complete

if __name__ == '__main__':
    main()  # Execute the main function

  check_for_updates()


Training data shape: (21397, 2)
Total training samples: 21397

Label distribution:
label
3    0.614946
4    0.120437
2    0.111511
1    0.102304
0    0.050802
Name: proportion, dtype: float64

Sample image IDs:
0    1000015157.jpg
1    1000201771.jpg
2     100042118.jpg
3    1000723321.jpg
4    1000812911.jpg
Name: image_id, dtype: object
Using device: cuda
Mixed precision training: enabled
Training fold 0


model.safetensors:   0%|          | 0.00/347M [00:00<?, ?B/s]

  self.scaler = torch.cuda.amp.GradScaler() if config.fp16 else None


Epoch 1/10


  with torch.cuda.amp.autocast():                 # Enable autocasting for mixed precision
Training: 100%|██████████| 2140/2140 [16:36<00:00,  2.15it/s, loss=0.307]
  with torch.cuda.amp.autocast():                    # Enable autocasting for mixed precision
Validating: 100%|██████████| 268/268 [01:18<00:00,  3.43it/s, loss=0.81]


Train Loss: 0.8871
Valid Loss: 0.6046, Accuracy: 0.7757
Saved checkpoint: /kaggle/working/weights/vit_base_patch16_384_fold_0_0
Epoch 2/10


Training: 100%|██████████| 2140/2140 [16:35<00:00,  2.15it/s, loss=1.02]
Validating: 100%|██████████| 268/268 [01:17<00:00,  3.44it/s, loss=0.703]


Train Loss: 0.6102
Valid Loss: 0.5364, Accuracy: 0.8065
Saved checkpoint: /kaggle/working/weights/vit_base_patch16_384_fold_0_1
Epoch 3/10


Training: 100%|██████████| 2140/2140 [16:35<00:00,  2.15it/s, loss=0.756]
Validating: 100%|██████████| 268/268 [01:18<00:00,  3.43it/s, loss=0.837]


Train Loss: 0.5499
Valid Loss: 0.4915, Accuracy: 0.8201
Saved checkpoint: /kaggle/working/weights/vit_base_patch16_384_fold_0_2
Epoch 4/10


Training: 100%|██████████| 2140/2140 [16:37<00:00,  2.15it/s, loss=1.18]
Validating: 100%|██████████| 268/268 [01:17<00:00,  3.44it/s, loss=0.72]


Train Loss: 0.5042
Valid Loss: 0.4829, Accuracy: 0.8285
Saved checkpoint: /kaggle/working/weights/vit_base_patch16_384_fold_0_3
Epoch 5/10


Training: 100%|██████████| 2140/2140 [16:38<00:00,  2.14it/s, loss=0.123]
Validating: 100%|██████████| 268/268 [01:18<00:00,  3.43it/s, loss=0.724]


Train Loss: 0.4661
Valid Loss: 0.4590, Accuracy: 0.8280
Saved checkpoint: /kaggle/working/weights/vit_base_patch16_384_fold_0_4
Epoch 6/10


Training: 100%|██████████| 2140/2140 [16:39<00:00,  2.14it/s, loss=0.0466]
Validating: 100%|██████████| 268/268 [01:17<00:00,  3.44it/s, loss=0.411]


Train Loss: 0.4209
Valid Loss: 0.4122, Accuracy: 0.8533
Saved checkpoint: /kaggle/working/weights/vit_base_patch16_384_fold_0_5
Epoch 7/10


Training: 100%|██████████| 2140/2140 [16:43<00:00,  2.13it/s, loss=0.229]
Validating: 100%|██████████| 268/268 [01:18<00:00,  3.43it/s, loss=0.351]


Train Loss: 0.3844
Valid Loss: 0.4116, Accuracy: 0.8542
Saved checkpoint: /kaggle/working/weights/vit_base_patch16_384_fold_0_6
Epoch 8/10


Training: 100%|██████████| 2140/2140 [16:45<00:00,  2.13it/s, loss=0.0401]
Validating: 100%|██████████| 268/268 [01:17<00:00,  3.44it/s, loss=0.298]


Train Loss: 0.3463
Valid Loss: 0.3726, Accuracy: 0.8650
Saved checkpoint: /kaggle/working/weights/vit_base_patch16_384_fold_0_7
Epoch 9/10


Training: 100%|██████████| 2140/2140 [16:47<00:00,  2.12it/s, loss=0.88]
Validating: 100%|██████████| 268/268 [01:18<00:00,  3.43it/s, loss=0.491]


Train Loss: 0.3108
Valid Loss: 0.3664, Accuracy: 0.8724
Saved checkpoint: /kaggle/working/weights/vit_base_patch16_384_fold_0_8
Epoch 10/10


Training: 100%|██████████| 2140/2140 [16:48<00:00,  2.12it/s, loss=0.15]
Validating: 100%|██████████| 268/268 [01:18<00:00,  3.43it/s, loss=0.369]


Train Loss: 0.2888
Valid Loss: 0.3635, Accuracy: 0.8750
Saved checkpoint: /kaggle/working/weights/vit_base_patch16_384_fold_0_9
Training fold 2
Epoch 1/10


Training: 100%|██████████| 2140/2140 [16:48<00:00,  2.12it/s, loss=0.536]
Validating: 100%|██████████| 268/268 [01:18<00:00,  3.43it/s, loss=0.531]


Train Loss: 0.8187
Valid Loss: 0.4760, Accuracy: 0.8343
Saved checkpoint: /kaggle/working/weights/vit_base_patch16_384_fold_2_0
Epoch 2/10


Training: 100%|██████████| 2140/2140 [16:49<00:00,  2.12it/s, loss=0.599]
Validating: 100%|██████████| 268/268 [01:18<00:00,  3.43it/s, loss=0.406]


Train Loss: 0.5752
Valid Loss: 0.5081, Accuracy: 0.8147
Epoch 3/10


Training: 100%|██████████| 2140/2140 [16:49<00:00,  2.12it/s, loss=0.827]
Validating: 100%|██████████| 268/268 [01:18<00:00,  3.43it/s, loss=0.971]


Train Loss: 0.5351
Valid Loss: 0.4407, Accuracy: 0.8427
Saved checkpoint: /kaggle/working/weights/vit_base_patch16_384_fold_2_2
Epoch 4/10


Training: 100%|██████████| 2140/2140 [16:50<00:00,  2.12it/s, loss=0.279]
Validating: 100%|██████████| 268/268 [01:18<00:00,  3.43it/s, loss=0.648]


Train Loss: 0.4855
Valid Loss: 0.4669, Accuracy: 0.8362
Epoch 5/10


Training: 100%|██████████| 2140/2140 [16:50<00:00,  2.12it/s, loss=0.114]
Validating: 100%|██████████| 268/268 [01:18<00:00,  3.43it/s, loss=1.02]


Train Loss: 0.4421
Valid Loss: 0.4980, Accuracy: 0.8331
Epoch 6/10


Training: 100%|██████████| 2140/2140 [16:50<00:00,  2.12it/s, loss=0.494]
Validating: 100%|██████████| 268/268 [01:18<00:00,  3.43it/s, loss=0.818]


Train Loss: 0.4111
Valid Loss: 0.4278, Accuracy: 0.8530
Saved checkpoint: /kaggle/working/weights/vit_base_patch16_384_fold_2_5
Epoch 7/10


Training: 100%|██████████| 2140/2140 [16:50<00:00,  2.12it/s, loss=0.184]
Validating: 100%|██████████| 268/268 [01:18<00:00,  3.43it/s, loss=1.03]


Train Loss: 0.3687
Valid Loss: 0.3728, Accuracy: 0.8659
Saved checkpoint: /kaggle/working/weights/vit_base_patch16_384_fold_2_6
Epoch 8/10


Training: 100%|██████████| 2140/2140 [16:50<00:00,  2.12it/s, loss=0.112]
Validating: 100%|██████████| 268/268 [01:18<00:00,  3.43it/s, loss=0.766]


Train Loss: 0.3233
Valid Loss: 0.3733, Accuracy: 0.8738
Epoch 9/10


Training: 100%|██████████| 2140/2140 [16:50<00:00,  2.12it/s, loss=0.571]
Validating: 100%|██████████| 268/268 [01:18<00:00,  3.43it/s, loss=1.06]


Train Loss: 0.2908
Valid Loss: 0.3603, Accuracy: 0.8771
Saved checkpoint: /kaggle/working/weights/vit_base_patch16_384_fold_2_8
Epoch 10/10


Training: 100%|██████████| 2140/2140 [16:50<00:00,  2.12it/s, loss=0.136]
Validating: 100%|██████████| 268/268 [01:18<00:00,  3.43it/s, loss=1.36]


Train Loss: 0.2660
Valid Loss: 0.3701, Accuracy: 0.8766
Training fold 3
Epoch 1/10


Training: 100%|██████████| 2140/2140 [16:50<00:00,  2.12it/s, loss=0.344]
Validating: 100%|██████████| 268/268 [01:18<00:00,  3.43it/s, loss=0.952]


Train Loss: 0.8228
Valid Loss: 0.5014, Accuracy: 0.8163
Saved checkpoint: /kaggle/working/weights/vit_base_patch16_384_fold_3_0
Epoch 2/10


Training: 100%|██████████| 2140/2140 [16:50<00:00,  2.12it/s, loss=0.0178]
Validating: 100%|██████████| 268/268 [01:18<00:00,  3.43it/s, loss=0.339]


Train Loss: 0.6004
Valid Loss: 0.5017, Accuracy: 0.8186
Epoch 3/10


Training: 100%|██████████| 2140/2140 [16:50<00:00,  2.12it/s, loss=2.06]
Validating: 100%|██████████| 268/268 [01:18<00:00,  3.43it/s, loss=0.677]


Train Loss: 0.5351
Valid Loss: 0.4756, Accuracy: 0.8343
Saved checkpoint: /kaggle/working/weights/vit_base_patch16_384_fold_3_2
Epoch 4/10


Training: 100%|██████████| 2140/2140 [16:50<00:00,  2.12it/s, loss=0.535]
Validating: 100%|██████████| 268/268 [01:18<00:00,  3.43it/s, loss=0.463]


Train Loss: 0.5036
Valid Loss: 0.4839, Accuracy: 0.8294
Epoch 5/10


Training: 100%|██████████| 2140/2140 [16:50<00:00,  2.12it/s, loss=0.0871]
Validating: 100%|██████████| 268/268 [01:18<00:00,  3.43it/s, loss=0.55]


Train Loss: 0.4617
Valid Loss: 0.4417, Accuracy: 0.8444
Saved checkpoint: /kaggle/working/weights/vit_base_patch16_384_fold_3_4
Epoch 6/10


Training: 100%|██████████| 2140/2140 [16:50<00:00,  2.12it/s, loss=0.209]
Validating: 100%|██████████| 268/268 [01:18<00:00,  3.43it/s, loss=0.538]


Train Loss: 0.4165
Valid Loss: 0.3962, Accuracy: 0.8607
Saved checkpoint: /kaggle/working/weights/vit_base_patch16_384_fold_3_5
Epoch 7/10


Training: 100%|██████████| 2140/2140 [16:51<00:00,  2.12it/s, loss=0.587]
Validating: 100%|██████████| 268/268 [01:18<00:00,  3.43it/s, loss=0.487]


Train Loss: 0.3823
Valid Loss: 0.3806, Accuracy: 0.8698
Saved checkpoint: /kaggle/working/weights/vit_base_patch16_384_fold_3_6
Epoch 8/10


Training: 100%|██████████| 2140/2140 [16:50<00:00,  2.12it/s, loss=0.173]
Validating: 100%|██████████| 268/268 [01:18<00:00,  3.43it/s, loss=0.444]


Train Loss: 0.3406
Valid Loss: 0.3739, Accuracy: 0.8712
Saved checkpoint: /kaggle/working/weights/vit_base_patch16_384_fold_3_7
Epoch 9/10


Training: 100%|██████████| 2140/2140 [16:50<00:00,  2.12it/s, loss=0.791]
Validating: 100%|██████████| 268/268 [01:18<00:00,  3.42it/s, loss=0.468]


Train Loss: 0.3053
Valid Loss: 0.3567, Accuracy: 0.8766
Saved checkpoint: /kaggle/working/weights/vit_base_patch16_384_fold_3_8
Epoch 10/10


Training: 100%|██████████| 2140/2140 [16:50<00:00,  2.12it/s, loss=0.259]
Validating: 100%|██████████| 268/268 [01:18<00:00,  3.43it/s, loss=0.247]


Train Loss: 0.2846
Valid Loss: 0.3530, Accuracy: 0.8839
Saved checkpoint: /kaggle/working/weights/vit_base_patch16_384_fold_3_9
Training completed successfully!


## Conclusion

This project demonstrates the effective application of Vision Transformers (ViT) in addressing the critical task of Cassava Leaf Disease Classification. By leveraging a pre-trained ViT model, robust data augmentation strategies, and a meticulous training regimen incorporating cross-validation and mixed-precision training, the system achieves high accuracy in distinguishing between different cassava diseases and identifying healthy leaves. The automated detection system offers a scalable and accessible solution for farmers, enabling timely interventions to prevent crop losses and enhance agricultural productivity. Future work may explore integrating additional data sources, refining model architectures, and deploying the model as a mobile application to facilitate real-time disease diagnostics in resource-constrained environments.
