# 🍽️ YOLOv8 - Food Classification (Using Our First Pretrained Model)

**YOLOv8** is an advanced CNN-based model used for object detection and classification.  
We'll use its **classification head** (`yolov8n-cls.pt`), pretrained on ImageNet,  
to fine-tune on a **subset of the Food-101 dataset** with 5 classes for high-speed, high-accuracy food recognition.

### 🎯 Goal  
Build a production-grade food classification system using YOLOv8’s classification head.  
We’ll fine-tune it on the following **5 food classes**:
- pizza  
- grilled_chicken  
- sushi  
- ice_cream  
- hamburger  

We'll use **advanced training techniques** to boost performance.

### ⚙️ Techniques Used  
- **Albumentations**: Advanced image augmentation for better generalization  
- **Cosine Annealing LR**: Smooth learning rate decay  
- **Model Ensembling** *(optional)*: Combines predictions for higher accuracy  


### 📁 Directory Structure  
```
📦 Project Root
├── Food_Classification_YOLOv8_Module12.ipynb
├── generate_yaml.py
├── food-101/
│   ├── images/
│   │   ├── pizza/
│   │   ├── grilled_chicken/
│   │   ├── sushi/
│   │   ├── ice_cream/
│   │   ├── hamburger/
│   ├── meta/
│   └── food101.yaml
```

### 📦 Install Requirements  
Make sure to install all dependencies before running the code:
```bash
pip install torch==2.4.1 torchvision==0.19.1
pip install opencv-python-headless==4.10.0
pip install ultralytics==8.2.28
pip install albumentations==1.4.8
```

### 🔧 Basic Setup (with PyTorch + YOLOv8)

In [None]:
# I generally use PyTorch so I can understand exactly what’s happening

# Import libraries
import torch
import torch.nn as nn
import torch.optim as optim
from torch.optim.lr_scheduler import CosineAnnealingLR
import torchvision.transforms as transforms
import torchvision.datasets as datasets
from torch.utils.data import DataLoader, Dataset, Subset
import cv2
import numpy as np
import logging
import os
from ultralytics import YOLO
import albumentations as A
from albumentations.pytorch import ToTensorV2
from typing import List, Dict, Optional, Tuple


* Logging to track executions and show some outputs in logs 

In [None]:
# logging setup  to track executions and errors

logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s',
                    handlers = [
                        logging.StreamHandler(), #to show logs in console 
                        logging.FileHandler('yolo_training.log') # save logs to a file
                    ])

logger = logging.getLogger(__name__)

# basic config for any project 
Config Class - stores the project setting.


* data_dir: ./food-101 dataset ka path.



* classes: 5 classes ki list.



* max_images_per_class: 100 images per class for training.



* test_images_per_class: 20 images per class for testing.



* epochs, batch_size, img_size: Training parameters.



* model_path_nano, model_path_small: Nano aur small models ke save paths.



* yaml_path: YAML file ka path.



* device: GPU ya CPU.



* lr: Initial learning rate for Cosine Annealing

In [None]:
# configurations class for project settings

class Config:
    def __init__(self):
        self.data_dir = "./foof/food-101"
        self.classes = ['pizza','grilled_chicken','sushi','ice_cream','hamburger']
        self.max_images_per_class = 100
        self.test_images_per_class = 20
        self.epochs = 18
        self.batch_size = 5 # batch size optimized for gpu 
        self.img_size = 224 # image size for yolo v8 
        self.mode_path_nano = "yolov8n_food_classifier.pt" #nano model save path 
        self.model_path_small = "yolov8s_food_classifier.pt" #small model save path 
        self.yaml_path = os.path.join(self.data_dir, 'food101.yaml')
        self.device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
        self.lr = 0.001
        logging.info(f"Using device : {self.device}")


# * Custom dataset class to load f101 dataset 

* FoodDataset Class

 What is it?:  
 A custom dataset class built to load images and labels from the Food-101 dataset.

 * Key Functions:

1. "__init__":  
 Initializes image paths, labels, and sets up image transformations using Albumentations.

 2. "__len__":  
 Returns the total number of images in the dataset.

 3. "__getitem__":  
 Loads an image using cv2, converts it to RGB format, applies transformations, and returns the image along with its label.

 * Error Handling:  
 If any image fails to load (e.g., file not found), it logs the error and returns None instead of crashing the program.

 * Why is it Different?:  
 Uses Albumentations instead of traditional torchvision transforms — it's more powerful and widely used in real-world, production-level image pipelines.


In [None]:

class FoodDataset(Dataset):
    def __init__(self, image_paths: List[str] , labels: List[int] , transform: Optional[A.Compose]=None):
        self.image_paths = image_paths
        self.labels = labels  # list of labels from 0 to 4 obviously but comment keep us in touch so it is important to write comments
        self.transform = transform #albumentations transform
        logger.info(f"Initialized dataset with {len(image_paths)} images ")
    
    
    def __len__(self) -> int:
        return len(self.image_paths)  #count of total images 
    
    
    def __getitem__(self , idx: int) -> Tuple[Optional[torch.Tensor], Optional[int]]: # Returns a single data sample (image tensor and label) for the given index
        try: 
            image_path = self.image_paths[idx]
            image = cv2.imread(image_path) # load the image
            if image is None:
                raise ValueError(f"Failed to load image: {image_path}")
            image = cv2.cvtColor(image , cv2.COLOR_BGR2RGB)
            
            if self.transform:
                augmented = self.transform(image=image)
                image =  augmented['image'] # Apply image transformations (e.g., resize, normalize, flip) if defined
            label = self.labels[idx] # fetching the label ( class label )
            return image,label 
        
        except Exception as e:
            logger.error(f"Error loading image {image_path}: {str(e)}")
            return None , None 
        

#🔧 Advanced Image Preprocessing using Albumentations

What is it?
An advanced image preprocessing pipeline built using the Albumentations library for better augmentation and model robustness.

# * Pipeline Explanation:

1. A.Resize – Resizes the image to 224x224.

2. A.HorizontalFlip – Flips the image horizontally with a 50% probability.

3. A.Rotate – Randomly rotates the image by ±15 degrees.

4. A.RandomBrightnessContrast – Randomly adjusts brightness and contrast.

5. A.GaussNoise – Adds Gaussian noise to make the model robust to noisy inputs.

6. A.MotionBlur – Adds motion blur to simulate real-world camera shake or object movement.

7. A.Normalize – Normalizes the image using ImageNet mean and std.

8. ToTensorV2 – Converts the image to a PyTorch tensor.

* Why is it advanced?
Albumentations provides more powerful and diverse augmentations than torchvision.transforms, enabling your model to generalize better in complex scenarios.



In [None]:
transform = A.Compose([
    A.Resize(224,224),
    A.HorizontalFlip(p=0.5),
    A.Rotate(limit = 15 , p=0.5)  ,# randomly rotate by 15 degrees
    A.RandomBrightnessContrast(brightness_limit = 0.2 , contrast_limit=0.2 , p =0.5),
    A.GaussNoise(var_limit = (10.0,50.0), p =0.3) ,
    A.MotionBlur(blur_limit=7, p =0.3),
    A.Normalize(mean=[0.485,0.456,0.406], std=[0.229,0.224,0.2225]),  # imagenet normalization 
    ToTensorV2()
])

# 🔹 load_food101_subset Function
📌 What is it?
Creates a subset of the Food-101 dataset containing 5 specific classes.


* datasets.Food101

Loads the train and test splits.

download=False assumes dataset is already available.

class_to_idx

Maps the selected class names to their corresponding label indices.

train_image_paths, test_image_paths

Collects a total of:

500 train images → 100 per class

100 test images → 20 per class

Stores both image paths and labels.

FoodDataset

Creates custom dataset objects with Albumentations transforms applied.

class_names

Remaps original class labels to 0-4 for cleaner model training.

Exception Handling

Logs an error message if dataset loading fails (clean failover).

In [None]:
# food-101 dataset load and subset function

def load_food101_subset(config: Config) -> Tuple[Optional[Dataset], Optional[Dataset], Optional[Dict[int, str]]]:
    try:
        logger.info("Loading food-101 dataset")
        
        train_dataset = datasets.Food101(root=config.data_dir, split="train", download=False)
        test_dataset = datasets.Food101(root=config.data_dir, split="test", download=False)

        # Filtering Classes (we selected only 5 classes)
        class_to_idx = {name: idx for idx, name in enumerate(train_dataset.classes) if name in config.classes}
        
        if len(class_to_idx) != len(config.classes):
            raise ValueError(f"Some classes not found in dataset: {config.classes}")

        train_image_paths = []
        train_labels = []
        test_image_paths = []
        test_labels = []
        
        # collecting image path and labels for training set
        
        class_counts = {cls: 0 for cls in config.classes}
        for idx ,(image_path , label) in enumerate(train_dataset):
            class_name = train_dataset.classes[label]
            if class_name in config.classes and class_counts[class_name] < config.max_images_per_class:
                train_image_paths.append(image_path)
                train_labels.append(config.classes.index(class_name))
                class_counts[class_name] += 1
                if sum(class_counts.values()) >= config.max_images_per_class * len(config.classes):
                    break
                
        # now apply same for test
        
        class_counts = {cls: 0 for cls in config.classes}
        for idx , (image_path , label) in enumerate(test_dataset):
            class_name = test_dataset.classes[label]
            if class_name in config.classes and class_counts[class_name]< config.test_images_per_class:
                test_image_paths.append(image_path)
                test_labels.append(config.classes.index(class_name))
                class_counts[class_name]+=1
                if sum(class_counts.values())>= config.test_images_per_class * len(config.classes):
                    break
                
        train_subset = FoodDataset(train_image_paths , train_labels , transform)
        test_subset = FoodDataset(test_image_paths, test_labels, transform)
        
        class_names = {i: cls for  i , cls in enumerate(config.classes)}
        logger.info(f"Loaded {len(train_image_paths)} train and {len(test_image_paths)} test images")
    
    
    except Exception as e:
        logger.error(f"error loading Food-101 dataset: {str(e)}")
        return None , None , None 


 # ✅ validate_model Function
What does it do?
Evaluates the model’s accuracy on the test set to make sure it's not just vibing but actually performing.

- 🧠 How it works
1. model.eval() puts the model into evaluation mode — shuts off dropout and other train-time tricks.

2. torch.no_grad() disables gradient tracking — saves memory and speeds things up during inference.

3. It loops through the test data, runs predictions using YOLOv8, and compares them with true labels.

4. Uses torch.tensor([r.probs.top1 for r in results]) to handle batch predictions correctly.

5. Includes basic exception handling to log errors without crashing the process.

 

In [None]:
def validate_model(model: YOLO , dataloader: DataLoader , class_names:Dict[int , str], config: Config) -> float:
    try:
        logger.info("Starting Validating mogger, TIME TO SEE THE FIRE SHI I BUILT LESS GO BUD ")
        model.eval()  # model is in evaluation mode
        correct = 0 
        total = 0
        with torch.no_grad(): #disable graadient 
            for images, labels in dataloader:
                if images is None or labels is None:
                    continue
                images, labels = images.to(config.device), labels.to(config.device)
                results = model(images) # YOLOv8 predictions
                
                predicted = torch.tensor([r.probs.top1 for r in results])
    # top predictions images
                total+= labels.size(0)
                correct+= (predicted == labels).sum().item()
        accuracy = 100 * correct / total if total>0 else 0
        logger.info(f"Validations accuracy : {accuracy:.2f}%")
        return accuracy
    
    except Exception as e:
        logger.error(f"Error in validation : {str(e)}")
        return 0.0



# Now training function which is easy to understand ### 📦

In [None]:
def train_yolo_model(config: Config , train_dataset: Dataset , test_dataset: Dataset) -> Tuple[Optional[YOLO], Optional[YOLO]]:
    try:
        logger.info("Setting up YOLOv8 models (nano and samll)")
        
        
        model_nano = YOLO('yolov8n-cls.pt')
        model_small = YOLO('yolov8s-cls.pt')


    # ensure YAML file exists
        if not os.path.exists(config.yaml_path):
            raise FileNotFoundError(f"YAML file not found as {config.yaml_path}, Run generate_yaml.py first")
    
    
    # Making DataLoader
    
        train_loader = DataLoader(
        train_dataset,
        batch_size = config.batch_size,
        shuffle=True,
        num_workers=2,
        pin_memory = True
        )  
        test_loader = DataLoader(
        test_dataset,
        batch_size = config.batch_size,
        shuffle=False,
        num_workers = 2,
        pin_memory = True
        )
    
    # train nano model 
    
        logger.info("Starting YOLOv8 nano training")
        model_nano.train(
        data=config.yaml_path,
        epochs =config.epochs,
        batch = config.batch_size,
        imgsz=config.img_size,
        device=config.device,
        patience = 5,
        augment = True,
        save = True,
        project = "runs/train",
        name="food_classifier_nano",
        optimizer='SGD',
        lr0 = config.lr,
        cos_lr=True # Cosine Annealing LR 
        )
        logger.info("Starting YOLOv9 small training")
        model_small.train(
        data=config.yaml_path,
        epochs =config.epochs,
        batch = config.batch_size,
        imgsz=config.img_size,
        device=config.device,
        patience = 5,
        augment = True,
        save = True,
        project = "runs/train",
        name="food_classifier_small",
        optimizer='SGD',
        lr0 = config.lr,
        cos_lr=True # Cosine Annealing LR 
        )
    
    
    # validate both models
    
        nano_accuracy = validate_model(model_nano ,test_loader , config.classes , config)
        small_accuracy = validate_model(model_small , test_loader,config.classes, config)
    
        logger.info(f"Nano model validation accuracy: {nano_accuracy:.2f}%")
        logger.info(f"Small model validation accuracy: {small_accuracy:.2f}%")
    
        model_nano.save(config.model_path_nano)
        model_small.save(config.model_path_small)
    
        logger.info(f"Nano model saved at {config.model_path_nano}")
        logger.info(f"Nano model saved at {config.model_path_nano}")
        
        return model_nano , model_small
    except Exception as e:
        
        logger.error(f"Error training YOLOv8: {str(e)}")
        return None , None
    
    
# Ensemble prediction function
def ensemble_predict(models: List[YOLO], image: torch.Tensor, class_names: Dict[int, str], config: Config) -> Optional[str]:
    try:
        logger.info("Starting ensemble prediction")
        probs = []
        for model in models:
            model.eval()
            with torch.no_grad():
                results = model(image)
                probs.append(results[0].probs.data.cpu().numpy())  # Probabilities for all classes
        # Average probabilities
        avg_probs = np.mean(probs, axis=0)
        predicted_idx = np.argmax(avg_probs)  # Top prediction index
        confidence = avg_probs[predicted_idx]
        logger.info(f"Ensemble prediction: {class_names[predicted_idx]} (Confidence: {confidence:.2f})")
        return class_names[predicted_idx]
    except Exception as e:
        logger.error(f"Error in ensemble prediction: {str(e)}")
        return None

# Inference function for single image
def classify_image(models: List[YOLO], image_path: str, transform: A.Compose, class_names: Dict[int, str], config: Config) -> Optional[str]:
    try:
        logger.info(f"Classifying image: {image_path}")
        image = cv2.imread(image_path)  # Image load karo
        if image is None:
            raise ValueError(f"Failed to load image: {image_path}")
        image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)  # BGR se RGB
        augmented = transform(image=image)
        image_tensor = augmented['image'].unsqueeze(0).to(config.device)  # Transform aur batch dimension
        prediction = ensemble_predict(models, image_tensor, class_names, config)
        return prediction
    except Exception as e:
        logger.error(f"Error classifying image: {str(e)}")
        return None

# Main execution
if __name__ == "__main__":
    # Initialize configuration
    config = Config()

    # Check if YAML file exists
    if not os.path.exists(config.yaml_path):
        logger.error(f"YAML file not found at {config.yaml_path}. Run generate_yaml.py first.")
        exit()

    # Load dataset
    train_dataset, test_dataset, class_names = load_food101_subset(config)
    if train_dataset is None or test_dataset is None:
        logger.error("Failed to load dataset")
        exit()

    # Train and validate models
    model_nano, model_small = train_yolo_model(config, train_dataset, test_dataset)
    if model_nano is None or model_small is None:
        logger.error("Failed to train models")
        exit()

    # Test inference with ensemble
    test_image = "./food-101/images/pizza/1001116.jpg"
    prediction = classify_image([model_nano, model_small], test_image, transform, class_names, config)
    print(f"Predicted class: {prediction}")