In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

# Project: Pizza Classification and Ingredient Recognition

## Context
This project utilizes Kaggle datasets and Python notebooks running on GPUs to train two distinct models:
1. An image classification model to detect the presence of a pizza.
2. A model to identify the ingredients of a pizza.

The final objective is to integrate these two models to build an application capable of retrieving a pizza image, identifying if it’s a pizza, and listing the main ingredients.

---

## Project Steps

### 1. Retrieving the "Pizza or Not Pizza" Dataset
- **Objective**: Download the "Pizza or Not Pizza" dataset from Kaggle for training the first model.
- **Steps**:
  - Import the dataset via the Kaggle API.
  - Prepare images for training (preprocessing, resizing).

### 2. Training and Optimization of the "Pizza or Not Pizza" Model
- **Objective**: Train a classification model to detect whether an image contains a pizza.
- **Steps**:
  - Define the model architecture (e.g., CNN).
  - Use GPUs to accelerate training.
  - Optimize hyperparameters (e.g., learning rate, number of epochs).

### 3. Model Evaluation and Visualization Tool
- **Objective**: Evaluate the model's performance and visualize some example predictions.
- **Steps**:
  - Calculate performance metrics (precision, recall, F1-score).
  - Use a notebook to visualize examples of correct and incorrect predictions.

### 4. Retrieving the "Pizza Ingredients" Dataset
- **Objective**: Download the pizza ingredients dataset for training the second model.
- **Steps**:
  - Import the dataset via Kaggle.
  - Process the data to make it compatible with the ingredient identification model.
  

### 5. Training and Optimization of the "Pizza Ingredients" Model
- **Objective**: Train a model to recognize ingredients in pizza images.
- **Steps**:
  - Design the model for ingredient identification. In step 4, we now have imported the dataset with labels, and we then have to build the technology to put the localization boxes on these images. In that way, we have the possibilty to measure the quality of the output of the model during training, by minimizing the distance between both reality and prediction, and the value of the label.
  - Optimize training using GPUs on Kaggle.
  - Adjust hyperparameters to improve prediction accuracy.

### 6. Test and Visualization of Ingredient Predictions
- **Objective**: Test the ingredient model and visualize the predictions.
- **Steps**:
  - **Evaluation**: Calculate performance metrics such as precision, recall, and F1-score.
  - **Prediction Visualization**: Implement an algorithm to take an image as input and return it with labeled or segmented ingredients.
    - **Algorithm**:
      1. Load the trained ingredient model.
      2. Preprocess the input image (resize, normalize, etc.).
      3. Apply the model to predict ingredients.
      4. Create a segmentation map based on predicted ingredients.
      5. Overlay the segmentation map on the original image.
      6. Return the labeled image.
  - **Example Visualization**: Display an example image with ingredient labels.

### 7. Integration of Both Models
- **Objective**: Integrate the "Pizza or Not Pizza" and "Pizza Ingredients" models for a comprehensive classification.
- **Steps**:
  - Create a function that first applies the pizza detection model, followed by the ingredient model.
  - Handle cases where the image does not contain a pizza (detected as "not pizza").

### 8. Building a Complete Example
- **Objective**: Set up an end-to-end workflow for pizza and ingredient recognition.
- **Steps**:
  - Create a function to retrieve a pizza image (from a URL or by upload).
  - Pass the image through the model pipeline to detect pizza and list ingredients.
  - Present the final result in a notebook, with visualizations of the intermediate steps.

---

## Technologies and Tools
- **Language**: Python
- **Frameworks**: TensorFlow or PyTorch
- **Execution Environment**: Kaggle Notebooks with GPU support
- **Visualization**: Matplotlib, Seaborn for results display
- **Dataset Access**: Kaggle API

---

## Deliverables
- **Kaggle Notebooks**: Containing each step of the process.
- **Trained Models**: Exported model files for pizza detection and ingredient identification.
- **Documentation**: Explanation of each step and results.

---

This project will provide hands-on experience in training image classification models and integrating them into a complete image analysis pipeline.


# Utilities for Preprocessing the "Pizza Toppings" Object Detection Dataset

This notebook defines and uses a set of utility functions to streamline the loading, conversion, and visualization of data for object detection. These functions are designed to efficiently prepare images and annotations from the **Pizza Toppings Object Detection** dataset, simplifying the workflow for training an object detection model with Keras and TensorFlow. Below is an overview of the primary utilities:

### Utility Functions

1. **`load_imgs(path, no_of_images, img_size, do_display)`**  
   This function loads a specified number of images from a folder, resizes each image, and optionally displays them. It returns a dictionary of images and a list of their paths, standardizing image sizes for use in a deep learning model.

2. **`convert_bbox(coordinates, image_width, image_height)`**  
   This function converts bounding box coordinates from the YOLOv8 format (center coordinates and relative width/height) to a more conventional format (top-left and bottom-right corners). This conversion is essential for visualizing annotations on images and ensuring compatibility with libraries like PIL or OpenCV.

3. **`load_annotations(path, labels, no_of_annotations, do_display)`**  
   This function loads annotations from text files and associates them with the dataset’s label definitions. It processes and converts coordinates, returning a dictionary of annotations for each image. This utility links each image with its bounding box information, simplifying data preparation for model training.

4. **`plot_image_annotations(image_paths, annotations, label_colors, target_size, do_display, text_display, font_scale, alpha)`**  
   This function displays images with their annotations by drawing bounding boxes around detected objects. It uses predefined colors for each label and offers customization options like text scaling and box transparency. This utility is key for visually inspecting the annotated data and ensuring data quality before training.

These utilities provide a structured way to load, process, and handle images and annotations, making the data workflow efficient and consistent for object detection model training.


In [2]:
!pip install roboflow

import os
import cv2
from matplotlib import pyplot as plt
import matplotlib.patches as patches
from PIL import Image, ImageDraw, ImageFont
import numpy as np
import matplotlib.pyplot as plt
from roboflow import Roboflow

# Global Variables

INPUT_SHAPE = (512, 512, 3)
IMG_SIZE = (512, 512)
# Set this variable to True if you want to save the trained model locally
SAVE_MODEL = TRUE

def load_imgs(path, no_of_images, img_size=IMG_SIZE, do_display=False):
    """Function which loads images from the given path

    Args:
        path (str): path to the folder with images.
        no_of_images (int): number of images to load.
        img_size (tuple, optional): size of the image. Defaults to IMG_SIZE.
        do_display (bool): flag to display images. Defaults to False.

    Returns:
        images (dict): dictionary with images
        image_paths (list): list of image paths
    """
    # Declaring necessary variables
    images = {}
    image_paths = []

    # Iterating through all images in the given path
    for img_no, img_name in enumerate(os.listdir(path)):
        # Breaking from loop if no_of_images is reached
        if img_no == no_of_images:
            break

        # Loading, resizing and storing images in a dictionary
        image_paths.append(os.path.join(path, img_name))
        img = Image.open(os.path.join(path, img_name))
        img = img.resize(img_size)
        images[img_no] = img

        # Displaying images if do_display is True
        if do_display:
            print('\033[1m' + 'Image {}  Image name: {}'.format(img_no+1, img_name) + '\033[0m')
            display(img)
    
    # Returning images and image paths
    return images, image_paths


def load_annotations(path, labels, no_of_annotations, do_display=False):
    """Function which loads annotations from the given path

    Args:
        path (str): path to the folder with annotations.
        labels (dict): dictionary with labels and their corresponding string names.
        no_of_annotations (int): number of annotations to load.
        do_display (bool): flag to display annotations. Defaults to False.

    Returns:
        annotations (dict): dictionary with annotations.
    """
    # Declaring necessary variables
    annotations = {}

    # Iterating over all files in the directory
    for filename in os.listdir(path):
        # Breaking from loop if no_of_annotations is reached
        if len(annotations) == no_of_annotations:
            break

        # Displaying filename if do_display is True
        if do_display:
            print('\033[35m' + 'Filename: {}'.format(filename) + '\033[0m')

        # Constructing the full file path
        file_path = os.path.join(path, filename)

        # Removing the final file extension from the filename
        filename = filename.rsplit('.', 1)[0]
        
        # Declaring empty list for the current filename
        annotations[filename] = []

        # Opening the file and reading its contents
        with open(file_path, 'r') as file:
        # Iterating through each line in the file
            for line in file:
                # Splitting the line into label and coordinates
                parts = line.strip().split(' ')
                label = labels[int(parts[0])]
                coordinates = [float(coord) for coord in parts[1:]]

                # Storing the data in the dictionary
                annotations[filename].append({label: coordinates})

                # Displaying annotations if do_display is True
                if do_display:
                    print('\033[33m' + 'Label:' + '\033[0m' + ' {}'.format(label) + '\033[32m' + ' Coordinates:' + '\033[0m' + ' {}'.format(coordinates))
        
        if do_display:
            print()

    # Returning annotations
    return annotations

def convert_bbox(coordinates, image_width, image_height):
    """Function to convert bounding box coordinates from YOLOv8 format to PIL format

    Args:
        coordinates (list): list of bounding box coordinates in YOLOv8 format [x_center, y_center, width, height]
        image_width (int): width of the original image
        image_height (int): height of the original image

    Returns:
        new_coordinates (list): list of bounding box coordinates in PIL format [x1, y1, x2, y2]
    """
    # Unpacking coordinates from YOLOv8 format
    x_center, y_center, width, height = coordinates

    # Calculating new width and height
    new_width = width * image_width
    new_height = height * image_height

    # Calculating new center
    new_x_center = x_center * image_width
    new_y_center = y_center * image_height

    # Calculating new coordinates
    new_x1 = int(new_x_center - (new_width / 2))
    new_y1 = int(new_y_center - (new_height / 2))
    new_x2 = int(new_x_center + (new_width / 2))
    new_y2 = int(new_y_center + (new_height / 2))

    # Returning new coordinates
    return [new_x1, new_y1, new_x2, new_y2]

def plot_image_annotations(image_paths, annotations, label_colors, target_size, do_display=False, text_display=True, font_scale=1, alpha=0.4):
    """Function which plots images with annotations

    Args:
        image_paths (list): list of image paths.
        annotations (dict): dictionary with annotations.
        labels (dict): dictionary with annotation labels.
        label_colors (dict): dictionary with annotation label colors.
        target_size (tuple): target size of the image.
        do_display (bool): flag to display images with annotations. Defaults to False.
        text_display (bool): flag to display annotation labels. Defaults to True.
        font_scale (float): font scale for the annotation labels. Defaults to 1.
        alpha (float): alpha value for the bounding boxes. Defaults to 0.4.
    """
    # Iterating through all the images
    for img_no, image_path in enumerate(image_paths):
        # Retrieving the image
        image_path = image_paths[img_no]

        # Reading the image
        image = cv2.imread(image_path)
        image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
        image = cv2.resize(image, target_size)

        # Retrieving the image name
        image_name = image_path.split('/')[-1]
        image_name = image_name.split('\\')[-1]

        # Removing only last file extension from the image name
        image_name = image_name.rsplit('.', 1)[0]

        # Retrieving the image annotations
        image_annotations = annotations[image_name]

        # Drawing the bounding boxes around the image
        for annotation in image_annotations:
            # Retrieving the label and coordinates of the bounding box
            label = list(annotation.keys())[0]
            bbox = annotation[label]

            # Converting the bounding box coordinates from YOLOv8 format to PIL format
            bbox = convert_bbox(bbox, target_size[0], target_size[1])
            
            # Retrieving the coordinates of the bounding box
            x1, y1, x2, y2 = bbox

            # Creating a copy of the image
            image_with_rectangle = image.copy()

            # Creating a filled rectangle for the bounding box
            cv2.rectangle(image_with_rectangle, (x1, y1), (x2, y2), label_colors[label], cv2.FILLED)

            # Blending the image with the rectangle using cv2.addWeighted
            image = cv2.addWeighted(image, 1 - alpha, image_with_rectangle, alpha, 0)

            # Adding a rectangle outline to the image
            cv2.rectangle(image, (x1, y1), (x2, y2), label_colors[label], 3)

            # Adding the label to the image
            if text_display:
                # Setting the font
                font = cv2.FONT_HERSHEY_SIMPLEX

                # Drawing the label on the image, whilst ensuring that it doesn't go out of bounds
                (label_width, label_height), baseline = cv2.getTextSize(label, font, font_scale, 2)

                # Ensuring that the label doesn't go out of bounds
                if y1 - label_height - baseline < 0:
                    y1 = label_height + baseline
                if x1 + label_width > image.shape[1]:
                    x1 = image.shape[1] - label_width
                if y1 + label_height + baseline > image.shape[0]:
                    y1 = image.shape[0] - label_height - baseline
                if x1 < 0:
                    x1 = 0

                # Creating a filled rectangle for the label background
                cv2.rectangle(image, (x1, y1 - label_height - baseline), (x1 + label_width, y1), label_colors[label], -1)

                # Adding the label text to the image
                cv2.putText(image, label, (x1, y1 - 5), font, font_scale, (255, 255, 255), 2, cv2.LINE_AA)

        # Displaying images with annotations if do_display is True
        if do_display:
            plt.figure(figsize=(10,10))
            plt.imshow(image)
            plt.axis('off')
            plt.show()

def plot_model_evaluation(history):
    """
    Plots the training and validation losses for bounding box and classification outputs.

    Args:
        history: History object from model training.
    """
    # Plot bounding box loss
    plt.figure(figsize=(12, 6))
    plt.subplot(1, 2, 1)
    plt.plot(history.history['bounding_box_loss'], label='Train Bounding Box Loss')
    plt.plot(history.history['val_bounding_box_loss'], label='Val Bounding Box Loss')
    plt.xlabel('Epochs')
    plt.ylabel('Loss')
    plt.title('Bounding Box Loss')
    plt.legend()

    # Plot classification accuracy
    plt.subplot(1, 2, 2)
    plt.plot(history.history['class_label_accuracy'], label='Train Classification Accuracy')
    plt.plot(history.history['val_class_label_accuracy'], label='Val Classification Accuracy')
    plt.xlabel('Epochs')
    plt.ylabel('Accuracy')
    plt.title('Classification Accuracy')
    plt.legend()
    
    plt.tight_layout()
    plt.show()



[0m^C


ModuleNotFoundError: No module named 'roboflow'

# **Step 4 : Retrieving the "Pizza Ingredients" Dataset**

In [None]:
# Import librairies
import os
import zipfile
import tensorflow as tf
from tensorflow.keras import layers, Model
from tensorflow.keras.applications import ResNet50
from tensorflow.keras.optimizers import Adam
from kaggle.api.kaggle_api_extended import KaggleApi
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt
import cv2
import numpy as np

# Use Kaggle API
kaggle_api = KaggleApi()
kaggle_api.authenticate()

# Download Dataset form Kaggle
kaggle_api.dataset_download_files('matthiasbartolo/pizza-toppings-object-detection', path='./data', unzip=True)

# Set up meta variables and labels definition
image_path = 'Pizza-Object-Detector-7/train/images'
annotation_path = 'Pizza-Object-Detector-7/train/labels'
LABELS = {
    0: "Arugula", 1: "Bacon", 2: "Basil", 3: "Broccoli", 4: "Cheese",
    5: "Chicken", 6: "Corn", 7: "Ham", 8: "Mushroom", 9: "Olives",
    10: "Onion", 11: "Pepperoni", 12: "Peppers", 13: "Pineapple", 14: "Pizza", 15: "Tomatoes"
}

# Loading images and annotations
images, image_paths = load_imgs(path=image_path, no_of_images=9000, do_display=False)
annotations = load_annotations(path=annotation_path, labels=LABELS, no_of_annotations=None, do_display=False)


The dataset is now imported in our workspace, we then have to reformat all the images to start the training. To do so, we will transform all the images using the PIL format. 

In [None]:
def preprocess_data(image_paths, annotations, target_size=(224, 224)):
    """
    Preprocesses images and associated bounding box and class annotations for model training.

    Args:
        image_paths (list): List of paths to the images.
        annotations (dict): Dictionary of bounding box annotations for each image.
                            Keys are image names, values are lists of bounding boxes with classes.
        target_size (tuple): Target size to resize each image (default is (224, 224)).

    Returns:
        tuple: A tuple containing:
            - X (np.array): Array of preprocessed images, normalized to [0, 1].
            - y_boxes (np.array): Array of bounding boxes in PIL format.
            - y_classes (np.array): Array of class labels for each bounding box.
    """
    X = []
    y_boxes = []
    y_classes = []

    for path in image_paths:
        # Load and resize the image
        img = Image.open(path)
        img = img.resize(target_size)
        img_array = np.array(img) / 255.0  # Normalize pixel values to [0, 1]
        X.append(img_array)

        # Extract the image name to match annotations
        image_name = os.path.basename(path).split('.')[0]
        
        # Check if annotations exist for this image
        if image_name in annotations:
            image_annotations = annotations[image_name]
            for annotation in image_annotations:
                label = list(annotation.keys())[0]  # Get label
                bbox_yolo = annotation[label]  # Bounding box in YOLOv8 format
                bbox_pil = convert_bbox(bbox_yolo, target_size[0], target_size[1])

                y_boxes.append(bbox_pil)
                y_classes.append(LABELS.index(label))  # Convert label name to class index

    return np.array(X), np.array(y_boxes), np.array(y_classes)

# Prepare training data
X_train, y_train_boxes, y_train_classes = preprocess_data(image_paths, annotations)


# Step 5 : Training and Optimization of the "Pizza Ingredients" Model

## Why Use a Backbone?

Using a pre-trained backbone saves computation and time because we leverage the learned representations from another dataset (like ImageNet) rather than training a network from scratch.

## Why ResNet50 as the Backbone in Our Case?

In our object detection task, we aim to identify different toppings on a pizza. ResNet50 is well-suited as a backbone because:
1. **Feature-rich**: It is deep enough to capture complex and hierarchical features, essential for distinguishing various objects in a single image.
2. **Efficient**: With optimized architecture, ResNet50 offers a good balance between accuracy and computational efficiency.
3. **Transfer Learning Compatibility**: Pre-trained on ImageNet, it can transfer well to our object detection task, as it has already learned to recognize basic object shapes and patterns.

By freezing the backbone during training, we retain the valuable general features while focusing the training on the detection-specific layers (the bounding box head). This approach is efficient, prevents overfitting, and ensures the model can adapt well to our smaller dataset.


In [None]:
from tensorflow.keras import layers, Model
from tensorflow.keras.applications import ResNet50
from tensorflow.keras.optimizers import Adam

def create_backbone(input_shape=(224, 224, 3)):
    """
    Loads a pre-trained ResNet50 model as the backbone, with the top layers removed.

    Args:
        input_shape (tuple): Shape of the input images (default is (224, 224, 3)).

    Returns:
        Model: A Keras model instance representing the backbone for feature extraction.
    """
    backbone = ResNet50(weights="imagenet", include_top=False, input_shape=input_shape)
    backbone.trainable = False  # Freeze the backbone to retain pre-trained weights
    return backbone

def add_detection_head(backbone, num_classes, dense_units=(512, 256), activation='relu'):
    """
    Adds bounding box and classification layers to the backbone for ingredient detection.

    Args:
        backbone (Model): The pre-trained backbone model without top layers.
        num_classes (int): Number of ingredient classes for classification.
        dense_units (tuple): Number of units in each dense layer (default is (512, 256)).
        activation (str): Activation function for dense layers (default is 'relu').

    Returns:
        Model: The Keras model with bounding box and classification prediction layers.
    """
    # Flatten backbone output
    x = layers.Flatten()(backbone.output)
    for units in dense_units:
        x = layers.Dense(units, activation=activation)(x)

    # Bounding box output
    bbox_output = layers.Dense(4, activation="linear", name="bounding_box")

    # Classification output
    class_output = layers.Dense(num_classes, activation="softmax", name="class_label")  # Softmax for multi-class classification

    # Create the final model with two outputs
    model = Model(inputs=backbone.input, outputs=[bbox_output, class_output])
    return model

def compile_model(model):
    """
    Compiles the model with Adam optimizer, mean squared error loss, and mean absolute error metric.

    Args:
        model (Model): The Keras model instance to compile.

    Returns:
        Model: The compiled Keras model ready for training.
    """
    # Compile the model with two loss functions and metrics
    model.compile(
        optimizer=Adam(),
        loss={'bounding_box': 'mse', 'class_label': 'sparse_categorical_crossentropy'},
        metrics={'bounding_box': 'mae', 'class_label': 'accuracy'}
    ) # Mean Squared Error and Sparse Categorical Crossentropy for loss function and Mean Absolute Error and Accuracy for metrics
    return model

# Create and configure the model for bounding box prediction
backbone = create_backbone(input_shape=INPUT_SHAPE)
model = add_detection_head(backbone, num_classes=len(LABELS))
model = compile_model(model)

# Training the model
history = model.fit(
    X_train, 
    {'bounding_box': y_train_boxes, 'class_label': y_train_classes}, 
    epochs=10, 
    batch_size=8, 
    validation_split=0.2
)

# Call the function to plot model evaluation
plot_model_evaluation(history)

# Save the model if SAVE_MODEL is set to True
if SAVE_MODEL:
    model_save_path = './trained_pizza_topping_model.h5'
    model.save(model_save_path)
    print(f"Model saved locally at: {model_save_path}")


# Step 6 : Test and Visualization of Ingredient Predictions

In [None]:
def predict_and_plot(model, image_paths, label_colors, target_size=IMG_SIZE):
    """
    Predicts bounding boxes and labels for a set of images using the trained model, 
    and plots the images with predicted bounding boxes and labels.

    Args:
        model (Model): The trained Keras model for bounding box and label prediction.
        image_paths (list): List of paths to the test images.
        label_colors (dict): Dictionary of colors for each label for visualization.
        target_size (tuple): Target size to resize each image (default is IMG_SIZE).
    """
    for path in image_paths:
        # Load and preprocess image
        img = Image.open(path)
        img_resized = img.resize(target_size)
        img_array = np.array(img_resized) / 255.0  # Normalize
        
        # Predict bounding box and class label
        bbox_pred, class_pred = model.predict(np.expand_dims(img_array, axis=0))
        bbox_pred = bbox_pred[0]  # Get bounding box prediction for the image
        class_pred = np.argmax(class_pred[0])  # Get predicted class label
        
        # Convert bounding box to integer coordinates
        x1, y1, x2, y2 = map(int, bbox_pred)
        
        # Draw bounding box and label on the image
        fig, ax = plt.subplots(1)
        ax.imshow(img_resized)
        
        # Draw bounding box
        rect = patches.Rectangle((x1, y1), x2 - x1, y2 - y1, linewidth=2, 
                                 edgecolor=label_colors[LABELS[class_pred]], facecolor='none')
        ax.add_patch(rect)
        
        # Draw label text
        ax.text(x1, y1 - 10, LABELS[class_pred], color=label_colors[LABELS[class_pred]], 
                fontsize=12, weight='bold', bbox=dict(facecolor='white', alpha=0.6))
        
        plt.axis('off')
        plt.show()

        
test_image_paths = [image_paths[i] for i in range(5)]  # Select 5 images for testing
label_colors = {
    'Arugula': 'blue', 'Bacon': 'red', 'Basil': 'yellow', 'Broccoli': 'purple', 
    'Cheese': 'cyan', 'Chicken': 'olive', 'Corn': 'magenta', 'Ham': 'teal', 
    'Mushroom': 'maroon', 'Olives': 'lime', 'Onion': 'grey', 'Pepperoni': 'navy', 
    'Peppers': 'black', 'Pizza': 'orange', 'Tomatoes': 'peach'
}
        
predict_and_plot(model, test_image_paths, label_colors, target_size=IMG_SIZE)