# üåæ KerasCV YOLOv8 for Global Wheat Head Detection

This notebook documents the training methodology for the **Global Wheat Detection Kaggle Competition**, addressing the critical task of accurate **object localization of wheat heads** within diverse agricultural imagery. The challenge requires robust generalization across significant variations in lighting, wheat variety, growth stage, and camera perspective.

Notice that the Competition link for accessing and also for downloading or using dataset is:  
[Global Wheat Detection](https://www.kaggle.com/competitions/global-wheat-detection)

---

## I. Methodology and Architecture
The core of this solution utilizes the **YOLOv8 object detection architecture**, implemented via the accelerated and standardized API provided by the **KerasCV library**. This choice facilitates rapid iteration while maintaining access to state-of-the-art model structures and leveraging optimized TensorFlow backends.

## II. Training Strategy: Stratified Optimization
To ensure optimal convergence and performance, a **three-phase, stratified training regime** is employed:

1.  **Warmup Phase:** A short initial training period with a controlled learning rate increase to stabilize model weights and prevent early-stage divergence.
2.  **Mid-Tune Phase:** The primary training period, utilizing an optimized learning rate schedule (e.g., cosine decay) for comprehensive feature learning.
3.  **Fine-Tune Phase:** A final, low-learning-rate stage dedicated to subtle refinement of weights, maximizing the model's predictive accuracy.

## III. Evaluation Framework
Performance is measured using the standard competition metric: the **mean Average Precision (mAP)**. For accurate and compliant calculation of these metrics, we leverage the **cocometrics** library. This package provides a highly robust and widely adopted implementation of the COCO (Common Objects in Context) evaluation protocol, ensuring reliable and standard analysis of object detection performance across various Intersection over Union (IoU) thresholds.

## IV. Workflow and Artifact Management
This notebook is strictly dedicated to the **training process**. To maintain a clean, efficient workflow and simplify the final submission environment:

* **Model Checkpointing:** Successful model weights are saved at various stages throughout the training process.
* **Artifact Export:** The final, trained model is systematically **exported and saved** to disk. This saved artifact will be loaded into a **separate, dedicated inference notebook** responsible solely for prediction generation, test-time augmentation (TTA) application, and submission file creation.

**Note:**
All trained models uploaded in My Kaggle repository to reuse and I used it myself for inference
[Wheat Detection Models](https://www.kaggle.com/models/amirmohamadaskari/wheat-detection)


---

## ‚öôÔ∏è Section 1: Environment and Hardware Configuration

This initial section establishes the necessary environment, imports core dependencies, and performs crucial hardware checks to ensure efficient, accelerated, and reproducible model training.

---

### 1.1 Library Imports and System Settings

This subsection imports all necessary **libraries** for data manipulation, visualization, deep learning (**TensorFlow** and **KerasCV**), and general utility. It also configures environment variables to manage TensorFlow verbosity and warnings.

In [None]:
import pandas as pd
import numpy as np
import os
import sys
import math
import gc
from PIL import Image
import cv2
import ast
import tensorflow as tf
from tensorflow.keras.callbacks import EarlyStopping, ReduceLROnPlateau, TensorBoard
import keras_cv
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt
import seaborn as sns
import random
import warnings
# Suppress specific TensorFlow logging for a cleaner output
warnings.filterwarnings("ignore")
os.environ["TF_CPP_MIN_LOG_LEVEL"] = "3"
os.environ["TF_CPP_MIN_VLOG_LEVEL"] = "0"
os.environ["TF_ENABLE_ONEDNN_OPTS"] = "0"

### 1.2 Hardware Verification

This step verifies the **availability of high-performance computing devices** (GPU or TPU) within the TensorFlow environment, which is crucial for efficient execution of the deep learning pipeline.

In [None]:
print("Available devices: \n")
# List and print all logical devices configured for TensorFlow
for device in tf.config.list_logical_devices():
    print(device.name, device.device_type)

### 1.3 Distribution Strategy Initialization

The `get_strategy()` function implements a robust mechanism to automatically detect and configure the appropriate **TensorFlow distribution strategy**. This is critical for maximizing training speed by leveraging multiple devices (TPUs or GPUs) if available. The strategy object returned dictates how the model and data are distributed across available cores/devices.

In [None]:
def get_strategy():
    """
    Detects and returns the best TensorFlow distribution strategy.
    - TPUStrategy for TPU(s)
    - MirroredStrategy for GPU(s)
    - Default strategy for CPU
    """
    try:
        # Try TPU first: Initialize and connect to the TPU cluster
        tpu = tf.distribute.cluster_resolver.TPUClusterResolver(tpu='local')
        tf.config.experimental_connect_to_cluster(tpu)
        tf.tpu.experimental.initialize_tpu_system(tpu)
        strategy = tf.distribute.TPUStrategy(tpu)
        print("Using TPU strategy:", type(strategy).__name__)
    except Exception:
        # If TPU not available, try GPU
        gpus = tf.config.list_physical_devices('GPU')
        if gpus:
            # Use MirroredStrategy for distributed training across multiple GPUs
            strategy = tf.distribute.MirroredStrategy()
            print("Using GPU strategy:", type(strategy).__name__)
        else:
            # Fallback to CPU/default strategy
            strategy = tf.distribute.get_strategy()
            print("No TPU/GPU found. Using CPU strategy:", type(strategy).__name__)

    # Report the number of replicas, which equals the number of devices used for training
    print("REPLICAS:", strategy.num_replicas_in_sync)
    return strategy

# Execute the function to set the global distribution strategy
strategy = get_strategy()

### 1.4 Strategy Confirmation and Version Summary

This final cell in the setup section confirms the number of **replicas** (devices) being utilized for parallel training and verifies the exact **TensorFlow version** to ensure environment consistency and reproducibility.

In [None]:
# Print the confirmed number of synchronous replicas (devices) being utilized
print("REPLICAS:", strategy.num_replicas_in_sync)
# Print the TensorFlow version for reproducibility documentation
print("TensorFlow version:", tf.__version__)

### 1.5 Setting Global Seeds for Reproducibility

To ensure that the results from this notebook can be reliably reproduced, a fixed **random seed** is set for the core libraries involved: `random`, `numpy`, and `tensorflow`. This is essential as operations like data splitting, augmentation, and model weight initialization often involve randomness.

In [None]:
SEED = 28
def seed_everything(SEED):
    # Set seed for the standard 'random' library
    random.seed(SEED)
    # Set seed for TensorFlow's global random operations
    tf.random.set_seed(SEED)
    # Set seed for NumPy's random operations
    np.random.seed(SEED)
    print('For reproducing purposes, everything seeded !')

# Execute the seeding function with the defined constant
seed_everything(SEED)

## üìä Section 2: Data Loading and Exploratory Data Analysis (EDA)

This section focuses on loading the dataset, understanding its core statistics, and visualizing key characteristics of the images and bounding box annotations. A thorough EDA is essential for making informed decisions regarding model architecture, data augmentation, and training parameters.

---

### 2.1 Dataset Path Configuration and Initial Counts

Define the file paths for the raw data, including the training images, test images, and the CSV file containing bounding box annotations. Initial counts of images in the respective directories and a shape check of a sample image are performed.

In [None]:
# Define base directory and paths for all resources
DATA_DIR = '/kaggle/input/global-wheat-detection'
TRAIN_DIR = os.path.join(DATA_DIR, 'train')
TEST_DIR = os.path.join(DATA_DIR, 'test')
CSV_PATH = os.path.join(DATA_DIR, 'train.csv')

In [None]:
# Count the number of files in the train and test directories
num_train_images = len(os.listdir(TRAIN_DIR))
num_test_images = len(os.listdir(TEST_DIR))
print(f'Number of total images on Train directory: {num_train_images}')
print(f'Number of test images on Test directory: {num_test_images}')

In [None]:
# Load a sample image to check the default image resolution
img_path = os.path.join(TRAIN_DIR, os.listdir(TRAIN_DIR)[0])
img = cv2.imread(img_path, cv2.IMREAD_COLOR)
print(img.shape)

In [None]:
# Load the annotation CSV file into a pandas DataFrame
df = pd.read_csv(CSV_PATH)
# Display the first few rows of the DataFrame
df.head()

### 2.2 Annotation Summary and Bounding Box Density

Analyze the overall size of the annotation DataFrame and calculate the density of wheat heads per image to understand the scale of the object detection task. The statistics help characterize the target variance.

The key observation here is the **high variance** in the number of wheat heads per image (demonstrated by the large standard deviation and maximum value). The mean of $\approx 43$ bounding boxes per image confirms this is a **dense detection problem**, requiring a model with strong capabilities for handling object clustering and overlap.

In [None]:
# Check the total number of bounding box annotations
df.shape

In [None]:
# Calculate the average number of bounding boxes per image
averaged_bbox_per_img = df.groupby('image_id').size().mean()
print(f'Average Bounding boxes exists in an image: {int(averaged_bbox_per_img)}')

In [None]:
# Get detailed statistics on the count of wheat heads per image
bbox_counts = df.groupby('image_id').size()
print('Statistics of wheat head per image:')
print(bbox_counts.describe().T)

---
Visualize the distribution of the bounding box counts per image. The histogram visually confirms the heavy tail of the distribution, indicating a small subset of images are exceptionally dense.

In [None]:
plt.figure(figsize= (12, 6))
# Create a histogram to visualize the distribution of wheat head counts
sns.histplot(bbox_counts, bins= 30, kde= True, color= 'purple')
plt.title('Number of Bounding Boxes per Image')
plt.xlabel('Number of Bounding Boxes')
plt.ylabel('Number of images')

plt.show()

### 2.3 Analysis of Unannotated Images (Negative Samples)

Determine the proportion of images that contain no annotations (i.e., no visible wheat heads). This check is crucial to ensure the class imbalance regarding negative samples is manageable and that the model is exposed to a sufficient number of true negative examples.

The finding that **unannotated images are not dominated** (approximately 0.01 of the dataset) confirms the data distribution is relatively balanced between positive and negative training examples, which should prevent trivial prediction bias towards background.

In [None]:
# Identify all image IDs that have at least one annotation
annonated_ids = set(df['image_id'].unique())
print(f'Number of images with Wheat: {len(annonated_ids)}')

In [None]:
# Compare all image files with annotated IDs to find empty images
all_images = [f.replace('.jpg', '') for f in os.listdir(TRAIN_DIR)]
empty_images = [f for f in all_images if f not in annonated_ids]
print(f'Number of images without annonation(Wheat): {len(empty_images)}')
print(f'Example of empty image: {empty_images[0]}')

In [None]:
# Calculate the percentage of annotated and unannotated images
empty_img_frac = len(empty_images) / len(os.listdir(TRAIN_DIR))
annonated_img_frac = len(annonated_ids) / len(os.listdir(TRAIN_DIR))

print(f'Empty images percentage: {empty_img_frac:.4f}')
print(f'Annonated images percentage: {annonated_img_frac:.4f}')
print("Empty images aren't dominated, no problem with them at all!")

---
Display a sample of an unannotated image, confirming the appearance of a true negative example (e.g., bare soil, field edges, or non-wheat crops).

In [None]:
img_path = os.path.join(TRAIN_DIR, empty_images[0] + '.jpg')
img = Image.open(img_path)

plt.imshow(img)
plt.axis('off')
plt.title(f'Example of empty: {empty_images[0]}.jpg')
plt.show()

### 2.4 Initial Image Visualisation

Define and call a utility function to display a few random training images. This provides an initial visual inspection of the dataset's **diversity in lighting, perspective, and background clutter**, which directly influences the design of the data augmentation pipeline.

In [None]:
def show_images(num_images= 6, cols= 3):
    # Determine the list of files to display
    files = os.listdir(TRAIN_DIR)[:num_images]
    rows = (num_images + cols - 1) // cols

    fig = plt.figure(figsize= (cols* 4, rows* 4))
    
    for i, fname in enumerate(files):
        img_path = os.path.join(TRAIN_DIR, fname)
        img = Image.open(img_path)
        img = img.resize((256, 256)) # Resize for consistent display

        plt.subplot(rows, cols, i+1)
        plt.imshow(img)
        plt.axis('off')
        plt.title(fname)
        
    plt.tight_layout()
    plt.show()

In [None]:
# Display 6 sample images
show_images(num_images= 6, cols= 3)

### 2.5 Bounding Box Transformation and Size Analysis

Transform the `bbox` string column into numerical coordinates (`x_min`, `y_min`, `x_max`, `y_max`) and calculate the dimensions (`width`, `height`) of all bounding boxes. This allows for an analysis of the object scale. The coordinate system is converted from **[x_min, y_min, width, height] (PASCAL VOC format)** to **[x_min, y_min, x_max, y_max]**, which is often preferred for internal object detection calculations.

In [None]:
# Convert the string representation of the list in 'bbox' column to an actual list
df['bbox'] = df['bbox'].apply(ast.literal_eval)
# Extract coordinates from the list: [x_min, y_min, x_max, y_max]
df['x_min'] = df['bbox'].apply(lambda b: b[0])
df['y_min'] = df['bbox'].apply(lambda b: b[1])
df['x_max'] = df['bbox'].apply(lambda b: b[0] + b[2])
df['y_max'] = df['bbox'].apply(lambda b: b[1] + b[3])

In [None]:
# Display the modified DataFrame structure
df.head()

In [None]:
# Calculate the actual width and height of each bounding box
df['width'] = df['x_max'] - df['x_min']
df['height'] = df['y_max'] - df['y_min']
print(df[['width' ,'height']].describe().T)

In [None]:
df.head()

---
Visualize the distributions of bounding box widths and heights. The statistics and plot confirm that the wheat heads are generally **small objects** (median dimensions are around 60x60 pixels in a 1024x1024 image). This small object size necessitates the use of high-resolution input and potentially multi-scale feature maps in the YOLOv8 backbone.

In [None]:
fig, ax = plt.subplots(1, 2, figsize= (12, 6))
for i, col in enumerate(['width', 'height']):
    # Plot histogram for width and height
    sns.histplot(df[col], bins= 50, kde= True, ax= ax[i])
    ax[i].set_title(f'Bounding Boxes {col} distribution')
    ax[i].set_xlim((0, 250)) # Limit x-axis for better visibility of the main distribution
    ax[i].set_xlabel(f'{col} pixels')
    ax[i].set_ylabel('Count')

### 2.6 Annotated Image Verification

Define and call a function to display images with their corresponding ground-truth bounding boxes drawn. This step visually confirms the accuracy and quality of the annotations and the coordinate transformation process. This is the **final check** before proceeding to data preprocessing.

In [None]:
def show_images_with_bboxes(df, image_dir, nrows, ncols):
    # Pick random images from the train dir to display
    files = os.listdir(image_dir)
    selected_files = random.sample(files, nrows * ncols)

    fig, axs = plt.subplots(nrows, ncols, figsize=(4*ncols, 4*nrows))

    for ax, fname in zip(axs.flatten(), selected_files):
        image_id = fname.replace('.jpg', '')

        # Load image using OpenCV (BGR format)
        img_path = os.path.join(image_dir, fname)
        img = cv2.imread(img_path)
        img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB) # Convert to RGB for display

        # Get bboxes if they exist for the current image
        if image_id in df['image_id'].values:
            bboxes = df[df['image_id'] == image_id][['x_min', 'y_min', 'x_max', 'y_max']].values
            for (x_min, y_min, x_max, y_max) in bboxes:
                # Draw the bounding box onto the image
                start_point = (int(x_min), int(y_min))
                end_point = (int(x_max), int(y_max))
                color = (255, 0, 0) # Red color (RGB)
                thickness = 2
                cv2.rectangle(img, start_point, end_point, color, thickness)

        # Show image and title
        ax.imshow(img)
        ax.axis('off')
        ax.set_title(fname, fontsize=8)

    plt.tight_layout()
    plt.show()

In [None]:
# Display 4 images with their ground-truth bounding boxes
show_images_with_bboxes(df, TRAIN_DIR, 2, 2)

## ‚öôÔ∏è Section 3: Data Preparation and Train/Validation Split

This section transforms the raw annotation DataFrame into a format suitable for the KerasCV object detection API. It groups bounding boxes by image, structures the data into dictionary objects, performs the train/validation split, and strategically integrates the unannotated images into the training set.

---

### 3.1 Structuring Data for Model Input

The bounding box data is grouped by `image_id` and converted into a list of dictionaries. Each dictionary represents a single image and contains its file path and a NumPy array of all associated bounding boxes in the required **`[x_min, y_min, x_max, y_max]`** format.

In [None]:
# Group bounding box coordinates by image_id, resulting in a list of bboxes per image
grouped = df.groupby('image_id')[['x_min', 'y_min', 'x_max', 'y_max']].apply(
    lambda x: x.values.tolist()
)

In [None]:
data_dicts = []
for image_id, bboxes in grouped.items():
    img_path = os.path.join(TRAIN_DIR, f'{image_id}.jpg')
    # Convert the list of bboxes into a float32 NumPy array with shape (N, 4)
    bboxes = np.array(bboxes, dtype=np.float32).reshape(-1, 4)
    data_dicts .append({
        'image_path': img_path,
        # 'bboxes' is the key expected by KerasCV's preprocessors
         'bboxes': bboxes
    })

### 3.2 Train-Validation Split

The dataset containing all positive samples is randomly divided into training (80%) and validation (20%) sets. The `random_state` is fixed using the global `SEED` for reproducibility.

In [None]:
# Split the data_dicts into train and validation sets (80/20 split)
train_dicts, val_dicts = train_test_split(
    data_dicts,
    test_size= 0.2,
    random_state= SEED,
    shuffle= True
)
print('Train and Validation dicts created successfully! 20% of data stored for validation')

---
### 3.3 Integration of Negative Samples

The unannotated images (`empty_images`) are explicitly added to the **training set only**. For these negative samples, the `bboxes` array is set to an empty **`(0, 4)`** array. This ensures the model learns to recognize and correctly classify true negative images, improving robustness.

In [None]:
for fname in empty_images:
    img_path = os.path.join(TRAIN_DIR, f'{fname}.jpg')
    # Create an empty bounding box array for images without wheat heads
    bboxes = np.zeros((0, 4), dtype=np.float32)
    train_dicts.append({
        'image_path': img_path,
        'bboxes': bboxes
    })

# Randomly shuffle the final training dictionary list after adding negative samples
random.shuffle(train_dicts)

## üõ†Ô∏è Section 4: Configuration, Data Pipeline, and Augmentation Strategy

This section establishes all essential **hyperparameters** for the YOLOv8 model and its training phases. It then defines the **input pipeline** using `tf.data` and **KerasCV layers** to efficiently load, preprocess, and augment the training and validation data. This structure ensures high throughput and effective generalization.

---

### 4.1 Hyperparameter Definitions

Key hyperparameters governing model size, training stability, learning rate schedule, and input pipeline efficiency are defined. The use of a small number of classes (`NUM_CLASSES=1`) is due to the nature of the competition (only 'wheat head' is detected).

In [None]:
# --- Model and Input Configuration ---
IMG_SIZE = (1024, 1024) # Target input resolution for the model
NUM_CLASSES = 1 # Only one class: 'wheat head'
GLOBAL_CLIPNORM = 10.0 # Gradient clipping value for training stability (prevents exploding gradients)

# --- Learning Rate Configuration (Three-Phase Strategy) ---
WARMUP_LR= 1e-3 # Learning rate for the initial Warmup phase
FINE_TUNE_BB_LR = 1e-4 # Learning rate for bounding box head during Fine-Tune (if separate training is desired)
FINE_TUNE_MODEL_LR = 1e-5 # Very low final learning rate for subtle refinement during Fine-Tune phase

# --- Epoch Configuration (Three-Phase Strategy) ---
WARMUP_EPOCH = 10 # Duration of the Warmup phase
INTERMEDIATE_EPOCH = WARMUP_EPOCH + 20 # Total epochs up to the end of the Mid-Tune phase
FINAL_EPOCH = INTERMEDIATE_EPOCH + 50 # Total epochs up to the end of the Fine-Tune phase

# --- tf.data Pipeline Configuration ---
AUTO = tf.data.AUTOTUNE # Optimal setting for parallel execution
BATCH_SIZE_PER_REPLICA = 4 # Batch size allocated to each available device (GPU/TPU core)
BUFFER_SHUFFLE_SIZE = 512 # Size of the buffer used for shuffling the dataset

# Global Batch Size calculation: crucial for scaling learning rates later
BATCH_SIZE = BATCH_SIZE_PER_REPLICA * strategy.num_replicas_in_sync 
print(f'Global Batch size: {BATCH_SIZE}')

---
### 4.2 Data Input Preparation Functions

These helper functions are designed to bridge the gap between the Python list-of-dictionaries format (`train_dicts`, `val_dicts`) and the required `tf.data.Dataset` and KerasCV formats.

* `prepare_inputs`: Converts the Python list of dictionaries into a tuple of **Ragged Tensors** (`image_paths`, `classes`, `boxes`). Ragged tensors are essential here because each image has a variable number of bounding boxes.
* `load_image`: Standard TensorFlow function to decode a JPEG file from the path.
* `load_dataset`: Combines the loaded image tensor with the bounding boxes and classes into a dictionary, which is the standard input format for KerasCV augmentation layers.

In [None]:
def prepare_inputs(dicts):
    # Convert list of image paths into a Ragged Tensor of strings
    image_paths = tf.ragged.constant(
        [s["image_path"] for s in dicts], dtype=tf.string
    )

    bbox_list = [
        np.array(s["bboxes"], dtype=np.float32).reshape(-1, 4)
        for s in dicts
    ]

    # Assign a class ID of 0 (since NUM_CLASSES=1, representing "wheat head")
    classes_list = [
        np.zeros((len(b)), dtype=np.float32) for b in bbox_list
    ]

    # Convert bounding box and class lists into Ragged Tensors
    bboxes  = tf.ragged.constant(bbox_list, ragged_rank=1, dtype=tf.float32)
    classes = tf.ragged.constant(classes_list, ragged_rank=1, dtype=tf.float32)

    return image_paths, classes, bboxes


In [None]:
# Load and decode JPEG image
def load_image(image_path):
    image = tf.io.read_file(image_path)
    image = tf.image.decode_jpeg(image, channels=3)
    return image

In [None]:
# Package into the dictionary format expected by KerasCV
def load_dataset(image_path, classes_rt, boxes_rt):
    image = load_image(image_path)
    bounding_boxes = {"boxes": boxes_rt, "classes": classes_rt}
    return {"images": image, "bounding_boxes": bounding_boxes}

### 4.3 Multi-Phase Data Augmentation Strategies

Three distinct `tf.keras.Sequential` pipelines are defined using **KerasCV layers**. Using different augmentation strengths for different training phases is a standard technique to prevent overfitting early on (strong augmentation) and enable subtle optimization later (light augmentation).

* **Strong Augmenter (Warmup/Early Mid-Tune):** Uses aggressive geometric and color transformations, including **Mosaic**, to significantly diversify the training data.
* **Light Augmenter (Late Mid-Tune/Fine-Tune):** Reduces the magnitude of transformations, providing a more stable input distribution closer to the validation data.
* **Validation Augmenter:** Only performs necessary deterministic operations (resizing) without any random transformations.

In [None]:
# Strong augmentations: Aggressive data diversification for early training
augmenter_strong = tf.keras.Sequential([
    # JitteredResize: Randomly scales the image before resizing to introduce scale variation
    keras_cv.layers.JitteredResize(
        target_size=IMG_SIZE, scale_factor=(0.9, 1.1), bounding_box_format="xyxy"
    ),
    # Mosaic: Combines 4 images into 1, dramatically increasing batch size and context diversity
    keras_cv.layers.Mosaic(bounding_box_format="xyxy", name= 'mosaic'),
    # Standard horizontal flipping
    keras_cv.layers.RandomFlip(
        mode="horizontal", bounding_box_format="xyxy"
    ),
    # Strong color distortion
    keras_cv.layers.RandomColorJitter(
        value_range=(0.0, 255.0),
        brightness_factor=0.2, contrast_factor=0.2,
        saturation_factor=0.2, hue_factor=0.1
    ),
    # Randomly desaturates colors (simulating sensor noise/weather)
    keras_cv.layers.RandomColorDegeneration(
        factor=(0.2, 0.7), seed=SEED
    ),
])

# Light augmentations: Milder transformations for stable convergence in later phases
augmenter_light = tf.keras.Sequential([
    # Reduced jitter scale
    keras_cv.layers.JitteredResize(
        target_size=IMG_SIZE, scale_factor=(0.95, 1.05), bounding_box_format="xyxy"
    ),
    keras_cv.layers.RandomFlip(
        mode="horizontal", bounding_box_format="xyxy"
    ),
    # Reduced color distortion
    keras_cv.layers.RandomColorJitter(
        value_range=(0.0, 255.0),
        brightness_factor=0.1, contrast_factor=0.1,
        saturation_factor=0.1, hue_factor=0.05
    ),
    keras_cv.layers.RandomColorDegeneration(
        factor=(0.1, 0.4), seed=SEED
    ),
])

# Validation (deterministic) resizing: Only standard resizing for evaluation
augmenter_val = tf.keras.Sequential([
    # Fixed resize scale
    keras_cv.layers.JitteredResize(
        target_size=IMG_SIZE, scale_factor=(1.0, 1.0), bounding_box_format="xyxy"
    )
])


### 4.4 Data Loading and Dataset Creation Functions

These functions orchestrate the data pipeline, applying the correct sequence of loading, batching, shuffling, and augmentation based on the desired training phase.

* `dict_to_tuple`: Converts the KerasCV output dictionary back into a simple `(images, bounding_boxes)` tuple required by the Keras `model.fit()` method.
* `create_strong_dataset`: Builds the dataset for the Warmup phase, incorporating `Mosaic` and strong color jitter. **Ragged batching** is used before augmentation, as KerasCV layers handle ragged tensors seamlessly.
* `create_light_dataset`: Builds the dataset for the Mid-Tune and Fine-Tune phases, applying either the light training augmentations or the deterministic validation augmentations.

In [None]:
def dict_to_tuple(inputs):
    # Convert KerasCV dictionary output to standard Keras (x, y) tuple
    return inputs['images'], inputs['bounding_boxes']

In [None]:
def create_strong_dataset(dict_list, batch_size=BATCH_SIZE):
    
    image_paths, classes, bboxes = prepare_inputs(dict_list)

    # 1. Start with tensor slices
    ds = tf.data.Dataset.from_tensor_slices((image_paths, classes, bboxes))
    # 2. Shuffle paths
    ds = ds.shuffle(BUFFER_SHUFFLE_SIZE)
    # 3. Load image from path
    ds = ds.map(load_dataset, num_parallel_calls=AUTO)
    # 4. Batch before augmentation (required for Mosaic and efficient augmentation)
    ds = ds.ragged_batch(batch_size, drop_remainder=True)
    # 5. Apply strong augmentation
    ds = ds.map(augmenter_strong, num_parallel_calls=AUTO)
    # 6. Convert to tuple format
    ds = ds.map(dict_to_tuple, num_parallel_calls=AUTO)

    # Pre-fetch data to overlap CPU work (loading/augmenting) with GPU work (training)
    return ds.prefetch(AUTO)

In [None]:
def create_light_dataset(dict_list, batch_size=BATCH_SIZE, is_training= False):

    image_paths, classes, bboxes = prepare_inputs(dict_list)
    
    ds = tf.data.Dataset.from_tensor_slices((image_paths, classes, bboxes))

    # 1. Load image from path
    ds = ds.map(load_dataset, num_parallel_calls=AUTO)

    if is_training:
        ds = ds.shuffle(BUFFER_SHUFFLE_SIZE)
        ds = ds.ragged_batch(batch_size, drop_remainder=True)
        # Apply light training augmentation
        ds = ds.map(augmenter_light, num_parallel_calls=AUTO)
    else:
        # No shuffle for validation
        ds = ds.ragged_batch(batch_size, drop_remainder=True)
        # Apply deterministic validation augmentation
        ds = ds.map(augmenter_val, num_parallel_calls=AUTO)
    
    # 2. Convert to tuple format
    ds = ds.map(dict_to_tuple, num_parallel_calls=AUTO)
    
    # Pre-fetch for efficiency
    return ds.prefetch(AUTO)

### 4.5 Final Dataset Initialization

Instantiate the three required datasets based on the previously defined functions and data splits. The availability of these distinct datasets enables the implementation of the three-phase training strategy.

In [None]:
# Initialize the datasets for the three training phases
train_strong_dataset = create_strong_dataset(train_dicts)
# Validation set is consistent across all phases
val_dataset = create_light_dataset(val_dicts, is_training= False)
# Light training set for Mid-Tune and Fine-Tune
train_light_dataset = create_light_dataset(train_dicts, is_training= True)

print('‚úÖ Train and Validation datasets are ready !')
print('Light Augmented dataset for Mid-Tune and Fine-Tune phases created !')

---
### 4.6 Dataset Verification and Augmentation Visualization

This subsection performs two vital checks:
1.  **Shape Verification:** Confirms that the output tensors from the `tf.data` pipeline (`train_strong_dataset`) have the expected shapes, particularly noting the ragged batching for bounding boxes.
2.  **Visual Validation:** Defines and utilizes a utility function (`visualize_dataset`) to visually inspect the effect of the applied strong augmentations on the training data (e.g., Mosaic, color jitter) and the deterministic transformations on the validation data.

The visualization uses the convenient **KerasCV `plot_bounding_box_gallery`** function to render the ground-truth boxes on the augmented images.

In [None]:
# Check the shapes of the output tensors from the data pipeline
for images, bounding_boxes in train_strong_dataset.take(1):
    bboxes = bounding_boxes["boxes"]
    classes = bounding_boxes["classes"]

    # Image shape should be (BATCH_SIZE, 1024, 1024, 3)
    print("Images shape:", images.shape)
    # Boxes shape will be RaggedTensor, showing (BATCH_SIZE, None, 4)
    print("Boxes shape:", bboxes.shape)
    # Classes shape will be RaggedTensor, showing (BATCH_SIZE, None)
    print("Classes shape:", classes.shape)

In [None]:
def visualize_dataset(dataset, rows=2, cols=2, 
                      value_range=(0, 255), bounding_box_format="xyxy"):
    # Take a single batch for visualization
    batch = next(iter(dataset.take(1)))
    # Our dataset is already (images, bounding_boxes) from the final map function
    images, bounding_boxes = batch
    
    num_images = rows * cols

    # 1. Plot raw augmented images
    fig, axs = plt.subplots(rows, cols, figsize= (4* cols, 4* rows))
    axs = axs.flatten()
    for i in range(num_images):
        # Convert tensor to NumPy array for plotting
        img = images[i].numpy().astype('uint8')

        axs[i].imshow(img)
        axs[i].set_title('Raw Image')
        axs[i].axis('off')
    plt.tight_layout()
    plt.show()
         
    # 2. Plot images with ground-truth bounding boxes
    keras_cv.visualization.plot_bounding_box_gallery(
        images,                 # images
        y_pred= bounding_boxes, # y_pred is used here as the input format is (images, y_true)
        value_range=value_range,# range of image values
        rows=rows,
        cols=cols,
        scale=5,
        font_scale=0.7,
        bounding_box_format=bounding_box_format,
    )

    plt.tight_layout()
    plt.show()

# Usage
visualize_dataset(train_strong_dataset, rows=2, cols=2)
visualize_dataset(val_dataset, rows=2, cols=2)

### 4.7 Calculation of Training and Validation Steps

Determine the total number of batches per epoch for both the training and validation sets. These values are required inputs for the Keras `fit` method, ensuring the model iterates over the entire dataset exactly once per epoch. The `math.ceil` function is used to account for any partial final batch.

In [None]:
# Get the total number of samples in each split
NUM_TRAIN_IMAGES = len(train_dicts)
NUM_VAL_IMAGES   = len(val_dicts)

# Calculate the number of batches (steps) per training epoch
steps_per_epoch  = math.ceil(NUM_TRAIN_IMAGES / BATCH_SIZE)
# Calculate the number of batches (steps) for validation
validation_steps = math.ceil(NUM_VAL_IMAGES / BATCH_SIZE)

print(f"Steps per Epoch: {steps_per_epoch}")
print(f"Validation Steps: {validation_steps}")

## üß† Section 5: YOLOv8 Model Definition and Three-Phase Training

This section defines the **YOLOv8 object detection model** using KerasCV, sets up the **optimizer and loss functions**, implements a **custom COCO metrics callback** for rigorous evaluation, and executes the three-phase training strategy: **Warmup**, **Mid-Tune**, and **Fine-Tune**.

**Kaggle Artifact Management Note:**
All model checkpoints are saved to the `/kaggle/working/` directory in this notebook. The best performing models from each phase are subsequently uploaded and versioned as a Kaggle Model on my account for use in the separate inference notebook.I already use that to load model from this. But you are free to use it on Kaggle or Load the model on Kaggle or download it to use it. You can access these saved model versions here: [Wheat Detection Models](https://www.kaggle.com/models/amirmohamadaskari/wheat-detection).

---

### 5.1 Model Initialization and Compilation (Phase 1 Setup)

The model is defined within the `strategy.scope()` to ensure weights are correctly initialized across all available devices (TPU/GPU cores). We utilize the **YOLOv8-M** backbone, leveraging pre-trained weights from the COCO dataset to accelerate convergence.

**Key Configuration:**
* **Backbone Freezing:** The entire backbone is initially **frozen** (`model.backbone.trainable = False`) during the Warmup phase. This stabilizes the large pre-trained weights and allows the newly initialized head/neck layers to learn effective feature representations first.
* **Batch Normalization (BN) Freezing:** **Batch Normalization layers** are explicitly set to `trainable = False` across the entire backbone. This is a common practice when fine-tuning with a small batch size per replica (common on TPU/multi-GPU) to prevent BN running average statistics from becoming unstable.
* **Optimizer:** The **AdamW** optimizer is chosen, featuring decoupled weight decay for better regularization.
* **Loss Functions:** **Focal Loss** is used for classification (addressing class imbalance between background and wheat heads), and **CIoU Loss** (Complete IoU) is used for robust bounding box regression.

In [None]:
def create_model():
    # Instantiate YOLOv8-M backbone with COCO pre-trained weights
    backbone = keras_cv.models.YOLOV8Backbone.from_preset(
        'yolo_v8_m_backbone_coco',
        name= 'yolov8_backbone'
    )
    # Instantiate the full YOLOv8 Detector model
    model = keras_cv.models.YOLOV8Detector(
        num_classes= NUM_CLASSES,
        bounding_box_format= 'xyxy',
        fpn_depth= 3, # Standard Feature Pyramid Network (FPN) depth
        backbone= backbone,
        name= 'yolov8_detector'
    )
    return model

In [None]:
with strategy.scope():
    
    model = create_model()

    # Initially freeze the backbone weights
    model.backbone.trainable = False

    # Explicitly freeze Batch Normalization layers across the backbone
    for layer in model.backbone.layers:
        if isinstance(layer, tf.keras.layers.BatchNormalization):
            layer.trainable = False

    model.summary()
    
    # Define the AdamW optimizer with the Warmup learning rate
    optimizer = tf.keras.optimizers.AdamW(
    learning_rate= WARMUP_LR,
    weight_decay= 1e-3, # Decoupled weight decay
    beta_1= 0.9,
    beta_2= 0.999,
    global_clipnorm= GLOBAL_CLIPNORM) # Apply gradient clipping

    classification_loss = keras_cv.losses.FocalLoss() # Used for classification

    # Compile the model with specified loss functions
    model.compile(
        optimizer= optimizer,
        classification_loss= classification_loss,
        box_loss= 'ciou',
        # Steps per execution: Increase performance on TPU/XLA by compiling a larger training graph
        steps_per_execution= 32 if isinstance(strategy, tf.distribute.TPUStrategy) else 1
    )

---
### 5.2 Custom COCO Metrics Callback for Evaluation

A custom Keras callback, `EvaluateCOCOMetricsCallback`, is implemented to calculate the competition's primary metric, **Mean Average Precision (MaP)**, at the end of each epoch.

This implementation uses `keras_cv.metrics.BoxCOCOMetrics` but manually collects all predictions and ground truths from the entire validation dataset (`self.data`) into large, single tensors. This is necessary because calling `metrics.update_state()` sequentially on batches can yield inaccurate results for global metrics like COCO MaP.

The callback also handles **saving the best model checkpoint** based on the highest achieved MaP score.

In [None]:
class EvaluateCOCOMetricsCallback(tf.keras.callbacks.Callback):
    def __init__(self, data, save_path):
        super().__init__()
        self.data = data
        self.metrics = keras_cv.metrics.BoxCOCOMetrics(
            bounding_box_format="xyxy",
            evaluate_freq=1e9,  # Set to a high number as evaluation is triggered manually in on_epoch_end
        )
        self.save_path = save_path
        self.best_map = -1.0 # Tracks the highest MaP achieved

    def on_epoch_end(self, epoch, logs=None):
        logs = logs or {}
        self.metrics.reset_state()

        # ---- START: MODIFIED SECTION ----
        # 1. Create lists to hold all ground truth and prediction data
        y_true_list = []
        y_pred_list = []

        # 2. Iterate through the entire validation dataset to collect data
        for images, y_true in self.data:
            # Predict on the current batch
            y_pred = self.model.predict(images, verbose=0)
            y_true_list.append(y_true)
            y_pred_list.append(y_pred)

        # 3. Concatenate all batches into single, large ragged tensors
        # Concatenate ground truth (boxes and classes)
        y_true_concat = {
            'boxes': tf.concat([item['boxes'] for item in y_true_list], axis=0),
            'classes': tf.concat([item['classes'] for item in y_true_list], axis=0)
        }
        # Concatenate predictions (boxes, classes, and confidence)
        y_pred_concat = {
            'boxes': tf.concat([item['boxes'] for item in y_pred_list], axis=0),
            'classes': tf.concat([item['classes'] for item in y_pred_list], axis=0),
            'confidence': tf.concat([item['confidence'] for item in y_pred_list], axis=0)
        }
        # ---- END: MODIFIED SECTION ----

        # 4. Update the metric's state ONCE with the full dataset for accurate COCO metrics
        self.metrics.update_state(y_true_concat, y_pred_concat)

        # 5. Get the final results
        metrics = self.metrics.result(force=True)
        logs.update(metrics)

        current_map = metrics["MaP"]
        
        # Manually print the validation metrics for visibility
        print(f"\nEpoch {epoch+1}: Validation Metrics")
        for key, value in metrics.items():
            print(f"  {key}: {value:.4f}")

        # Model checkpointing logic  
        if current_map > self.best_map:
            self.best_map = current_map
            # Save the model to the Kaggle working directory
            self.model.save(self.save_path)
            print(f"‚úÖ Validation MaP improved to {current_map:.4f}. Model saved to {self.save_path}")

        return logs

### 5.3 Phase 1: Warmup Training

This phase trains the model for a short duration (`WARMUP_EPOCH=10`) using a relatively high, fixed learning rate (`WARMUP_LR`). The strong augmentation pipeline (`train_strong_dataset`) is used here.

The primary goal is to quickly train the uninitialized layers (head and neck) of the detector while the large pre-trained backbone remains stable (frozen). This prevents catastrophic forgetting and allows the model to find a reasonable starting point in the weight space.

In [None]:
# Define the save path for the best model of the Warmup phase(you can use your own path)
phase1_saved_path = "/kaggle/working/warmup_best_model.keras"
# Instantiate Callbacks
coco_cb = EvaluateCOCOMetricsCallback(val_dataset, 
                                      save_path= phase1_saved_path,
                                      )
early_stopping_cb = EarlyStopping(
    monitor= 'MaP',
    patience= 3, # Stop if MaP doesn't improve for 3 epochs
    restore_best_weights= True,
    mode= 'max'
)

reduce_lr_cb = ReduceLROnPlateau(
    monitor= 'MaP',
    patience= 3,
    factor= 0.66,
    min_lr= WARMUP_LR * 0.1,
    verbose= 1
)

# TensorBoard Object if you want to use
tb_cb = TensorBoard(
    log_dir= '/kaggle/working/logs',
    histogram_freq= 1
)

callbacks = [
    coco_cb,
    early_stopping_cb,
    reduce_lr_cb,
    tb_cb
]

In [None]:
print("--- Starting Phase 1: Warmup Training ---")
# Train the Warmup Model
history = model.fit(train_strong_dataset.repeat(), 
                    validation_data= val_dataset.repeat(),
                    epochs= WARMUP_EPOCH,
                    callbacks= [callbacks],
                    steps_per_epoch= steps_per_epoch,
                    validation_steps= validation_steps)

---
### 5.4 Phase 2: Mid-Tune Training

In this phase, the training transitions from only updating the detector head to **unfreezing the lower layers of the backbone** for more specialized feature extraction.

**Key Configuration:**
* **Model Loading:** The best weights and Model saved from the Warmup phase are loaded. Note that the path `/kaggle/input/wheat-detection/keras/warmup/1/warmup_best_model.keras` points to the pre-uploaded version of the best model, ensuring a stable starting point.
* **Layer Unfreezing:** The backbone is partially unfrozen, starting from the layer named `stack4_downsample_conv`. This selectively trains the deeper feature extraction layers.
* **Learning Rate Schedule:** A **Cosine Decay schedule** is implemented. This schedule starts at a reasonable rate (`FINE_TUNE_BB_LR`) and smoothly decreases over the duration of the phase, allowing for fine optimization without large destructive updates.
* **Augmentation:** The strong augmentation is replaced with the milder `train_light_dataset` to further stabilize training.

In [None]:
START_UNFREEZE_LAYER_NAME = 'stack4_downsample_conv'
with strategy.scope():
    print("Loading model from warmup phase...")
    # Load the best model from the previous Warmup phase
    model = tf.keras.models.load_model(
        # The model is loaded from a pre-uploaded Kaggle Model
        '/kaggle/input/wheat-detection/keras/warmup/1/warmup_best_model.keras',
            custom_objects = {
                'YOLOV8Detector': keras_cv.models.YOLOV8Detector,
                'YOLOV8Backbone': keras_cv.models.YOLOV8Backbone
            }
    )
    print("Model loaded successfully. Ready for Mid-Tune phase !")
    
    # 1. Unfreeze the entire backbone
    model.backbone.trainable = True
    unfreeze_checkpoint = False

    # 2. Implement partial unfreezing (freeze the first few stacks)
    for layer in model.backbone.layers:
        if layer.name == START_UNFREEZE_LAYER_NAME:
            unfreeze_checkpoint = True
        if unfreeze_checkpoint:
            layer.trainable = True
        else:
            layer.trainable = False

        # Ensure BN layers remain frozen regardless of backbone trainable status
        if isinstance(layer, tf.keras.layers.BatchNormalization):
            layer.trainable = False

    # Also check and freeze BN layers in the Neck and Head of the detector
    for layer in model.layers:
        if isinstance(layer, tf.keras.layers.BatchNormalization):
            layer.trainable = False
    
    # Configure Cosine Decay Learning Rate schedule
    num_phase2_epochs = INTERMEDIATE_EPOCH - WARMUP_EPOCH
    decay_steps = int(steps_per_epoch * num_phase2_epochs)
    learning_rate = tf.keras.optimizers.schedules.CosineDecay(
        initial_learning_rate=FINE_TUNE_BB_LR,
        decay_steps=decay_steps,
        alpha=0.1 # End LR will be 10% of initial LR (1e-5)
    )

    optimizer = tf.keras.optimizers.AdamW(
        learning_rate = learning_rate,
        weight_decay = 1e-4,
        beta_1 = 0.9,
        beta_2 = 0.999,
        global_clipnorm = GLOBAL_CLIPNORM
    )

    classification_loss = keras_cv.losses.FocalLoss()
    
    # Recompile the model with the new, scheduled optimizer
    model.compile(
        optimizer = optimizer,
        classification_loss = classification_loss,
        box_loss = 'ciou',
        steps_per_execution= 32 if isinstance(strategy, tf.distribute.TPUStrategy) else 1
    )
    print("\n--- Model configured for Phase 2: Mid-Tune ---")

In [None]:
# Define Phase 2 callbacks
phase2_saved_path = "/kaggle/working/midtune_best_model.keras"
coco_cb = EvaluateCOCOMetricsCallback(val_dataset, 
                                       phase2_saved_path)
early_stopping_cb = EarlyStopping(
    monitor= 'MaP',
    patience= 5, # Increased patience for finer tuning
    restore_best_weights= True,
    mode= 'max'
)

tb_cb = TensorBoard(
    log_dir= '/kaggle/working/logs',
    histogram_freq= 1
)

callbacks = [
    coco_cb,
    early_stopping_cb,
    tb_cb
]

In [None]:
print("--- Starting Phase 2: Mid-Tune Training ---")
# Train the model for Mid-Tune, starting from WARMUP_EPOCH
final_history = model.fit(
    train_light_dataset.repeat(),
    epochs= INTERMEDIATE_EPOCH,
    initial_epoch= WARMUP_EPOCH,
    validation_data= val_dataset.repeat(),
    steps_per_epoch= steps_per_epoch,
    validation_steps= validation_steps,
    callbacks= callbacks
)

---
### 5.5 Phase 3: Fine-Tune Training

This final phase focuses on maximizing the model's performance by making small, global adjustments to all layers.

**Key Configuration:**
* **Model Loading:** The best Model from the Mid-Tune phase are loaded.that Version 2 of my Model that already give the link.
* **Full Unfreezing:** All layers in the entire model (backbone, neck, and head) are set to **trainable**, except for the BN layers, which remain frozen.
* **Learning Rate Schedule:** The learning rate is set to a very low value (`FINE_TUNE_MODEL_LR`) and decays further. This ensures that weight updates are minimal, allowing the model to settle into the best possible configuration without destructive jumps.
* **Augmentation:** The mild `train_light_dataset` is maintained.

In [None]:
with strategy.scope():
    print("Loading model from mid-tune phase...")
    # Load the best model from the previous Mid-Tune phase
    model = tf.keras.models.load_model(
        '/kaggle/input/wheat-detection/keras/warmup/2/midtune_best_model.keras',
            custom_objects = {
                'YOLOV8Detector': keras_cv.models.YOLOV8Detector,
                'YOLOV8Backbone': keras_cv.models.YOLOV8Backbone
            }
    )
    print("Model loaded successfully. Ready for Fine-Tune phase !")
    
    # Set all layers in the backbone to trainable (Full Unfreezing)
    model.backbone.trainable = True

    # Ensure BN layers are frozen across the entire detector model (backbone, neck, head)
    for layer in model.backbone.layers:
        if isinstance(layer, tf.keras.layers.BatchNormalization):
            layer.trainable = False

    for layer in model.layers: # Iterate through all layers of the detector model
    # Note: We re-check for BN to catch those in the Neck and Head
        if isinstance(layer, tf.keras.layers.BatchNormalization):
            layer.trainable = False
    
    # Configure Cosine Decay Learning Rate schedule with a very low initial LR
    num_phase3_epochs = FINAL_EPOCH - INTERMEDIATE_EPOCH
    decay_steps = int(steps_per_epoch * num_phase3_epochs)
    learning_rate = tf.keras.optimizers.schedules.CosineDecay(
        initial_learning_rate=FINE_TUNE_MODEL_LR,
        decay_steps=decay_steps,
        alpha=0.1 # End LR will be 10% of initial LR (5e-6)
    )

    optimizer = tf.keras.optimizers.AdamW(
        learning_rate = learning_rate,
        weight_decay = 1e-4,
        beta_1 = 0.9,
        beta_2 = 0.999,
        global_clipnorm = GLOBAL_CLIPNORM
    )

    classification_loss = keras_cv.losses.FocalLoss()
    
    # Recompile the model with the final optimizer settings
    model.compile(
        optimizer = optimizer,
        classification_loss = classification_loss,
        box_loss = 'ciou',
        steps_per_execution= 32 if isinstance(strategy, tf.distribute.TPUStrategy) else 1
    )
    print("\n--- Model configured for Phase 3: Fine-Tune ---")

In [None]:
# Define Phase 3 callbacks
phase3_saved_path = "/kaggle/working/finetune_best_model.keras"
coco_cb = EvaluateCOCOMetricsCallback(val_dataset, 
                                       phase3_saved_path)
early_stopping_cb = EarlyStopping(
    monitor= 'MaP',
    patience= 8, # Highest patience to allow for slow, subtle improvements
    restore_best_weights= True,
    mode= 'max'
)

tb_cb = TensorBoard(
    log_dir= '/kaggle/working/logs',
    histogram_freq= 1
)

callbacks = [
    coco_cb,
    early_stopping_cb,
    tb_cb
]

In [None]:
print("--- Starting Phase 3: Fine-Tune Training ---")
# Train the model, continuing from INTERMEDIATE_EPOCH, Final Model.
final_history = model.fit(
    train_light_dataset.repeat(),
    epochs= FINAL_EPOCH,
    initial_epoch= INTERMEDIATE_EPOCH,
    validation_data= val_dataset.repeat(),
    steps_per_epoch= steps_per_epoch,
    validation_steps= validation_steps,
    callbacks= callbacks
)

---

## üìà 6. Final Performance Analysis 

This section provides the complete performance analysis, combining the original Sections 6.1 through 6.4.

### 6.1 Phase 1: Warmup Training Summary (Epochs 1-10) üöÄ
This phase focused on training the detector head while the pre-trained backbone was frozen.

| Metric | Start (Epoch 1) | Best (Epoch 10) | Improvement |
| :--- | :---: | :---: | :---: |
| **Validation MaP** | 0.2530 | 0.4617 | +0.2087 |
| MaP@[IoU=50] | 0.5888 | 0.8272 | +0.2384 |
| Training Box Loss | 1.9803 | 1.2328 | -0.7475 |

**Analysis:** Rapid and substantial convergence, establishing a strong baseline performance. The high $\text{MaP}@[IoU=50]$ indicated the model quickly learned to locate objects.

### 6.2 Phase 2: Mid-Tune Training Summary (Epochs 11-30) ‚öôÔ∏è
This phase partially unfreezing deeper backbone layers and used a Cosine Decay Learning Rate schedule.

| Metric | Start (Epoch 11) | Peak (Epoch 28) | End (Epoch 30) |
| :--- | :---: | :---: | :---: |
| **Validation MaP** | 0.4816 | **0.5063** | 0.5052 |
| MaP@[IoU=75] | 0.4894 | 0.5262 | 0.5246 |
| Training Box Loss | 1.1631 | 1.0507 | 1.0480 |

**Analysis:** The model crossed the $0.50$ MaP threshold and specialized, showing significant gains in $\text{MaP}@[IoU=75]$, which translates to **higher spatial accuracy and tighter bounding boxes**. The model was well-converged for this set of trainable parameters.

### 6.3 Phase 3: Fine-Tune Training Summary (Epochs 31-51) ‚ú®
This phase involved fully unfreezing the entire backbone and continuing the low learning rate cosine decay schedule, aiming for marginal gains by fine-tuning all weights. (The provided logs stop at Epoch 51 out of 80).

| Metric | Start (Epoch 31) | **Peak (Epoch 43)** | End (Epoch 51) | Change (Start to Peak) |
| :--- | :---: | :---: | :---: | :---: |
| **Validation MaP** | 0.4983 | **0.5127** | 0.5101 | **+0.0144** |
| MaP@[IoU=50] | 0.8570 | 0.8704 | 0.8694 | +0.0134 |
| MaP@[IoU=75] | 0.5142 | 0.5331 (E46) | 0.5285 | +0.0189 |
| Training Box Loss | 1.0589 | 1.0081 | 0.9734 | -0.0855 |

**Observations (Epochs 31-51):**

* **Peak Performance:** The $\text{MaP}$ reached its highest point yet at **$0.5127$ in Epoch 43**. This small but crucial improvement confirms that further fine-tuning all layers slightly enhanced performance.
* **Loss Reduction:** The training loss continued its steady decline, dropping below $1.0$ (to $0.9734$ by Epoch 51).
* **Specialization:** Noticeable improvements were seen in $\text{MaP}@[area=small]$, indicating the deeper fine-tuning helped the model detect **tiny or distant wheat heads more effectively**.

### 6.4 Final Conclusion and Contextual Evaluation

Based on the three-phase training strategy, the model has achieved substantial performance growth and stability:

| Phase | Epoch Range | MaP Improvement | Key Takeaway |
| :---: | :---: | :---: | :--- |
| **Warmup** | 1-10 | +0.2087 | Rapid baseline establishment (Head Training). |
| **Mid-Tune** | 11-30 | +0.0446 | Bounding box refinement (Partial Backbone Fine-Tuning). |
| **Fine-Tune** | 31-51 | +0.0144 | Final performance ceiling, improved detection of small objects (Full Backbone Fine-Tuning). |

The **best recorded $\text{MaP}$** achieved throughout the entire training process is **$0.5127$ (Epoch 43)**. This validates the multi-stage fine-tuning strategy for adapting the large pre-trained YOLOv8 model.

**Contextual Note:** Achieving better results is highly difficult due to the competition's data characteristics‚Äîthe high density of small, often overlapping wheat heads. This imposes a practical ceiling on the achievable $\text{MaP}$. Techniques like **CutMix, MixUp, or Random Erasing** could be explored in future runs to improve generalization against occlusion and scale challenges.

---

## üíæ 7. Submission Strategy & Next Steps

This section details the final plan for model deployment and submission.

### 7.1 Model Saving and Checkpoints

The three best models from this training pipeline‚Äî**Warmup\_Best\_Model, Midtune\_Best\_Model, and Finetune\_Best\_Model**‚Äîhave been saved and versioned according to their peak performance epochs.

These models are publicly available in the dedicated My Kaggle repository, which serves as the source for the submission notebook:  
[Wheat Detection Models](https://www.kaggle.com/models/amirmohamadaskari/wheat-detection)

### 7.2 Final Submission Workflow

For the final submission, the following workflow is being employed to ensure **clarity and separation of work**:

1.  A **new, dedicated inference notebook** will be created, separate from the primary training script.
2.  This notebook will load the **Final Fine-Tuned Model** (corresponding to the best $\text{MaP}$ of $0.5127$ from Epoch 43) directly from the **Wheat Detection Models** repository.
3.  The notebook will then apply **Test-Time Augmentation (TTA)**, which involves running the model multiple times on augmented versions of the test images and combining the results (e.g., via non-maximum suppression or weighted box fusion).
4.  This strategic separation ensures the submission process is clean, optimized for inference speed, and clearly distinguishes the final prediction logic from the training experiments.