# BoXYZ - Ultralytics Evaluate YOLO Box

⚠️⚠️⚠️ WARNING ⚠️⚠️⚠️ **MAKE SURE YOU DOWNLOADED AND PROCESSED THE SCD CARTON DATASET BY [RUNNING NOTEBOOK 2.1](./2.1_download_preprocess_datasets.ipynb)**

Here I evaluate the SCD carton dataset on the instance segmentation task using YOLOv9 (compact) and YOLOv11 (small and medium).


Following the approach in the [SCD paper](https://www.mdpi.com/1424-8220/22/10/3617), I fine-tuned on the LSCD after training on the OSCD. Although this is how the authors described the training for the object detection task, it is not clear how they trained/pretrained the models for the instance segmentation task: OSCD followed by LSCD or both cobined? (I went with the former OSCD -> LSCD).

In [None]:
!pip install ultralytics

In [None]:
import os

from ultralytics import YOLO

DS_LOCATION = os.environ.get('DS_LOCATION', 
                             "/media/abawi/e38fddf9-a92e-4c73-b905-995771f8fc3a/datasets/segmentation")

## YOLOv9c Segmentation

In [None]:
model_name = "train_2.2B_2_ft_lscd_yolo9c_epoch17:50"

## Overview and results

I used the [YOLOv9c model (Generalized Efficient Layer Aggregation Network [GELAN])](https://arxiv.org/abs/2402.13616), fine-tuning it on the OSCD dataset followed by the LSCD dataset. The YOLOv9c is a "compact" variant of the YOLOv9 family that is larger (25.5M parameters) than the medium (20.1M parameters) and small (7.2M parameters) variants. I chose most hyper-parameters based on non-exhastive trial and error, on YOLOv9c's default settings, SCD's settings as reported in the [SCD paper](https://www.mdpi.com/1424-8220/22/10/3617), best-practices (previous observations done by myself), or due to resource limitation (I'm using an RTX 3080Ti to run everything locally as specified in the requirements).

**(TODO (fabawi): image defaults to 1024x1024 in yollo11, check if similar here)**

During the OSCD pretraining, none of the layers were frozen and were trained using the AdamW optimizer with a batch size of 64 (mini-batch 8 with gradient accumulation upto 64). The learning rate is automatically determined at 0.00125 with a momentum of 0.9). The remaining hyper-parameters were set to their defaults based on the YOLOv9c settings in [ultralytics](https://www.ultralytics.com/). The image size was set to [600,1000] following SCD's training+finetuning specs for the object detection task: pretraining image size was [600,1000] in the SCD paper although the task was different.

During the LSCD fine-tuning phase, 10 layers were frozen (because the images are larger and the OSCD low-level features won't have to be altered significantly to work with LSCD) and the remaining params were trained using the AdamW optimizer as well. The batch size was set to 4 (following the SCD's default mini-batch size for training models). Here we also enabled cos annealing with a final learning rate of $1e^{-5}$ since the classes and distribution of OSCD and LSCD are similar and we want to escape local minima but not catastrophically forget the features learned from OSCD.

The model was pretrained for 17 epochs on the OSCD dataset ($m\!A\!P\!_{seg} = 81.3$) followed by finetuning (resumed) for 50 epochs on the LSCD dataset ($m\!A\!P\!_{seg} = 86.5$) segmentation mask (all). This under-performs the [SCD paper's](https://www.mdpi.com/1424-8220/22/10/3617) results on the full SCD (combined evaluation and training on both OSCD and LSCD) compared to the SOLOv2 ($m\!A\!P\!_{seg} = 88.9$) and HTC ($m\!A\!P\!_{seg} = 89.6$)