# Fine Tune YOLOv8 for Object Detection on Custom Dataset from Roboflow

## Contents

1. [Setup Environment](#setup-environment)
2. [Download dataset from Roboflow Universe](#download-dataset-from-roboflow-universe)
3. [Create and Export `data.yaml`](#create--export-datayaml)
4. [Initiate Model & Hyperparameters](#initiate-model--hyperparameters)
5. [Start Finetuning](#start-the-training-process)
6. [Evaluate Training](#validate-the-model)
7. [Export Model to Other Formats](#export-the-model-in-a-different-format)

## Setup Environment

To finetune a YOLOv8 model we need to install the following packages:

1. [__ultralytics__](https://docs.ultralytics.com/): Provides the backend code needed for fine tuning and inference of YOLOv8 models.
2. [__roboflow__](https://universe.roboflow.com/): An online repository for a wide selection of Computer Vision datasets. You can upload your own dataset, annotate your images, perform data augmentation and export the dataset onto your training session.
3. __PyYAML__: Needed to export `data.yaml` file which is needed by the dataloader to feed data and labels to the YOLOv8 model.

### Creating Virtual Environemnt

If you want to run this notebook on a local machine or any cloud VM with __NVIDIA GPU__, then it is recommended to create a virtual environment either using `conda` or `venv` environment manager. For Google Colab users, no need to create a virtual environment, simply install the dependencies.

#### `conda`

```bash
$ conda create -n yolo python=3.11
$ conda activate yolo
$ conda install jupyter ipython
```

#### `venv`

```bash
$ python -m venv venv
$ source venv/bin/activate
$ pip install jupyter ipython
```

In [None]:
# install dependencies
! pip install --upgrade --no-cache-dir --quiet ultralytics roboflow pyyaml tensorboard

## Downloading a Dataset from Roboflow Universe

1. Visit [Roboflow Universe](https://universe.roboflow.com/) and create an account. An account is necessary to obtain an API key, which is required for using the `roboflow` package.

2. Browse the available datasets on Roboflow Universe and select the one you wish to download.

3. Click on the **Download Dataset** button for your chosen dataset. This will redirect you to the dataset's download page.

4. From the available export formats, choose **YOLOv8**.

5. Click on **Show Download Code** and then hit **Continue**. 

6. Copy the code provided on the download page and paste it in the cell below. Remove the first line, `!pip install roboflow`, as the `roboflow` package should already be installed. 

7. Add `location` parameter to the line `dataset = version.download("yolov8")`. Set the parameter value to the string `datasets`. 

**Note:** _I have found that setting the name of the parent directory as `datasets` reduces chances of minor bugs. Also enables the use of yolo cli_

In [None]:
from roboflow import Roboflow
rf = Roboflow(api_key="API-KEY")
project = rf.workspace("large-benchmark-datasets").project("logistics-sz9jr")
version = project.version(2)
dataset = version.download("yolov8", location="datasets")

## Create & Export `data.yaml`

The `data.yaml` file acts as a configuration file that specifies how the training data is loaded and processed. It provides crucial information to the training process, essentially telling the model where to find the data and how to interpret it. The yaml file should have the following keys. You can manually create the `data.yaml` file but I prefer using the `pyyaml` library for this task. The `pyyaml` library converts a python dictionary into a yaml file.

**Note**: Certain roboflow datasets don't have a train-valid-test split but rather a simple train-valid or train-test split. In such scenarios you can replace the the `valid` key with the name of the test folder or replace the `test` key with the name of the validation folder.

In [None]:
# get the list of labels
labels = list(project.classes.keys())
nc = len(labels)

# prepare the dictionary
data_yaml = {
    "path": ".",
    "train": "train",
    "val": "valid",
    "test": "test",
    "nc": nc,
    "names": labels
}

# print(data_yaml)

In [None]:
import yaml

with open("data.yaml", "w") as fp:
    yaml.dump(data_yaml, fp, yaml.SafeDumper)

## Initiate Model & Hyperparameters

### Model

Ultralytics provides 5 different scaled YOLOv8 models for Object Detection. You can see their accuracy and performance by clicking on this [LINK](https://docs.ultralytics.com/tasks/detect/#models). If you want to train your model from scratch then replace the file extension of the model variant from `.pt` to `.yaml`. The __nano model__(`yolov8n.pt`) is the fastest performaning model with the lowest accuracy and __medium model__(`yolov8m.pt`) provides a good balance between both performance and accuracy.

In [None]:
from ultralytics import YOLO

model = YOLO("yolov8s.pt", task="detect")

### Hyperparameters

Click on the following [LINK](https://docs.ultralytics.com/usage/cfg/#train-settings) for list of all hyperparameters needed for fine tuning the model. These hyperparameters are to be passed to the model as a python dictionary.

In [None]:
simple_hyperparameters = dict(
    device = 0,
    data = "data.yaml",
    epochs = 20,
    batch = 16,
    seed = 1337
)

In [None]:
hyperparameters = dict(
    device = 0,
    data = "data.yaml",
    epochs = 20,
    batch = 16,
    optimizer = "AdamW",
    cos_lr = True,
    lr0 = 1e-4,
    warmup_epochs = 5,
    project = "object-detection",
    name = "training-1",
    verbose = True,
    seed = 1337
)

## Start the Training Process

In [None]:
_ = model.train(**hyperparameters)

## Validate the Model

In [None]:
_ = model.val()

## Export the model in a different format

You can check out all the supported exporting formats at this [LINK](https://docs.ultralytics.com/modes/export/#arguments).

In [None]:
! yolo export model=object-detector/test-run-3/weights/best.pt format=tflite