# This is a simplified version of How to Train YOLOv8 on a Custom Dataset
You can access the original version [here](https://colab.research.google.com/github/roboflow-ai/notebooks/blob/main/notebooks/train-yolov8-object-detection-on-custom-dataset.ipynb).


## The Original Tutorial use Google Colab, but I reccomend you to use Kaggle instead for faster training time.

If you are running this notebook in ~~Google Colab~~ Kaggle, navigate to ~~`Edit` -> `Notebook settings` -> `Hardware accelerator`~~ `Right of Page` -> `Notebook options` -> `Accelerator` -> `GPU T4x2`. This will enable GPU acceleration. Also don't forget to turn on the `internet connection`.

## Steps in this Tutorial

In this tutorial, we are going to cover:

- Creating a Roboflow Dataset
- Exporting a Roboflow Dataset
- Training YOLOv8 on a Custom Dataset
- Running Inference on Test Images

## For Quick Start, Please edit the following variables:
- 

**Let's begin!**

## Install YOLOv8

⚠️ YOLOv8 is still under heavy development. Breaking changes are being introduced almost weekly. We strive to make our YOLOv8 notebooks work with the latest version of the library. Last tests took place on **27.01.2023** with version **YOLOv8.0.20**.

If you notice that our notebook behaves incorrectly - especially if you experience errors that prevent you from going through the tutorial - don't hesitate! Let us know and open an [issue](https://github.com/roboflow/notebooks/issues) on the Roboflow Notebooks repository.

YOLOv8 can be installed in two ways - from the source and via pip. This is because it is the first iteration of YOLO to have an official package.

In [None]:
# Pip install method (recommended)

!pip install ultralytics==8.0.20

from IPython import display
display.clear_output()

import ultralytics
ultralytics.checks()

from ultralytics import YOLO

from IPython.display import display, Image

## Preparing a custom dataset

Building a custom dataset can be a painful process. It might take dozens or even hundreds of hours to collect images, label them, and export them in the proper format. Fortunately, Roboflow makes this process as straightforward and fast as possible. Let me show you how!

### Step 1: Creating project

Before you start, you need to create a Roboflow [account](https://app.roboflow.com/login). Once you do that, you can create a new project in the Roboflow [dashboard](https://app.roboflow.com/). Keep in mind to choose the right project type. In our case, Object Detection.

<div align="center">
  <img
    width="640"
    src="https://media.roboflow.com/preparing-custom-dataset-example/creating-project.gif?ik-sdk-version=javascript-1.4.3&updatedAt=1672929799852"
  >
</div>

### Step 2: Uploading images

Next, add the data to your newly created project. You can do it via API or through our [web interface](https://docs.roboflow.com/adding-data/object-detection).

If you drag and drop a directory with a dataset in a supported format, the Roboflow dashboard will automatically read the images and annotations together.

<div align="center">
  <img
    width="640"
    src="https://media.roboflow.com/preparing-custom-dataset-example/uploading-images.gif?ik-sdk-version=javascript-1.4.3&updatedAt=1672929808290"
  >
</div>

### Step 3: Labeling

If you only have images, you can label them in [Roboflow Annotate](https://docs.roboflow.com/annotate).

<div align="center">
  <img
    width="640"
    src="https://user-images.githubusercontent.com/26109316/210901980-04861efd-dfc0-4a01-9373-13a36b5e1df4.gif"
  >
</div>

### Step 4: Generate new dataset version

Now that we have our images and annotations added, we can Generate a Dataset Version. When Generating a Version, you may elect to add preprocessing and augmentations. This step is completely optional, however, it can allow you to significantly improve the robustness of your model.

<div align="center">
  <img
    width="640"
    src="https://media.roboflow.com/preparing-custom-dataset-example/generate-new-version.gif?ik-sdk-version=javascript-1.4.3&updatedAt=1673003597834"
  >
</div>

### Step 5: Exporting dataset

Once the dataset version is generated, we have a hosted dataset we can load directly into our notebook for easy training. Click `Export` and select the `YOLO v8 PyTorch` dataset format.

<div align="center">
  <img
    width="640"
    src="https://media.roboflow.com/preparing-custom-dataset-example/export.gif?ik-sdk-version=javascript-1.4.3&updatedAt=1672943313709"
  >
</div>

### Step 6: Copying dataset link

Finally, we need to copy the dataset link. We will use it in the next step to load the dataset into our notebook.


In [None]:
import os
HOME = os.getcwd()
print(HOME)

!mkdir {HOME}/datasets
%cd {HOME}/datasets

### Paste your Roboflow Download Link Here 👇
#TODO: paste your Roboflow download link here
# !pip install roboflow

# from roboflow import Roboflow
# rf = Roboflow(api_key="ABCDEFGHIJKLMNOPQRSTUVWXYZ")
# project = rf.workspace("pusing-rrwop").project("pusing-safety")
# dataset = project.version(1).download("yolov8")

## Before we start training
Let's detect if we could use GPU by running the following code. remember the index of the GPU that you are going to use.

In [None]:
!nvidia-smi

## Custom Training
Tips for training:
1. To use GPU, give the "device=" argument. Use number like 0,1,2,3,etc for multiple GPU according to the detected GPU in the code before.

2. For faster training, use cache=True. It will load all of your dataset to memory for faster training so watch out for your GPU RAM usage.

3. GPU RAM usage is effected by the number of image in dataset, batch size, and image size. The higher the number, the higher the RAM usage so choose wisely. (I use 1100 image with batch=16 and imgsz=1080 and it use +-15GB of RAM out of 16GB given by Kaggle)

4. Epoch is the number of training iteration. The higher the number, the better the result but it will take longer time to train (too much epoch will cause overfitting)

5. Batch is the number of sample trained at once. The higher the number, the faster the training but it will use more GPU RAM (too much batch will cause out of memory)

6. imgz is the image size in pixel. It will resize the biggest side of the image to the given pixel. Must be divisible by batch.


In [None]:
%cd {HOME}

#Train
#TODO: Modify epochs, batch, imgsz, and device as desired
!yolo task=detect mode=train model=yolov8s.pt data={dataset.location}/data.yaml epochs=500 batch=8 imgsz=1080 plots=True device=0,1 cache=True

## Result
Before we see the result, Let's adjust our code to use the model we want. We need to adjust it if you Set your notebook to have Persistent Files. You can adjust which model you want to use by looking at `/kaggle/working/runs/detect/`. For the latest model, use the highest number

<img src="./assets/choosing-model.png"></img>

In [None]:
#TODO: Modify with the desired model number if you want to run all and use the newest model, then use the highest number + 1
#Use empty string for the first model 
NUMBER_OF_MODEL = "8"

In [None]:
%cd {HOME}
Image(filename=f'{HOME}/runs/detect/train{NUMBER_OF_MODEL}/confusion_matrix.png', width=1080)

In [None]:
%cd {HOME}
Image(filename=f'{HOME}/runs/detect/train{NUMBER_OF_MODEL}/results.png', width=1080)

In [None]:
%cd {HOME}
Image(filename=f'{HOME}/runs/detect/train{NUMBER_OF_MODEL}/val_batch0_pred.jpg', width=1080)

## Inference with Custom Model
Before we start inference, let's adjust some variable so that it will display the latest interference. You can adjust it by looking at `/kaggle/working/runs/detect/train<number of prediction>`. For the latest prediction, use the **highest number + 1**

In [None]:
#TODO: Modify with the desired prediction number if you want to run all and use the newest prediction, then use the highest number + 1
#Use empty string for the first prediction
NUMBER_OF_PREDICTION = "5"

In [None]:
%cd {HOME}
!yolo task=detect mode=predict model={HOME}/runs/detect/train{NUMBER_OF_MODEL}/weights/best.pt conf=0.25 source={dataset.location}/test/images save=True

**NOTE:** Let's take a look at few results.

In [None]:
import glob
from IPython.display import Image, display

for image_path in glob.glob(f'{HOME}/runs/detect/predict{NUMBER_OF_PREDICTION}/*.jpg'):
      display(Image(filename=image_path, width=600))
      print("\n")

## Save the model
To save the model, Locate your model in `/kaggle/working/runs/detect/train{NUMBER_OF_MODEL}/exp/weights/` and download them.

<img src="./assets/download-model.png"></img>