# YOLOv8 Model Training on Google Colab

This notebook is designed for training a YOLOv8 model on Google Colab using a custom dataset stored on Google Drive.
It covers:
1. Mounting Google Drive.
2. Installing the `ultralytics` library.
3. Setting up paths for the dataset and project output.
4. Training the YOLOv8 model.
5. Information on where the trained model and results are saved.

In [None]:
# Mount Google Drive
from google.colab import drive
drive.mount('/content/drive')

In [None]:
# Install the ultralytics library
!pip install ultralytics

## Setup Paths

Define the paths to your dataset's `data.yaml` file and the directory where you want to save your training results (e.g., model weights, logs).

In [None]:
import os

# --- IMPORTANT: SET THESE PATHS ACCORDING TO YOUR GOOGLE DRIVE STRUCTURE ---

# Path to your dataset's data.yaml file on Google Drive
# Example: '/content/drive/MyDrive/datasets/tennis_data/data.yaml'
dataset_yaml_path = '/content/drive/MyDrive/your_dataset_folder/data.yaml' # ★★★ MODIFY THIS ★★★

# Directory on Google Drive to save training outputs (weights, logs, etc.)
# Example: '/content/drive/MyDrive/YOLOv8_Tennis_Project'
project_output_dir = '/content/drive/MyDrive/YOLOv8_Training_Outputs' # ★★★ MODIFY THIS ★★★

# Name for this specific training run (a subdirectory will be created under project_output_dir)
experiment_name = 'tennis_detection_run1'

# --- END OF PATH SETUP ---

# Create the project output directory if it doesn't exist
os.makedirs(project_output_dir, exist_ok=True)

print(f"Dataset YAML path: {dataset_yaml_path}")
print(f"Project output directory: {project_output_dir}")
print(f"Experiment name: {experiment_name}")

# Verify if the dataset YAML file exists
if not os.path.exists(dataset_yaml_path):
    print(f"ERROR: Dataset YAML file not found at {dataset_yaml_path}")
    print("Please double-check the path.")
else:
    print("Dataset YAML file found.")

## Initialize and Train the YOLOv8 Model

In [None]:
from ultralytics import YOLO

# Load a pre-trained YOLOv8 model
# You can choose different sizes like 'yolov8n.pt', 'yolov8s.pt', 'yolov8m.pt', 'yolov8l.pt', 'yolov8x.pt'
# For starting, 'yolov8n.pt' (Nano) or 'yolov8s.pt' (Small) are good choices.
model_name = 'yolov8s.pt'
model = YOLO(model_name)

# Define training hyperparameters
num_epochs = 100  # Number of training epochs (e.g., 50, 100, 200)
batch_size = 16   # Batch size (adjust based on GPU memory, e.g., 8, 16, 32)
img_size = 640    # Input image size (e.g., 640, 1280)

print(f"Starting training with model: {model_name}")
print(f"Epochs: {num_epochs}, Batch Size: {batch_size}, Image Size: {img_size}")

# Start training
try:
    results = model.train(
        data=dataset_yaml_path,
        epochs=num_epochs,
        imgsz=img_size,
        batch=batch_size,
        project=project_output_dir, # Main directory for saving all runs
        name=experiment_name,       # Subdirectory for this specific run
        exist_ok=True,              # Allow overwriting if the experiment_name directory already exists
        # device=0,                 # Use GPU (0 for the first GPU, 'cpu' for CPU)
                                    # Colab usually assigns a GPU automatically if available.
        # workers=8,                # Number of dataloader workers (adjust based on Colab's CPU)
        # patience=30,              # Early stopping patience (e.g., stop if no improvement after 30 epochs)
        # lr0=0.01,                 # Initial learning rate
        # optimizer='AdamW',        # Optimizer (e.g., 'SGD', 'Adam', 'AdamW')
    )
    print("\nTraining completed successfully!")
    print(f"Results, logs, and model weights saved in: {os.path.join(project_output_dir, experiment_name)}")
except Exception as e:
    print(f"\nAn error occurred during training: {e}")

## Training Results and Saved Models

After training, the results, including metrics, confusion matrices, and model weights, will be saved in the directory you specified:
`{project_output_dir}/{experiment_name}`

Key files to look for:
- **Weights:** Inside the `weights` subdirectory (e.g., `best.pt`, `last.pt`).
  - `best.pt`: The model weights that achieved the best validation metric (usually mAP50-95). This is typically the model you'll use for inference.
  - `last.pt`: The model weights from the very last epoch of training.
- **Results CSV:** `results.csv` contains a summary of metrics per epoch.
- **Plots:** Various plots like confusion matrix, P-R curve, etc. (e.g., `confusion_matrix.png`, `PR_curve.png`).

You can download these files from your Google Drive.

In [None]:
# Example: List files in the experiment's weights directory
weights_dir = os.path.join(project_output_dir, experiment_name, 'weights')
if os.path.exists(weights_dir):
    print(f"\nFiles in the weights directory ({weights_dir}):")
    for f_name in os.listdir(weights_dir):
        print(f"- {f_name}")
else:
    print(f"\nWeights directory not found: {weights_dir}")
    print("This might be because training hasn't completed or an error occurred.")

# Path to the best trained model (useful for later inference)
best_model_path = os.path.join(weights_dir, 'best.pt')
print(f"\nPath to the best model: {best_model_path}")
if os.path.exists(best_model_path):
    print("Best model file (best.pt) found.")
else:
    print("Best model file (best.pt) not found. Training might have failed or not produced it.")