# Combined Notebook: Animal Detection Data Processing and YOLOv10 Training

This notebook combines the data processing and YOLOv10 training steps.

**Workflow:**
1.  **Setup:** Import libraries, define paths, set up Kaggle download (optional).
2.  **Dataset Download (Optional):** Download dataset from Kaggle.
3.  **Data Loading & Filtering:** Load annotations, filter for relevant animals.
4.  **Data Augmentation:** Define and apply augmentations to the training and validation sets.
5.  **YOLO Preparation:** Convert annotations to YOLO format and create `data.yaml`.
6.  **YOLOv10 Training:** Train the YOLOv10 model using the prepared dataset.
7.  **Validation:** Evaluate the trained model on the validation set.
8.  **Inference:** Run detection on a sample video.


Dataset Located here - 
https://www.kaggle.com/datasets/goelyash/animal-dataset

## 1. Setup: Imports and Configuration

In [1]:
# Install necessary libraries
!pip install torch torchvision torchaudio ultralytics pandas opencv-python matplotlib pyyaml albumentations tqdm




[notice] A new release of pip is available: 25.0.1 -> 25.1.1
[notice] To update, run: python.exe -m pip install --upgrade pip


In [2]:
import os
import cv2
import pandas as pd
import albumentations as A
import numpy as np
import matplotlib.pyplot as plt
import yaml
import shutil
import torch
from ultralytics import YOLO
import warnings
#import kaggle

# Suppress specific warnings from albumentations if needed (optional)
warnings.filterwarnings("ignore", category=UserWarning, module='albumentations')

# --- Configuration ---
# !!! IMPORTANT: Update this path to your main project directory !!!
base_dir = r'C:\Users\Tomer\Documents\DATASET\Notebooks' # Use raw string for Windows paths

# Define subdirectories relative to base_dir
annotation_dir = os.path.join(base_dir, "Annotation")
train_img_dir_orig = os.path.join(base_dir, "Train")  # Original Train images
val_img_dir_orig = os.path.join(base_dir, "Val")    # Original Validation images
aug_train_dir = os.path.join(base_dir, "Augmented_Train") # Augmented Train images
aug_val_dir = os.path.join(base_dir, "Augmented_Val")   # Augmented Val images
yolo_dir = os.path.join(base_dir, "YOLO")            # YOLOv10 specific files
#kaggle_dataset_path = 'your-kaggle-username/your-dataset-name' # !!! IMPORTANT: Update this !!!
#kaggle_download_dir = base_dir # Download directly into base_dir or a subdirectory

# Create directories if they don't exist
os.makedirs(annotation_dir, exist_ok=True)
os.makedirs(aug_train_dir, exist_ok=True)
os.makedirs(aug_val_dir, exist_ok=True)
os.makedirs(yolo_dir, exist_ok=True)

# Define original annotation file paths
train_ann_file_orig = os.path.join(annotation_dir, "train_annotation.csv")
val_ann_file_orig = os.path.join(annotation_dir, "val_annotation.csv")

# Define augmented annotation file paths
aug_train_ann_file = os.path.join(annotation_dir, "aug_train_annotation.csv")
aug_val_ann_file = os.path.join(annotation_dir, "aug_val_annotation.csv")

# List of animals (classes) to monitor/detect
animals_to_monitor = [
    'African crocodile',
    'African elephant',
    'American alligator',
    'American black bear',
    'Arctic fox',
    'baboon',
    'badger',
    'bear',
    'beaver',
    'bison',
    'brown bear',
    'capuchin',
    'cheetah',
    'cougar',
    'coyote',
    'crocodile',
    'dingo',
    'grey fox',
    'hare',
    'hog',
    'hyena',
    'ice bear',
    'jaguar',
    'Komodo dragon',
    'leopard',
    'lion',
    'lynx',
    'macaque',
    'marmot',
    'mink',
    'Old World buffalo',
    'otter',
    'ox',
    'porcupine',
    'python',
    'rabbit',
    'ram',
    'red fox',
    'red wolf',
    'skunk',
    'sloth bear',
    'snow leopard',
    'squirrel',
    'tiger',
    'timber wolf',
    'warthog',
    'water buffalo',
    'weasel',
    'white wolf',
    'wild boar',
    'wild dog',
    'wildcat',
    'wolf',
    'wood rabbit'
]


# Create mapping from animal name to class index
class_map = {animal: i for i, animal in enumerate(animals_to_monitor)}
print(f"Number of classes: {len(class_map)}")
#print("Class Map:", class_map) # Uncomment to view map

# --- Training Configuration ---
YOLO_MODEL_SIZE = 's' # e.g., 'n', 's', 'm', 'l', 'x'
PRETRAINED_WEIGHTS_FILE = f'yolov10{YOLO_MODEL_SIZE}.pt'
PRETRAINED_WEIGHTS_PATH = os.path.join(yolo_dir, PRETRAINED_WEIGHTS_FILE)
EPOCHS = 100 # Adjust as needed
DEVICE = 'cuda:0' if torch.cuda.is_available() else 'cpu'
print(f"Using device: {DEVICE}")

  check_for_updates()


Number of classes: 54
Using device: cpu


## 2. Dataset Download (Optional)

## 3. Data Loading & Filtering

Load the original annotation CSV files and filter them to include only images containing the `animals_to_monitor`.

## 4. Data Augmentation

Define augmentation pipelines using `albumentations` and apply them to the filtered training and validation datasets.

### Visualize an Example Augmented Image (Optional)

## 5. YOLO Preparation

1.  **Convert CSV Annotations to YOLO Format:** Create a `.txt` file for each image with normalized bounding box coordinates (`class_id x_center y_center width height`).
2.  **Organize Files:** Copy augmented images to the YOLO directory structure (`YOLO/train/images`, `YOLO/val/images`).
3.  **Create `data.yaml`:** Generate the configuration file required by YOLO for training.

In [None]:
# Define YOLO directory structure paths
yolo_train_images_dir = os.path.join(yolo_dir, "train", "images")
yolo_train_labels_dir = os.path.join(yolo_dir, "train", "labels")
yolo_val_images_dir = os.path.join(yolo_dir, "val", "images")
yolo_val_labels_dir = os.path.join(yolo_dir, "val", "labels")
data_yaml_path = os.path.join(yolo_dir, "data.yaml")

# Create directories if they don't exist
for d in [yolo_train_images_dir, yolo_train_labels_dir, yolo_val_images_dir, yolo_val_labels_dir]:
    os.makedirs(d, exist_ok=True)

print("YOLO directory structure created.")

## 6. YOLOv10 Training

Load a pre-trained YOLOv10 model and fine-tune it on the augmented dataset.

In [None]:
# Optional: Check GPU availability again
print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"CUDA version: {torch.version.cuda}")
    print(f"Device name: {torch.cuda.get_device_name(0)}")
else:
    print("CUDA not available, training will use CPU.")

In [None]:
# Download pre-trained weights if they don't exist (using YOLO class)
if not os.path.exists(PRETRAINED_WEIGHTS_PATH):
    print(f"Pre-trained weights '{PRETRAINED_WEIGHTS_FILE}' not found. Attempting to download...")
    try:
        # This will download the weights if not found locally in standard paths
        temp_model = YOLO(PRETRAINED_WEIGHTS_FILE) 
        # Check if download worked (ultralytics might place it elsewhere, find and move if necessary)
        # Usually downloads to a cache dir, let's assume YOLO handles finding it
        # If it needs to be explicitly in yolo_dir, add logic to find and move it.
        print(f"Downloaded {PRETRAINED_WEIGHTS_FILE} successfully (likely to cache). Training will use this.")
        # If you need it strictly in PRETRAINED_WEIGHTS_PATH, uncomment and adapt:
        # downloaded_path = ... # Find where ultralytics saved it
        # shutil.move(downloaded_path, PRETRAINED_WEIGHTS_PATH)
    except Exception as e:
        print(f"Error downloading pre-trained weights: {e}")
        print("Please download the weights manually and place them at:")
        print(PRETRAINED_WEIGHTS_PATH)
        # Optionally raise error to stop execution
        # raise
        PRETRAINED_WEIGHTS_PATH = PRETRAINED_WEIGHTS_FILE # Fallback to just name if download failed
else:
    print(f"Using existing pre-trained weights: {PRETRAINED_WEIGHTS_PATH}")

# Load the YOLOv10 model for training
# Use the filename (e.g., 'yolov10s.pt') - YOLO handles loading
model = YOLO(PRETRAINED_WEIGHTS_FILE) 

# Check if data.yaml exists
if not os.path.exists(data_yaml_path):
    print(f"Error: data.yaml not found at {data_yaml_path}. Cannot start training.")
    # raise FileNotFoundError(f"data.yaml not found at {data_yaml_path}")
else:
    print(f"Starting YOLOv10 training for {EPOCHS} epochs...")
    print(f"Using data configuration: {data_yaml_path}")
    print(f"Using device: {DEVICE}")
    
    try:
        # Start training
        # Common arguments: data, epochs, imgsz, device, project, name, batch, patience, workers
        results = model.train(
            data=data_yaml_path,
            epochs=EPOCHS,
            imgsz=640, # Default, adjust if needed
            device=DEVICE,
            project=yolo_dir, # Save results within YOLO directory
            name=f'yolov10_{YOLO_MODEL_SIZE}_custom_train', # Experiment name
            exist_ok=True, # Allow overwriting previous runs with the same name
            # batch=16, # Adjust based on GPU memory
            # patience=20, # Early stopping patience
            # workers=8 # Adjust based on CPU cores and system
        )
        print("\nTraining completed!")
        print(f"Model weights and results saved in: {results.save_dir}")
        # The best weights are usually saved as 'best.pt' in the experiment directory
        best_model_path = os.path.join(results.save_dir, 'weights', 'best.pt') 
        print(f"Best model saved at: {best_model_path}")
        
    except Exception as e:
        print(f"An error occurred during training: {e}")
        # You might want to investigate the error based on the traceback

# Define path to the best trained model for later use
# This assumes training ran successfully and results object is available
try:
    trained_model_path = os.path.join(results.save_dir, 'weights', 'best.pt')
    if not os.path.exists(trained_model_path):
         print(f"Warning: Best model path {trained_model_path} not found after training. Validation/Inference might fail.")
         # Fallback or define manually if needed
         trained_model_path = os.path.join(yolo_dir, f'yolov10_{YOLO_MODEL_SIZE}_custom_train', 'weights', 'best.pt') 
except NameError:
     print("Warning: Training results object not found. Define 'trained_model_path' manually if needed for validation/inference.")
     # Define manually if training was skipped or failed
     trained_model_path = os.path.join(yolo_dir, f'yolov10_{YOLO_MODEL_SIZE}_custom_train', 'weights', 'best.pt') # Adjust path if needed

## 7. Validation

Evaluate the performance of the trained model on the validation dataset.

In [None]:
print(f"Path to trained model for validation: {trained_model_path}")

if not os.path.exists(trained_model_path):
    print(f"Error: Trained model not found at {trained_model_path}. Cannot run validation.")
elif not os.path.exists(data_yaml_path):
     print(f"Error: data.yaml not found at {data_yaml_path}. Cannot run validation.")
else:
    print(f"\nLoading trained model from {trained_model_path} for validation...")
    try:
        # Load the best trained model
        model = YOLO(trained_model_path)
        
        # Run validation
        print(f"Running validation using device: {DEVICE}...") 
        # Note: You might want to use CPU ('cpu') for validation if GPU memory is limited after training
        validation_results = model.val(
            data=data_yaml_path,
            imgsz=640,
            device=DEVICE, 
            split='val', # Explicitly use the validation set
            project=yolo_dir,
            name=f'yolov10_{YOLO_MODEL_SIZE}_validation', # Separate folder for validation results
            exist_ok=True 
        )
        
        print("\n--- Validation Results Summary ---")
        # Access metrics through the results object (attributes might change slightly between ultralytics versions)
        # Common metrics are typically in validation_results.box
        map50_95 = getattr(validation_results.box, 'map', None) # mAP@0.5:0.95 (Primary metric)
        map50 = getattr(validation_results.box, 'map50', None)   # mAP@0.5
        # precision = getattr(validation_results.box, 'precision', None) # Precision
        # recall = getattr(validation_results.box, 'recall', None)       # Recall
        
        if map50_95 is not None:
            print(f"mAP@0.5:0.95: {map50_95:.4f}")
        if map50 is not None:
            print(f"mAP@0.5:    {map50:.4f}")
        # if precision is not None:
        #     print(f"Precision:    {precision:.4f}") # Usually available per class
        # if recall is not None:
        #     print(f"Recall:       {recall:.4f}")    # Usually available per class
        
        print(f"\nValidation metrics saved in: {validation_results.save_dir}")
        print("-------------------------------")
        
    except Exception as e:
        print(f"An error occurred during validation: {e}")


## 8. Inference on Video

Run the trained model on a sample video to detect animals.

In [None]:
def detect_animals_in_video_ultralytics(model_path, input_video_path, output_video_path, 
                                       confidence_threshold=0.25, device='cpu'):
    """
    Detects animals in a video using a trained YOLO model (via ultralytics)
    and saves an annotated output video.

    Args:
        model_path (str): Path to the trained YOLO model file (.pt).
        input_video_path (str): Path to the input video file.
        output_video_path (str): Path to save the annotated output video.
        confidence_threshold (float): Minimum confidence score for detections.
        device (str): Device to run inference on ('cpu', 'cuda:0', etc.).
    """

    if not os.path.exists(model_path):
        print(f"Error: Model file not found at {model_path}")
        return
        
    if not os.path.exists(input_video_path):
        print(f"Error: Input video not found at {input_video_path}")
        return

    try:
        # Load the trained model
        model = YOLO(model_path)
        print(f"Model loaded successfully from {model_path}")

        # Run inference on the video stream
        print(f"Starting inference on {input_video_path}...")
        # The predict method handles frame iteration, drawing, and saving
        # Key args: source, save, show, conf, device, project, name, stream=True
        results_generator = model.predict(
            source=input_video_path,
            save=False, # We will save manually for better control
            stream=True, # Process video frame by frame
            conf=confidence_threshold,
            device=device,
            verbose=False # Reduce console output during prediction
        )

        # --- Video Writer Setup ---
        cap = cv2.VideoCapture(input_video_path)
        if not cap.isOpened():
            print(f"Error: Cannot open video capture for {input_video_path}")
            return
        
        fps = cap.get(cv2.CAP_PROP_FPS)
        width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
        height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
        fourcc = cv2.VideoWriter_fourcc(*'mp4v') # Codec for .mp4
        out = cv2.VideoWriter(output_video_path, fourcc, fps, (width, height))
        print(f"Output video setup: {output_video_path} ({width}x{height} @ {fps:.2f} FPS)")
        cap.release() # Release capture, ultralytics handles reading
        # --- Process frames --- 
        frame_count = 0
        for results in results_generator:
            # The 'results' object contains detections for the current frame
            # Use the built-in plot() method to get the annotated frame
            annotated_frame = results.plot() 
            
            # Write the annotated frame to the output video
            out.write(annotated_frame)
            frame_count += 1
            if frame_count % 100 == 0:
                 print(f"Processed {frame_count} frames...")
            
        # --- Clean Up ---
        out.release()
        cv2.destroyAllWindows()
        print(f"\nFinished processing video.")
        print(f"Annotated video saved to: {output_video_path}")

    except Exception as e:
        print(f"An error occurred during video inference: {e}")
        # Clean up resources if an error occurs mid-process
        if 'out' in locals() and out.isOpened():
            out.release()
        cv2.destroyAllWindows()

# --- Example Usage ---
# Define paths for video inference
input_video = os.path.join(base_dir, 'Wolves.mp4')   # !!! IMPORTANT: Update with your video file !!!
output_video = os.path.join(base_dir, 'Analyzed_Wolves_ultralytics.mp4')

# Check if input video exists
if not os.path.exists(input_video):
    print(f"\nInput video for inference not found: {input_video}")
    print("Please place your test video file at this location or update the 'input_video' path.")
elif not os.path.exists(trained_model_path):
    print(f"\nTrained model path not found: {trained_model_path}")
    print("Cannot run video inference without a trained model.")
else:
    # Run detection on the video
    # Use the same device as training/validation or specify 'cpu'
    detect_animals_in_video_ultralytics(
        model_path=trained_model_path, 
        input_video_path=input_video, 
        output_video_path=output_video,
        confidence_threshold=0.4, # Adjust confidence as needed
        device=DEVICE # Use the globally defined device
    )

## End of Notebook