# 🐛 YOLOv8 Insect Detection Training on Google Colab

**Project**: Insect Detection Training Project  
**Purpose**: Train custom YOLOv8 models for beetle detection using Roboflow dataset  
**Environment**: Google Colaboratory with GPU acceleration  

---

## 📋 Overview

This notebook provides an interactive training pipeline for YOLOv8 insect detection models. It includes:

- ✅ Automated environment setup
- ✅ GPU configuration and verification
- ✅ Dataset preparation and validation
- ✅ Interactive model training
- ✅ Real-time progress monitoring
- ✅ Model evaluation and export
- ✅ Results visualization

---

## ⚡ Quick Start

1. **Enable GPU**: Go to Runtime → Change runtime type → Select GPU
2. **Run all cells**: Click Runtime → Run all
3. **Upload dataset**: Follow prompts to upload your dataset
4. **Monitor training**: Watch real-time training progress
5. **Download results**: Save trained models to Google Drive

---

## 🛠️ Step 1: Environment Setup and Library Installation

In [None]:
# Install required libraries
print("🔧 Installing required libraries...")

!pip install ultralytics roboflow supervision
!pip install --upgrade torch torchvision

print("✅ Installation completed!")

In [None]:
# Import required libraries
import os
import sys
import time
import shutil
import zipfile
from pathlib import Path
from datetime import datetime
import warnings
warnings.filterwarnings('ignore')

# Deep learning libraries
import torch
import torchvision
from ultralytics import YOLO

# Data manipulation and visualization
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from IPython.display import Image, display, HTML, clear_output
import cv2

# Google Colab specific
from google.colab import files, drive
import yaml

print("📚 Libraries imported successfully!")
print(f"🐍 Python version: {sys.version}")
print(f"🔥 PyTorch version: {torch.__version__}")
print(f"👁️ OpenCV version: {cv2.__version__}")

## 🚀 Step 2: GPU Configuration and System Verification

In [None]:
# Check GPU availability and configuration
def check_gpu_setup():
    print("🔍 Checking GPU configuration...")
    print("="*50)
    
    # Check CUDA availability
    cuda_available = torch.cuda.is_available()
    print(f"CUDA Available: {cuda_available}")
    
    if cuda_available:
        device_count = torch.cuda.device_count()
        print(f"GPU Count: {device_count}")
        
        for i in range(device_count):
            gpu_name = torch.cuda.get_device_name(i)
            gpu_memory = torch.cuda.get_device_properties(i).total_memory / 1e9
            print(f"GPU {i}: {gpu_name} ({gpu_memory:.1f} GB)")
        
        # Set device
        device = torch.device('cuda:0')
        print(f"\n✅ Using device: {device}")
        
        # Test GPU with a simple operation
        test_tensor = torch.rand(1000, 1000).to(device)
        result = torch.mm(test_tensor, test_tensor.t())
        print("✅ GPU test operation successful!")
        
    else:
        print("⚠️ GPU not available. Training will use CPU (slower).")
        device = torch.device('cpu')
    
    print("="*50)
    return device

# Run GPU check
training_device = check_gpu_setup()

In [None]:
# Display system information
def display_system_info():
    print("💻 System Information")
    print("="*40)
    
    # CPU information
    print(f"CPU cores: {os.cpu_count()}")
    
    # Memory information (approximate)
    import psutil
    memory = psutil.virtual_memory()
    print(f"RAM: {memory.total / 1e9:.1f} GB (Available: {memory.available / 1e9:.1f} GB)")
    
    # Disk space
    disk = psutil.disk_usage('/')
    print(f"Disk: {disk.total / 1e9:.1f} GB (Free: {disk.free / 1e9:.1f} GB)")
    
    print("\n🔧 Software Versions")
    print("="*40)
    print(f"Python: {sys.version.split()[0]}")
    print(f"PyTorch: {torch.__version__}")
    print(f"Torchvision: {torchvision.__version__}")
    print(f"OpenCV: {cv2.__version__}")
    print(f"NumPy: {np.__version__}")
    
display_system_info()

## 📁 Step 3: Google Drive Integration and Dataset Setup

In [None]:
# Mount Google Drive
print("📁 Mounting Google Drive...")
drive.mount('/content/drive')

# Create project directory in Google Drive
project_dir = Path('/content/drive/MyDrive/insect_detection_training')
project_dir.mkdir(exist_ok=True)

# Create subdirectories
(project_dir / 'datasets').mkdir(exist_ok=True)
(project_dir / 'models').mkdir(exist_ok=True)
(project_dir / 'results').mkdir(exist_ok=True)
(project_dir / 'logs').mkdir(exist_ok=True)

print(f"✅ Project directory created: {project_dir}")
print(f"📂 Working directory: {os.getcwd()}")

# Set working directory
os.chdir('/content')
print(f"📁 Changed to working directory: {os.getcwd()}")

## 📊 Step 4: Dataset Preparation and Upload

### Option A: Upload from Local Computer

In [None]:
# Option A: Upload dataset from local computer
def upload_dataset_local():
    print("📤 Upload your dataset ZIP file")
    print("Expected structure inside ZIP:")
    print("""
    dataset.zip
    ├── train/
    │   ├── images/
    │   └── labels/
    ├── valid/
    │   ├── images/
    │   └── labels/
    ├── test/
    │   ├── images/
    │   └── labels/
    └── data.yaml
    """)
    
    uploaded = files.upload()
    
    # Extract uploaded files
    for filename in uploaded.keys():
        if filename.endswith('.zip'):
            print(f"📦 Extracting {filename}...")
            with zipfile.ZipFile(filename, 'r') as zip_ref:
                zip_ref.extractall('datasets')
            print("✅ Dataset extracted successfully!")
            break
    else:
        print("❌ No ZIP file found. Please upload a ZIP file containing your dataset.")
        return False
    
    return True

# Uncomment the line below to upload dataset
# upload_success = upload_dataset_local()

### Option B: Download from Roboflow (Recommended)

In [None]:
# Option B: Download from Roboflow
def download_roboflow_dataset():
    print("🤖 Downloading beetle dataset from Roboflow...")
    
    try:
        from roboflow import Roboflow
        
        # Initialize Roboflow (you may need to set API key)
        # Get your API key from: https://app.roboflow.com/settings/api
        print("🔑 Please enter your Roboflow API key (or press Enter to skip):")
        api_key = input("API Key: ").strip()
        
        if api_key:
            rf = Roboflow(api_key=api_key)
            project = rf.workspace("z-algae-bilby").project("beetle")
            dataset = project.version(1).download("yolov8", location="datasets")
            print("✅ Dataset downloaded from Roboflow!")
            return True
        else:
            print("⚠️ No API key provided. You can download manually from:")
            print("https://universe.roboflow.com/z-algae-bilby/beetle/dataset/1")
            return False
            
    except Exception as e:
        print(f"❌ Error downloading from Roboflow: {e}")
        print("💡 Alternative: Download manually and upload using Option A")
        return False

# Download from Roboflow
download_success = download_roboflow_dataset()

### Option C: Manual Dataset Setup (For Testing)

In [None]:
# Option C: Create sample dataset structure for testing
def create_sample_dataset():
    print("🧪 Creating sample dataset structure for testing...")
    
    # Create directory structure
    base_dir = Path('datasets')
    for split in ['train', 'valid', 'test']:
        (base_dir / split / 'images').mkdir(parents=True, exist_ok=True)
        (base_dir / split / 'labels').mkdir(parents=True, exist_ok=True)
    
    # Create sample data.yaml
    data_yaml = {
        'train': './train/images',
        'val': './valid/images', 
        'test': './test/images',
        'nc': 1,
        'names': ['beetle'],
        'roboflow': {
            'workspace': 'z-algae-bilby',
            'project': 'beetle',
            'version': 1,
            'license': 'CC BY 4.0',
            'url': 'https://universe.roboflow.com/z-algae-bilby/beetle/dataset/1'
        }
    }
    
    with open(base_dir / 'data.yaml', 'w') as f:
        yaml.dump(data_yaml, f, default_flow_style=False)
    
    print("✅ Sample dataset structure created!")
    print("⚠️ Note: This is just a structure. You still need to add actual images and labels.")
    return True

# Uncomment to create sample structure
# create_sample_dataset()

## ✅ Step 5: Dataset Validation and Analysis

In [None]:
# Validate dataset structure and contents
def validate_dataset(dataset_path='datasets'):
    print("🔍 Validating dataset structure...")
    print("="*50)
    
    dataset_dir = Path(dataset_path)
    
    # Check if data.yaml exists
    data_yaml_path = dataset_dir / 'data.yaml'
    if not data_yaml_path.exists():
        print("❌ data.yaml not found!")
        return False
    
    # Load and display data.yaml
    with open(data_yaml_path, 'r') as f:
        data_config = yaml.safe_load(f)
    
    print("📄 Dataset Configuration (data.yaml):")
    for key, value in data_config.items():
        if key != 'roboflow':
            print(f"  {key}: {value}")
    
    # Check directory structure and count files
    results = {}
    for split in ['train', 'valid', 'test']:
        images_dir = dataset_dir / split / 'images'
        labels_dir = dataset_dir / split / 'labels'
        
        if images_dir.exists() and labels_dir.exists():
            image_files = list(images_dir.glob('*.[jp][pn]g')) + list(images_dir.glob('*.jpeg'))
            label_files = list(labels_dir.glob('*.txt'))
            
            results[split] = {
                'images': len(image_files),
                'labels': len(label_files)
            }
            
            status = "✅" if len(image_files) == len(label_files) and len(image_files) > 0 else "⚠️"
            print(f"{status} {split.upper()}: {len(image_files)} images, {len(label_files)} labels")
        else:
            print(f"❌ {split.upper()}: Directory not found")
            results[split] = {'images': 0, 'labels': 0}
    
    # Calculate total
    total_images = sum(split['images'] for split in results.values())
    total_labels = sum(split['labels'] for split in results.values())
    
    print(f"\n📊 TOTAL: {total_images} images, {total_labels} labels")
    
    if total_images > 0 and total_images == total_labels:
        print("✅ Dataset validation successful!")
        return True, data_config, results
    else:
        print("❌ Dataset validation failed!")
        return False, None, None

# Run validation
validation_success, dataset_config, dataset_stats = validate_dataset()

In [None]:
# Visualize dataset statistics
def visualize_dataset_stats(stats):
    if not stats:
        print("❌ No dataset statistics to display")
        return
    
    print("📊 Dataset Statistics Visualization")
    
    # Create bar plot
    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 5))
    
    # Plot 1: Images per split
    splits = list(stats.keys())
    image_counts = [stats[split]['images'] for split in splits]
    
    bars1 = ax1.bar(splits, image_counts, color=['#FF6B6B', '#4ECDC4', '#45B7D1'])
    ax1.set_title('Images per Split', fontsize=14, fontweight='bold')
    ax1.set_ylabel('Number of Images')
    
    # Add value labels on bars
    for bar, count in zip(bars1, image_counts):
        ax1.text(bar.get_x() + bar.get_width()/2, bar.get_height() + 1, 
                str(count), ha='center', va='bottom', fontweight='bold')
    
    # Plot 2: Split distribution pie chart
    total = sum(image_counts)
    percentages = [count/total*100 for count in image_counts]
    
    ax2.pie(percentages, labels=[f'{split}\n({count} images)' for split, count in zip(splits, image_counts)], 
            autopct='%1.1f%%', startangle=90, colors=['#FF6B6B', '#4ECDC4', '#45B7D1'])
    ax2.set_title('Dataset Split Distribution', fontsize=14, fontweight='bold')
    
    plt.tight_layout()
    plt.show()
    
    # Print summary
    print(f"\n📈 Dataset Summary:")
    print(f"Total Images: {total}")
    for split, count in zip(splits, image_counts):
        percentage = count/total*100
        print(f"{split.upper()}: {count} images ({percentage:.1f}%)")

# Visualize if validation was successful
if validation_success:
    visualize_dataset_stats(dataset_stats)
else:
    print("⚠️ Cannot visualize dataset - validation failed")

In [None]:
# Display sample images from dataset
def display_sample_images(dataset_path='datasets', num_samples=6):
    if not validation_success:
        print("⚠️ Cannot display samples - dataset validation failed")
        return
    
    print(f"🖼️ Displaying {num_samples} sample images from training set")
    
    dataset_dir = Path(dataset_path)
    train_images = list((dataset_dir / 'train' / 'images').glob('*.[jp][pn]g'))
    
    if len(train_images) == 0:
        print("❌ No images found in training set")
        return
    
    # Select random samples
    sample_images = np.random.choice(train_images, min(num_samples, len(train_images)), replace=False)
    
    # Create subplot
    cols = 3
    rows = (num_samples + cols - 1) // cols
    fig, axes = plt.subplots(rows, cols, figsize=(15, 5*rows))
    
    if rows == 1:
        axes = axes.reshape(1, -1)
    
    for i, img_path in enumerate(sample_images):
        row = i // cols
        col = i % cols
        
        # Load and display image
        img = cv2.imread(str(img_path))
        img_rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
        
        axes[row, col].imshow(img_rgb)
        axes[row, col].set_title(f"Sample {i+1}: {img_path.name}", fontsize=10)
        axes[row, col].axis('off')
    
    # Hide empty subplots
    for i in range(len(sample_images), rows * cols):
        row = i // cols
        col = i % cols
        axes[row, col].axis('off')
    
    plt.tight_layout()
    plt.show()

# Display sample images
display_sample_images()

## 🎯 Step 6: Training Configuration and Model Selection

In [None]:
# Training configuration
class TrainingConfig:
    """Training configuration class"""
    
    def __init__(self):
        # Model configuration
        self.model_size = 'n'  # n, s, m, l, x (nano, small, medium, large, extra-large)
        self.pretrained_model = f'yolov8{self.model_size}.pt'
        
        # Training parameters
        self.epochs = 100
        self.batch_size = 16  # Adjust based on GPU memory
        self.image_size = 640
        self.device = 'auto'  # auto, cpu, 0, 1, etc.
        
        # Data configuration
        self.data_yaml = 'datasets/data.yaml'
        
        # Output configuration
        self.project_name = 'training_results'
        self.experiment_name = 'beetle_detection_colab'
        
        # Advanced settings
        self.patience = 50  # Early stopping patience
        self.save_period = 10  # Save checkpoint every N epochs
        self.workers = 2  # Number of dataloader workers
        
        # Optimization
        self.optimizer = 'AdamW'  # SGD, Adam, AdamW
        self.lr0 = 0.01  # Initial learning rate
        self.weight_decay = 0.0005
        
    def display_config(self):
        """Display current configuration"""
        print("🎯 Training Configuration")
        print("="*40)
        print(f"Model: {self.pretrained_model}")
        print(f"Epochs: {self.epochs}")
        print(f"Batch Size: {self.batch_size}")
        print(f"Image Size: {self.image_size}")
        print(f"Device: {self.device}")
        print(f"Dataset: {self.data_yaml}")
        print(f"Project: {self.project_name}/{self.experiment_name}")
        print(f"Optimizer: {self.optimizer}")
        print(f"Learning Rate: {self.lr0}")
        print(f"Weight Decay: {self.weight_decay}")
        print("="*40)

# Create configuration
config = TrainingConfig()
config.display_config()

# GPU memory optimization
if torch.cuda.is_available():
    gpu_memory = torch.cuda.get_device_properties(0).total_memory / 1e9
    print(f"\n🎮 GPU Memory: {gpu_memory:.1f} GB")
    
    # Adjust batch size based on GPU memory
    if gpu_memory < 8:
        config.batch_size = 8
        print("⚡ Reduced batch size to 8 for GPU memory optimization")
    elif gpu_memory >= 16:
        config.batch_size = 32
        print("🚀 Increased batch size to 32 for better GPU utilization")
    
print(f"\n📊 Final Batch Size: {config.batch_size}")

## 🚀 Step 7: Model Training Execution

In [None]:
# Load pre-trained model
def load_pretrained_model(model_name):
    print(f"📥 Loading pre-trained model: {model_name}")
    
    try:
        model = YOLO(model_name)
        print(f"✅ Model loaded successfully!")
        
        # Display model info
        print(f"\n📋 Model Information:")
        print(f"Model file: {model_name}")
        print(f"Task: {model.task}")
        
        return model
    
    except Exception as e:
        print(f"❌ Error loading model: {e}")
        return None

# Load the model
model = load_pretrained_model(config.pretrained_model)

In [None]:
# Execute model training
def train_model(model, config):
    if model is None:
        print("❌ Cannot start training - model not loaded")
        return None
    
    if not validation_success:
        print("❌ Cannot start training - dataset validation failed")
        return None
    
    print("🚀 Starting model training...")
    print("⏱️ This may take a while depending on your configuration")
    print("📊 Training progress will be displayed below")
    print("="*60)
    
    # Record start time
    start_time = time.time()
    
    try:
        # Start training
        results = model.train(
            data=config.data_yaml,
            epochs=config.epochs,
            batch=config.batch_size,
            imgsz=config.image_size,
            device=config.device,
            project=config.project_name,
            name=config.experiment_name,
            save=True,
            save_period=config.save_period,
            patience=config.patience,
            workers=config.workers,
            optimizer=config.optimizer,
            lr0=config.lr0,
            weight_decay=config.weight_decay,
            val=True,
            plots=True,
            verbose=True
        )
        
        # Calculate training time
        training_time = time.time() - start_time
        hours = int(training_time // 3600)
        minutes = int((training_time % 3600) // 60)
        seconds = int(training_time % 60)
        
        print("="*60)
        print(f"✅ Training completed successfully!")
        print(f"⏱️ Total training time: {hours:02d}:{minutes:02d}:{seconds:02d}")
        print(f"📁 Results saved to: {config.project_name}/{config.experiment_name}")
        
        return results
        
    except Exception as e:
        print(f"❌ Training failed: {e}")
        return None

# Start training (this will take a while!)
print("⚠️ Warning: Training will start in 5 seconds...")
time.sleep(5)

training_results = train_model(model, config)

## 📊 Step 8: Training Results Analysis and Visualization

In [None]:
# Display training results
def display_training_results(results, config):
    if results is None:
        print("❌ No training results to display")
        return
    
    print("📊 Training Results Summary")
    print("="*50)
    
    # Results directory
    results_dir = Path(config.project_name) / config.experiment_name
    
    # Display key metrics if available
    if hasattr(results, 'results_dict'):
        metrics = results.results_dict
        print("🎯 Final Metrics:")
        for key, value in metrics.items():
            if isinstance(value, (int, float)):
                print(f"  {key}: {value:.4f}")
    
    # Check for results plots
    plots_to_show = [
        ('results.png', '📈 Training/Validation Curves'),
        ('confusion_matrix.png', '🎯 Confusion Matrix'),
        ('labels.jpg', '📊 Label Distribution'),
        ('val_batch0_pred.jpg', '🔍 Validation Predictions')
    ]
    
    for plot_file, title in plots_to_show:
        plot_path = results_dir / plot_file
        if plot_path.exists():
            print(f"\n{title}")
            display(Image(str(plot_path)))
        else:
            print(f"⚠️ {title} not found: {plot_path}")

# Display results
if training_results:
    display_training_results(training_results, config)
else:
    print("⚠️ No training results available to display")

In [None]:
# Load best model and run validation
def validate_trained_model(config):
    print("🧪 Loading best model for validation...")
    
    # Path to best model
    best_model_path = Path(config.project_name) / config.experiment_name / 'weights' / 'best.pt'
    
    if not best_model_path.exists():
        print(f"❌ Best model not found: {best_model_path}")
        return None
    
    try:
        # Load best model
        best_model = YOLO(str(best_model_path))
        print(f"✅ Best model loaded from: {best_model_path}")
        
        # Run validation
        print("\n🎯 Running validation on test set...")
        val_results = best_model.val(data=config.data_yaml)
        
        # Display validation metrics
        if hasattr(val_results, 'box'):
            box_metrics = val_results.box
            print("\n📊 Validation Metrics:")
            print(f"  mAP@0.5: {box_metrics.map50:.4f}")
            print(f"  mAP@0.5:0.95: {box_metrics.map:.4f}")
            print(f"  Precision: {box_metrics.mp:.4f}")
            print(f"  Recall: {box_metrics.mr:.4f}")
        
        return best_model, val_results
        
    except Exception as e:
        print(f"❌ Validation failed: {e}")
        return None, None

# Run validation if training was successful
if training_results:
    best_model, validation_results = validate_trained_model(config)
else:
    print("⚠️ Skipping validation - training was not completed")
    best_model, validation_results = None, None

## 🔍 Step 9: Model Inference and Testing

In [None]:
# Test model inference on sample images
def test_model_inference(model, config, num_samples=4):
    if model is None:
        print("❌ No model available for testing")
        return
    
    print(f"🔍 Testing model inference on {num_samples} sample images...")
    
    # Get test images
    test_images_dir = Path('datasets/test/images')
    if not test_images_dir.exists():
        # Fallback to validation images
        test_images_dir = Path('datasets/valid/images')
    
    if not test_images_dir.exists():
        print("❌ No test images found")
        return
    
    # Get sample images
    image_files = list(test_images_dir.glob('*.[jp][pn]g'))
    if len(image_files) == 0:
        print("❌ No image files found")
        return
    
    sample_images = np.random.choice(image_files, min(num_samples, len(image_files)), replace=False)
    
    # Create subplot for results
    fig, axes = plt.subplots(2, 2, figsize=(15, 12))
    axes = axes.flatten()
    
    for i, img_path in enumerate(sample_images):
        if i >= len(axes):
            break
        
        try:
            # Run inference
            results = model(str(img_path))
            
            # Get annotated image
            annotated_img = results[0].plot()
            
            # Convert BGR to RGB for matplotlib
            annotated_img_rgb = cv2.cvtColor(annotated_img, cv2.COLOR_BGR2RGB)
            
            # Display image
            axes[i].imshow(annotated_img_rgb)
            
            # Get detection info
            detections = len(results[0].boxes) if results[0].boxes is not None else 0
            confidence = results[0].boxes.conf.max().item() if detections > 0 else 0
            
            axes[i].set_title(f"{img_path.name}\nDetections: {detections}, Max Conf: {confidence:.3f}", 
                            fontsize=10)
            axes[i].axis('off')
            
        except Exception as e:
            print(f"❌ Error processing {img_path.name}: {e}")
            axes[i].text(0.5, 0.5, f"Error: {str(e)[:50]}...", 
                        ha='center', va='center', transform=axes[i].transAxes)
            axes[i].axis('off')
    
    # Hide unused subplots
    for i in range(len(sample_images), len(axes)):
        axes[i].axis('off')
    
    plt.tight_layout()
    plt.suptitle('🔍 Model Inference Results', fontsize=16, fontweight='bold', y=1.02)
    plt.show()

# Test inference
if best_model:
    test_model_inference(best_model, config)
else:
    print("⚠️ Skipping inference test - no trained model available")

## 💾 Step 10: Model Export and Download

In [None]:
# Export model to different formats
def export_trained_model(model, config, formats=['onnx', 'torchscript']):
    if model is None:
        print("❌ No model available for export")
        return
    
    print(f"📦 Exporting model to formats: {formats}")
    
    exported_files = []
    
    for format_type in formats:
        try:
            print(f"\n🔄 Exporting to {format_type.upper()}...")
            export_path = model.export(format=format_type, imgsz=config.image_size)
            exported_files.append(export_path)
            print(f"✅ {format_type.upper()} export successful: {export_path}")
            
        except Exception as e:
            print(f"❌ {format_type.upper()} export failed: {e}")
    
    if exported_files:
        print(f"\n🎉 Successfully exported {len(exported_files)} model formats")
        for file_path in exported_files:
            print(f"  📄 {file_path}")
    
    return exported_files

# Export model
if best_model:
    exported_models = export_trained_model(best_model, config)
else:
    print("⚠️ Skipping model export - no trained model available")
    exported_models = []

In [None]:
# Copy results to Google Drive
def copy_results_to_drive(config):
    print("💾 Copying training results to Google Drive...")
    
    # Source directory
    source_dir = Path(config.project_name) / config.experiment_name
    
    # Destination directory in Google Drive
    drive_dir = Path('/content/drive/MyDrive/insect_detection_training/results') / config.experiment_name
    
    if not source_dir.exists():
        print(f"❌ Source directory not found: {source_dir}")
        return False
    
    try:
        # Create destination directory
        drive_dir.mkdir(parents=True, exist_ok=True)
        
        # Copy entire results directory
        import shutil
        shutil.copytree(source_dir, drive_dir, dirs_exist_ok=True)
        
        print(f"✅ Results copied to: {drive_dir}")
        
        # List important files
        important_files = [
            'weights/best.pt',
            'weights/last.pt', 
            'results.png',
            'confusion_matrix.png'
        ]
        
        print("\n📁 Important files in Google Drive:")
        for file_path in important_files:
            full_path = drive_dir / file_path
            if full_path.exists():
                size_mb = full_path.stat().st_size / (1024 * 1024)
                print(f"  ✅ {file_path} ({size_mb:.1f} MB)")
            else:
                print(f"  ❌ {file_path} (not found)")
        
        return True
        
    except Exception as e:
        print(f"❌ Error copying to Google Drive: {e}")
        return False

# Copy results to Google Drive
if training_results:
    copy_success = copy_results_to_drive(config)
else:
    print("⚠️ Skipping copy to Google Drive - no training results available")

In [None]:
# Download trained models to local computer
def download_models(config):
    print("⬇️ Preparing model files for download...")
    
    results_dir = Path(config.project_name) / config.experiment_name
    weights_dir = results_dir / 'weights'
    
    if not weights_dir.exists():
        print(f"❌ Weights directory not found: {weights_dir}")
        return
    
    # Files to download
    download_files = {
        'best.pt': 'Best model weights',
        'last.pt': 'Last epoch weights'
    }
    
    print("\n📥 Available for download:")
    
    for filename, description in download_files.items():
        file_path = weights_dir / filename
        if file_path.exists():
            size_mb = file_path.stat().st_size / (1024 * 1024)
            print(f"  📄 {filename}: {description} ({size_mb:.1f} MB)")
            
            # Trigger download
            try:
                files.download(str(file_path))
                print(f"  ✅ {filename} download initiated")
            except Exception as e:
                print(f"  ❌ Error downloading {filename}: {e}")
        else:
            print(f"  ❌ {filename}: Not found")
    
    # Also download results plot
    results_plot = results_dir / 'results.png'
    if results_plot.exists():
        try:
            files.download(str(results_plot))
            print(f"  ✅ results.png download initiated")
        except Exception as e:
            print(f"  ❌ Error downloading results.png: {e}")

# Download models
if training_results:
    download_models(config)
else:
    print("⚠️ No models available for download")

## 📋 Step 11: Training Summary and Next Steps

In [None]:
# Generate training summary
def generate_training_summary(config, training_results, validation_results):
    print("📋 TRAINING SUMMARY")
    print("="*60)
    
    # Basic information
    print(f"🎯 Project: {config.project_name}/{config.experiment_name}")
    print(f"🤖 Model: {config.pretrained_model}")
    print(f"📊 Dataset: {config.data_yaml}")
    print(f"⚙️ Configuration:")
    print(f"   - Epochs: {config.epochs}")
    print(f"   - Batch Size: {config.batch_size}")
    print(f"   - Image Size: {config.image_size}")
    print(f"   - Device: {config.device}")
    
    # Training status
    if training_results:
        print(f"\n✅ Training Status: COMPLETED")
        
        # Validation metrics
        if validation_results and hasattr(validation_results, 'box'):
            box_metrics = validation_results.box
            print(f"\n📊 Final Metrics:")
            print(f"   - mAP@0.5: {box_metrics.map50:.4f}")
            print(f"   - mAP@0.5:0.95: {box_metrics.map:.4f}")
            print(f"   - Precision: {box_metrics.mp:.4f}")
            print(f"   - Recall: {box_metrics.mr:.4f}")
            
            # Performance evaluation
            if box_metrics.map50 >= 0.7:
                print(f"   🎉 EXCELLENT: Model meets target performance (mAP@0.5 ≥ 0.7)")
            elif box_metrics.map50 >= 0.5:
                print(f"   ✅ GOOD: Model shows good performance (mAP@0.5 ≥ 0.5)")
            else:
                print(f"   ⚠️ FAIR: Model needs improvement (mAP@0.5 < 0.5)")
    else:
        print(f"\n❌ Training Status: FAILED or INCOMPLETE")
    
    # File locations
    results_dir = Path(config.project_name) / config.experiment_name
    drive_dir = Path('/content/drive/MyDrive/insect_detection_training/results') / config.experiment_name
    
    print(f"\n📁 Output Locations:")
    print(f"   - Local: {results_dir}")
    print(f"   - Google Drive: {drive_dir}")
    
    # Next steps
    print(f"\n🚀 Next Steps:")
    print(f"   1. Download model files (best.pt) for deployment")
    print(f"   2. Test model on new images")
    print(f"   3. Deploy to production environment")
    print(f"   4. Monitor performance on real data")
    
    if training_results:
        print(f"\n💡 Optimization Tips:")
        if validation_results and hasattr(validation_results, 'box'):
            if validation_results.box.map50 < 0.7:
                print(f"   - Try training for more epochs")
                print(f"   - Increase model size (yolov8s or yolov8m)")
                print(f"   - Add more training data")
                print(f"   - Adjust data augmentation")
            else:
                print(f"   - Model performance is good!")
                print(f"   - Consider model compression for deployment")
                print(f"   - Test on edge devices (Raspberry Pi)")
    
    print("="*60)

# Generate summary
generate_training_summary(config, training_results, validation_results)

In [None]:
# Usage examples for trained model
def show_usage_examples(config):
    print("💻 USAGE EXAMPLES")
    print("="*50)
    
    model_path = f"{config.project_name}/{config.experiment_name}/weights/best.pt"
    
    print("\n🐍 Python Usage:")
    print("```python")
    print("from ultralytics import YOLO")
    print("")
    print(f"# Load trained model")
    print(f"model = YOLO('{model_path}')")
    print("")
    print("# Run inference on single image")
    print("results = model('path/to/image.jpg')")
    print("")
    print("# Run inference on multiple images")
    print("results = model(['img1.jpg', 'img2.jpg'])")
    print("")
    print("# Save results with annotations")
    print("for r in results:")
    print("    r.save(filename='result.jpg')")
    print("```")
    
    print("\n🖥️ Command Line Usage:")
    print("```bash")
    print(f"# Single image prediction")
    print(f"yolo predict model={model_path} source=image.jpg")
    print("")
    print(f"# Batch prediction")
    print(f"yolo predict model={model_path} source=images_folder/")
    print("")
    print(f"# Video prediction")
    print(f"yolo predict model={model_path} source=video.mp4")
    print("```")
    
    print("\n🌐 Integration with detect_insect.py:")
    print("```bash")
    print(f"# Use trained model with detection script")
    print(f"python detect_insect.py \\")
    print(f"    --input input_images/ \\")
    print(f"    --output output_images/ \\")
    print(f"    --model {model_path}")
    print("```")
    
    print("\n📱 Export for Edge Deployment:")
    print("```python")
    print("# Export to ONNX for cross-platform deployment")
    print(f"model = YOLO('{model_path}')")
    print("model.export(format='onnx')")
    print("")
    print("# Export to TensorRT for NVIDIA GPUs")
    print("model.export(format='engine')")
    print("```")

# Show usage examples
if training_results:
    show_usage_examples(config)
else:
    print("⚠️ No usage examples available - training was not completed")

---

## 🎉 Training Complete!

**Congratulations!** You have successfully completed the YOLOv8 training pipeline for insect detection.

### 📋 What You've Accomplished:
- ✅ Set up GPU-accelerated training environment
- ✅ Prepared and validated beetle detection dataset
- ✅ Trained custom YOLOv8 model
- ✅ Evaluated model performance
- ✅ Exported model for deployment
- ✅ Saved results to Google Drive

### 🚀 Next Steps:
1. **Download your trained model** (`best.pt`) for local use
2. **Test the model** on new beetle images
3. **Deploy to production** using the provided usage examples
4. **Monitor performance** and retrain as needed

### 📚 Resources:
- [YOLOv8 Documentation](https://docs.ultralytics.com/)
- [Model Deployment Guide](https://docs.ultralytics.com/modes/export/)
- [Performance Optimization](https://docs.ultralytics.com/guides/model-optimization/)

---

*🐛 Happy beetle detecting! 🐛*