# AI Takeoff MVP - Training Notebook

This notebook trains a YOLOv8 model to detect construction elements (e.g., outlets, doors) on blueprint images.

## Prerequisites
- At least 5-10 labeled images in `data_labeled/images/`
- Corresponding YOLO format labels in `data_labeled/labels/`
- `dataset.yaml` configured with correct class names

## 1. Import Libraries

In [None]:
from ultralytics import YOLO
import os
from pathlib import Path
import yaml

print("✅ Libraries imported successfully")
print(f"Current working directory: {os.getcwd()}")

## 2. Verify Data Setup

In [None]:
# Check if data directories exist
data_labeled_path = Path('../data_labeled')
images_path = data_labeled_path / 'images'
labels_path = data_labeled_path / 'labels'
dataset_yaml = data_labeled_path / 'dataset.yaml'

print("📁 Checking data directories...")
print(f"Images directory exists: {images_path.exists()}")
print(f"Labels directory exists: {labels_path.exists()}")
print(f"Dataset YAML exists: {dataset_yaml.exists()}")

# Count images and labels
if images_path.exists():
    image_files = list(images_path.glob('*.png')) + list(images_path.glob('*.jpg')) + list(images_path.glob('*.jpeg'))
    print(f"\n📊 Found {len(image_files)} images")
    
if labels_path.exists():
    label_files = list(labels_path.glob('*.txt'))
    print(f"📊 Found {len(label_files)} label files")
    
if len(image_files) < 5:
    print("\n⚠️  WARNING: You have fewer than 5 images. Consider adding more for better training.")
else:
    print("\n✅ Data setup looks good!")

# Display dataset configuration
if dataset_yaml.exists():
    with open(dataset_yaml, 'r') as f:
        config = yaml.safe_load(f)
    print("\n📋 Dataset Configuration:")
    print(f"Classes: {config.get('names', {})}")

## 3. Load Pre-trained YOLOv8 Model

We'll use YOLOv8n (nano) - the smallest and fastest variant, perfect for MVP.

In [None]:
# Load a pre-trained YOLOv8n model
print("🔄 Loading pre-trained YOLOv8n model...")
model = YOLO('yolov8n.pt')  # This will download the model if not present
print("✅ Model loaded successfully!")

## 4. Train the Model

Training parameters:
- **epochs=20**: Number of training iterations (good for MVP)
- **imgsz=640**: Image size for training
- **batch=8**: Batch size (adjust based on your GPU memory)
- **patience=5**: Early stopping if no improvement

In [None]:
# Train the model
print("🚀 Starting training...\n")

results = model.train(
    data='../data_labeled/dataset.yaml',  # Path to dataset configuration
    epochs=20,                             # Number of training epochs
    imgsz=640,                             # Image size
    batch=8,                               # Batch size (reduce if out of memory)
    name='takeoff_mvp',                    # Experiment name
    patience=5,                            # Early stopping patience
    save=True,                             # Save checkpoints
    plots=True,                            # Generate training plots
    device='cpu'                           # Use 'cuda' if you have GPU, 'cpu' otherwise
)

print("\n✅ Training completed!")

## 5. Save the Trained Model

In [None]:
# The best model is automatically saved during training
# Let's copy it to our models directory for easy access

import shutil

# YOLOv8 saves the best model in runs/detect/takeoff_mvp/weights/best.pt
source_model = Path('runs/detect/takeoff_mvp/weights/best.pt')
target_model = Path('../models/best.pt')

if source_model.exists():
    target_model.parent.mkdir(parents=True, exist_ok=True)
    shutil.copy(source_model, target_model)
    print(f"✅ Model saved to: {target_model.absolute()}")
    print(f"📊 Model size: {target_model.stat().st_size / (1024*1024):.2f} MB")
else:
    print("⚠️  Could not find trained model. Check training output above.")

## 6. View Training Results

In [None]:
from IPython.display import Image, display
import matplotlib.pyplot as plt

# Display training results plot
results_plot = Path('runs/detect/takeoff_mvp/results.png')

if results_plot.exists():
    print("📈 Training Results:")
    display(Image(filename=str(results_plot)))
else:
    print("⚠️  Results plot not found")

# Display confusion matrix
confusion_matrix = Path('runs/detect/takeoff_mvp/confusion_matrix.png')
if confusion_matrix.exists():
    print("\n📊 Confusion Matrix:")
    display(Image(filename=str(confusion_matrix)))

## 7. Quick Test on Training Data

In [None]:
# Load the trained model
trained_model = YOLO('../models/best.pt')

# Test on one of the training images
if len(image_files) > 0:
    test_image = image_files[0]
    print(f"🔍 Testing on: {test_image.name}")
    
    results = trained_model(str(test_image))
    
    # Display results
    for result in results:
        # Get detection count
        num_detections = len(result.boxes)
        print(f"\n✅ Detected {num_detections} objects")
        
        # Show image with detections
        result_img = result.plot()
        plt.figure(figsize=(12, 8))
        plt.imshow(result_img[..., ::-1])  # Convert BGR to RGB
        plt.axis('off')
        plt.title(f'Detection Results: {num_detections} objects found')
        plt.tight_layout()
        plt.show()
else:
    print("⚠️  No images found for testing")

## Summary

✅ Training complete! Your model is saved at `../models/best.pt`

### Next Steps:
1. Open `test_mvp.ipynb` to run inference on new images
2. If accuracy is low, consider:
   - Adding more labeled training data
   - Training for more epochs
   - Adjusting the confidence threshold
3. Check the training plots above to understand model performance

### Key Metrics to Watch:
- **mAP50**: Mean Average Precision at 50% IoU (higher is better)
- **Precision**: How many detections were correct
- **Recall**: How many actual objects were detected