# Demo 1: YOLO Model Fine-Tuning

**Training object detection models for satellite imagery**

## What We're Doing:
- Fine-tune YOLO11 on satellite imagery
- Train for ship detection (ports) and vehicle detection (retail)
- Quick demo: 4-5 iterations to show the process

---

In [None]:
# Setup - Works both locally and in SageMaker
import sys
import os
from pathlib import Path

# Install dependencies in SageMaker
IS_SAGEMAKER = os.path.exists('/home/ec2-user/SageMaker') or os.environ.get('SM_MODEL_DIR') is not None

if IS_SAGEMAKER:
 print(' Installing dependencies...')
 import subprocess
 subprocess.run(['pip', 'install', 'ultralytics', 'opencv-python-headless', '-q'], check=True)
 print(' Dependencies installed')

# Core imports
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import warnings
warnings.filterwarnings('ignore')

# YOLO import
try:
 from ultralytics import YOLO
 YOLO_AVAILABLE = True
except ImportError:
 YOLO_AVAILABLE = False
 print(' YOLO not available - run: pip install ultralytics')

# Environment detection
if IS_SAGEMAKER:
 PROJECT_ROOT = Path('/home/ec2-user/SageMaker/Real-Time-Economic-Forecasting')
 USE_S3 = True
 print(' Running in AWS SageMaker')
else:
 PROJECT_ROOT = Path.cwd().parent.parent
 USE_S3 = False
 print(' Running locally')

# ===========================================
# S3 BUCKET CONFIGURATION (ACTUAL STRUCTURE)
# ===========================================
S3_RAW = 'economic-forecast-raw'
S3_MODELS = 'economic-forecast-models'
S3_PROCESSED = 'economic-forecast-processed'

# S3 Paths (matching actual bucket structure)
S3_PATHS = {
 'satellite': f's3://{S3_RAW}/satellite/google_earth',
 'port_la_images': f's3://{S3_RAW}/satellite/google_earth/Port_of_LA',
 'mall_images': f's3://{S3_RAW}/satellite/google_earth/Mall_of_america',
 'models': f's3://{S3_MODELS}/yolo',
 'port_model': f's3://{S3_MODELS}/yolo/ports/best.pt',
 'retail_model': f's3://{S3_MODELS}/yolo/retail/best.pt',
 'city_model': f's3://{S3_MODELS}/yolo/city/best.pt',
 'ais': f's3://{S3_PROCESSED}/ais',
 'ais_la': f's3://{S3_PROCESSED}/ais/Port_of_LA_ais_features.csv',
 'detections': f's3://{S3_PROCESSED}/detections',
 'news': f's3://{S3_RAW}/news/sentiment/data',
}

# Local paths
LOCAL_PATHS = {
 'satellite': PROJECT_ROOT / 'data' / 'raw' / 'satellite' / 'google_earth',
 'port_la_images': PROJECT_ROOT / 'data' / 'raw' / 'satellite' / 'google_earth' / 'Port_of_LA',
 'mall_images': PROJECT_ROOT / 'data' / 'raw' / 'satellite' / 'google_earth' / 'Mall_of_america',
 'models': PROJECT_ROOT / 'data' / 'models' / 'satellite',
 'port_model': PROJECT_ROOT / 'data' / 'models' / 'satellite' / 'ports_dota_yolo11_20251127_013205' / 'weights' / 'best.pt',
 'retail_model': PROJECT_ROOT / 'data' / 'models' / 'satellite' / 'retail_yolo11_20251126_150811' / 'weights' / 'best.pt',
 'ais': PROJECT_ROOT / 'data' / 'processed' / 'ais',
 'ais_la': PROJECT_ROOT / 'data' / 'processed' / 'ais' / 'Port_of_LA_ais_features.csv',
 'detections': PROJECT_ROOT / 'results' / 'annotations',
}

def get_path(key):
 '''Get path - S3 or local based on environment.'''
 if USE_S3:
  return S3_PATHS.get(key, S3_PATHS.get('satellite'))
 else:
  return LOCAL_PATHS.get(key, LOCAL_PATHS.get('satellite'))

def download_model(model_type='port'):
 '''Download model from S3 to local temp for inference.'''
 if not USE_S3:
  # Return local path
  if model_type == 'port':
   return LOCAL_PATHS['port_model']
  elif model_type == 'retail':
   return LOCAL_PATHS['retail_model']
  return None
 
 import boto3
 import tempfile
 
 s3 = boto3.client('s3')
 
 model_keys = {
  'port': 'yolo/ports/best.pt',
  'retail': 'yolo/retail/best.pt',
  'city': 'yolo/city/best.pt',
 }
 
 key = model_keys.get(model_type)
 if not key:
  print(f' Unknown model type: {model_type}')
  return None
 
 local_path = Path(tempfile.gettempdir()) / f'{model_type}_best.pt'
 
 if not local_path.exists():
  print(f' Downloading {model_type} model from S3...')
  s3.download_file(S3_MODELS, key, str(local_path))
  print(f' Model saved to {local_path}')
 else:
  print(f' Using cached model: {local_path}')
 
 return local_path

def list_s3_images(prefix):
 '''List images in S3 bucket.'''
 import boto3
 s3 = boto3.client('s3')
 
 # Parse bucket and prefix from s3:// path
 if prefix.startswith('s3://'):
  parts = prefix.replace('s3://', '').split('/', 1)
  bucket = parts[0]
  prefix = parts[1] if len(parts) > 1 else ''
 else:
  bucket = S3_RAW
 
 response = s3.list_objects_v2(Bucket=bucket, Prefix=prefix)
 
 images = []
 for obj in response.get('Contents', []):
  key = obj['Key']
  if key.endswith(('.jpg', '.jpeg', '.png', '.tif')):
   images.append(f's3://{bucket}/{key}')
 
 return images

def download_image(s3_path, local_dir='/tmp'):
 '''Download single image from S3.'''
 import boto3
 s3 = boto3.client('s3')
 
 parts = s3_path.replace('s3://', '').split('/', 1)
 bucket = parts[0]
 key = parts[1]
 
 filename = key.split('/')[-1]
 local_path = Path(local_dir) / filename
 
 s3.download_file(bucket, key, str(local_path))
 return local_path

print(f' Setup complete | S3: {USE_S3} | YOLO: {YOLO_AVAILABLE}')
print(f' Project: {PROJECT_ROOT}')


---
## Understanding YOLO Architecture

**YOLO (You Only Look Once)** - Real-time object detection

```
Input Image → Backbone (Feature Extraction) → Neck → Head → Detections
               ↓
            [class, x, y, w, h, conf]
```

In [None]:
# Load pre-trained YOLO11 model
print(" Loading YOLO11 base model...")
model = YOLO('yolo11n.pt') # nano version for fast demo

print("\n Model Architecture:")
print(f" • Model: YOLO11-nano")
print(f" • Parameters: ~2.6M")
print(f" • Pre-trained on: COCO dataset (80 classes)")
print(f" • Our task: Fine-tune for satellite imagery")

---
## Training Datasets

We use two specialized datasets:

| Dataset | Purpose | Classes |
|---------|---------|--------|
| **DOTA** | Aerial/satellite objects | ship, harbor, storage-tank, vehicle |
| **xView** | Overhead imagery | ships, vehicles, buildings |

In [None]:
# Show dataset structure
print(" TRAINING DATA STRUCTURE")
print("="*50)
print("""
data/models/
├── satellite/   # Port detection model
│ ├── train/
│ │ ├── images/  # Satellite images
│ │ └── labels/  # YOLO format annotations
│ └── valid/
│
└── retail/    # Vehicle detection model 
 ├── train/
 │ ├── images/  # Parking lot images
 │ └── labels/  # Car annotations
 └── valid/

Label Format (YOLO):
 class_id x_center y_center width height
 0   0.45  0.32  0.12 0.08
""")

---
## Fine-Tuning Demo (Quick Training)

 **For demo purposes**: Training only 5 epochs

In production, we trained for 100+ epochs

In [None]:
# Create a minimal dataset config for demo
demo_yaml = """
# Demo training config
path: ../data/models/satellite
train: train/images
val: valid/images

names:
 0: ship
 1: storage-tank
 2: harbor
 3: large-vehicle
 4: small-vehicle
"""

print(" Dataset Configuration:")
print(demo_yaml)

In [None]:
# DEMO: Train for just 2 epochs to show the process
# NOTE: In production, we trained for 100+ epochs

print(" STARTING DEMO TRAINING")
print("="*50)
print(" Demo mode: 2 epochs only (production: 100+)")
print()

# Check if training data exists
train_path = PROJECT_ROOT / 'data' / 'models' / 'satellite' / 'train' / 'images'

if train_path.exists() and len(list(train_path.glob('*'))) > 0:
 # Actual training demo
 model = YOLO('yolo11n.pt')
 results = model.train(
  data=str(PROJECT_ROOT / 'data' / 'models' / 'satellite' / 'data.yaml'),
  epochs=2,   # Just 2 for demo
  imgsz=640,
  batch=4,
  device='cpu',  # Use CPU for compatibility
  verbose=True,
  project=str(PROJECT_ROOT / 'runs' / 'demo'),
  name='satellite_demo'
 )
 print("\n Demo training complete!")
else:
 # Simulated training output
 print(" Simulated Training Progress:")
 print()
 epochs_demo = [
  {'epoch': 1, 'loss': 2.45, 'mAP50': 0.12},
  {'epoch': 2, 'loss': 1.89, 'mAP50': 0.28},
  {'epoch': 3, 'loss': 1.52, 'mAP50': 0.45},
  {'epoch': 4, 'loss': 1.21, 'mAP50': 0.58},
  {'epoch': 5, 'loss': 0.98, 'mAP50': 0.67},
 ]
 
 for e in epochs_demo:
  print(f" Epoch {e['epoch']}/5: loss={e['loss']:.3f}, mAP50={e['mAP50']:.3f}")
 
 print("\n Demo training simulation complete!")

---
## Training Metrics Visualization

In [None]:
# Visualize ACTUAL training metrics from our trained models
import pandas as pd
import numpy as np

# Load actual training results
port_results_path = PROJECT_ROOT / 'data' / 'models' / 'satellite' / 'ports_dota_yolo11_20251127_013205' / 'results.csv'
retail_results_path = PROJECT_ROOT / 'data' / 'models' / 'satellite' / 'retail_yolo11_20251126_150811' / 'results.csv'

# Read actual metrics
if port_results_path.exists():
    port_df = pd.read_csv(port_results_path)
    port_df.columns = port_df.columns.str.strip()
    print("Loaded Port Detection training metrics")
else:
    print("Port results not found, using sample data")
    port_df = None

if retail_results_path.exists():
    retail_df = pd.read_csv(retail_results_path)
    retail_df.columns = retail_df.columns.str.strip()
    print("Loaded Retail Detection training metrics")
else:
    retail_df = None

# Plot actual training curves
fig, axes = plt.subplots(2, 2, figsize=(14, 10))

# Port Model - Loss
ax1 = axes[0, 0]
if port_df is not None:
    ax1.plot(port_df['epoch'], port_df['train/box_loss'], 'b-', linewidth=2, label='Box Loss')
    ax1.plot(port_df['epoch'], port_df['train/cls_loss'], 'r-', linewidth=1.5, alpha=0.7, label='Class Loss')
ax1.set_xlabel('Epoch', fontsize=12)
ax1.set_ylabel('Loss', fontsize=12)
ax1.set_title('Port Detection - Training Loss', fontsize=14, fontweight='bold')
ax1.grid(True, alpha=0.3)
ax1.legend()

# Port Model - mAP
ax2 = axes[0, 1]
if port_df is not None:
    ax2.plot(port_df['epoch'], port_df['metrics/mAP50(B)'] * 100, 'g-', linewidth=2, label='mAP@50')
    ax2.plot(port_df['epoch'], port_df['metrics/mAP50-95(B)'] * 100, 'b-', linewidth=1.5, alpha=0.7, label='mAP@50-95')
ax2.set_xlabel('Epoch', fontsize=12)
ax2.set_ylabel('mAP (%)', fontsize=12)
ax2.set_title('Port Detection - Accuracy (mAP)', fontsize=14, fontweight='bold')
ax2.grid(True, alpha=0.3)
ax2.axhline(y=70, color='r', linestyle='--', alpha=0.5, label='Target (70%)')
ax2.legend()

# Retail Model - Loss
ax3 = axes[1, 0]
if retail_df is not None:
    ax3.plot(retail_df['epoch'], retail_df['train/box_loss'], 'b-', linewidth=2, label='Box Loss')
    ax3.plot(retail_df['epoch'], retail_df['train/cls_loss'], 'r-', linewidth=1.5, alpha=0.7, label='Class Loss')
ax3.set_xlabel('Epoch', fontsize=12)
ax3.set_ylabel('Loss', fontsize=12)
ax3.set_title('Retail Detection - Training Loss', fontsize=14, fontweight='bold')
ax3.grid(True, alpha=0.3)
ax3.legend()

# Retail Model - mAP
ax4 = axes[1, 1]
if retail_df is not None:
    ax4.plot(retail_df['epoch'], retail_df['metrics/mAP50(B)'] * 100, 'g-', linewidth=2, label='mAP@50')
    ax4.plot(retail_df['epoch'], retail_df['metrics/mAP50-95(B)'] * 100, 'b-', linewidth=1.5, alpha=0.7, label='mAP@50-95')
ax4.set_xlabel('Epoch', fontsize=12)
ax4.set_ylabel('mAP (%)', fontsize=12)
ax4.set_title('Retail Detection - Accuracy (mAP)', fontsize=14, fontweight='bold')
ax4.grid(True, alpha=0.3)
ax4.legend()

plt.tight_layout()
plt.show()

# Print final metrics
print("\n" + "="*60)
print("ACTUAL TRAINING RESULTS")
print("="*60)

if port_df is not None:
    print(f"\nPort Detection Model (120 epochs):")
    print(f"  Final mAP@50:    {port_df['metrics/mAP50(B)'].iloc[-1]*100:.1f}%")
    print(f"  Final mAP@50-95: {port_df['metrics/mAP50-95(B)'].iloc[-1]*100:.1f}%")
    print(f"  Final Precision: {port_df['metrics/precision(B)'].iloc[-1]*100:.1f}%")
    print(f"  Final Recall:    {port_df['metrics/recall(B)'].iloc[-1]*100:.1f}%")

if retail_df is not None:
    print(f"\nRetail Detection Model (94 epochs):")
    print(f"  Final mAP@50:    {retail_df['metrics/mAP50(B)'].iloc[-1]*100:.1f}%")
    print(f"  Final mAP@50-95: {retail_df['metrics/mAP50-95(B)'].iloc[-1]*100:.1f}%")
    print(f"  Final Precision: {retail_df['metrics/precision(B)'].iloc[-1]*100:.1f}%")
    print(f"  Final Recall:    {retail_df['metrics/recall(B)'].iloc[-1]*100:.1f}%")

---
## Our Trained Models

We have 3 fine-tuned models ready:

In [None]:
# Show our ACTUAL trained models with real metrics
print("TRAINED MODELS - ACTUAL RESULTS")
print("="*60)

models_info = [
    {
        'name': 'Port Detection (DOTA)',
        'file': 'ports_dota_yolo11_20251127_013205/weights/best.pt',
        'classes': ['ship', 'storage-tank', 'harbor', 'large-vehicle', 'small-vehicle'],
        'epochs': 120,
        'mAP50': 71.5,
        'mAP50_95': 46.6,
        'precision': 80.2,
        'recall': 67.5,
        'training_time': '17 hours',
        'use': 'Port of LA satellite images'
    },
    {
        'name': 'Retail Detection',
        'file': 'retail_yolo11_20251126_150811/weights/best.pt',
        'classes': ['car', 'truck', 'bus'],
        'epochs': 94,
        'mAP50': 41.5,
        'mAP50_95': 17.7,
        'precision': 39.6,
        'recall': 55.7,
        'training_time': '3 hours',
        'use': 'Mall of America parking lots'
    },
    {
        'name': 'City Detection',
        'file': 'city_yolo11_20251127_184743/weights/best.pt',
        'classes': ['building', 'vehicle', 'road'],
        'epochs': 84,
        'mAP50': 29.1,
        'mAP50_95': 13.0,
        'precision': 32.9,
        'recall': 36.1,
        'training_time': '6 hours',
        'use': 'Urban area detection'
    }
]

for m in models_info:
    print(f"\n{m['name']}")
    print(f"  File: {m['file']}")
    print(f"  Classes: {', '.join(m['classes'])}")
    print(f"  Epochs: {m['epochs']}")
    print(f"  mAP@50: {m['mAP50']:.1f}%")
    print(f"  mAP@50-95: {m['mAP50_95']:.1f}%")
    print(f"  Precision: {m['precision']:.1f}%")
    print(f"  Recall: {m['recall']:.1f}%")
    print(f"  Training Time: {m['training_time']}")
    print(f"  Use: {m['use']}")

# Summary table
print("\n" + "="*60)
print("SUMMARY TABLE")
print("="*60)
print(f"{'Model':<20} {'mAP@50':>10} {'mAP@50-95':>12} {'Precision':>12} {'Recall':>10}")
print("-"*60)
for m in models_info:
    print(f"{m['name']:<20} {m['mAP50']:>9.1f}% {m['mAP50_95']:>11.1f}% {m['precision']:>11.1f}% {m['recall']:>9.1f}%")

---
## Summary

### Training Results:
| Model | mAP@50 | Epochs | Training Time |
|-------|--------|--------|---------------|
| **Port Detection** | 71.5% | 120 | 17 hours |
| **Retail Detection** | 41.5% | 94 | 3 hours |
| **City Detection** | 29.1% | 84 | 6 hours |

### Key Takeaways:
1. **Port Detection** achieves best accuracy (71.5% mAP) - ideal for ship counting
2. **YOLO11** fine-tuned on DOTA dataset for satellite imagery
3. **Tiled detection** handles high-resolution images (8000x5000 pixels)
4. Models stored in S3 for AWS Lambda inference

### Next Step:
**Demo 2**: Use these models to detect objects in satellite images

In [None]:
print("="*60)
print(" Demo 1 Complete: YOLO Fine-Tuning")
print("="*60)
print("\n Next: Demo_2_Object_Detection.ipynb")