# Mamba-YOLO Training from Scratch - Google Colab

**Notebook untuk training Mamba-YOLO dari awal tanpa pre-trained weights**

## Overview
- Training dari scratch dengan COCO-1000 dataset
- GPU requirement: Tesla T4 atau lebih tinggi
- Estimated time: 2-3 jam untuk setup + training

## Quick Start
1. Upload notebook ini ke Google Colab
2. Runtime ‚Üí Change runtime type ‚Üí GPU
3. Jalankan semua cell secara berurutan (Runtime ‚Üí Run all)

## Step 1: Verifikasi GPU

In [2]:
import torch
import sys

print('System Information:')
print(f'  Python: {sys.version.split()[0]}')
print(f'  PyTorch: {torch.__version__}')
print(f'  CUDA Available: {torch.cuda.is_available()}')

if torch.cuda.is_available():
    print(f'  CUDA Version: {torch.version.cuda}')
    print(f'  GPU: {torch.cuda.get_device_name(0)}')
    print(f'  GPU Memory: {torch.cuda.get_device_properties(0).total_memory / 1e9:.2f} GB')
    print('\nStatus: GPU Ready')
else:
    print('\nERROR: GPU not detected!')
    print('Please enable GPU: Runtime > Change runtime type > GPU')
    raise RuntimeError('GPU required')

System Information:
  Python: 3.12.12
  PyTorch: 2.8.0+cu126
  CUDA Available: False

ERROR: GPU not detected!
Please enable GPU: Runtime > Change runtime type > GPU


RuntimeError: GPU required

## Step 2: Clone Repository

In [None]:
# Clone Mamba-YOLO repository
!git clone https://github.com/HZAI-ZJNU/Mamba-YOLO.git
%cd Mamba-YOLO
!ls -la

## Step 3: Install PyTorch 2.3.0 + CUDA 12.1

In [None]:
# Install PyTorch sesuai requirements
!pip3 install torch==2.3.0 torchvision==0.18.0 torchaudio==2.3.0 --index-url https://download.pytorch.org/whl/cu121

## Step 4: Verifikasi PyTorch + CUDA

In [None]:
import torch

print('PyTorch Verification:')
print(f'  Version: {torch.__version__}')
print(f'  CUDA Available: {torch.cuda.is_available()}')
print(f'  CUDA Version: {torch.version.cuda}')

if torch.cuda.is_available():
    print(f'  GPU: {torch.cuda.get_device_name(0)}')
    print('\nStatus: PyTorch + CUDA OK')
else:
    raise RuntimeError('CUDA not available!')

## Step 5: Install Dependencies

In [None]:
# Install required libraries
!pip install seaborn thop timm einops

## Step 6: Install Selective Scan (CUDA Extension)

Proses ini akan compile CUDA extensions untuk Mamba SSM. Harap tunggu hingga selesai (10-20 menit).

In [None]:
import os
import time

print('Installing Selective Scan (CUDA Extension)...')
print('This will take 10-20 minutes. Please wait.\n')

# Set CUDA architecture untuk compatibility
os.environ['TORCH_CUDA_ARCH_LIST'] = '7.0;7.5;8.0;8.6;8.9;9.0'

# Install selective_scan
%cd selective_scan

start_time = time.time()
!pip install -v . 2>&1 | tee /tmp/selective_scan_install.log
elapsed = time.time() - start_time

%cd ..

print(f'\nInstallation time: {elapsed/60:.1f} minutes')

# Verify installation - test import CUDA modules yang sebenarnya digunakan
print('\nVerifying CUDA modules...')
cuda_modules_ok = True

try:
    import selective_scan_cuda_core
    print('  [OK] selective_scan_cuda_core')
except ImportError as e:
    print(f'  [FAIL] selective_scan_cuda_core: {e}')
    cuda_modules_ok = False

try:
    import selective_scan_cuda_oflex
    print('  [OK] selective_scan_cuda_oflex')
except ImportError as e:
    print(f'  [FAIL] selective_scan_cuda_oflex: {e}')
    cuda_modules_ok = False

try:
    import selective_scan_cuda_ndstate
    print('  [OK] selective_scan_cuda_ndstate')
except ImportError as e:
    print(f'  [FAIL] selective_scan_cuda_ndstate: {e}')
    cuda_modules_ok = False

if cuda_modules_ok:
    print('\nStatus: Selective Scan CUDA modules installed successfully')
else:
    print('\nERROR: Some CUDA modules failed to compile')
    print('Check log: /tmp/selective_scan_install.log')
    raise ImportError('Selective Scan installation incomplete')

## Step 7: Install Ultralytics (Mamba-YOLO)

In [None]:
# Install ultralytics dalam development mode
!pip install -e .

print('\nStatus: Ultralytics installed')

## Step 8: Final Verification

In [None]:
import torch
import selective_scan_cuda_core
from ultralytics import YOLO

print('Final Verification:')
print(f'  PyTorch: {torch.__version__}')
print(f'  CUDA: {torch.version.cuda}')
print(f'  GPU: {torch.cuda.get_device_name(0)}')
print('  Selective Scan CUDA: OK')
print('  Ultralytics: OK')

# Test load model
try:
    model = YOLO('ultralytics/cfg/models/mamba-yolo/Mamba-YOLO-T.yaml')
    print('  Mamba-YOLO-T: OK')
    print('\nStatus: All components ready')
except Exception as e:
    print(f'  Model load error: {e}')

---

## Step 9: Prepare COCO Dataset (Person Detection Only)

Download COCO128 dan filter hanya untuk person class, duplicate untuk mencapai 1000 images training.


In [None]:
import os
import yaml
from pathlib import Path
import shutil

print('Preparing COCO-1000 Dataset (Person Detection Only)...')
print('=' * 60)

# Create directory structure
base_dir = Path('coco1000_person')
for split in ['train', 'val']:
    (base_dir / 'images' / split).mkdir(parents=True, exist_ok=True)
    (base_dir / 'labels' / split).mkdir(parents=True, exist_ok=True)

# Download COCO128 dataset
print('\nDownload COCO128 dataset...')
!wget -q https://ultralytics.com/assets/coco128.zip
!unzip -q coco128.zip

# Function to filter person annotations (class 0)
def filter_person_label(label_path):
    """Read label file and keep only person annotations (class 0)"""
    if not label_path.exists():
        return None
    
    person_lines = []
    with open(label_path, 'r') as f:
        for line in f:
            parts = line.strip().split()
            if parts and int(parts[0]) == 0:  # Class 0 = person
                person_lines.append(line.strip())
    
    return person_lines if person_lines else None

# Ambil semua 128 images dari COCO128
source_images = list(Path('coco128/images/train2017').glob('*.jpg'))
print(f'Found {len(source_images)} images in COCO128')

# Filter images yang memiliki person annotations
person_images = []
for img_path in source_images:
    label_path = Path('coco128/labels/train2017') / img_path.with_suffix('.txt').name
    person_annotations = filter_person_label(label_path)
    if person_annotations:
        person_images.append((img_path, person_annotations))

print(f'Found {len(person_images)} images with person annotations')

# Target: 1000 untuk train, 200 untuk val
train_target = 1000
val_target = 200
total_available = len(person_images)

print(f'\nCreating dataset with {train_target} train + {val_target} val images...')
print(f'Using {total_available} unique person images (will duplicate to reach target)')

# Copy images untuk training (dengan duplicates jika perlu)
train_count = 0
while train_count < train_target:
    for img_path, person_annotations in person_images:
        if train_count >= train_target:
            break
        
        # Create unique filename dengan suffix jika duplicate
        suffix = f"_{train_count // total_available}" if train_count >= total_available else ""
        new_name = img_path.stem + suffix + img_path.suffix
        
        # Copy image
        shutil.copy(img_path, base_dir / 'images' / 'train' / new_name)
        
        # Save filtered person annotations
        new_label = img_path.stem + suffix + '.txt'
        label_save_path = base_dir / 'labels' / 'train' / new_label
        with open(label_save_path, 'w') as f:
            for line in person_annotations:
                f.write(line + '\n')
        
        train_count += 1

# Copy images untuk validation
val_count = 0
while val_count < val_target:
    for img_path, person_annotations in person_images:
        if val_count >= val_target:
            break
        
        suffix = f"_v{val_count // total_available}" if val_count >= total_available else "_v"
        new_name = img_path.stem + suffix + img_path.suffix
        
        # Copy image
        shutil.copy(img_path, base_dir / 'images' / 'val' / new_name)
        
        # Save filtered person annotations
        new_label = img_path.stem + suffix + '.txt'
        label_save_path = base_dir / 'labels' / 'val' / new_label
        with open(label_save_path, 'w') as f:
            for line in person_annotations:
                f.write(line + '\n')
        
        val_count += 1

# Verify counts
actual_train = len(list((base_dir / 'images' / 'train').glob('*.jpg')))
actual_val = len(list((base_dir / 'images' / 'val').glob('*.jpg')))
actual_train_labels = len(list((base_dir / 'labels' / 'train').glob('*.txt')))
actual_val_labels = len(list((base_dir / 'labels' / 'val').glob('*.txt')))

print(f'\nDataset Created:')
print(f'  Train images: {actual_train} (labels: {actual_train_labels})')
print(f'  Val images: {actual_val} (labels: {actual_val_labels})')
print(f'  Total: {actual_train + actual_val}')

# Create dataset YAML for single class (person only)
dataset_yaml = {
    'path': str(base_dir.absolute()),
    'train': 'images/train',
    'val': 'images/val',
    'nc': 1,  # Number of classes = 1 (person only)
    'names': {
        0: 'person'
    }
}

yaml_path = base_dir / 'coco1000_person.yaml'
with open(yaml_path, 'w') as f:
    yaml.dump(dataset_yaml, f, default_flow_style=False)

print(f'\nDataset YAML: {yaml_path}')
print('=' * 60)
print('\nStatus: Person detection dataset ready!')
print(f'\nDataset Details:')
print(f'  Classes: 1 (person only)')
print(f'  Unique person images: {total_available}')
print(f'  Total images (with duplicates): {actual_train + actual_val}')
print('\nNOTE: Dataset ini fokus pada deteksi person saja.')
print('Untuk training real tugas akhir, gunakan dataset lengkap dengan head detection.')

## Step 9b: Download Full COCO (Optional)

**SKIP cell ini** - Cell di atas sudah cukup untuk demo.

Untuk training real dengan dataset lengkap, uncomment code di bawah:

In [None]:
# # Download full COCO train2017 (118K images, ~18GB)
# # WARNING: Ini akan download file besar dan butuh waktu lama!

# print('Downloading full COCO train2017 dataset...')
# print('Size: ~18GB, Time: ~30-60 minutes')
# print('=' * 60)

# # Download images
# !wget http://images.cocodataset.org/zips/train2017.zip
# !unzip -q train2017.zip

# # Download annotations
# !wget http://images.cocodataset.org/annotations/annotations_trainval2017.zip
# !unzip -q annotations_trainval2017.zip

# print('\nConverting COCO annotations to YOLO format...')
# # Gunakan ultralytics converter
# from ultralytics.data.converter import convert_coco

# convert_coco(
#     labels_dir='annotations',
#     save_dir='coco_yolo',
#     use_segments=False,
#     use_keypoints=False,
#     cls91to80=True
# )

# print('\nFull COCO dataset ready!')
# print('Update path di dataset YAML ke folder coco_yolo')

## Step 10: Train Mamba-YOLO dari Scratch

Training model dari awal tanpa pre-trained weights (100 epochs, ~1-2 jam).

In [None]:
from ultralytics import YOLO

# Load model architecture (tanpa weights)
model = YOLO('ultralytics/cfg/models/mamba-yolo/Mamba-YOLO-T.yaml')

# Training configuration untuk 1000 images - Person Detection
results = model.train(
    data='coco1000_person/coco1000_person.yaml',  # Dataset YAML (1000 person images)
    epochs=100,                          # Lebih banyak epochs untuk dataset lebih besar
    imgsz=640,                           # Image size
    batch=8,                             # Batch size (naik dari 4)
    device='0',                          # GPU device
    project='mamba_scratch',             # Output directory
    name='person_detection',             # Experiment name
    patience=30,                         # Early stopping patience
    save=True,                           # Save checkpoints
    save_period=10,                      # Save every N epochs
    workers=4,                           # Dataloader workers (naik dari 2)
    optimizer='AdamW',                   # Optimizer
    lr0=0.001,                           # Initial learning rate
    lrf=0.01,                            # Final learning rate factor
    momentum=0.937,                      # Momentum
    weight_decay=0.0005,                 # Weight decay
    warmup_epochs=5,                     # Warmup epochs (naik dari 3)
    warmup_momentum=0.8,                 # Warmup momentum
    box=7.5,                             # Box loss weight
    cls=0.5,                             # Class loss weight
    dfl=1.5,                             # DFL loss weight
    plots=True,                          # Generate plots
    verbose=True,                        # Verbose output
    amp=True,                            # Automatic Mixed Precision
    cache=True,                          # Cache images untuk speed up
    single_cls=True                      # Single class mode (person only)
)

print('\nTraining completed!')
print(f'Results saved in: mamba_scratch/person_detection')

## Step 11: Evaluate Model

### Step 11a: Cek Hasil Training (Alternatif)

Jika Step 11 error, gunakan cell ini untuk lihat hasil training dari file CSV.

In [None]:
import pandas as pd
from pathlib import Path

# Path ke hasil training (update sesuai experiment name)
results_dir = Path('mamba_scratch/person_detection')

print('Training Results Summary (Person Detection):')
print('=' * 60)

# 1. Cek apakah ada results.csv
results_csv = results_dir / 'results.csv'
if results_csv.exists():
    df = pd.read_csv(results_csv)
    print('\nLast Epoch Metrics:')
    last_row = df.iloc[-1]
    
    # Tampilkan metrics penting
    metrics_to_show = [
        ('metrics/mAP50(B)', 'mAP50'),
        ('metrics/mAP50-95(B)', 'mAP50-95'),
        ('metrics/precision(B)', 'Precision'),
        ('metrics/recall(B)', 'Recall'),
        ('train/box_loss', 'Box Loss'),
        ('train/cls_loss', 'Class Loss'),
        ('train/dfl_loss', 'DFL Loss')
    ]
    
    for col, label in metrics_to_show:
        if col in df.columns:
            print(f'  {label}: {last_row[col]:.4f}')
        elif col.replace('(B)', '') in df.columns:
            # Try without (B) suffix
            print(f'  {label}: {last_row[col.replace("(B)", "")]:.4f}')
    
    print(f'\nTotal epochs: {len(df)}')
    print(f'\nFull results: {results_csv}')
    
    # Show training progress
    if len(df) > 5:
        print('\nTraining Progress (First 5 vs Last 5 epochs):')
        print('First 5 epochs mAP50:', df['metrics/mAP50(B)'].head().mean() if 'metrics/mAP50(B)' in df.columns else 'N/A')
        print('Last 5 epochs mAP50:', df['metrics/mAP50(B)'].tail().mean() if 'metrics/mAP50(B)' in df.columns else 'N/A')
else:
    print('results.csv not found')

# 2. List trained weights
weights_dir = results_dir / 'weights'
if weights_dir.exists():
    print(f'\nTrained Weights:')
    for weight_file in weights_dir.glob('*.pt'):
        size_mb = weight_file.stat().st_size / (1024 * 1024)
        print(f'  {weight_file.name}: {size_mb:.1f} MB')

# 3. Cek apakah ada plots
print(f'\nTraining Plots: {results_dir}')
plot_files = list(results_dir.glob('*.png'))
if plot_files:
    print(f'  Found {len(plot_files)} plot files')
    for plot in plot_files[:5]:  # Show first 5
        print(f'    - {plot.name}')
else:
    print('  No plot files found')

print('=' * 60)

In [None]:
# Load trained model
model = YOLO('mamba_scratch/person_detection/weights/best.pt')

# Evaluate on validation set
try:
    metrics = model.val(
        data='coco1000_person/coco1000_person.yaml',
        split='val',
        device='0'
    )
    
    # Print metrics
    print('\nValidation Metrics (Person Detection):')
    print(f'  mAP50: {metrics.box.map50:.4f}')
    print(f'  mAP50-95: {metrics.box.map:.4f}')
    print(f'  Precision: {metrics.box.mp:.4f}')
    print(f'  Recall: {metrics.box.mr:.4f}')
    
except AttributeError as e:
    print('\nNote: AttributeError saat akses metrics (bug ultralytics)')
    print('Namun validation telah selesai. Cek hasil di: mamba_scratch/person_detection')
    print('\nMetrics dari validation:')
    print('  mAP50: Lihat di results.csv atau console output di atas')
    print('  Model tetap tersimpan dan bisa digunakan untuk inference')

In [None]:
## PENJELASAN: Dataset 1000 Images - Person Detection

**Dataset Configuration:**
- Training images: 1000 (person only)
- Validation images: 200 (person only)
- Total: 1200 images
- Source: COCO128 filtered untuk person class
- Classes: 1 (person)

**Training Settings:**
- Epochs: 100
- Batch size: 8
- Workers: 4
- Image size: 640x640
- Cache: Enabled (speed up)
- AMP: Enabled (faster training)
- Single class mode: Enabled

**Expected Results:**
Dengan fokus pada 1 class (person), Anda **akan melihat hasil yang lebih baik** dibanding multi-class:
- mAP50: ~0.30 - 0.50 (lebih tinggi karena single class)
- mAP50-95: ~0.15 - 0.30
- Precision/Recall: Lebih tinggi untuk person detection
- Training time: ~1-2 jam di Tesla T4

**Keuntungan Single Class (Person):**
1. Model lebih fokus dan spesifik
2. Convergence lebih cepat
3. Hasil lebih baik untuk task tertentu
4. Cocok untuk aplikasi crowd counting, person tracking, dll

**Note Penting:**
- Dataset ini hanya mendeteksi person (class 0)
- Annotations untuk class lain sudah difilter
- Cocok untuk tugas akhir yang fokus pada person/head detection

**Untuk Hasil Terbaik:**
- Gunakan dataset real dengan lebih banyak unique images
- Train lebih lama: `epochs=300`
- Atau gunakan pre-trained weights: `model = YOLO('yolov8n.pt')`
- Fine-tune dengan dataset spesifik Anda (head detection)

## Step 12: Test Inference

In [None]:
from google.colab.patches import cv2_imshow
import cv2

# Load trained model
model = YOLO('mamba_scratch/person_detection/weights/best.pt')

# Test pada salah satu validation image
test_image = list(Path('coco1000_person/images/val').glob('*.jpg'))[0]
print(f'Testing on: {test_image}')

# Run inference (person detection only)
results = model.predict(
    source=str(test_image),
    device='0',
    conf=0.25,
    iou=0.45,
    save=True,
    project='mamba_scratch',
    name='person_predictions'
)

# Display result
result_img = cv2.imread(str(results[0].save_dir / test_image.name))
cv2_imshow(result_img)

print(f'\nDetections (Person): {len(results[0].boxes)}')
print(f'Results saved in: mamba_scratch/person_predictions')

## Step 13: Download Trained Model

In [None]:
from google.colab import files

# Download best model (person detection)
print('Downloading trained model (person detection)...')
files.download('mamba_scratch/person_detection/weights/best.pt')

# Download last checkpoint
print('Downloading last checkpoint...')
files.download('mamba_scratch/person_detection/weights/last.pt')

print('\nDownload complete!')
print('Model ini trained untuk deteksi person saja (single class)')

In [None]:
from ultralytics import YOLO
import torch
import time
import numpy as np

print('='*60)
print('üìä MODEL PROFILING - MAMBA-YOLO-T (PERSON DETECTION)')
print('='*60)

# Load trained model
model = YOLO('mamba_scratch/person_detection/weights/best.pt')

# ============================================================================
# 1. MODEL COMPLEXITY (Parameters & GFLOPs)
# ============================================================================
print('\nüîç 1. MODEL COMPLEXITY ANALYSIS')
print('-'*60)

# Get model info menggunakan ultralytics built-in
model_info = model.model.info(verbose=False)

# Manual calculation untuk lebih detail
def count_parameters(model):
    """Count total and trainable parameters"""
    total_params = sum(p.numel() for p in model.parameters())
    trainable_params = sum(p.numel() for p in model.parameters() if p.requires_grad)
    return total_params, trainable_params

total_params, trainable_params = count_parameters(model.model)

print(f'üì¶ Model Parameters:')
print(f'   Total Parameters: {total_params:,} ({total_params/1e6:.2f}M)')
print(f'   Trainable Parameters: {trainable_params:,} ({trainable_params/1e6:.2f}M)')

# GFLOPs calculation using thop library
try:
    from thop import profile, clever_format
    
    # Create dummy input (batch_size=1, channels=3, height=640, width=640)
    dummy_input = torch.randn(1, 3, 640, 640).to('cuda')
    
    # Profile model
    flops, params = profile(model.model, inputs=(dummy_input,), verbose=False)
    flops, params = clever_format([flops, params], "%.3f")
    
    print(f'\n‚ö° Computational Complexity:')
    print(f'   GFLOPs: {flops}')
    print(f'   Parameters (thop): {params}')
    
except ImportError:
    print('\n‚ö†Ô∏è  thop not installed. Install with: pip install thop')
    print('   Skipping GFLOPs calculation')

# ============================================================================
# 2. INFERENCE SPEED (FPS & Latency)
# ============================================================================
print('\n'+ '-'*60)
print('‚ö° 2. INFERENCE SPEED BENCHMARK')
print('-'*60)

# Prepare test image
test_image_path = list(Path('coco1000_person/images/val').glob('*.jpg'))[0]

# Warmup (untuk stabilkan GPU)
print('\nüî• Warming up GPU...')
for _ in range(10):
    _ = model.predict(test_image_path, device='0', verbose=False)

# Benchmark inference time
print('üìè Running speed benchmark (100 iterations)...')
latencies = []
n_iterations = 100

for i in range(n_iterations):
    start_time = time.time()
    results = model.predict(test_image_path, device='0', verbose=False)
    end_time = time.time()
    
    latency_ms = (end_time - start_time) * 1000  # Convert to milliseconds
    latencies.append(latency_ms)
    
    if (i + 1) % 20 == 0:
        print(f'   Progress: {i+1}/{n_iterations}')

# Calculate statistics
latencies = np.array(latencies)
mean_latency = np.mean(latencies)
std_latency = np.std(latencies)
min_latency = np.min(latencies)
max_latency = np.max(latencies)
fps = 1000 / mean_latency  # FPS from milliseconds

print(f'\nüìä Inference Speed Results:')
print(f'   Mean Latency: {mean_latency:.2f} ms (¬± {std_latency:.2f} ms)')
print(f'   Min Latency: {min_latency:.2f} ms')
print(f'   Max Latency: {max_latency:.2f} ms')
print(f'   FPS (Frames Per Second): {fps:.2f}')
print(f'   Throughput: {fps * 1:.2f} images/second')

# ============================================================================
# 3. MEMORY USAGE
# ============================================================================
print('\n'+ '-'*60)
print('üíæ 3. GPU MEMORY USAGE')
print('-'*60)

# Get GPU memory info
if torch.cuda.is_available():
    # Memory before inference
    torch.cuda.empty_cache()
    torch.cuda.reset_peak_memory_stats()
    
    # Run inference
    _ = model.predict(test_image_path, device='0', verbose=False)
    
    # Memory after inference
    memory_allocated = torch.cuda.memory_allocated(0) / (1024**2)  # MB
    memory_reserved = torch.cuda.memory_reserved(0) / (1024**2)    # MB
    max_memory = torch.cuda.max_memory_allocated(0) / (1024**2)    # MB
    
    print(f'üì¶ GPU Memory (Tesla T4):')
    print(f'   Allocated: {memory_allocated:.2f} MB')
    print(f'   Reserved: {memory_reserved:.2f} MB')
    print(f'   Peak Usage: {max_memory:.2f} MB')

# ============================================================================
# 4. MODEL SIZE
# ============================================================================
print('\n'+ '-'*60)
print('üìÅ 4. MODEL FILE SIZE')
print('-'*60)

model_path = Path('mamba_scratch/person_detection/weights/best.pt')
model_size_mb = model_path.stat().st_size / (1024 * 1024)

print(f'üíæ Model Weight File:')
print(f'   File: {model_path.name}')
print(f'   Size: {model_size_mb:.2f} MB')

# ============================================================================
# 5. SUMMARY TABLE (untuk Tugas Akhir)
# ============================================================================
print('\n' + '='*60)
print('üìã SUMMARY - MAMBA-YOLO-T PROFILING')
print('='*60)

summary_table = f"""
‚ïî‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïó
‚ïë           MAMBA-YOLO-T MODEL PROFILING RESULTS          ‚ïë
‚ï†‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ï£
‚ïë Model Architecture: Mamba-YOLO-T (Tiny)                 ‚ïë
‚ïë Task: Person Detection (Single Class)                   ‚ïë
‚ïë Input Size: 640x640                                      ‚ïë
‚ïë Device: {torch.cuda.get_device_name(0):<44} ‚ïë
‚ï†‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ï£
‚ïë COMPLEXITY METRICS                                       ‚ïë
‚ï†‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ï£
‚ïë Parameters: {total_params/1e6:>6.2f} M                                    ‚ïë
‚ïë GFLOPs: {flops if 'flops' in locals() else 'N/A':<48} ‚ïë
‚ïë Model Size: {model_size_mb:>6.2f} MB                                   ‚ïë
‚ï†‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ï£
‚ïë SPEED METRICS                                            ‚ïë
‚ï†‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ï£
‚ïë Mean Latency: {mean_latency:>6.2f} ms                                 ‚ïë
‚ïë FPS: {fps:>6.2f}                                              ‚ïë
‚ïë Throughput: {fps:>6.2f} images/sec                            ‚ïë
‚ï†‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ï£
‚ïë MEMORY USAGE                                             ‚ïë
‚ï†‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ï£
‚ïë GPU Memory (Peak): {max_memory if 'max_memory' in locals() else 0:>6.2f} MB                           ‚ïë
‚ïö‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïù
"""

print(summary_table)

# Save profiling results to file
profiling_results = {
    'model': 'Mamba-YOLO-T',
    'task': 'Person Detection',
    'parameters_M': total_params / 1e6,
    'gflops': flops if 'flops' in locals() else 'N/A',
    'model_size_MB': model_size_mb,
    'mean_latency_ms': mean_latency,
    'fps': fps,
    'gpu_memory_peak_MB': max_memory if 'max_memory' in locals() else 0
}

import json
profiling_path = Path('mamba_scratch/person_detection/profiling_results.json')
with open(profiling_path, 'w') as f:
    json.dump(profiling_results, f, indent=2)

print(f'\n‚úÖ Profiling results saved to: {profiling_path}')
print('='*60)

---

## Catatan Penting

### 1. Dataset - Person Detection
- Notebook ini fokus pada **deteksi person saja** (single class)
- Dataset: 1000 images dari COCO128 yang di-filter untuk person annotations
- Format dataset: YOLO format (txt annotations)
- All annotations selain person sudah dihapus

### 2. Training dari Scratch - Single Class
Training dari scratch untuk person detection membutuhkan:
- Dataset dengan banyak person images (minimal 1000+ images)
- Epochs: 100-300 epochs
- GPU dengan memory besar (minimal 8GB VRAM)
- Waktu training: ~1-2 jam untuk 1000 images

**Keuntungan Single Class:**
- Model lebih fokus dan spesifik
- Training lebih cepat converge
- Hasil mAP lebih tinggi untuk class target
- Cocok untuk aplikasi spesifik (crowd counting, person tracking, etc.)

### 3. Hyperparameters
- `batch`: Sesuaikan dengan GPU memory (4-16)
- `epochs`: 100-300 untuk training dari scratch
- `lr0`: Learning rate awal (0.001)
- `patience`: Early stopping patience (30)
- `imgsz`: Image size (640 standard)
- `single_cls`: True (untuk single class mode)

### 4. Model Variants
- `Mamba-YOLO-T`: 5.8M params (tercepat, untuk demo)
- `Mamba-YOLO-B`: 19.1M params (balanced)
- `Mamba-YOLO-L`: 57.6M params (terbaik, butuh GPU kuat)

### 5. Save Model ke Google Drive
```python
# Mount Google Drive
from google.colab import drive
drive.mount('/content/drive')

# Save training results ke Drive
!cp -r mamba_scratch /content/drive/MyDrive/

# Atau copy hanya best model
!cp mamba_scratch/person_detection/weights/best.pt /content/drive/MyDrive/
```

### 6. Troubleshooting

**Error: CUDA out of memory**
- Kurangi `batch` size (coba 4 atau 2)
- Kurangi `imgsz` (coba 320 atau 480)

**Error: Slow training**
- Kurangi `workers` (coba 2 atau 1)
- Pastikan menggunakan GPU (device='0')

**Error: selective_scan import failed**
- Pastikan PyTorch CUDA version match dengan CUDA Toolkit
- Cek log: `/tmp/selective_scan_install.log`

**Error: Poor detection results**
- Dataset terlalu kecil (tambah jumlah images)
- Epochs terlalu sedikit (tambah epochs ke 200-300)
- Check dataset quality dan annotations

### 7. Selective Scan Info
Selective Scan adalah CUDA extension yang harus di-compile. Yang di-import adalah:
- `selective_scan_cuda_core` (inti SSM algorithm)
- `selective_scan_cuda_oflex` (flexible version)
- `selective_scan_cuda_ndstate` (N-dimensional state)

### 8. Aplikasi untuk Tugas Akhir
Model person detection ini cocok untuk:
- **Head detection** (dengan fine-tuning pada head dataset)
- Crowd counting
- Person tracking
- Social distancing monitoring
- People analytics

**Langkah selanjutnya:**
1. Prepare dataset head detection Anda
2. Fine-tune model ini dengan dataset head
3. Atau train dari scratch dengan full head dataset

---

## Resources

- **GitHub**: https://github.com/HZAI-ZJNU/Mamba-YOLO
- **Paper**: Mamba-YOLO: SSMs-Based YOLO For Object Detection
- **COCO Dataset**: https://cocodataset.org/
- **Ultralytics Docs**: https://docs.ultralytics.com/