# üß± LEGO Detection System - Complete Training Pipeline

**Entrenamiento completo en Colab con GPU T4**

Este notebook ejecuta todo el pipeline de manera aut√≥noma:
1. ‚úÖ Descarga de librer√≠as LDraw
2. ‚úÖ Generaci√≥n de manifiestos con clasificaci√≥n de materiales
3. ‚úÖ Renderizado sint√©tico 4K con BlenderProc + Eevee
4. ‚úÖ Entrenamiento YOLOv8 para detecci√≥n
5. ‚úÖ Entrenamiento ArcFace para clasificaci√≥n
6. ‚úÖ Construcci√≥n de √≠ndice FAISS
7. ‚úÖ Backup autom√°tico a Google Drive

**Tiempo estimado**: 20-24 horas en T4 GPU

---

## üîß Configuraci√≥n Inicial

In [None]:
# Verificar GPU
!nvidia-smi

# Montar Google Drive para backups
from google.colab import drive
drive.mount('/content/drive')

print("\n‚úÖ GPU T4 detectada y Drive montado")

In [None]:
# Clonar repositorio
!git clone https://github.com/YOUR_USERNAME/Brickclinic.git
%cd Brickclinic

print("‚úÖ Repositorio clonado")

In [None]:
# Instalar dependencias CV
!pip install -q -r requirements_cv.txt
!pip install -q blenderproc

# Descargar HDRI para iluminaci√≥n realista
!blenderproc download haven

print("‚úÖ Dependencias instaladas")

## üìä Configuraci√≥n del Dataset

In [None]:
# Par√°metros de entrenamiento
CONFIG = {
    "set_num": "75078-1",           # Set LEGO a entrenar
    "num_pieces": 100,              # N√∫mero de piezas (100 = set completo)
    "views_per_piece": 350,         # Vistas por pieza (ajustable por tipo)
    
    # Renderizado
    "use_eevee": True,              # Eevee = r√°pido, Cycles = calidad
    "resolution": (3840, 2160),     # 4K
    "angle_filter_deg": 30,         # ¬±30¬∞ desde vertical
    
    # Entrenamiento
    "yolo_epochs": 100,
    "yolo_batch": 32,               # T4 soporta batch grande
    "arcface_epochs": 50,
    "arcface_batch": 64,
    
    # Rutas
    "local_dir": "/content/lego_training",
    "drive_backup": "/content/drive/MyDrive/lego_models"
}

print(f"üì¶ Configuraci√≥n: {CONFIG['num_pieces']} piezas del set {CONFIG['set_num']}")
print(f"üé® Renders estimados: ~{CONFIG['num_pieces'] * CONFIG['views_per_piece']:,} im√°genes 4K")
print(f"‚è±Ô∏è  Tiempo estimado: 20-24 horas")

## üöÄ Ejecuci√≥n Aut√≥noma del Pipeline

**IMPORTANTE**: Esta celda ejecutar√° todo el pipeline de manera aut√≥noma.
Puedes cerrar el navegador y volver en 24 horas.

In [None]:
# Ejecutar orquestador maestro
!python scripts/colab_orchestrator.py \
  --set-num {CONFIG['set_num']} \
  --num-pieces {CONFIG['num_pieces']}

print("\n" + "="*70)
print("üéâ PIPELINE COMPLETO")
print("="*70)
print(f"\nüìÅ Modelos guardados en: {CONFIG['drive_backup']}")
print("\nüí° Pr√≥ximos pasos:")
print("   1. Descargar modelos de Drive")
print("   2. Copiar a Brickclinic/models/")
print("   3. Reiniciar API FastAPI")
print("   4. Probar detecci√≥n con im√°genes reales")

---

## üîç Ejecuci√≥n Manual por Etapas

Si prefieres ejecutar cada etapa manualmente para debugging:

### Etapa 1: Descarga de LDraw

In [None]:
# Descargar biblioteca LDraw
!mkdir -p {CONFIG['local_dir']}
!curl -L -o {CONFIG['local_dir']}/ldraw.zip https://library.ldraw.org/library/updates/complete.zip
!unzip -q {CONFIG['local_dir']}/ldraw.zip -d {CONFIG['local_dir']}
!rm {CONFIG['local_dir']}/ldraw.zip

print("‚úÖ LDraw descargado: ~60 MB")

### Etapa 2: Generaci√≥n de Manifiesto

In [None]:
# Generar manifiesto con clasificaci√≥n de materiales
!python scripts/generate_piece_manifest.py \
  --set-num {CONFIG['set_num']} \
  --num-pieces {CONFIG['num_pieces']} \
  --output {CONFIG['local_dir']}/manifests/{CONFIG['set_num']}_manifest.json

# Ver distribuci√≥n de tipos
import json
with open(f"{CONFIG['local_dir']}/manifests/{CONFIG['set_num']}_manifest.json") as f:
    manifest = json.load(f)
    
print(f"\nüìä Distribuci√≥n de tipos:")
for piece_type, count in manifest['type_distribution'].items():
    print(f"   {piece_type}: {count}")

### Etapa 3: Renderizado Sint√©tico (18-20 horas)

In [None]:
# Renderizar dataset con BlenderProc + Eevee
!python scripts/render_material_aware.py \
  --manifest {CONFIG['local_dir']}/manifests/{CONFIG['set_num']}_manifest.json \
  --ldraw-dir {CONFIG['local_dir']}/ldraw \
  --output-dir {CONFIG['local_dir']}/ai_data_v2

print("\n‚úÖ Renderizado completo")
!du -sh {CONFIG['local_dir']}/ai_data_v2/renders

### Etapa 4: Entrenamiento YOLO (2-3 horas)

In [None]:
# Entrenar detector YOLOv8
!python scripts/train_yolo.py \
  --data-dir {CONFIG['local_dir']}/ai_data_v2 \
  --epochs {CONFIG['yolo_epochs']} \
  --batch {CONFIG['yolo_batch']} \
  --device cuda

print("\n‚úÖ YOLO entrenado")
!ls -lh models/yolov8_pieces.pt

### Etapa 5: Entrenamiento ArcFace (1-2 horas)

In [None]:
# Entrenar embeddings ArcFace
!python scripts/train_arcface.py \
  --data-dir {CONFIG['local_dir']}/ai_data_v2 \
  --epochs {CONFIG['arcface_epochs']} \
  --batch {CONFIG['arcface_batch']} \
  --device cuda

print("\n‚úÖ ArcFace entrenado")
!ls -lh models/arcface_resnet50.pth

### Etapa 6: Construcci√≥n de √çndice FAISS

In [None]:
# Construir √≠ndice vectorial
!python scripts/build_faiss_index.py \
  --model models/arcface_resnet50.pth \
  --data-dir {CONFIG['local_dir']}/ai_data_v2 \
  --output {CONFIG['local_dir']}/ai_data_v2/embeddings/faiss.index

print("\n‚úÖ FAISS index construido")
!ls -lh {CONFIG['local_dir']}/ai_data_v2/embeddings/faiss.index

### Etapa 7: Backup a Google Drive

In [None]:
# Copiar modelos y embeddings a Drive
import shutil
from pathlib import Path

drive_dir = Path(CONFIG['drive_backup'])
drive_dir.mkdir(parents=True, exist_ok=True)

# Modelos
!cp models/yolov8_pieces.pt {CONFIG['drive_backup']}/
!cp models/arcface_resnet50.pth {CONFIG['drive_backup']}/

# Embeddings y metadata
!cp -r {CONFIG['local_dir']}/ai_data_v2/embeddings {CONFIG['drive_backup']}/
!cp -r {CONFIG['local_dir']}/manifests {CONFIG['drive_backup']}/
!cp data/lego_colors.json {CONFIG['drive_backup']}/

print(f"\n‚úÖ Backup completo en: {CONFIG['drive_backup']}")
!du -sh {CONFIG['drive_backup']}

---

## üß™ Validaci√≥n de Modelos

In [None]:
# Probar YOLO en imagen de prueba
from ultralytics import YOLO
from PIL import Image
import matplotlib.pyplot as plt

model = YOLO('models/yolov8_pieces.pt')

# Tomar imagen del dataset
test_image = f"{CONFIG['local_dir']}/ai_data_v2/renders/3001_solid_view_0050.png"
results = model.predict(test_image, conf=0.25)

# Visualizar
img = Image.open(test_image)
plt.figure(figsize=(12, 8))
plt.imshow(results[0].plot())
plt.axis('off')
plt.title(f"Detecciones: {len(results[0].boxes)}")
plt.show()

print(f"‚úÖ Modelo YOLO funcional - {len(results[0].boxes)} piezas detectadas")

In [None]:
# Probar FAISS index
import sys
sys.path.append('.')
from api.cv.vector_search import VectorSearchService

service = VectorSearchService(
    index_path=f"{CONFIG['local_dir']}/ai_data_v2/embeddings/faiss.index"
)

stats = service.get_stats()
print(f"\n‚úÖ FAISS Index funcional")
print(f"   Total embeddings: {stats['total_embeddings']:,}")
print(f"   Piezas √∫nicas: {stats['unique_pieces']}")
print(f"   Dimensi√≥n: {stats['dimension']}")

---

## üì¶ Descarga de Modelos

Si prefieres descargar directamente desde el notebook:

In [None]:
# Comprimir modelos para descarga
!zip -r lego_models.zip \
  models/yolov8_pieces.pt \
  models/arcface_resnet50.pth \
  {CONFIG['local_dir']}/ai_data_v2/embeddings/faiss.index \
  data/lego_colors.json

from google.colab import files
files.download('lego_models.zip')

print("‚úÖ Modelos comprimidos y descargando...")

---

## üìö Recursos

- **Documentaci√≥n**: Ver `COLAB_PIPELINE.md` en el repo
- **Troubleshooting**: Si Colab desconecta, ejecutar desde la √∫ltima etapa completada
- **Monitoreo**: Revisar logs en `/content/lego_training/logs/`

**Soporte**: GitHub Issues en el repositorio