# üöÄ Atheria 4: Entrenamiento Progresivo (Long-Running)

Notebook optimizado para **entrenamientos largos** (muchas horas) con aprovechamiento m√°ximo de cuota de GPU.

## ‚ú® Caracter√≠sticas

- üîÑ **Auto-guardado en Google Drive** - Sincronizaci√≥n configurable
- üìä **Monitoreo de recursos** - GPU%, RAM, tiempo de sesi√≥n
- ‚ö° **Auto-recuperaci√≥n** - Contin√∫a autom√°ticamente desde checkpoints
- ‚è∞ **L√≠mite de tiempo autom√°tico** - Guardado de emergencia antes de timeout
- üìà **Visualizaci√≥n en tiempo real** - Progreso, m√©tricas, recursos
- üíæ **Smart Checkpointing** - Solo guarda mejores modelos + √∫ltimo

## üìã Cuotas de GPU

- **Colab Free**: ~12 horas/d√≠a (variable)
- **Colab Pro**: ~24 horas continuas
- **Kaggle**: 30 horas/semana (T4/P100)

## üéØ Workflow Recomendado

1. Configurar experimento (Secci√≥n 4)
2. Ejecutar todas las celdas autom√°ticamente
3. Dejar corriendo sin supervisi√≥n
4. Checkpoints se guardan autom√°ticamente en Drive
5. Si se desconecta: ejecutar de nuevo, auto-recupera desde Drive

---
## üì¶ Secci√≥n 1: Setup y Detecci√≥n de Entorno

In [None]:
# Detectar entorno (Kaggle o Colab)
import os
import sys
from pathlib import Path

# IMPORTANTE: Verificar Kaggle PRIMERO (tiene google.colab instalado pero no funciona)
IN_KAGGLE = os.path.exists("/kaggle/input") or os.path.exists("/kaggle/working")

# Solo si NO es Kaggle, verificar Colab
if not IN_KAGGLE:
    try:
        import google.colab
        # Verificar que realmente estemos en Colab (existe /content)
        IN_COLAB = os.path.exists("/content")
    except:
        IN_COLAB = False
else:
    IN_COLAB = False

ENV_NAME = "Kaggle" if IN_KAGGLE else "Colab" if IN_COLAB else "Local"
print(f"üåç Entorno detectado: {ENV_NAME}")

# Instalar dependencias b√°sicas
print("üì¶ Instalando dependencias...")
%pip install -q snntorch scikit-learn matplotlib

# Para Colab: instalar pybind11 (opcional, solo para motor nativo)
if IN_COLAB:
    %pip install -q pybind11

print("‚úÖ Dependencias instaladas")

---
## üíæ Secci√≥n 2: Google Drive - Montaje y Configuraci√≥n

In [None]:
# Montar Google Drive (solo Colab)
if IN_COLAB:
    from google.colab import drive
    print("üìÅ Montando Google Drive...")
    drive.mount('/content/drive')
    DRIVE_ROOT = Path("/content/drive/MyDrive/Atheria")
    print(f"‚úÖ Drive montado en: {DRIVE_ROOT}")
elif IN_KAGGLE:
    # En Kaggle, usar /kaggle/working como alternativa
    DRIVE_ROOT = Path("/kaggle/working/atheria_checkpoints")
    print(f"üìÅ Usando directorio local: {DRIVE_ROOT}")
else:
    DRIVE_ROOT = Path.home() / "atheria_checkpoints"
    print(f"üíª Usando directorio local: {DRIVE_ROOT}")

# Crear estructura de carpetas
DRIVE_CHECKPOINT_DIR = DRIVE_ROOT / "checkpoints"
DRIVE_LOGS_DIR = DRIVE_ROOT / "logs"
DRIVE_EXPORTS_DIR = DRIVE_ROOT / "exports"

for directory in [DRIVE_CHECKPOINT_DIR, DRIVE_LOGS_DIR, DRIVE_EXPORTS_DIR]:
    directory.mkdir(parents=True, exist_ok=True)

print(f"\nüìÇ Estructura de carpetas creada:")
print(f"  - Checkpoints: {DRIVE_CHECKPOINT_DIR}")
print(f"  - Logs: {DRIVE_LOGS_DIR}")
print(f"  - Exports: {DRIVE_EXPORTS_DIR}")

---
## üîß Secci√≥n 3: Clonar Proyecto Atheria

In [None]:
# Configurar ruta del proyecto
if IN_KAGGLE:
    PROJECT_ROOT = Path("/kaggle/working/Atheria")
    if not PROJECT_ROOT.exists():
        print("‚ö†Ô∏è Proyecto no encontrado. Clonando desde GitHub...")
        !git clone https://github.com/Jonakss/Atheria.git /kaggle/working/Atheria
elif IN_COLAB:
    PROJECT_ROOT = Path("/content/Atheria")
    if not PROJECT_ROOT.exists():
        print("üì• Clonando proyecto desde GitHub...")
        !git clone https://github.com/Jonakss/Atheria.git /content/Atheria
else:
    # Local: asumir que estamos en notebooks/
    PROJECT_ROOT = Path.cwd().parent if Path.cwd().name == 'notebooks' else Path.cwd()

# Agregar al path de Python
if str(PROJECT_ROOT) not in sys.path:
    sys.path.insert(0, str(PROJECT_ROOT))

print(f"üìÅ Proyecto configurado en: {PROJECT_ROOT}")

# Verificar estructura b√°sica
src_path = PROJECT_ROOT / "src"
if src_path.exists():
    print("‚úÖ Estructura del proyecto verificada")
else:
    print("‚ùå Error: No se encontr√≥ la carpeta 'src'. Verifica la instalaci√≥n.")

---
## üìä Secci√≥n 4: Utilidades de Monitoreo de Recursos

In [None]:
import torch
import psutil
import time
from datetime import datetime, timedelta
from IPython.display import clear_output

class ResourceMonitor:
    """Monitor de recursos GPU/RAM/Tiempo"""
    
    def __init__(self, max_training_hours=10):
        self.start_time = time.time()
        self.max_training_seconds = max_training_hours * 3600
        self.max_training_hours = max_training_hours
        
    def get_gpu_usage(self):
        """Retorna uso de GPU en %"""
        if not torch.cuda.is_available():
            return 0.0
        try:
            return torch.cuda.utilization()
        except:
            return 0.0
    
    def get_gpu_memory(self):
        """Retorna memoria GPU usada/total en GB"""
        if not torch.cuda.is_available():
            return 0.0, 0.0
        allocated = torch.cuda.memory_allocated() / 1e9  # GB
        reserved = torch.cuda.memory_reserved() / 1e9
        return allocated, reserved
    
    def get_ram_usage(self):
        """Retorna uso de RAM en GB"""
        mem = psutil.virtual_memory()
        return mem.used / 1e9, mem.total / 1e9
    
    def get_elapsed_time(self):
        """Retorna tiempo transcurrido y restante"""
        elapsed = time.time() - self.start_time
        remaining = max(0, self.max_training_seconds - elapsed)
        return elapsed, remaining
    
    def should_stop(self):
        """True si se acerca al l√≠mite de tiempo (90%)"""
        elapsed, remaining = self.get_elapsed_time()
        return remaining < (self.max_training_seconds * 0.1)  # 10% restante
    
    def get_status_str(self):
        """Retorna string con estado de recursos"""
        gpu_usage = self.get_gpu_usage()
        gpu_mem_used, gpu_mem_reserved = self.get_gpu_memory()
        ram_used, ram_total = self.get_ram_usage()
        elapsed, remaining = self.get_elapsed_time()
        
        elapsed_str = str(timedelta(seconds=int(elapsed)))
        remaining_str = str(timedelta(seconds=int(remaining)))
        
        status = (
            f"üìä RECURSOS:\n"
            f"  GPU Utilization: {gpu_usage:.1f}%\n"
            f"  GPU Memory: {gpu_mem_used:.2f}GB / {gpu_mem_reserved:.2f}GB\n"
            f"  RAM: {ram_used:.2f}GB / {ram_total:.2f}GB ({ram_used/ram_total*100:.1f}%)\n"
            f"  \n"
            f"‚è∞ TIEMPO:\n"
            f"  Transcurrido: {elapsed_str}\n"
            f"  Restante: {remaining_str} (de {self.max_training_hours}h m√°ximo)\n"
        )
        return status

print("‚úÖ ResourceMonitor definido")

---
## ‚öôÔ∏è Secci√≥n 5: Configuraci√≥n del Experimento

In [None]:
from types import SimpleNamespace

# ============================================================================
# üéØ CONFIGURACI√ìN DEL EXPERIMENTO - MODIFICA AQU√ç
# ============================================================================

EXPERIMENT_CONFIG = {
    # Identificaci√≥n
    "EXPERIMENT_NAME": "UNET_64ch_D8_Progressive",
    
    # Arquitectura
    # Opciones: "UNET", "SNN_UNET", "MLP", "DEEP_QCA", "UNET_CONVLSTM", "UNET_UNITARY"
    "MODEL_ARCHITECTURE": "UNET",
    
    # Par√°metros del modelo
    "MODEL_PARAMS": {
        "d_state": 8,           # Dimensi√≥n del estado cu√°ntico
        "hidden_channels": 64,  # Canales ocultos (32, 64, 128)
    },
    
    # Entrenamiento
    "GRID_SIZE_TRAINING": 64,     # Tama√±o del grid
    "QCA_STEPS_TRAINING": 100,    # Pasos de simulaci√≥n por episodio
    "LR_RATE_M": 1e-4,            # Learning rate
    "GAMMA_DECAY": 0.01,          # Decaimiento Lindbladian
    
    # Configuraci√≥n progresiva
    "TOTAL_EPISODES": 1000,       # Total de episodios a entrenar
    "SAVE_EVERY_EPISODES": 10,    # Checkpoint local cada N episodios
    "DRIVE_SYNC_EVERY": 50,       # Sincronizar a Drive cada N episodios
    "MAX_TRAINING_HOURS": 10,     # Tiempo m√°ximo de entrenamiento
    
    # Auto-recuperaci√≥n
    "AUTO_RESUME": True,          # Continuar autom√°ticamente desde Drive
    
    # Checkpointing inteligente
    "MAX_CHECKPOINTS_TO_KEEP": 5, # Mejores N modelos a guardar
}

# Convertir a SimpleNamespace
exp_cfg = SimpleNamespace(**EXPERIMENT_CONFIG)
exp_cfg.MODEL_PARAMS = SimpleNamespace(**EXPERIMENT_CONFIG["MODEL_PARAMS"])

# Detectar dispositivo
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
exp_cfg.DEVICE = device

# Directorios espec√≠ficos del experimento
EXPERIMENT_CHECKPOINT_DIR = DRIVE_CHECKPOINT_DIR / exp_cfg.EXPERIMENT_NAME
EXPERIMENT_LOG_DIR = DRIVE_LOGS_DIR / exp_cfg.EXPERIMENT_NAME
EXPERIMENT_CHECKPOINT_DIR.mkdir(parents=True, exist_ok=True)
EXPERIMENT_LOG_DIR.mkdir(parents=True, exist_ok=True)

# Tambi√©n crear directorios locales (proyecto)
LOCAL_CHECKPOINT_DIR = PROJECT_ROOT / "output" / "checkpoints" / exp_cfg.EXPERIMENT_NAME
LOCAL_CHECKPOINT_DIR.mkdir(parents=True, exist_ok=True)

print("=" * 70)
print("üìä CONFIGURACI√ìN DEL EXPERIMENTO")
print("=" * 70)
print(f"Nombre: {exp_cfg.EXPERIMENT_NAME}")
print(f"Arquitectura: {exp_cfg.MODEL_ARCHITECTURE}")
print(f"Device: {device}")
print(f"Grid: {exp_cfg.GRID_SIZE_TRAINING}x{exp_cfg.GRID_SIZE_TRAINING}")
print(f"Total Episodios: {exp_cfg.TOTAL_EPISODES}")
print(f"Learning Rate: {exp_cfg.LR_RATE_M}")
print(f"Tiempo M√°ximo: {exp_cfg.MAX_TRAINING_HOURS}h")
print(f"\nüìÅ Checkpoints Drive: {EXPERIMENT_CHECKPOINT_DIR}")
print(f"üìÅ Checkpoints Local: {LOCAL_CHECKPOINT_DIR}")
print("=" * 70)

---
## üîÑ Secci√≥n 6: Verificar/Cargar Checkpoint Existente (Auto-Resume)

In [None]:
import glob
import logging

logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
logger = logging.getLogger(__name__)

def find_latest_checkpoint(checkpoint_dir):
    """Encuentra el √∫ltimo checkpoint en un directorio"""
    checkpoints = list(checkpoint_dir.glob("*.pth"))
    if not checkpoints:
        return None
    
    # Buscar last_model.pth primero
    last_model = checkpoint_dir / "last_model.pth"
    if last_model.exists():
        return str(last_model)
    
    # Si no, buscar el m√°s reciente
    latest = max(checkpoints, key=lambda p: p.stat().st_mtime)
    return str(latest)

# Buscar checkpoint existente
checkpoint_path = None
resume_from_episode = 0

if exp_cfg.AUTO_RESUME:
    print("üîç Buscando checkpoint existente...")
    
    # Primero buscar en Drive
    drive_checkpoint = find_latest_checkpoint(EXPERIMENT_CHECKPOINT_DIR)
    if drive_checkpoint:
        checkpoint_path = drive_checkpoint
        print(f"‚úÖ Checkpoint encontrado en Drive: {Path(checkpoint_path).name}")
    else:
        # Si no hay en Drive, buscar en local
        local_checkpoint = find_latest_checkpoint(LOCAL_CHECKPOINT_DIR)
        if local_checkpoint:
            checkpoint_path = local_checkpoint
            print(f"‚úÖ Checkpoint encontrado localmente: {Path(checkpoint_path).name}")
    
    if checkpoint_path:
        # Cargar checkpoint para ver en qu√© episodio estamos
        try:
            ckpt_data = torch.load(checkpoint_path, map_location='cpu')
            resume_from_episode = ckpt_data.get('episode', 0)
            print(f"üîÑ Continuando desde episodio {resume_from_episode}")
        except Exception as e:
            logger.warning(f"‚ö†Ô∏è Error leyendo checkpoint: {e}")
            checkpoint_path = None

if checkpoint_path is None:
    print("üÜï No se encontr√≥ checkpoint previo. Iniciando desde cero.")

# Calcular episodios restantes
episodes_remaining = max(0, exp_cfg.TOTAL_EPISODES - resume_from_episode)
print(f"\nüìà Episodios restantes: {episodes_remaining}/{exp_cfg.TOTAL_EPISODES}")

---
## üöÄ Secci√≥n 7: Loop de Entrenamiento Progresivo

In [None]:
from src.trainers.qc_trainer_v4 import QC_Trainer_v4
from src.model_loader import instantiate_model, load_weights, load_checkpoint_data
import matplotlib.pyplot as plt
from IPython.display import clear_output, display
import json

# Inicializar monitor de recursos
monitor = ResourceMonitor(max_training_hours=exp_cfg.MAX_TRAINING_HOURS)

# Instanciar modelo
model = instantiate_model(exp_cfg)

# Cargar pesos si hay checkpoint
if checkpoint_path:
    ckpt_data = load_checkpoint_data(checkpoint_path)
    if ckpt_data:
        load_weights(model, ckpt_data)

# Inicializar trainer
trainer = QC_Trainer_v4(
    experiment_name=exp_cfg.EXPERIMENT_NAME,
    model=model,
    device=device,
    lr=exp_cfg.LR_RATE_M,
    grid_size=exp_cfg.GRID_SIZE_TRAINING,
    qca_steps=exp_cfg.QCA_STEPS_TRAINING,
    gamma_decay=exp_cfg.GAMMA_DECAY,
    max_checkpoints_to_keep=exp_cfg.MAX_CHECKPOINTS_TO_KEEP
)

# Tracking de m√©tricas
training_log = {
    "experiment_name": exp_cfg.EXPERIMENT_NAME,
    "config": EXPERIMENT_CONFIG,
    "episodes": [],
    "losses": [],
    "metrics": [],
    "checkpoints_saved": []
}

# Funci√≥n auxiliar para guardar en Drive
def sync_checkpoint_to_drive(local_path, episode_num):
    """Copia checkpoint local a Drive"""
    try:
        import shutil
        filename = Path(local_path).name
        drive_path = EXPERIMENT_CHECKPOINT_DIR / filename
        shutil.copy2(local_path, drive_path)
        logger.info(f"üíæ Checkpoint sincronizado a Drive: {filename}")
        return True
    except Exception as e:
        logger.error(f"‚ùå Error sincronizando a Drive: {e}")
        return False

# ENTRENAMIENTO PROGRESIVO
print("\n" + "=" * 70)
print("üéØ INICIANDO ENTRENAMIENTO PROGRESIVO")
print("=" * 70)

try:
    for episode in range(resume_from_episode, exp_cfg.TOTAL_EPISODES):
        
        # Verificar l√≠mite de tiempo ANTES de empezar episodio
        if monitor.should_stop():
            print("\n‚è∞ ¬°L√çMITE DE TIEMPO ALCANZADO!")
            print("üíæ Guardando checkpoint de emergencia...")
            
            # Guardar checkpoint de emergencia
            trainer.save_checkpoint(episode, is_best=False)
            
            # Sincronizar a Drive
            last_checkpoint = LOCAL_CHECKPOINT_DIR / "last_model.pth"
            if last_checkpoint.exists():
                sync_checkpoint_to_drive(str(last_checkpoint), episode)
            
            print(f"‚úÖ Entrenamiento detenido en episodio {episode}")
            print(f"üìä Progreso: {episode}/{exp_cfg.TOTAL_EPISODES} episodios ({episode/exp_cfg.TOTAL_EPISODES*100:.1f}%)")
            break
        
        # Entrenar episodio
        epoch_result = trainer.train_episode(episode)
        
        # Registrar m√©tricas
        training_log["episodes"].append(episode)
        training_log["losses"].append(epoch_result.get("loss_total", 0))
        training_log["metrics"].append(epoch_result)
        
        # Guardar checkpoint peri√≥dicamente
        if (episode + 1) % exp_cfg.SAVE_EVERY_EPISODES == 0:
            is_best = (episode + 1) % (exp_cfg.SAVE_EVERY_EPISODES * 5) == 0  # Cada 50 eps = best candidate
            trainer.save_checkpoint(episode, is_best=is_best)
            training_log["checkpoints_saved"].append(episode)
        
        # Sincronizar a Drive peri√≥dicamente
        if (episode + 1) % exp_cfg.DRIVE_SYNC_EVERY == 0:
            print(f"\nüíæ Sincronizando checkpoint a Drive (episodio {episode})...")
            last_checkpoint = LOCAL_CHECKPOINT_DIR / "last_model.pth"
            if last_checkpoint.exists():
                sync_checkpoint_to_drive(str(last_checkpoint), episode)
            
            # Tambi√©n sincronizar mejores modelos
            best_checkpoint = LOCAL_CHECKPOINT_DIR / "best_model.pth"
            if best_checkpoint.exists():
                sync_checkpoint_to_drive(str(best_checkpoint), episode)
        
        # Visualizaci√≥n cada 10 episodios
        if (episode + 1) % 10 == 0:
            clear_output(wait=True)
            
            print("=" * 70)
            print(f"üìà PROGRESO: Episodio {episode + 1}/{exp_cfg.TOTAL_EPISODES} ({(episode+1)/exp_cfg.TOTAL_EPISODES*100:.1f}%)")
            print("=" * 70)
            
            # Mostrar recursos
            print(monitor.get_status_str())
            
            # Mostrar √∫ltimas m√©tricas
            if len(training_log["losses"]) > 0:
                recent_losses = training_log["losses"][-10:]
                avg_loss = sum(recent_losses) / len(recent_losses)
                print(f"\nüìä M√âTRICAS (√∫ltimos 10 episodios):")
                print(f"  Loss promedio: {avg_loss:.4f}")
                print(f"  Loss actual: {training_log['losses'][-1]:.4f}")
            
            # Gr√°fico de p√©rdida
            if len(training_log["episodes"]) > 1:
                plt.figure(figsize=(10, 4))
                plt.plot(training_log["episodes"], training_log["losses"], 'b-', alpha=0.6)
                plt.xlabel('Episodio')
                plt.ylabel('Loss')
                plt.title(f'{exp_cfg.EXPERIMENT_NAME} - Progreso de Entrenamiento')
                plt.grid(True, alpha=0.3)
                plt.tight_layout()
                plt.show()
    
    print("\n" + "=" * 70)
    print("‚úÖ ENTRENAMIENTO COMPLETADO EXITOSAMENTE")
    print("=" * 70)
    
except KeyboardInterrupt:
    print("\n‚ö†Ô∏è Entrenamiento interrumpido por usuario")
    print("üíæ Guardando checkpoint...")
    trainer.save_checkpoint(episode, is_best=False)
    
except Exception as e:
    print(f"\n‚ùå ERROR DURANTE ENTRENAMIENTO: {e}")
    logger.error(f"Error: {e}", exc_info=True)
    print("üíæ Guardando checkpoint de emergencia...")
    try:
        trainer.save_checkpoint(episode, is_best=False)
    except:
        pass
    raise

finally:
    # Guardar log de entrenamiento en Drive
    log_file = EXPERIMENT_LOG_DIR / f"training_log_{datetime.now().strftime('%Y%m%d_%H%M%S')}.json"
    with open(log_file, 'w') as f:
        json.dump(training_log, f, indent=2)
    print(f"\nüìÑ Log guardado en: {log_file}")

---
## üìä Secci√≥n 8: Visualizaci√≥n de Resultados Finales

In [None]:
import matplotlib.pyplot as plt
import numpy as np

if len(training_log["episodes"]) > 0:
    fig, axes = plt.subplots(2, 2, figsize=(14, 10))
    
    # 1. P√©rdida total
    axes[0, 0].plot(training_log["episodes"], training_log["losses"], 'b-', alpha=0.6, label='Loss')
    axes[0, 0].set_xlabel('Episodio')
    axes[0, 0].set_ylabel('Loss Total')
    axes[0, 0].set_title('Evoluci√≥n de la P√©rdida')
    axes[0, 0].grid(True, alpha=0.3)
    axes[0, 0].legend()
    
    # 2. M√©tricas individuales (si est√°n disponibles)
    if len(training_log["metrics"]) > 0 and "survival_rate" in training_log["metrics"][0]:
        survival_rates = [m.get("survival_rate", 0) for m in training_log["metrics"]]
        axes[0, 1].plot(training_log["episodes"], survival_rates, 'g-', alpha=0.6, label='Survival Rate')
        axes[0, 1].set_xlabel('Episodio')
        axes[0, 1].set_ylabel('Survival Rate')
        axes[0, 1].set_title('Tasa de Supervivencia')
        axes[0, 1].grid(True, alpha=0.3)
        axes[0, 1].legend()
    
    # 3. Checkpoints guardados
    axes[1, 0].scatter(training_log["checkpoints_saved"], 
                      [1] * len(training_log["checkpoints_saved"]), 
                      c='red', marker='|', s=100, label='Checkpoint')
    axes[1, 0].set_xlabel('Episodio')
    axes[1, 0].set_yticks([])
    axes[1, 0].set_title('Checkpoints Guardados')
    axes[1, 0].grid(True, alpha=0.3, axis='x')
    axes[1, 0].legend()
    
    # 4. Histograma de p√©rdidas
    axes[1, 1].hist(training_log["losses"], bins=30, alpha=0.7, color='purple', edgecolor='black')
    axes[1, 1].set_xlabel('Loss')
    axes[1, 1].set_ylabel('Frecuencia')
    axes[1, 1].set_title('Distribuci√≥n de P√©rdidas')
    axes[1, 1].grid(True, alpha=0.3, axis='y')
    
    plt.tight_layout()
    plt.savefig(EXPERIMENT_LOG_DIR / 'training_summary.png', dpi=150, bbox_inches='tight')
    plt.show()
    
    print(f"\nüìä Gr√°ficos guardados en: {EXPERIMENT_LOG_DIR / 'training_summary.png'}")
else:
    print("‚ö†Ô∏è No hay datos de entrenamiento para visualizar")

---
## üì¶ Secci√≥n 9: Exportaci√≥n y Finalizaci√≥n

In [None]:
from src.model_loader import load_model
import shutil

print("üì§ Exportando modelo final...\n")

# Encontrar mejor checkpoint
best_checkpoint = LOCAL_CHECKPOINT_DIR / "best_model.pth"
if not best_checkpoint.exists():
    best_checkpoint = find_latest_checkpoint(LOCAL_CHECKPOINT_DIR)

if best_checkpoint:
    print(f"üì• Cargando modelo desde: {Path(best_checkpoint).name}")
    
    # Cargar modelo
    model = load_model(exp_cfg, str(best_checkpoint))
    model.eval()
    model.to(device)
    
    # Exportar a TorchScript
    example_input = torch.randn(1, 2 * exp_cfg.MODEL_PARAMS.d_state, 
                                exp_cfg.GRID_SIZE_TRAINING, 
                                exp_cfg.GRID_SIZE_TRAINING, device=device)
    
    torchscript_path = DRIVE_EXPORTS_DIR / f"{exp_cfg.EXPERIMENT_NAME}_model.pt"
    
    try:
        traced_model = torch.jit.trace(model, example_input, strict=False)
        traced_model.save(str(torchscript_path))
        print(f"‚úÖ Modelo TorchScript exportado: {torchscript_path.name}")
    except Exception as e:
        print(f"‚ö†Ô∏è Error exportando TorchScript: {e}")
    
    # Copiar mejor checkpoint a Drive
    drive_best_checkpoint = EXPERIMENT_CHECKPOINT_DIR / "best_model_FINAL.pth"
    shutil.copy2(best_checkpoint, drive_best_checkpoint)
    print(f"‚úÖ Mejor checkpoint copiado: {drive_best_checkpoint.name}")
    
    # Generar reporte de entrenamiento
    report_path = EXPERIMENT_LOG_DIR / f"{exp_cfg.EXPERIMENT_NAME}_REPORT.md"
    
    with open(report_path, 'w') as f:
        f.write(f"# Reporte de Entrenamiento: {exp_cfg.EXPERIMENT_NAME}\n\n")
        f.write(f"**Fecha:** {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}\n\n")
        f.write(f"## Configuraci√≥n\n\n")
        f.write(f"```json\n{json.dumps(EXPERIMENT_CONFIG, indent=2)}\n```\n\n")
        f.write(f"## Resultados\n\n")
        f.write(f"- **Total de episodios completados:** {len(training_log['episodes'])}\n")
        f.write(f"- **Checkpoints guardados:** {len(training_log['checkpoints_saved'])}\n")
        
        if len(training_log["losses"]) > 0:
            f.write(f"- **Loss final:** {training_log['losses'][-1]:.4f}\n")
            f.write(f"- **Loss promedio:** {np.mean(training_log['losses']):.4f}\n")
            f.write(f"- **Loss m√≠nimo:** {min(training_log['losses']):.4f}\n")
        
        f.write(f"\n## Archivos\n\n")
        f.write(f"- Mejor modelo: `{drive_best_checkpoint.name}`\n")
        f.write(f"- TorchScript: `{torchscript_path.name}`\n")
        f.write(f"- Logs: `{EXPERIMENT_LOG_DIR.name}`\n")
    
    print(f"‚úÖ Reporte generado: {report_path.name}")
    
else:
    print("‚ö†Ô∏è No se encontr√≥ checkpoint para exportar")

print("\n" + "=" * 70)
print("üéâ EXPORTACI√ìN COMPLETADA")
print("=" * 70)
print(f"\nüìÅ Todos los archivos est√°n en Drive:")
print(f"  - Checkpoints: {EXPERIMENT_CHECKPOINT_DIR}")
print(f"  - Logs: {EXPERIMENT_LOG_DIR}")
print(f"  - Exports: {DRIVE_EXPORTS_DIR}")
print("\n‚úÖ Puedes cerrar el notebook. Todo est√° guardado en Drive.")

---
## üìù Notas Finales

### ‚úÖ Checklist de Verificaci√≥n

- ‚úì Checkpoints guardados en Drive cada 50 episodios
- ‚úì Logs de entrenamiento persistidos
- ‚úì Modelo final exportado a TorchScript
- ‚úì Reporte de entrenamiento generado
- ‚úì Auto-recuperaci√≥n configurada para pr√≥xima sesi√≥n

### üîÑ Para Continuar Entrenamiento

1. Ejecutar este notebook de nuevo
2. Mantener `AUTO_RESUME = True`
3. Ajustar `TOTAL_EPISODES` si quieres m√°s episodios
4. El notebook detectar√° autom√°ticamente el √∫ltimo checkpoint en Drive

### üìä Mejores Pr√°cticas

- **Colab Free**: Entrenar en sesiones de 6-8 horas
- **Colab Pro**: Sesiones de 12-20 horas
- **Kaggle**: Aprovechar las 30 horas semanales
- Siempre verificar que Drive est√© sincronizando correctamente
- Revisar gr√°ficos cada 50-100 episodios para detectar problemas

---

**¬°Feliz entrenamiento! üöÄ**