# Módulo 3.1: PyTorch Básico con MLflow

## Teoría: PyTorch vs TensorFlow

### ¿Qué es PyTorch?
Framework de Deep Learning desarrollado por Facebook (Meta) que se ha vuelto extremadamente popular en investigación y producción.

### Diferencias clave con TensorFlow:

| Aspecto | PyTorch | TensorFlow/Keras |
|---------|---------|------------------|
| **Grafos computacionales** | Dinámicos (define-by-run) | Estáticos (define-and-run) |
| **Debugging** | Pythonic, fácil debugging | Más complejo |
| **Comunidad** | Investigación académica | Industria |
| **API** | Más bajo nivel, flexible | Alto nivel (Keras) |
| **Deployment** | TorchServe, ONNX | TF Serving, TFLite |

### Ventajas de PyTorch:
1. **Grafos dinámicos**: El grafo se construye en tiempo de ejecución
2. **Pythonic**: Código más natural y debuggeable
3. **Investigación**: Preferido en papers académicos
4. **Flexibilidad**: Control total sobre el training loop

### Componentes principales:

#### 1. Tensors
```python
import torch
x = torch.tensor([[1, 2], [3, 4]])
```
- Similar a NumPy arrays
- Soporte para GPU (CUDA)
- Autograd para diferenciación automática

#### 2. nn.Module
```python
class MyModel(nn.Module):
    def __init__(self):
        super().__init__()
        self.linear = nn.Linear(10, 1)
    
    def forward(self, x):
        return self.linear(x)
```
- Clase base para todos los modelos
- Define arquitectura en `__init__`
- Define forward pass en `forward()`

#### 3. Autograd
```python
x = torch.tensor([1.0], requires_grad=True)
y = x ** 2
y.backward()
print(x.grad)  # dy/dx = 2x = 2.0
```
- Diferenciación automática
- Tracking de gradientes

#### 4. Training Loop
En PyTorch, escribes el training loop explícitamente:
```python
for epoch in range(epochs):
    for batch in dataloader:
        optimizer.zero_grad()
        output = model(batch)
        loss = criterion(output, target)
        loss.backward()
        optimizer.step()
```

## Objetivos
- Construir modelos con PyTorch
- Integración con MLflow
- Custom training loops
- GPU acceleration
- Logging de métricas y modelos

In [None]:
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, TensorDataset
import torchvision
import torchvision.transforms as transforms
import mlflow
import mlflow.pytorch
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
from tqdm import tqdm
import warnings
warnings.filterwarnings('ignore')

print(f"PyTorch version: {torch.__version__}")
print(f"MLflow version: {mlflow.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"CUDA device: {torch.cuda.get_device_name(0)}")

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f"Using device: {device}")

In [None]:
mlflow.set_tracking_uri("http://localhost:5000")
mlflow.set_experiment("pytorch-basics")

## 1. Dataset: Fashion MNIST

In [None]:
transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.5,), (0.5,))
])

train_dataset = torchvision.datasets.FashionMNIST(
    root='./data',
    train=True,
    download=True,
    transform=transform
)

test_dataset = torchvision.datasets.FashionMNIST(
    root='./data',
    train=False,
    download=True,
    transform=transform
)

batch_size = 64
train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=batch_size, shuffle=False)

classes = ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat',
           'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot']

print(f"Train samples: {len(train_dataset)}")
print(f"Test samples: {len(test_dataset)}")
print(f"Batch size: {batch_size}")
print(f"Classes: {classes}")

## 2. Modelo Simple: Fully Connected Network

In [None]:
class SimpleNN(nn.Module):
    def __init__(self, input_size=784, hidden_size=256, num_classes=10):
        super(SimpleNN, self).__init__()
        self.fc1 = nn.Linear(input_size, hidden_size)
        self.relu = nn.ReLU()
        self.dropout = nn.Dropout(0.5)
        self.fc2 = nn.Linear(hidden_size, num_classes)
    
    def forward(self, x):
        x = x.view(x.size(0), -1)
        x = self.fc1(x)
        x = self.relu(x)
        x = self.dropout(x)
        x = self.fc2(x)
        return x

model = SimpleNN()
print(model)

total_params = sum(p.numel() for p in model.parameters())
trainable_params = sum(p.numel() for p in model.parameters() if p.requires_grad)
print(f"\nTotal parameters: {total_params:,}")
print(f"Trainable parameters: {trainable_params:,}")

## 3. Training Loop con MLflow

In [None]:
def train_epoch(model, dataloader, criterion, optimizer, device):
    model.train()
    running_loss = 0.0
    correct = 0
    total = 0
    
    for images, labels in tqdm(dataloader, desc="Training"):
        images, labels = images.to(device), labels.to(device)
        
        optimizer.zero_grad()
        
        outputs = model(images)
        loss = criterion(outputs, labels)
        
        loss.backward()
        
        optimizer.step()
        
        running_loss += loss.item()
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()
    
    epoch_loss = running_loss / len(dataloader)
    epoch_acc = 100 * correct / total
    
    return epoch_loss, epoch_acc

def evaluate(model, dataloader, criterion, device):
    model.eval()
    running_loss = 0.0
    correct = 0
    total = 0
    all_preds = []
    all_labels = []
    
    with torch.no_grad():
        for images, labels in tqdm(dataloader, desc="Evaluating"):
            images, labels = images.to(device), labels.to(device)
            
            outputs = model(images)
            loss = criterion(outputs, labels)
            
            running_loss += loss.item()
            _, predicted = torch.max(outputs.data, 1)
            total += labels.size(0)
            correct += (predicted == labels).sum().item()
            
            all_preds.extend(predicted.cpu().numpy())
            all_labels.extend(labels.cpu().numpy())
    
    epoch_loss = running_loss / len(dataloader)
    epoch_acc = 100 * correct / total
    
    return epoch_loss, epoch_acc, all_preds, all_labels

In [None]:
with mlflow.start_run(run_name="simple_nn_pytorch") as run:
    
    model = SimpleNN().to(device)
    
    lr = 0.001
    epochs = 10
    
    criterion = nn.CrossEntropyLoss()
    optimizer = optim.Adam(model.parameters(), lr=lr)
    
    mlflow.log_param("model_type", "SimpleNN")
    mlflow.log_param("optimizer", "Adam")
    mlflow.log_param("learning_rate", lr)
    mlflow.log_param("epochs", epochs)
    mlflow.log_param("batch_size", batch_size)
    mlflow.log_param("total_parameters", total_params)
    mlflow.log_param("device", str(device))
    
    train_losses = []
    train_accs = []
    val_losses = []
    val_accs = []
    
    for epoch in range(epochs):
        print(f"\nEpoch {epoch+1}/{epochs}")
        
        train_loss, train_acc = train_epoch(model, train_loader, criterion, optimizer, device)
        val_loss, val_acc, _, _ = evaluate(model, test_loader, criterion, device)
        
        train_losses.append(train_loss)
        train_accs.append(train_acc)
        val_losses.append(val_loss)
        val_accs.append(val_acc)
        
        mlflow.log_metric("train_loss", train_loss, step=epoch)
        mlflow.log_metric("train_accuracy", train_acc, step=epoch)
        mlflow.log_metric("val_loss", val_loss, step=epoch)
        mlflow.log_metric("val_accuracy", val_acc, step=epoch)
        
        print(f"Train Loss: {train_loss:.4f}, Train Acc: {train_acc:.2f}%")
        print(f"Val Loss: {val_loss:.4f}, Val Acc: {val_acc:.2f}%")
    
    final_loss, final_acc, predictions, true_labels = evaluate(model, test_loader, criterion, device)
    
    mlflow.log_metric("final_test_loss", final_loss)
    mlflow.log_metric("final_test_accuracy", final_acc)
    
    cm = confusion_matrix(true_labels, predictions)
    plt.figure(figsize=(12, 10))
    sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', xticklabels=classes, yticklabels=classes)
    plt.title('Confusion Matrix - PyTorch SimpleNN')
    plt.ylabel('True Label')
    plt.xlabel('Predicted Label')
    plt.tight_layout()
    plt.savefig('pytorch_confusion_matrix.png')
    mlflow.log_artifact('pytorch_confusion_matrix.png')
    plt.close()
    
    fig, axes = plt.subplots(1, 2, figsize=(14, 5))
    
    axes[0].plot(train_losses, label='Train Loss')
    axes[0].plot(val_losses, label='Val Loss')
    axes[0].set_xlabel('Epoch')
    axes[0].set_ylabel('Loss')
    axes[0].set_title('Training and Validation Loss')
    axes[0].legend()
    axes[0].grid(True, alpha=0.3)
    
    axes[1].plot(train_accs, label='Train Accuracy')
    axes[1].plot(val_accs, label='Val Accuracy')
    axes[1].set_xlabel('Epoch')
    axes[1].set_ylabel('Accuracy (%)')
    axes[1].set_title('Training and Validation Accuracy')
    axes[1].legend()
    axes[1].grid(True, alpha=0.3)
    
    plt.tight_layout()
    plt.savefig('pytorch_training_history.png')
    mlflow.log_artifact('pytorch_training_history.png')
    plt.show()
    
    mlflow.pytorch.log_model(model, "pytorch_model")
    
    torch.save(model.state_dict(), 'model_weights.pth')
    mlflow.log_artifact('model_weights.pth')
    
    mlflow.set_tag("framework", "PyTorch")
    mlflow.set_tag("dataset", "FashionMNIST")
    
    print(f"\nFinal Test Accuracy: {final_acc:.2f}%")
    print(f"Run ID: {run.info.run_id}")

## 4. CNN con PyTorch

In [None]:
class ConvNet(nn.Module):
    def __init__(self, num_classes=10):
        super(ConvNet, self).__init__()
        
        self.conv_layers = nn.Sequential(
            nn.Conv2d(1, 32, kernel_size=3, padding=1),
            nn.BatchNorm2d(32),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2),
            
            nn.Conv2d(32, 64, kernel_size=3, padding=1),
            nn.BatchNorm2d(64),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2),
            
            nn.Conv2d(64, 128, kernel_size=3, padding=1),
            nn.BatchNorm2d(128),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2)
        )
        
        self.fc_layers = nn.Sequential(
            nn.Flatten(),
            nn.Linear(128 * 3 * 3, 256),
            nn.ReLU(),
            nn.Dropout(0.5),
            nn.Linear(256, num_classes)
        )
    
    def forward(self, x):
        x = self.conv_layers(x)
        x = self.fc_layers(x)
        return x

cnn_model = ConvNet().to(device)
print(cnn_model)
print(f"\nTotal parameters: {sum(p.numel() for p in cnn_model.parameters()):,}")

In [None]:
with mlflow.start_run(run_name="cnn_pytorch") as run:
    
    model = ConvNet().to(device)
    
    lr = 0.001
    epochs = 15
    
    criterion = nn.CrossEntropyLoss()
    optimizer = optim.Adam(model.parameters(), lr=lr)
    scheduler = optim.lr_scheduler.StepLR(optimizer, step_size=5, gamma=0.5)
    
    mlflow.log_param("model_type", "ConvNet")
    mlflow.log_param("optimizer", "Adam")
    mlflow.log_param("learning_rate", lr)
    mlflow.log_param("epochs", epochs)
    mlflow.log_param("scheduler", "StepLR")
    
    best_acc = 0.0
    
    for epoch in range(epochs):
        print(f"\nEpoch {epoch+1}/{epochs}")
        
        train_loss, train_acc = train_epoch(model, train_loader, criterion, optimizer, device)
        val_loss, val_acc, _, _ = evaluate(model, test_loader, criterion, device)
        
        scheduler.step()
        
        mlflow.log_metric("train_loss", train_loss, step=epoch)
        mlflow.log_metric("train_accuracy", train_acc, step=epoch)
        mlflow.log_metric("val_loss", val_loss, step=epoch)
        mlflow.log_metric("val_accuracy", val_acc, step=epoch)
        mlflow.log_metric("learning_rate", optimizer.param_groups[0]['lr'], step=epoch)
        
        if val_acc > best_acc:
            best_acc = val_acc
            torch.save(model.state_dict(), 'best_cnn_model.pth')
        
        print(f"Train Loss: {train_loss:.4f}, Train Acc: {train_acc:.2f}%")
        print(f"Val Loss: {val_loss:.4f}, Val Acc: {val_acc:.2f}%")
        print(f"Current LR: {optimizer.param_groups[0]['lr']:.6f}")
    
    mlflow.log_metric("best_accuracy", best_acc)
    mlflow.pytorch.log_model(model, "cnn_model")
    mlflow.log_artifact('best_cnn_model.pth')
    
    print(f"\nBest Validation Accuracy: {best_acc:.2f}%")

## Resumen del Módulo 3.1

### Conceptos Clave PyTorch:

1. **nn.Module**: Clase base para modelos
2. **Training Loop**: Control explícito del entrenamiento
3. **Autograd**: Diferenciación automática
4. **DataLoader**: Gestión eficiente de batches
5. **Device**: Manejo de CPU/GPU

### MLflow con PyTorch:
- `mlflow.pytorch.log_model()`: Guardar modelos
- `mlflow.pytorch.load_model()`: Cargar modelos
- Logging manual de métricas por época
- Guardar state_dict para checkpoints

### Próximo Notebook:
Transfer Learning con modelos preentrenados