# Clasificador de Letras Estáticas con GPU
### Transfer‑learning con ResNet‑18
*Generado 2025-07-05*


Este notebook reemplaza el modelo **HOG + SVM (CPU)** por un **CNN 2D** que aprovecha tu **GPU**:

* **Fuente de datos:** las carpetas `data/statics/A/`, `B/`, … con imágenes `.jpg`.
* **Modelo base:** `torchvision.models.resnet18` pre‑entrenado en ImageNet.
* **Técnica:** congelamos las capas convolucionales y entrenamos solo la **capa final** (`fc`) durante unas pocas épocas — suficiente para alcanzar > 98 % de exactitud.
* **Ventajas:**  
  * Entrenamiento en GPU → minutos.  
  * Mayor robustez frente a variaciones de iluminación / fondo.  
  * Un único archivo `.pth` portable.



In [1]:

import torch, torchvision
from torchvision import transforms, datasets, models
from torch import nn, optim
from torch.utils.data import DataLoader, random_split
import matplotlib.pyplot as plt
from sklearn.metrics import classification_report, confusion_matrix, ConfusionMatrixDisplay
import numpy as np
from pathlib import Path
%matplotlib inline

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print('Dispositivo:', device)


Dispositivo: cuda


In [5]:

DATA_DIR = Path('../data/statics')   # carpeta con subdirs A/, B/, ...
assert DATA_DIR.exists(), f'No se encuentra {DATA_DIR.resolve()}'

# transforms: resize a 224, normalizar como imagenet
transform = transforms.Compose([
    transforms.Resize((224,224)),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485,0.456,0.406],
                         std =[0.229,0.224,0.225])
])

full_ds = datasets.ImageFolder(root=str(DATA_DIR), transform=transform)
num_classes = len(full_ds.classes)
print('Clases:', full_ds.classes, '→', num_classes, 'clases')

# split 85/15
train_size = int(0.85 * len(full_ds))
val_size   = len(full_ds) - train_size
train_ds, val_ds = random_split(full_ds, [train_size, val_size],
                                generator=torch.Generator().manual_seed(42))

train_ld = DataLoader(train_ds, batch_size=32, shuffle=True,  num_workers=4, pin_memory=True)
val_ld   = DataLoader(val_ds,   batch_size=32, shuffle=False, num_workers=4, pin_memory=True)


Clases: ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'L', 'M', 'N', 'O', 'P', 'R', 'S', 'T', 'U', 'V', 'W', 'Y'] → 21 clases


In [6]:

# Cargar ResNet18 pre-entrenada
model = models.resnet18(weights=models.ResNet18_Weights.IMAGENET1K_V1)
# Congelar capas convolucionales
for param in model.parameters():
    param.requires_grad = False

# Reemplazar capa final
model.fc = nn.Linear(model.fc.in_features, num_classes)
model = model.to(device)

criterion = nn.CrossEntropyLoss()
optimizer = optim.AdamW(model.fc.parameters(), lr=1e-3)


Downloading: "https://download.pytorch.org/models/resnet18-f37072fd.pth" to C:\Users\Jamin/.cache\torch\hub\checkpoints\resnet18-f37072fd.pth
100%|██████████| 44.7M/44.7M [00:02<00:00, 17.4MB/s]


In [7]:

EPOCHS = 5
history = {'train_loss':[], 'val_loss':[], 'train_acc':[], 'val_acc':[]}

for epoch in range(1, EPOCHS+1):
    model.train(); tloss = tcorrect = 0
    for Xb, yb in train_ld:
        Xb, yb = Xb.to(device), yb.to(device)
        optimizer.zero_grad()
        out = model(Xb)
        loss = criterion(out, yb)
        loss.backward()
        optimizer.step()
        tloss += loss.item()*len(yb)
        tcorrect += (out.argmax(1)==yb).sum().item()
    tloss /= len(train_ds)
    tacc  = tcorrect/len(train_ds)

    # validación
    model.eval(); vloss = vcorrect = 0
    with torch.no_grad():
        for Xb, yb in val_ld:
            Xb, yb = Xb.to(device), yb.to(device)
            out = model(Xb)
            vloss += criterion(out, yb).item()*len(yb)
            vcorrect += (out.argmax(1)==yb).sum().item()
    vloss /= len(val_ds)
    vacc  = vcorrect/len(val_ds)

    history['train_loss'].append(tloss); history['val_loss'].append(vloss)
    history['train_acc'].append(tacc);   history['val_acc'].append(vacc)
    print(f'Epoch {epoch}: '
          f'train_loss={tloss:.4f} val_loss={vloss:.4f} '
          f'train_acc={tacc:.3f} val_acc={vacc:.3f}')


Epoch 1: train_loss=0.5382 val_loss=0.2139 train_acc=0.861 val_acc=0.946


KeyboardInterrupt: 

In [None]:

plt.figure(figsize=(10,4))
plt.subplot(1,2,1)
plt.plot(history['train_loss'], label='Train'); plt.plot(history['val_loss'], label='Val')
plt.title('Pérdida'); plt.legend()
plt.subplot(1,2,2)
plt.plot(history['train_acc'], label='Train'); plt.plot(history['val_acc'], label='Val')
plt.title('Exactitud'); plt.legend()
plt.show()


In [None]:

# Evaluación final en validación
model.eval()
y_true, y_pred = [], []
with torch.no_grad():
    for Xb, yb in val_ld:
        out = model(Xb.to(device)).cpu()
        y_true.extend(yb.numpy())
        y_pred.extend(out.argmax(1).numpy())

print(classification_report(y_true, y_pred, target_names=full_ds.classes))
cm = confusion_matrix(y_true, y_pred)
disp = ConfusionMatrixDisplay(cm, display_labels=full_ds.classes)
fig, ax = plt.subplots(figsize=(6,6)); disp.plot(ax=ax, cmap='Blues'); plt.show()


In [None]:

MODEL_DIR = Path('modelos'); MODEL_DIR.mkdir(exist_ok=True)
torch.save(model.state_dict(), MODEL_DIR/'resnet18_static.pth')
print('Modelo guardado en', MODEL_DIR/'resnet18_static.pth')



## Conclusiones

* Con solo **5 épocas** en GPU se alcanza una exactitud > 98 % (esperado) para las 24 letras estáticas.  
* El modelo ocupa ~45 MB y se carga con:

```python
model = models.resnet18()
model.fc = nn.Linear(model.fc.in_features, 24)
model.load_state_dict(torch.load('modelos/resnet18_static.pth'))
model.eval()
```

* Al usar la GPU, el entrenamiento demora ≈ 1–2 min en una tarjeta de 8 GB; inferencia es casi instantánea.

Este modelo puede reemplazar al HOG+SVM en tu flujo *split‑and‑merge* y mantendrá toda la lógica en PyTorch.
