# 📋 Estructura del Proyecto - 5 Etapas

Este notebook está organizado en 5 etapas claramente definidas para facilitar su comprensión y ejecución:

## 🔧 Etapa 1: Configuración y Descarga de Datos
- Instalación de dependencias necesarias
- Verificación de GPU y configuración del dispositivo
- Descarga automática del dataset Stanford Dogs

## 📊 Etapa 2: Preprocesamiento y Análisis Exploratorio
- Análisis de la estructura del dataset
- Visualización de muestras de datos
- Implementación de transformaciones y data loaders

## 🏗️ Etapa 3: Arquitectura del Modelo
- Definición del modelo CNN personalizado
- Configuración de optimizador y función de pérdida
- Preparación para mixed precision training

## 🚀 Etapa 4: Entrenamiento
- Entrenamiento del modelo por 5 épocas
- Monitoreo en tiempo real del progreso
- Guardado del mejor modelo

## 📈 Etapa 5: Evaluación y Resultados
- Evaluación del modelo en conjunto de prueba
- Visualización de predicciones
- Análisis de métricas de rendimiento

---

# Clasificación de Razas de Perros - Stanford Dogs Dataset
## Framework: PyTorch con CUDA
### 120 clases de razas caninas con data augmentation y redes profundas

In [1]:
# Verificar GPU disponible
!nvidia-smi

# Instalar dependencias
!python -m pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
!python -m pip install timm albumentations matplotlib seaborn scikit-learn
!python -m pip install wget gdown

Sat Jun 28 11:26:22 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 572.61                 Driver Version: 572.61         CUDA Version: 12.8     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                  Driver-Model | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|   0  NVIDIA GeForce RTX 3050 ...  WDDM  |   00000000:01:00.0 Off |                  N/A |
| N/A   46C    P8              8W /   85W |    2887MiB /   4096MiB |     33%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                


[notice] A new release of pip is available: 25.0.1 -> 25.1.1
[notice] To update, run: python.exe -m pip install --upgrade pip





[notice] A new release of pip is available: 25.0.1 -> 25.1.1
[notice] To update, run: python.exe -m pip install --upgrade pip





[notice] A new release of pip is available: 25.0.1 -> 25.1.1
[notice] To update, run: python.exe -m pip install --upgrade pip


In [2]:
import torch
import torch.nn as nn
import torch.optim as optim
import torchvision
from torchvision import transforms, datasets
from torch.utils.data import DataLoader, Dataset
import timm
import albumentations as A
from albumentations.pytorch import ToTensorV2
import cv2
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.metrics import classification_report, confusion_matrix
import os
import wget
import tarfile
from PIL import Image
import time
import json
import random
from tqdm import tqdm

# Configurar device
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f'Usando dispositivo: {device}')
if torch.cuda.is_available():
    print(f'GPU: {torch.cuda.get_device_name(0)}')
    print(f'Memoria GPU disponible: {torch.cuda.get_device_properties(0).total_memory / 1024**3:.1f} GB')

  from .autonotebook import tqdm as notebook_tqdm


Usando dispositivo: cuda
GPU: NVIDIA GeForce RTX 3050 Ti Laptop GPU
Memoria GPU disponible: 4.0 GB


# 📊 ETAPA 2: PREPROCESAMIENTO Y ANÁLISIS EXPLORATORIO
---

In [3]:
# Descargar Stanford Dogs Dataset
def download_stanford_dogs():
    if not os.path.exists('stanford_dogs'):
        print("Descargando Stanford Dogs Dataset...")
        
        # URLs del dataset
        images_url = "http://vision.stanford.edu/aditya86/ImageNetDogs/images.tar"
        annotations_url = "http://vision.stanford.edu/aditya86/ImageNetDogs/annotation.tar"
        lists_url = "http://vision.stanford.edu/aditya86/ImageNetDogs/lists.tar"
        
        os.makedirs('stanford_dogs', exist_ok=True)
        
        # Descargar archivos
        try:
            print("Descargando imágenes...")
            wget.download(images_url, 'stanford_dogs/images.tar')
            print("\nDescargando anotaciones...")
            wget.download(annotations_url, 'stanford_dogs/annotation.tar')
            print("\nDescargando listas...")
            wget.download(lists_url, 'stanford_dogs/lists.tar')
            
            # Extraer archivos
            for tar_file in ['images.tar', 'annotation.tar', 'lists.tar']:
                print(f"\nExtrayendo {tar_file}...")
                with tarfile.open(f'stanford_dogs/{tar_file}', 'r') as tar:
                    tar.extractall('stanford_dogs/')
                os.remove(f'stanford_dogs/{tar_file}')
                
        except Exception as e:
            print(f"Error descargando dataset: {e}")
            print("Usando dataset alternativo...")
            # Usar torchvision como alternativa
            return False
    return True

download_stanford_dogs()

True

In [4]:
# Preparar dataset con augmentación avanzada
class StanfordDogsDataset(Dataset):
    def __init__(self, root_dir, transform=None, is_train=True):
        self.root_dir = root_dir
        self.transform = transform
        self.is_train = is_train
        self.images = []
        self.labels = []
        self.class_to_idx = {}
        
        self._load_dataset()
    
    def _load_dataset(self):
        images_dir = os.path.join(self.root_dir, 'Images')
        if os.path.exists(images_dir):
            classes = sorted(os.listdir(images_dir))
            self.class_to_idx = {cls: idx for idx, cls in enumerate(classes)}
            
            for class_name in classes:
                class_dir = os.path.join(images_dir, class_name)
                if os.path.isdir(class_dir):
                    for img_name in os.listdir(class_dir):
                        if img_name.lower().endswith(('.jpg', '.jpeg', '.png')):
                            self.images.append(os.path.join(class_dir, img_name))
                            self.labels.append(self.class_to_idx[class_name])
        else:
            # Usar datos sintéticos para demostración
            print("Generando datos sintéticos para demostración...")
            self.class_to_idx = {f'breed_{i:03d}': i for i in range(120)}
            for i in range(12000):  # 100 imágenes por clase
                self.images.append(f'synthetic_image_{i}.jpg')
                self.labels.append(i % 120)
    
    def __len__(self):
        return len(self.images)
    
    def __getitem__(self, idx):
        img_path = self.images[idx]
        label = self.labels[idx]
        
        # Cargar imagen o crear sintética
        if os.path.exists(img_path):
            image = cv2.imread(img_path)
            image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
        else:
            # Imagen sintética para demostración
            image = np.random.randint(0, 255, (224, 224, 3), dtype=np.uint8)
        
        if self.transform:
            augmented = self.transform(image=image)
            image = augmented['image']
        
        return image.float(), torch.tensor(label, dtype=torch.long)

# Transformaciones con albumentations para mejor rendimiento
train_transform = A.Compose([
    A.Resize(256, 256),
    A.RandomCrop(224, 224),
    A.HorizontalFlip(p=0.5),
    A.RandomRotate90(p=0.3),
    A.ShiftScaleRotate(shift_limit=0.1, scale_limit=0.2, rotate_limit=30, p=0.5),
    A.RandomBrightnessContrast(brightness_limit=0.2, contrast_limit=0.2, p=0.5),
    A.HueSaturationValue(hue_shift_limit=20, sat_shift_limit=30, val_shift_limit=20, p=0.5),
    A.GaussNoise(var_limit=(10.0, 50.0), p=0.3),
    A.Blur(blur_limit=3, p=0.3),
    A.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
    ToTensorV2()
])

val_transform = A.Compose([
    A.Resize(224, 224),
    A.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
    ToTensorV2()
])

# Crear datasets
train_dataset = StanfordDogsDataset('stanford_dogs', transform=train_transform, is_train=True)
val_dataset = StanfordDogsDataset('stanford_dogs', transform=val_transform, is_train=False)

# Split train/val
train_size = int(0.8 * len(train_dataset))
val_size = len(train_dataset) - train_size
train_dataset, _ = torch.utils.data.random_split(train_dataset, [train_size, val_size])

print(f'Tamaño dataset entrenamiento: {len(train_dataset)}')
print(f'Tamaño dataset validación: {len(val_dataset)}')
print(f'Número de clases: {len(train_dataset.dataset.class_to_idx)}')

  original_init(self, **validated_kwargs)
  A.GaussNoise(var_limit=(10.0, 50.0), p=0.3),


Tamaño dataset entrenamiento: 16464
Tamaño dataset validación: 20580
Número de clases: 120


In [5]:
# Modelo optimizado con EfficientNet y técnicas avanzadas
class DogBreedClassifier(nn.Module):
    def __init__(self, num_classes=120, model_name='efficientnet_b3'):
        super(DogBreedClassifier, self).__init__()
        
        # Backbone preentrenado
        self.backbone = timm.create_model(model_name, pretrained=True, num_classes=0)
        
        # Feature dimension
        feature_dim = self.backbone.num_features
        
        # Clasificador con dropout y batch normalization
        self.classifier = nn.Sequential(
            nn.AdaptiveAvgPool2d(1),
            nn.Flatten(),
            nn.BatchNorm1d(feature_dim),
            nn.Dropout(0.5),
            nn.Linear(feature_dim, 512),
            nn.ReLU(inplace=True),
            nn.BatchNorm1d(512),
            nn.Dropout(0.3),
            nn.Linear(512, num_classes)
        )
        
        # Inicialización de pesos
        self._initialize_weights()
    
    def _initialize_weights(self):
        for m in self.classifier.modules():
            if isinstance(m, nn.Linear):
                nn.init.xavier_normal_(m.weight)
                nn.init.constant_(m.bias, 0)
    
    def forward(self, x):
        features = self.backbone(x)
        if len(features.shape) > 2:
            features = torch.mean(features, dim=[-2, -1])  # Global average pooling
        output = self.classifier(features)
        return output

# Crear modelo optimizado
model = DogBreedClassifier(num_classes=120, model_name='efficientnet_b3')
model = model.to(device)

# Compilar con optimizaciones CUDA
if torch.cuda.is_available():
    model = torch.jit.script(model)
    torch.backends.cudnn.benchmark = True
    torch.backends.cudnn.deterministic = False

print(f'Modelo creado con {sum(p.numel() for p in model.parameters()):,} parámetros')
print(f'Parámetros entrenables: {sum(p.numel() for p in model.parameters() if p.requires_grad):,}')

Modelo creado con 11,548,832 parámetros
Parámetros entrenables: 11,548,832


In [6]:
# Configuración de entrenamiento optimizada
batch_size = 32 if torch.cuda.is_available() else 16
num_workers = 4

# DataLoaders optimizados
train_loader = DataLoader(
    train_dataset, 
    batch_size=batch_size, 
    shuffle=True, 
    num_workers=num_workers,
    pin_memory=True,
    persistent_workers=True
)

val_loader = DataLoader(
    val_dataset, 
    batch_size=batch_size, 
    shuffle=False, 
    num_workers=num_workers,
    pin_memory=True,
    persistent_workers=True
)

# Optimizer con weight decay y momentum
optimizer = optim.AdamW(model.parameters(), lr=1e-3, weight_decay=1e-4)

# Scheduler con warmup
scheduler = optim.lr_scheduler.OneCycleLR(
    optimizer,
    max_lr=1e-3,
    epochs=20,
    steps_per_epoch=len(train_loader),
    pct_start=0.1,
    div_factor=10,
    final_div_factor=100
)

# Loss function con label smoothing
criterion = nn.CrossEntropyLoss(label_smoothing=0.1)

# Métricas
scaler = torch.cuda.amp.GradScaler() if torch.cuda.is_available() else None

print(f'Batch size: {batch_size}')
print(f'Steps por época: {len(train_loader)}')
print(f'Using mixed precision: {scaler is not None}')

Batch size: 32
Steps por época: 515
Using mixed precision: True


  scaler = torch.cuda.amp.GradScaler() if torch.cuda.is_available() else None


# 🏗️ ETAPA 3: ARQUITECTURA DEL MODELO
---

In [None]:
# Función de entrenamiento optimizada
def train_epoch(model, loader, optimizer, criterion, scheduler, scaler=None):
    model.train()
    total_loss = 0
    correct = 0
    total = 0
    
    pbar = tqdm(loader, desc='Training')
    for batch_idx, (data, target) in enumerate(pbar):
        data, target = data.to(device, non_blocking=True), target.to(device, non_blocking=True)
        
        optimizer.zero_grad(set_to_none=True)
        
        if scaler is not None:
            with torch.cuda.amp.autocast():
                output = model(data)
                loss = criterion(output, target)
            
            scaler.scale(loss).backward()
            scaler.step(optimizer)
            scaler.update()
        else:
            output = model(data)
            loss = criterion(output, target)
            loss.backward()
            optimizer.step()
        
        scheduler.step()
        
        total_loss += loss.item()
        pred = output.argmax(dim=1, keepdim=True)
        correct += pred.eq(target.view_as(pred)).sum().item()
        total += target.size(0)
        
        pbar.set_postfix({
            'Loss': f'{loss.item():.4f}',
            'Acc': f'{100.*correct/total:.2f}%',
            'LR': f'{scheduler.get_last_lr()[0]:.6f}'
        })
    
    return total_loss / len(loader), 100. * correct / total

def validate(model, loader, criterion):
    model.eval()
    total_loss = 0
    correct = 0
    total = 0
    
    with torch.no_grad():
        pbar = tqdm(loader, desc='Validation')
        for data, target in pbar:
            data, target = data.to(device, non_blocking=True), target.to(device, non_blocking=True)
            
            if scaler is not None:
                with torch.cuda.amp.autocast():
                    output = model(data)
                    loss = criterion(output, target)
            else:
                output = model(data)
                loss = criterion(output, target)
            
            total_loss += loss.item()
            pred = output.argmax(dim=1, keepdim=True)
            correct += pred.eq(target.view_as(pred)).sum().item()
            total += target.size(0)
            
            pbar.set_postfix({
                'Loss': f'{loss.item():.4f}',
                'Acc': f'{100.*correct/total:.2f}%'
            })
    
    return total_loss / len(loader), 100. * correct / total

# Entrenamiento
epochs = 5
best_acc = 0
train_losses, val_losses = [], []
train_accs, val_accs = [], []

print("Iniciando entrenamiento...")
start_time = time.time()

for epoch in range(epochs):
    print(f'\nÉpoca {epoch+1}/{epochs}')
    
    # Entrenamiento
    train_loss, train_acc = train_epoch(model, train_loader, optimizer, criterion, scheduler, scaler)
    
    # Validación
    val_loss, val_acc = validate(model, val_loader, criterion)
    
    # Guardar métricas
    train_losses.append(train_loss)
    val_losses.append(val_loss)
    train_accs.append(train_acc)
    val_accs.append(val_acc)
    
    print(f'Train Loss: {train_loss:.4f}, Train Acc: {train_acc:.2f}%')
    print(f'Val Loss: {val_loss:.4f}, Val Acc: {val_acc:.2f}%')
    
    # Guardar mejor modelo
    if val_acc > best_acc:
        best_acc = val_acc
        torch.save({
            'epoch': epoch,
            'model_state_dict': model.state_dict(),
            'optimizer_state_dict': optimizer.state_dict(),
            'best_acc': best_acc,
        }, 'best_dog_classifier.pth')
        print(f'Nuevo mejor modelo guardado! Accuracy: {best_acc:.2f}%')

training_time = time.time() - start_time
print(f'\nEntrenamiento completado en {training_time/60:.2f} minutos')
print(f'Mejor accuracy: {best_acc:.2f}%')

Iniciando entrenamiento...

Época 1/5


Training:   0%|          | 0/515 [00:00<?, ?it/s]

# 🚀 ETAPA 4: ENTRENAMIENTO
---

In [None]:
# Visualización de resultados
plt.figure(figsize=(15, 5))

# Loss
plt.subplot(1, 3, 1)
plt.plot(train_losses, label='Train Loss', color='blue')
plt.plot(val_losses, label='Val Loss', color='red')
plt.title('Training and Validation Loss')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.legend()
plt.grid(True)

# Accuracy
plt.subplot(1, 3, 2)
plt.plot(train_accs, label='Train Acc', color='blue')
plt.plot(val_accs, label='Val Acc', color='red')
plt.title('Training and Validation Accuracy')
plt.xlabel('Epoch')
plt.ylabel('Accuracy (%)')
plt.legend()
plt.grid(True)

# Learning rate
plt.subplot(1, 3, 3)
lrs = [scheduler.get_last_lr()[0] for _ in range(len(train_losses))]
plt.plot(lrs, color='green')
plt.title('Learning Rate Schedule')
plt.xlabel('Epoch')
plt.ylabel('Learning Rate')
plt.yscale('log')
plt.grid(True)

plt.tight_layout()
plt.show()

# Estadísticas finales
print(f"\n=== RESULTADOS FINALES ===")
print(f"Mejor accuracy de validación: {best_acc:.2f}%")
print(f"Tiempo total de entrenamiento: {training_time/60:.2f} minutos")
print(f"Promedio por época: {training_time/epochs:.2f} segundos")
print(f"Número de parámetros: {sum(p.numel() for p in model.parameters()):,}")
print(f"Tamaño del modelo: {os.path.getsize('best_dog_classifier.pth')/1024**2:.2f} MB")

# 📈 ETAPA 5: EVALUACIÓN Y RESULTADOS
---

In [None]:
# Función de inferencia optimizada
def predict_dog_breed(model, image_path, class_names, top_k=5):
    model.eval()
    
    # Cargar y preprocesar imagen
    if os.path.exists(image_path):
        image = cv2.imread(image_path)
        image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
    else:
        # Imagen sintética para demostración
        image = np.random.randint(0, 255, (224, 224, 3), dtype=np.uint8)
    
    # Aplicar transformaciones
    transformed = val_transform(image=image)
    input_tensor = transformed['image'].unsqueeze(0).to(device)
    
    with torch.no_grad():
        if scaler is not None:
            with torch.cuda.amp.autocast():
                output = model(input_tensor)
        else:
            output = model(input_tensor)
        
        probabilities = torch.softmax(output, dim=1)
        top_k_probs, top_k_indices = torch.topk(probabilities, top_k)
    
    results = []
    for i in range(top_k):
        breed_idx = top_k_indices[0][i].item()
        prob = top_k_probs[0][i].item()
        breed_name = list(class_names.keys())[breed_idx] if breed_idx < len(class_names) else f'Breed_{breed_idx}'
        results.append((breed_name, prob))
    
    return results

# Ejemplo de predicción
class_names = train_dataset.dataset.class_to_idx
example_predictions = predict_dog_breed(model, 'example_dog.jpg', class_names)

print("\n=== PREDICCIÓN DE EJEMPLO ===")
for i, (breed, prob) in enumerate(example_predictions, 1):
    print(f"{i}. {breed}: {prob*100:.2f}%")

print("\n¡Modelo de clasificación de razas de perros completado!")
print("Características implementadas:")
print("✓ EfficientNet-B3 como backbone")
print("✓ Data augmentation avanzado con Albumentations")
print("✓ Optimización CUDA con mixed precision")
print("✓ Scheduler OneCycleLR con warmup")
print("✓ Label smoothing y regularización")
print("✓ 120 clases de razas caninas")