## $\color{#dda}{\text{0. Libs}}$

MNIST Classification: Comparación de 3 enfoques diferentes
1. MLP (Multi-Layer Perceptron) - Baseline denso
2. CNN (Convolutional Neural Network) - LeNet-like con BatchNorm
3. HOG + SVM (Clásico) - Histograms of Oriented Gradients + Linear SVM

Objetivo: Entrenar cada modelo y luego evaluar con AIC, BIC y FIC

In [None]:
#!/usr/bin/env python3

In [None]:
import sys
import os

project_path = r'C:\Users\hecto\OneDrive\Escritorio\Personal\iroFactory\31.FLOPs-Information-Criterion'
if project_path not in sys.path:
    sys.path.insert(0, project_path)

# Fix OpenMP
os.environ['KMP_DUPLICATE_LIB_OK'] = 'TRUE'

import numpy as np
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, TensorDataset
from sklearn.svm import LinearSVC
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
from sklearn.datasets import fetch_openml
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsClassifier
import time

## $\color{#dda}{\text{1. Configuración de parámetros generales}}$

In [None]:
BATCH_SIZE = 128
EPOCHS = 10
LEARNING_RATE = 0.001
DEVICE = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

res="Resultados:"

print(f"\nConfiguración:")
print(f"  Device: {DEVICE}")
print(f"  Batch size: {BATCH_SIZE}")
print(f"  Epochs: {EPOCHS}")
print(f"  Learning rate: {LEARNING_RATE}")

MNIST: COMPARACIÓN DE 3 ENFOQUES DE CLASIFICACIÓN

Configuración:
  Device: cpu
  Batch size: 128
  Epochs: 10
  Learning rate: 0.001


## $\color{#dda}{\text{2. Carga de datos}}$

Desde OpenML

In [None]:
mnist = fetch_openml('mnist_784', version=1, parser='auto')

X = mnist.data.to_numpy().astype(np.float32)
y = mnist.target.to_numpy().astype(np.int64)

# Normalizar
X = X / 255.0

spli=60000
# Split train/test
X_train = X[:spli]
y_train = y[:spli]
X_test = X[spli:]
y_test = y[spli:]

# Normalización estándar
mean = X_train.mean()
std = X_train.std()
X_train = (X_train - mean) / std
X_test = (X_test - mean) / std

print(f"\nDatasets cargados:")
print(f"  Train: {X_train.shape[0]} imágenes")
print(f"  Test:  {X_test.shape[0]} imágenes")


CARGANDO DATOS MNIST
Descargando MNIST desde OpenML...

Datasets cargados:
  Train: 60000 imágenes
  Test:  10000 imágenes


In [None]:
# Convertir a tensores PyTorch
X_train_torch = torch.FloatTensor(X_train).view(-1, 1, 28, 28)
y_train_torch = torch.LongTensor(y_train)
X_test_torch = torch.FloatTensor(X_test).view(-1, 1, 28, 28)
y_test_torch = torch.LongTensor(y_test)

# DataLoaders
train_dataset = TensorDataset(X_train_torch, y_train_torch)
test_dataset = TensorDataset(X_test_torch, y_test_torch)
train_loader = DataLoader(train_dataset, batch_size=BATCH_SIZE, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=BATCH_SIZE, shuffle=False)

## $\color{#dda}{\text{3. Funciones auxiliares}}$

In [None]:
def count_parameters(model):
    """Cuenta parámetros entrenables"""
    return sum(p.numel() for p in model.parameters() if p.requires_grad)

def train_pytorch_model(model, train_loader, epochs, lr):
    """Entrena un modelo de PyTorch"""
    optimizer = optim.Adam(model.parameters(), lr=lr)
    criterion = nn.CrossEntropyLoss()
    
    model.train()
    for epoch in range(epochs):
        for batch_idx, (data, target) in enumerate(train_loader):
            data, target = data.to(DEVICE), target.to(DEVICE)
            
            optimizer.zero_grad()
            output = model(data)
            loss = criterion(output, target)
            loss.backward()
            optimizer.step()
    
    return model

def evaluate_pytorch_model(model, test_loader):
    """Evalúa un modelo de PyTorch"""
    model.eval()
    correct = 0
    total = 0
    
    with torch.no_grad():
        for data, target in test_loader:
            data, target = data.to(DEVICE), target.to(DEVICE)
            output = model(data)
            _, predicted = output.max(1)
            total += target.size(0)
            correct += predicted.eq(target).sum().item()
    
    return 100. * correct / total

## $\color{#dda}{\text{4. MNIST}}$

Resuelto 10 veces, cada una por un método distinto para comparar cuál es el mejor de acuerdo a un FIC que parametrizamos para economizar en recursos de cómputo.

### $\color{#dda}{\text{4.1. Regresión Logística}}$

Características: Lineal, sin capas ocultas, baseline clásico

In [None]:
logreg_start = time.time()

logreg_model = LogisticRegression(max_iter=100, random_state=42, verbose=0)
logreg_model.fit(X_train, y_train)

logreg_train_time = time.time() - logreg_start
logreg_predictions = logreg_model.predict(X_test)
logreg_accuracy = 100. * np.mean(logreg_predictions == y_test)

logreg_params = logreg_model.coef_.size + logreg_model.intercept_.size

print(f"\n{res}")
print(f"  Test Accuracy: {logreg_accuracy:.2f}%")
print(f"  Parámetros: {logreg_params:,}")
print(f"  Training Time: {logreg_train_time:.1f}s")


MODELO 1: REGRESIÓN LOGÍSTICA
Características: Lineal, sin capas ocultas, baseline clásico

Resultados:
  Test Accuracy: 92.50%
  Parámetros: 7,850
  Training Time: 14.2s


STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(


### $\color{#dda}{\text{4.2. K-Nearest Neighbors}}$

Características: No paramétrico, basado en distancia

k=5

In [None]:
knn_start = time.time()

# Usar subset para acelerar
subset_size = 10000
knn_model = KNeighborsClassifier(n_neighbors=5, n_jobs=-1)
knn_model.fit(X_train[:subset_size], y_train[:subset_size])

knn_train_time = time.time() - knn_start
knn_predictions = knn_model.predict(X_test)
knn_accuracy = 100. * np.mean(knn_predictions == y_test)

knn_params = 0  # KNN no tiene parámetros tradicionales

print(f"\n{res}")
print(f"  Test Accuracy: {knn_accuracy:.2f}%")
print(f"  Parámetros: {knn_params} (no paramétrico)")
print(f"  Training Time: {knn_train_time:.1f}s")
print(f"  Nota: Entrenado con subset de {subset_size} muestras")


MODELO 2: K-NEAREST NEIGHBORS (k=5)
Características: No paramétrico, basado en distancia

Resultados:
  Test Accuracy: 94.42%
  Parámetros: 0 (no paramétrico)
  Training Time: 0.1s
  Nota: Entrenado con subset de 10000 muestras


### $\color{#dda}{\text{4.3. Random Forest}}$

Características: Ensemble, robusto, interpretable

100 trees

In [None]:
rf_start = time.time()

rf_model = RandomForestClassifier(n_estimators=100, max_depth=20, random_state=42, n_jobs=-1)
rf_model.fit(X_train, y_train)

rf_train_time = time.time() - rf_start
rf_predictions = rf_model.predict(X_test)
rf_accuracy = 100. * np.mean(rf_predictions == y_test)

# Estimación de parámetros: nodos × árboles
rf_params = sum(tree.tree_.node_count for tree in rf_model.estimators_)

print(f"\n{res}")
print(f"  Test Accuracy: {rf_accuracy:.2f}%")
print(f"  Parámetros: {rf_params:,} (nodos totales)")
print(f"  Training Time: {rf_train_time:.1f}s")


MODELO 3: RANDOM FOREST (100 árboles)
Características: Ensemble, robusto, interpretable

Resultados:
  Test Accuracy: 96.80%
  Parámetros: 958,830 (nodos totales)
  Training Time: 28.1s


### $\color{#dda}{\text{4.4. PCA + SVM}}$

Características: Reducción dimensionalidad + clasificador lineal

PCA (150 comps) + LINEAR SVM

In [None]:
pca_svm_start = time.time()

# PCA
pca = PCA(n_components=150, random_state=42)
X_train_pca = pca.fit_transform(X_train)
X_test_pca = pca.transform(X_test)

# SVM
scaler = StandardScaler()
X_train_pca_scaled = scaler.fit_transform(X_train_pca)
X_test_pca_scaled = scaler.transform(X_test_pca)

svm_model = LinearSVC(C=1.0, max_iter=1000, random_state=42)
svm_model.fit(X_train_pca_scaled, y_train)

pca_svm_train_time = time.time() - pca_svm_start
pca_svm_predictions = svm_model.predict(X_test_pca_scaled)
pca_svm_accuracy = 100. * np.mean(pca_svm_predictions == y_test)

pca_svm_params = svm_model.coef_.size + svm_model.intercept_.size

print(f"\n{res}")
print(f"  Test Accuracy: {pca_svm_accuracy:.2f}%")
print(f"  Parámetros: {pca_svm_params:,}")
print(f"  Training Time: {pca_svm_train_time:.1f}s")
print(f"  Varianza explicada: {pca.explained_variance_ratio_.sum()*100:.1f}%")


MODELO 4: PCA (150 componentes) + LINEAR SVM
Características: Reducción dimensionalidad + clasificador lineal

Resultados:
  Test Accuracy: 91.55%
  Parámetros: 1,510
  Training Time: 21.3s
  Varianza explicada: 94.8%


### $\color{#dda}{\text{4.5. MLP Tiny (muy muy pequeño :3)}}$

Características: Red mínima, ultra eficiente

MLP TINY (784 -> 64 -> 10)

In [None]:
class MLPTiny(nn.Module):
    def __init__(self):
        super(MLPTiny, self).__init__()
        self.flatten = nn.Flatten()
        self.fc1 = nn.Linear(784, 64)
        self.dropout = nn.Dropout(0.2)
        self.fc2 = nn.Linear(64, 10)
        
    def forward(self, x):
        x = self.flatten(x)
        x = torch.relu(self.fc1(x))
        x = self.dropout(x)
        x = self.fc2(x)
        return x

mlp_tiny_model = MLPTiny().to(DEVICE)
mlp_tiny_params = count_parameters(mlp_tiny_model)

print(f"\nParámetros: {mlp_tiny_params:,}")
print("Entrenando...")

mlp_tiny_start = time.time()
train_pytorch_model(mlp_tiny_model, train_loader, EPOCHS, LEARNING_RATE)
mlp_tiny_train_time = time.time() - mlp_tiny_start

mlp_tiny_accuracy = evaluate_pytorch_model(mlp_tiny_model, test_loader)

print(f"\n{res}")
print(f"  Test Accuracy: {mlp_tiny_accuracy:.2f}%")
print(f"  Training Time: {mlp_tiny_train_time:.1f}s")


MODELO 5: MLP TINY (784 -> 64 -> 10)
Características: Red mínima, ultra eficiente

Parámetros: 50,890
Entrenando...

Resultados:
  Test Accuracy: 97.34%
  Training Time: 64.7s


### $\color{#dda}{\text{4.6. MLP Medium (balanceado)}}$

Características: Balance entre tamaño y capacidad

MLP MEDIUM (784 -> 256 -> 128 -> 10)

In [None]:
class MLPMedium(nn.Module):
    def __init__(self):
        super(MLPMedium, self).__init__()
        self.flatten = nn.Flatten()
        self.fc1 = nn.Linear(784, 256)
        self.dropout1 = nn.Dropout(0.2)
        self.fc2 = nn.Linear(256, 128)
        self.dropout2 = nn.Dropout(0.2)
        self.fc3 = nn.Linear(128, 10)
        
    def forward(self, x):
        x = self.flatten(x)
        x = torch.relu(self.fc1(x))
        x = self.dropout1(x)
        x = torch.relu(self.fc2(x))
        x = self.dropout2(x)
        x = self.fc3(x)
        return x

mlp_medium_model = MLPMedium().to(DEVICE)
mlp_medium_params = count_parameters(mlp_medium_model)

print(f"\nParámetros: {mlp_medium_params:,}")
print("Entrenando...")

mlp_medium_start = time.time()
train_pytorch_model(mlp_medium_model, train_loader, EPOCHS, LEARNING_RATE)
mlp_medium_train_time = time.time() - mlp_medium_start

mlp_medium_accuracy = evaluate_pytorch_model(mlp_medium_model, test_loader)

print(f"\n{res}")
print(f"  Test Accuracy: {mlp_medium_accuracy:.2f}%")
print(f"  Training Time: {mlp_medium_train_time:.1f}s")


MODELO 6: MLP MEDIUM (784 -> 256 -> 128 -> 10)
Características: Balance entre tamaño y capacidad

Parámetros: 235,146
Entrenando...

Resultados:
  Test Accuracy: 98.28%
  Training Time: 73.1s


### $\color{#dda}{\text{4.7. MLP Large (sobredimensionado)}}$

Características: Muy grande, posible overfitting

MLP LARGE (784 -> 512 -> 512 -> 256 -> 10)

In [None]:
class MLPLarge(nn.Module):
    def __init__(self):
        super(MLPLarge, self).__init__()
        self.flatten = nn.Flatten()
        self.fc1 = nn.Linear(784, 512)
        self.dropout1 = nn.Dropout(0.3)
        self.fc2 = nn.Linear(512, 512)
        self.dropout2 = nn.Dropout(0.3)
        self.fc3 = nn.Linear(512, 256)
        self.dropout3 = nn.Dropout(0.3)
        self.fc4 = nn.Linear(256, 10)
        
    def forward(self, x):
        x = self.flatten(x)
        x = torch.relu(self.fc1(x))
        x = self.dropout1(x)
        x = torch.relu(self.fc2(x))
        x = self.dropout2(x)
        x = torch.relu(self.fc3(x))
        x = self.dropout3(x)
        x = self.fc4(x)
        return x

mlp_large_model = MLPLarge().to(DEVICE)
mlp_large_params = count_parameters(mlp_large_model)

print(f"\nParámetros: {mlp_large_params:,}")
print("Entrenando...")

mlp_large_start = time.time()
train_pytorch_model(mlp_large_model, train_loader, EPOCHS, LEARNING_RATE)
mlp_large_train_time = time.time() - mlp_large_start

mlp_large_accuracy = evaluate_pytorch_model(mlp_large_model, test_loader)

print(f"\n{res}")
print(f"  Test Accuracy: {mlp_large_accuracy:.2f}%")
print(f"  Training Time: {mlp_large_train_time:.1f}s")



MODELO 7: MLP LARGE (784 -> 512 -> 512 -> 256 -> 10)
Características: Muy grande, posible overfitting

Parámetros: 798,474
Entrenando...

Resultados:
  Test Accuracy: 97.93%
  Training Time: 128.5s


### $\color{#dda}{\text{4.8. CNN Tiny (mínima convolucional)}}$

Características: Convolucional mínima, muy eficiente

CNN TINY (SOLO UNA capa conv :0)

In [None]:
class CNNTiny(nn.Module):
    def __init__(self):
        super(CNNTiny, self).__init__()
        self.conv1 = nn.Conv2d(1, 16, 3, padding=1)
        self.pool = nn.MaxPool2d(2, 2)
        self.fc = nn.Linear(16 * 14 * 14, 10)
        
    def forward(self, x):
        x = torch.relu(self.conv1(x))
        x = self.pool(x)
        x = x.view(x.size(0), -1)
        x = self.fc(x)
        return x

cnn_tiny_model = CNNTiny().to(DEVICE)
cnn_tiny_params = count_parameters(cnn_tiny_model)

print(f"\nParámetros: {cnn_tiny_params:,}")
print("Entrenando...")

cnn_tiny_start = time.time()
train_pytorch_model(cnn_tiny_model, train_loader, EPOCHS, LEARNING_RATE)
cnn_tiny_train_time = time.time() - cnn_tiny_start

cnn_tiny_accuracy = evaluate_pytorch_model(cnn_tiny_model, test_loader)

print(f"\n{res}")
print(f"  Test Accuracy: {cnn_tiny_accuracy:.2f}%")
print(f"  Training Time: {cnn_tiny_train_time:.1f}s")


MODELO 8: CNN TINY (1 capa conv)
Características: Convolucional mínima, muy eficiente

Parámetros: 31,530
Entrenando...

Resultados:
  Test Accuracy: 97.98%
  Training Time: 155.8s


### $\color{#dda}{\text{4.9. CNN Medium (LeNet-like)}}$

Características: LeNet-like moderno, buen balance

CNN MEDIUM (2 capas conv con BatchNorm)

In [None]:
class CNNMedium(nn.Module):
    def __init__(self):
        super(CNNMedium, self).__init__()
        self.conv1 = nn.Conv2d(1, 32, 3, padding=1)
        self.bn1 = nn.BatchNorm2d(32)
        self.conv2 = nn.Conv2d(32, 64, 3, padding=1)
        self.bn2 = nn.BatchNorm2d(64)
        self.pool = nn.MaxPool2d(2, 2)
        self.fc1 = nn.Linear(64 * 7 * 7, 128)
        self.dropout = nn.Dropout(0.3)
        self.fc2 = nn.Linear(128, 10)
        
    def forward(self, x):
        x = self.pool(torch.relu(self.bn1(self.conv1(x))))
        x = self.pool(torch.relu(self.bn2(self.conv2(x))))
        x = x.view(x.size(0), -1)
        x = torch.relu(self.fc1(x))
        x = self.dropout(x)
        x = self.fc2(x)
        return x

cnn_medium_model = CNNMedium().to(DEVICE)
cnn_medium_params = count_parameters(cnn_medium_model)

print(f"\nParámetros: {cnn_medium_params:,}")
print("Entrenando...")

cnn_medium_start = time.time()
train_pytorch_model(cnn_medium_model, train_loader, EPOCHS, LEARNING_RATE)
cnn_medium_train_time = time.time() - cnn_medium_start

cnn_medium_accuracy = evaluate_pytorch_model(cnn_medium_model, test_loader)

print(f"\n{res}")
print(f"  Test Accuracy: {cnn_medium_accuracy:.2f}%")
print(f"  Training Time: {cnn_medium_train_time:.1f}s")


MODELO 9: CNN MEDIUM (2 capas conv con BatchNorm)
Características: LeNet-like moderno, buen balance

Parámetros: 421,834
Entrenando...

Resultados:
  Test Accuracy: 99.19%
  Training Time: 693.7s


### $\color{#dda}{\text{4.10. CNN Deep (muy profunda)}}$

Características: Máxima capacidad, posible overkill para MNIST

CNN DEEP (4 capas conv, muy profunda)

In [None]:
class CNNDeep(nn.Module):
    def __init__(self):
        super(CNNDeep, self).__init__()
        self.conv1 = nn.Conv2d(1, 32, 3, padding=1)
        self.bn1 = nn.BatchNorm2d(32)
        self.conv2 = nn.Conv2d(32, 64, 3, padding=1)
        self.bn2 = nn.BatchNorm2d(64)
        self.conv3 = nn.Conv2d(64, 128, 3, padding=1)
        self.bn3 = nn.BatchNorm2d(128)
        self.conv4 = nn.Conv2d(128, 128, 3, padding=1)
        self.bn4 = nn.BatchNorm2d(128)
        self.pool = nn.MaxPool2d(2, 2)
        self.fc1 = nn.Linear(128 * 1 * 1, 256)
        self.dropout = nn.Dropout(0.4)
        self.fc2 = nn.Linear(256, 10)
        
    def forward(self, x):
        x = self.pool(torch.relu(self.bn1(self.conv1(x))))
        x = self.pool(torch.relu(self.bn2(self.conv2(x))))
        x = self.pool(torch.relu(self.bn3(self.conv3(x))))
        x = self.pool(torch.relu(self.bn4(self.conv4(x))))
        x = x.view(x.size(0), -1)
        x = torch.relu(self.fc1(x))
        x = self.dropout(x)
        x = self.fc2(x)
        return x

cnn_deep_model = CNNDeep().to(DEVICE)
cnn_deep_params = count_parameters(cnn_deep_model)

print(f"\nParámetros: {cnn_deep_params:,}")
print("Entrenando...")

cnn_deep_start = time.time()
train_pytorch_model(cnn_deep_model, train_loader, EPOCHS, LEARNING_RATE)
cnn_deep_train_time = time.time() - cnn_deep_start

cnn_deep_accuracy = evaluate_pytorch_model(cnn_deep_model, test_loader)

print(f"\n{res}")
print(f"  Test Accuracy: {cnn_deep_accuracy:.2f}%")
print(f"  Training Time: {cnn_deep_train_time:.1f}s")


MODELO 10: CNN DEEP (4 capas conv, muy profunda)
Características: Máxima capacidad, posible overkill para MNIST

Parámetros: 276,554
Entrenando...

Resultados:
  Test Accuracy: 99.18%
  Training Time: 1066.6s


## $\color{#dda}{\text{5. AIC, BIC y FIC pa' todos los modelos}}$

Favor de leer los comentarios en el código (sobre todo en los hiperparámetros de la fórmula) para entender mejor qué pasa y por qué.

In [None]:
from flop_counter import FlopInformationCriterion, count_model_flops

# Parámetros ajustables del FIC

wonder_flop_priority = 3 # Va de la mano con alpha, en este caso sería la potencia de 10 que prioriza los pocos flops.
# No sé si debería colocarla como una variable propia del modelo pero funciona para al menos este ejemplo que lo amerita.
# En este caso le di prioridad absoluta a los FLOPs, sin tener en cuenta los parámetros
# Incluso si se tiene una beta en cero, alpha debería aumentarse o reducirse de acuerdo a qué tanto nos importa la log-verosimilitud
# Teniendo eso en cuenta, unos 10000 de alpha indican que la prioridad que le doy a los FLOPs es mucho mayor que la que le doy a la verosimilitud.
# En este caso es extremo porque quiero probar mi punto, pero este notebook ejemplifica que el FIC balanceará diferente de acuerdo a las prioridades que le pongamos.
# Si bajamos wonder_flop_priority a, por ejemplo, 2 (10^2 = 100), entonces decimos que quizá nos importan los FLOPs pero la verosimilitud sigue importando bastantito.

FIC_VARIANT = 'custom'  # Opciones: 'deployment', 'research', 'balanced', 'custom'
FIC_ALPHA = 1000         # Peso del término de FLOPs (mayor = penaliza más FLOPs)
FIC_BETA = 0.0         # Peso del término de parámetros (mayor = penaliza más params)
FIC_LAMBDA = 0.1        # Balance entre términos (0=solo FLOPs, 1=solo params) Este parámetro depende mucho de la cantidad de FLOPs que se esperan

print(f"\nParámetros FIC configurados:")
print(f"  Variant:        {FIC_VARIANT}")
print(f"  Alpha (FLOPs):  {FIC_ALPHA}")
print(f"  Beta (Params):  {FIC_BETA}")
print(f"  Lambda (Balance): {FIC_LAMBDA}")

print("\n💡 Guía de ajuste:")
print("  • ALPHA alto → Penaliza fuertemente modelos con muchos FLOPs")
print("  • BETA alto → Penaliza fuertemente modelos con muchos parámetros")
print("  • LAMBDA cercano a 0 → Prioriza eficiencia en FLOPs")
print("  • LAMBDA cercano a 1 → Prioriza parsimonia en parámetros")
print("\nEjemplos de configuraciones:")
print("  Deployment (edge):   alpha=10.0, beta=1.0, lambda=0.2")
print("  Research:            alpha=1.0, beta=2.0, lambda=0.7")
print("  Balanced:            alpha=5.0, beta=0.5, lambda=0.3")

# Configurar FIC con parámetros personalizados
fic_calculator = FlopInformationCriterion(
    variant=FIC_VARIANT,
    alpha=FIC_ALPHA,
    beta=FIC_BETA,
    lambda_balance=FIC_LAMBDA
)


Parámetros FIC configurados:
  Variant:        custom
  Alpha (FLOPs):  1000
  Beta (Params):  0.0
  Lambda (Balance): 0.1

💡 Guía de ajuste:
  • ALPHA alto → Penaliza fuertemente modelos con muchos FLOPs
  • BETA alto → Penaliza fuertemente modelos con muchos parámetros
  • LAMBDA cercano a 0 → Prioriza eficiencia en FLOPs
  • LAMBDA cercano a 1 → Prioriza parsimonia en parámetros

Ejemplos de configuraciones:
  Deployment (edge):   alpha=10.0, beta=1.0, lambda=0.2
  Research:            alpha=1.0, beta=2.0, lambda=0.7
  Balanced:            alpha=5.0, beta=0.5, lambda=0.3


### $\color{#dda}{\text{5.1. Funciones de cálculo}}$

In [163]:
def calculate_aic(log_likelihood, k):
    """AIC = 2k - 2ln(L) = log_likelihood + 2k"""
    return log_likelihood + 2 * k

def calculate_bic(log_likelihood, k, n):
    """BIC = k*ln(n) - 2ln(L) = log_likelihood + k*ln(n)"""
    return log_likelihood + k * np.log(n)

def calculate_log_likelihood_pytorch(model, data_loader, device):
    """Calcula log-likelihood negativa para modelos PyTorch"""
    model.eval()
    criterion = nn.CrossEntropyLoss(reduction='sum')
    total_loss = 0
    
    with torch.no_grad():
        for data, target in data_loader:
            data, target = data.to(device), target.to(device)
            output = model(data)
            total_loss += criterion(output, target).item()
    
    return total_loss

def calculate_log_likelihood_sklearn(predictions, y_true, n_classes=10):
    """Aproximación de log-likelihood para modelos sklearn"""
    correct = (predictions == y_true).astype(float)
    # Probabilidades: alta si correcto, baja si incorrecto
    probs = np.where(correct == 1, 0.99, 0.01/(n_classes-1))
    return -2 * np.sum(np.log(probs + 1e-10))

### $\color{#dda}{\text{5.2. SKLearn eval}}$

In [164]:
all_results = {}

sklearn_models = [
    ("Logistic Regression", logreg_predictions, logreg_params, logreg_accuracy, logreg_train_time),
    ("KNN (k=5)", knn_predictions, knn_params if knn_params > 0 else 100, knn_accuracy, knn_train_time),
    ("Random Forest", rf_predictions, rf_params, rf_accuracy, rf_train_time),
    ("PCA+SVM", pca_svm_predictions, pca_svm_params, pca_svm_accuracy, pca_svm_train_time),
]

for name, predictions, n_params, accuracy, train_time in sklearn_models:
    print(f"\n[sklearn] {name}...")
    
    # Calcular log-likelihood
    log_lik = calculate_log_likelihood_sklearn(predictions, y_test)
    
    # Calcular AIC y BIC
    aic = calculate_aic(log_lik, n_params)
    bic = calculate_bic(log_lik, n_params, len(y_test))
    
    # Estimar FLOPs según el tipo de modelo
    if "Logistic" in name:
        flops = 784 * 10 * 2  # matmul input->output
    elif "KNN" in name:
        flops = 784 * len(y_test) * 5  # distance calculations
    elif "Forest" in name:
        flops = rf_params * 10  # traversals aproximados
    elif "SVM" in name:
        flops = 150 * 10 * 2 + 784 * 150  # PCA transform + SVM
    else:
        flops = 0
    
    # Calcular FIC con parámetros configurados
    fic_result = fic_calculator.calculate_fic(
        log_likelihood=log_lik,
        flops=flops,
        n_params=n_params,
        n_samples=len(y_test)
    )
    
    # Mostrar desglose del FIC
    print(f"  Log-likelihood: {log_lik:,.0f}")
    print(f"  Params: {n_params:,}")
    print(f"  FLOPs: {flops:,}")
    print(f"  AIC: {aic:,.0f}")
    print(f"  BIC: {bic:,.0f}")
    print(f"  FIC: {fic_result['fic']:,.0f}")
    print(f"    └─ FLOPs term: {fic_result.get('flops_term', 0):,.0f}")
    print(f"    └─ Params term: {fic_result.get('params_term', 0):,.0f}")
    
    all_results[name] = {
        'accuracy': accuracy,
        'params': n_params,
        'flops': flops,
        'train_time': train_time,
        'aic': aic,
        'bic': bic,
        'fic': fic_result['fic'],
        'log_likelihood': log_lik,
        'flops_term': fic_result.get('flops_term', 0),
        'params_term': fic_result.get('params_term', 0)
    }


[sklearn] Logistic Regression...
  Log-likelihood: 10,390
  Params: 7,850
  FLOPs: 15,680
  AIC: 26,090
  BIC: 82,691
  FIC: 11,370
    └─ FLOPs term: 0
    └─ Params term: 0

[sklearn] KNN (k=5)...
  Log-likelihood: 7,781
  Params: 100
  FLOPs: 39,200,000
  AIC: 7,981
  BIC: 8,702
  FIC: 44,810
    └─ FLOPs term: 0
    └─ Params term: 0

[sklearn] Random Forest...
  Log-likelihood: 4,548
  Params: 958,830
  FLOPs: 9,588,300
  AIC: 1,922,208
  BIC: 8,835,699
  FIC: 14,785
    └─ FLOPs term: 0
    └─ Params term: 0

[sklearn] PCA+SVM...
  Log-likelihood: 11,680
  Params: 1,510
  FLOPs: 120,600
  AIC: 14,700
  BIC: 25,588
  FIC: 12,959
    └─ FLOPs term: 0
    └─ Params term: 0


### $\color{#dda}{\text{5.3. PyTorch eval}}$

In [165]:
pytorch_models = [
    ("MLP Tiny", mlp_tiny_model, mlp_tiny_params, mlp_tiny_accuracy, mlp_tiny_train_time),
    ("MLP Medium", mlp_medium_model, mlp_medium_params, mlp_medium_accuracy, mlp_medium_train_time),
    ("MLP Large", mlp_large_model, mlp_large_params, mlp_large_accuracy, mlp_large_train_time),
    ("CNN Tiny", cnn_tiny_model, cnn_tiny_params, cnn_tiny_accuracy, cnn_tiny_train_time),
    ("CNN Medium", cnn_medium_model, cnn_medium_params, cnn_medium_accuracy, cnn_medium_train_time),
    ("CNN Deep", cnn_deep_model, cnn_deep_params, cnn_deep_accuracy, cnn_deep_train_time),
]

for name, model, n_params, accuracy, train_time in pytorch_models:
    print(f"\n[PyTorch] {name}...")
    
    # Calcular log-likelihood
    log_lik = calculate_log_likelihood_pytorch(model, test_loader, DEVICE)
    
    # Calcular AIC y BIC
    aic = calculate_aic(log_lik, n_params)
    bic = calculate_bic(log_lik, n_params, len(y_test))
    
    # Contar FLOPs reales
    sample_input = X_test_torch[:1].to(DEVICE)
    try:
        flops_result = count_model_flops(
            model=model,
            input_data=sample_input,
            framework='torch',
            verbose=False
        )
        flops = flops_result['total_flops']
    except Exception as e:
        print(f"  ⚠️  Warning: No se pudieron contar FLOPs: {e}")
        flops = 0
    
    # Calcular FIC con parámetros configurados
    fic_result = fic_calculator.calculate_fic(
        log_likelihood=log_lik,
        flops=flops,
        n_params=n_params,
        n_samples=len(y_test)
    )
    
    # Mostrar desglose del FIC
    print(f"  Log-likelihood: {log_lik:,.0f}")
    print(f"  Params: {n_params:,}")
    print(f"  FLOPs: {flops:,}")
    print(f"  AIC: {aic:,.0f}")
    print(f"  BIC: {bic:,.0f}")
    print(f"  FIC: {fic_result['fic']:,.0f}")
    print(f"    └─ FLOPs term: {fic_result.get('flops_term', 0):,.0f}")
    print(f"    └─ Params term: {fic_result.get('params_term', 0):,.0f}")
    
    all_results[name] = {
        'accuracy': accuracy,
        'params': n_params,
        'flops': flops,
        'train_time': train_time,
        'aic': aic,
        'bic': bic,
        'fic': fic_result['fic'],
        'log_likelihood': log_lik,
        'flops_term': fic_result.get('flops_term', 0),
        'params_term': fic_result.get('params_term', 0)
    }


[PyTorch] MLP Tiny...
  Log-likelihood: 925
  Params: 50,890
  FLOPs: 1,237,568
  AIC: 102,705
  BIC: 469,640
  FIC: 3,442
    └─ FLOPs term: 0
    └─ Params term: 0

[PyTorch] MLP Medium...
  Log-likelihood: 609
  Params: 235,146
  FLOPs: 1,393,536
  AIC: 470,901
  BIC: 2,166,384
  FIC: 3,278
    └─ FLOPs term: 0
    └─ Params term: 0

[PyTorch] MLP Large...
  Log-likelihood: 702
  Params: 798,474
  FLOPs: 2,410,240
  AIC: 1,597,650
  BIC: 7,354,920
  FIC: 4,341
    └─ FLOPs term: 0
    └─ Params term: 0

[PyTorch] CNN Tiny...
  Log-likelihood: 644
  Params: 31,530
  FLOPs: 19,894,784
  AIC: 63,704
  BIC: 291,046
  FIC: 20,229
    └─ FLOPs term: 0
    └─ Params term: 0

[PyTorch] CNN Medium...
  Log-likelihood: 306
  Params: 421,834
  FLOPs: 27,378,816
  AIC: 843,974
  BIC: 3,885,541
  FIC: 26,660
    └─ FLOPs term: 0
    └─ Params term: 0

[PyTorch] CNN Deep...
  Log-likelihood: 319
  Params: 276,554
  FLOPs: 17,720,576
  AIC: 553,427
  BIC: 2,547,476
  FIC: 17,937
    └─ FLOPs term

## $\color{#dda}{\text{6. Resultados}}$

In [166]:
# ============================================================================
# TABLA COMPARATIVA FINAL
# ============================================================================

print("\n" + "="*100)
print("TABLA COMPARATIVA COMPLETA: 10 ENFOQUES")
print("="*100)

print(f"\n{'Modelo':<25} {'Acc%':<8} {'Params':<12} {'FLOPs':<15} {'Time(s)':<10} {'AIC':<12} {'BIC':<12} {'FIC':<12}")
print("-" * 120)

for name in sorted(all_results.keys(), key=lambda x: all_results[x]['fic']):
    r = all_results[name]
    print(f"{name:<25} {r['accuracy']:<8.2f} {r['params']:<12,} {r['flops']:<15,} "
          f"{r['train_time']:<10.1f} {r['aic']:<12.0f} {r['bic']:<12.0f} {r['fic']:<12.0f}")


TABLA COMPARATIVA COMPLETA: 10 ENFOQUES

Modelo                    Acc%     Params       FLOPs           Time(s)    AIC          BIC          FIC         
------------------------------------------------------------------------------------------------------------------------
MLP Medium                98.28    235,146      1,393,536       73.1       470901       2166384      3278        
MLP Tiny                  97.34    50,890       1,237,568       64.7       102705       469640       3442        
MLP Large                 97.93    798,474      2,410,240       128.5      1597650      7354920      4341        
Logistic Regression       92.50    7,850        15,680          14.2       26090        82691        11370       
PCA+SVM                   91.55    1,510        120,600         21.3       14700        25588        12959       
Random Forest             96.80    958,830      9,588,300       28.1       1922208      8835699      14785       
CNN Deep                  99.18    276,


## $\color{#dda}{\text{7. Conclusiones}}$

La variable que dice wonder es la potencia de 10 que aumentará a alpha lo suficiente para dar prioridad a los FLOPs pero sin opacar la verosimilitud. Con estos parámetros se priorizan los FLOPs pero sin dejar de lado la eficiencia del modelo y su log-likelihood. Para este caso de demostración, digamos que dejamos la beta en cero y la lambda en 0.1 porque priorizamos el balance entre FLOPs y verosimilitud.

```python
wonder_flop_priority = 3

FIC_VARIANT = 'custom' 
FIC_ALPHA = 1000
FIC_BETA = 0.0
FIC_LAMBDA = 0.1 
```