# Modelo ConvNeXt

En este notebook explicaremos el proceso para construir una red neuronal convolucional *ConvNeXt* a partir de una red neuronal *ResNet*. Nos basaremos en el artículo [A ConvNet for the 2020s](https://arxiv.org/abs/2201.03545), donde fueron introducidas las redes *ConvNeXt*.

El proceso consiste en siete pasos:

1. **Crear un modelo base de ResNet basado en ResNet-50**
2. **Cambiar la proporción de los stages del modelo ResNet**
3. **Cambiar el bloque *Stem* por un bloque *Patchify***
4. **Añadir *depthwise convolution* (*ResNeXt-ify*)**
5. **Cambiar el bloque *Bottleneck* por *Inverted-Bottleneck***
6. **Aumentar el tamaño de los kernels**
7. **Cambios en el *microdiseño***


Para medir los cambios en la exactitud (*accuracy*) del modelo en cada uno de los siete pasos, entrenaremos al correspondiente modelo por 100 épocas utilizando los mismos parámetros de entrenamiento. Repetiremos tres veces cada experimento y reportaremos el promedio de los resultados.

### Preparación de los datos

Entrenaremos al modelo en el conjunto de datos CIFAR-10. Este dataset consiste de 60000 imágenes a color en 10 clases distintas, donde no hay intersección entre las distintas clases. Se puede acceder al dataset mediante las herramientas de la paquetería de pytorch, o también en la página oficial: https://www.cs.toronto.edu/~kriz/cifar.html.

In [None]:
# Importamos las paqueterías necesarias para el notebook
import time
import gc
import copy
import numpy as np
import pandas as pd
import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
from torchvision.ops import StochasticDepth
from torchvision import datasets
from torchvision import transforms
from torch.utils.data.sampler import SubsetRandomSampler

# De ser posible utilizaremos GPU
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

In [None]:
def data_loader(data_dir,
                batch_size,
                random_seed=42,
                valid_size=0.1,
                shuffle=False,
                test=False):
    """
    Función para cargar los datos de CIFAR-10
    """

    # Definimos el transform para normalizar los datos con pytorch
    # Los valores fueron obtenidos en el notebook "data_extraction.ipynb"
    normalize = transforms.Normalize(
        mean=[0.4914, 0.4822, 0.4465],
        std=[0.2023, 0.1994, 0.2010],
    )

    # Definimos el transform para preporcesar los datos
    transform_train = transforms.Compose([
        transforms.RandomCrop(32, padding=4),
        transforms.RandomHorizontalFlip(),
        transforms.ToTensor(),
        normalize
    ])
    transform_test = transforms.Compose([
        transforms.ToTensor(),
        normalize
    ])

    # Obtener los datos del conjunto de prueba
    if test:
        dataset = datasets.CIFAR10(
          root=data_dir, train=False,
          download=True, transform=transform_test,
        )

        data_loader = torch.utils.data.DataLoader(
            dataset, batch_size=batch_size, shuffle=shuffle
        )

        return data_loader

    # Cargamos una copia de los datos de entrenamiento
    train_dataset = datasets.CIFAR10(
        root=data_dir, train=True,
        download=True, transform=transform_train,
    )

    # Cargamos una copia extra de los datos de entrenamiento para dividirlo después en el conjunto de validación
    valid_dataset = datasets.CIFAR10(
        root=data_dir, train=True,
        download=True, transform=transform_train,
    )

    # Separamos los datos de entrenamiento y validación mediante índices
    num_train = len(train_dataset)
    indices = list(range(num_train))
    split = int(np.floor(valid_size * num_train))

    if shuffle:
        np.random.seed(42)
        np.random.shuffle(indices)

    train_idx, valid_idx = indices[split:], indices[:split]
    train_sampler = SubsetRandomSampler(train_idx)
    valid_sampler = SubsetRandomSampler(valid_idx)

    # Finalmente, definimos los conjuntos de entrenamiento y validación
    train_loader = torch.utils.data.DataLoader(
        train_dataset, batch_size=batch_size, sampler=train_sampler)

    valid_loader = torch.utils.data.DataLoader(
        valid_dataset, batch_size=batch_size, sampler=valid_sampler)

    return (train_loader, valid_loader)


# Aplicamos la función para cargar los datos de CIFAR-10, los guardamos en el directorio actual
train_loader, valid_loader = data_loader(data_dir='./data',
                                         batch_size=64)

test_loader = data_loader(data_dir='./data',
                              batch_size=64,
                              test=True)
cifar10_classes = ('plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck')

Files already downloaded and verified
Files already downloaded and verified
Files already downloaded and verified


### Función de entrenamiento

Con la siguiente función obtendremos los datos del desempeño de cada modelo a lo largo del entrenamiento. Los parámetros de entrenamiento son los mismo que los utilizados en el artículo donde fueorn introducidas las redes [*ConvNeXt*](https://arxiv.org/abs/2201.03545), con la excepción del número de épocas, el cual no demostró mejorar el desempeño de la red al ser aumentado.

In [None]:
def entrenamiento(model, epocas):

    model = model.to(device)

    # variables para guardar los resultados
    accuracy_training_epochs = []
    accuracy_validation_epochs = []
    loss_epoch = []
    test_accuracy = []
    best_model = None

    # parámetros de entrenamiento
    num_epochs = epocas

    optimizer = optim.AdamW(model.parameters(),
                            lr=0.004,
                            betas=(0.9, 0.999),
                            weight_decay=0.05
                            )

    criterion = nn.CrossEntropyLoss()

    lr_scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max=epocas)


    # entrenamiento
    print("Comenzando entrenamiento")
    for epoch in range(num_epochs):
        start_time = time.time()
        for i, (images, labels) in enumerate(train_loader):

            # Mover a los tensores a GPU de ser posible
            images = images.to(device)
            labels = labels.to(device)

            # Forward pass
            outputs = model(images)
            loss = criterion(outputs, labels)

            # Backward pass
            optimizer.zero_grad()
            loss.backward()
            optimizer.step()

            # Ahorro de memoria
            del images, labels, outputs
            torch.cuda.empty_cache()
            gc.collect()

        loss_epoch.append(loss.item()) # Guardar la información del loss de esta época
        lr_scheduler.step() # Implementación de learning rate decay

        # Medición de la exactitud en el conjunto de validación
        with torch.no_grad():
            correct = 0
            total = 0
            for images, labels in valid_loader:
                images = images.to(device)
                labels = labels.to(device)
                outputs = model(images)
                _, predicted = torch.max(outputs.data, 1)
                total += labels.size(0)
                correct += (predicted == labels).sum().item()
                del images, labels, outputs
            val_accuracy = correct/total
            accuracy_validation_epochs.append(val_accuracy)

        # Medición de la exactitud sobre todo el conjunto de entrenamiento
        with torch.no_grad():
            correct = 0
            total = 0
            for images, labels in train_loader:
                images = images.to(device)
                labels = labels.to(device)
                outputs = model(images)
                _, predicted = torch.max(outputs.data, 1)
                total += labels.size(0)
                correct += (predicted == labels).sum().item()
                del images, labels, outputs
            train_accuracy = correct/total
            accuracy_training_epochs.append(train_accuracy)

        # Medición de la exactitud en el conjunto de prueba
        with torch.no_grad():
            correct = 0
            total = 0
            for images, labels in test_loader:
                images = images.to(device)
                labels = labels.to(device)
                outputs = model(images)
                _, predicted = torch.max(outputs.data, 1)
                total += labels.size(0)
                correct += (predicted == labels).sum().item()
                del images, labels, outputs
            t_acc = correct/total
            test_accuracy.append(t_acc)

            # Guardar el modelo en caso de que su accuracy en el conjunto de prueba sea mayor que los anteriores
            if t_acc >= max(test_accuracy):
                best_model = copy.deepcopy(model)


        # Imprimir la pérdida, la exactitud en la validación y la exactitud en los datos de entrenamiento, de esta época.
        print(f"Epoch [{epoch+1}/{num_epochs}], Training accuracy: {round(train_accuracy,3)}, Validation accuracy: {round(val_accuracy,3)}, loss = {round(loss_epoch[-1],3)}")
        print(f"Time spent on epoch {epoch+1}: {round((time.time()-start_time)/60,2)}min")

    print("Entrenamiento finalizado")

   # regresar el mejor modelo, el accuracy en el entrenamiento, validation y prueba y la pérdida. Info de todas las épocas.
    return [best_model,
            accuracy_training_epochs,
            accuracy_validation_epochs,
            test_accuracy,
            loss_epoch]


## Modelo base: ResNet-50

El modelo base es el ResNet-50, el cual fue introducido en el artículo [Deep Residual Learning for Image Recognition](https://arxiv.org/abs/1512.03385).

Debido a las diferencias en tamaño entre CIFAR-10 y el conjunto de datos para el que fue diseñado el modelo ResNet-50 hemos hecho algunas modificaciones a la arquitectura original.

Los detalles de cómo funciona el modelo ResNet-50, los cambios que introdujimos y la explicación del código pueden ser consultados en el notebook de "./ResNet-50.ipynb".

In [None]:
class utilConv(nn.Sequential):
    # groups=1 es la opción por defecto de una capa convolucional en pytorch, la defino para cambiarla más adelante en el notebook.
    def __init__(self, in_features, out_features, kernel_size, stride = 1, norm = nn.BatchNorm2d, act = nn.ReLU, bias=True, groups=1):
        super().__init__(
            nn.Conv2d(in_features, out_features, kernel_size=kernel_size ,padding=kernel_size // 2, stride=stride, bias=bias,groups=groups),
            norm(out_features),
            act()
        )

class BottleNeckBlock(nn.Module):
    def __init__(self,in_features, out_features, reduction = 4, stride = 1):
        super().__init__()
        reduced_features = out_features // reduction
        self.block = nn.Sequential(
            # Reducción de canales
            utilConv(in_features, reduced_features, kernel_size=1, stride=stride, bias=False), # el stride puede ser 2 para aplicar downsampling
            # El número de canales se mantiene fijo
            utilConv(reduced_features, reduced_features, kernel_size=3, bias=False),
            # Aumento de canales
            utilConv(reduced_features, out_features, kernel_size=1, bias=False, act=nn.Identity),
        )

        # self.shortcut es utilizado para transformar al input a las dimensiones correctas para poder sumarlo a la salida del bloque
        if in_features != out_features:
            self.shortcut =nn.Sequential(utilConv(in_features, out_features, kernel_size=1, stride=stride, bias=False))
        else:
            self.shortcut = nn.Identity()

        self.act = nn.ReLU()

    def forward(self, x):
        res = x
        x = self.block(x)
        res = self.shortcut(res)
        x += res
        x = self.act(x)
        return x

class Stage(nn.Sequential):
    def __init__(self, in_features, out_features, depth, stride = 2):  # in_features y out_features deben ser distintos, sino se aplicará downsampling y el Bottleneck no aplicará la identidad
        super().__init__(

            BottleNeckBlock(in_features, out_features, stride=stride), # Aquí se lleva a cabo el downsampling
            *[BottleNeckBlock(out_features, out_features) for _ in range(depth - 1)]
        )


class Stem(nn.Sequential):
    def __init__(self, in_features, out_features):
        super().__init__(
            utilConv(in_features, out_features, kernel_size=3, stride=1),  # en el caso de ImageNet, el kernel es de tamaño 7
#             nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
        )

class Encoder(nn.Module):
    def __init__(self, in_channels, stem_features, depths, widths):  #
        super().__init__()
        self.stem = Stem(in_channels, stem_features)

        in_out_widths = list(zip(widths, widths[1:]))


        self.stages = nn.ModuleList() # lista de pytorch con los stages

        self.stages.append(Stage(stem_features, widths[0], depths[0], stride=1)) # se puede inferir de la figura 1 del artículo que el primer bloque del stage1 tiene stride 1

        for (in_features, out_features), depth in zip(in_out_widths, depths[1:]):
            # añadir cada uno de los stages
            self.stages.append(Stage(in_features, out_features, depth))


    def forward(self, x):
        x = self.stem(x)
        for stage in self.stages:

            x = stage(x)
        return x



class Decoder(nn.Module):
    def __init__(self, in_features, n_classes):
        super().__init__()
        self.avg = nn.AdaptiveAvgPool2d((1, 1))
        self.decoder = nn.Linear(in_features, n_classes)

    def forward(self, x):
        x = self.avg(x)
        x = x.view(x.size(0), -1)
        x = self.decoder(x)
        x = F.softmax(x, dim=1)
        return x

Con estos bloques podemos definir nuestro modelo base:

In [None]:
class ResNet(nn.Module):

    def __init__(self, in_channels, n_classes, stem_features, depths, widths):
        super().__init__()
        self.encoder = Encoder(in_channels=in_channels, stem_features=stem_features, depths=depths, widths=widths)
        self.decoder = Decoder(widths[-1], n_classes)

    def forward(self, x):
        x = self.encoder(x)
        x = self.decoder(x)
        return x

#### Experimentos

In [None]:
# Repetimos 3 veces el experimento

model1 = ResNet(in_channels=3, n_classes = 10, stem_features=64, depths=[3,4,6,3], widths=[64, 128, 256,512]).to(device)
model2 = ResNet(in_channels=3, n_classes = 10, stem_features=64, depths=[3,4,6,3], widths=[64, 128, 256,512]).to(device)
model3 = ResNet(in_channels=3, n_classes = 10, stem_features=64, depths=[3,4,6,3], widths=[64, 128, 256,512]).to(device)

model1, training1, validation1, test1, loss1 = entrenamiento(model1, 100)
model2, training2, validation2, test2, loss2 = entrenamiento(model2, 100)
model3, training3, validation3, test3, loss3 = entrenamiento(model3, 100)

Comenzando entrenamiento
Epoch [1/100], Training accuracy: 0.3, Validation accuracy: 0.31, loss = 2.371
Time spent on epoch 1: 3.12min
Epoch [2/100], Training accuracy: 0.285, Validation accuracy: 0.287, loss = 2.23
Time spent on epoch 2: 2.95min
Epoch [3/100], Training accuracy: 0.301, Validation accuracy: 0.291, loss = 2.065
Time spent on epoch 3: 2.96min
Epoch [4/100], Training accuracy: 0.34, Validation accuracy: 0.349, loss = 2.193
Time spent on epoch 4: 2.97min
Epoch [5/100], Training accuracy: 0.367, Validation accuracy: 0.379, loss = 2.215
Time spent on epoch 5: 3.04min
Epoch [6/100], Training accuracy: 0.396, Validation accuracy: 0.403, loss = 2.124
Time spent on epoch 6: 3.0min
Epoch [7/100], Training accuracy: 0.408, Validation accuracy: 0.401, loss = 2.083
Time spent on epoch 7: 2.98min
Epoch [8/100], Training accuracy: 0.434, Validation accuracy: 0.443, loss = 2.195
Time spent on epoch 8: 2.98min
Epoch [9/100], Training accuracy: 0.446, Validation accuracy: 0.443, loss = 2

Epoch [73/100], Training accuracy: 0.817, Validation accuracy: 0.808, loss = 1.735
Time spent on epoch 73: 3.06min
Epoch [74/100], Training accuracy: 0.827, Validation accuracy: 0.806, loss = 1.8
Time spent on epoch 74: 3.07min
Epoch [75/100], Training accuracy: 0.831, Validation accuracy: 0.815, loss = 1.958
Time spent on epoch 75: 3.07min
Epoch [76/100], Training accuracy: 0.832, Validation accuracy: 0.818, loss = 1.462
Time spent on epoch 76: 3.07min
Epoch [77/100], Training accuracy: 0.837, Validation accuracy: 0.828, loss = 1.74
Time spent on epoch 77: 3.07min
Epoch [78/100], Training accuracy: 0.839, Validation accuracy: 0.814, loss = 1.555
Time spent on epoch 78: 3.06min
Epoch [79/100], Training accuracy: 0.841, Validation accuracy: 0.827, loss = 1.68
Time spent on epoch 79: 3.06min
Epoch [80/100], Training accuracy: 0.848, Validation accuracy: 0.831, loss = 1.687
Time spent on epoch 80: 3.07min
Epoch [81/100], Training accuracy: 0.849, Validation accuracy: 0.83, loss = 1.476
Ti

Epoch [45/100], Training accuracy: 0.732, Validation accuracy: 0.729, loss = 1.671
Time spent on epoch 45: 3.09min
Epoch [46/100], Training accuracy: 0.747, Validation accuracy: 0.749, loss = 1.928
Time spent on epoch 46: 3.08min
Epoch [47/100], Training accuracy: 0.745, Validation accuracy: 0.737, loss = 2.017
Time spent on epoch 47: 3.09min
Epoch [48/100], Training accuracy: 0.737, Validation accuracy: 0.736, loss = 1.899
Time spent on epoch 48: 3.09min
Epoch [49/100], Training accuracy: 0.744, Validation accuracy: 0.735, loss = 1.833
Time spent on epoch 49: 3.09min
Epoch [50/100], Training accuracy: 0.759, Validation accuracy: 0.753, loss = 1.854
Time spent on epoch 50: 3.09min
Epoch [51/100], Training accuracy: 0.752, Validation accuracy: 0.744, loss = 1.832
Time spent on epoch 51: 3.09min
Epoch [52/100], Training accuracy: 0.756, Validation accuracy: 0.742, loss = 2.029
Time spent on epoch 52: 3.08min
Epoch [53/100], Training accuracy: 0.773, Validation accuracy: 0.761, loss = 1.6

Epoch [17/100], Training accuracy: 0.649, Validation accuracy: 0.643, loss = 1.821
Time spent on epoch 17: 3.09min
Epoch [18/100], Training accuracy: 0.641, Validation accuracy: 0.635, loss = 2.178
Time spent on epoch 18: 3.1min
Epoch [19/100], Training accuracy: 0.639, Validation accuracy: 0.643, loss = 1.652
Time spent on epoch 19: 3.1min
Epoch [20/100], Training accuracy: 0.645, Validation accuracy: 0.643, loss = 1.74
Time spent on epoch 20: 3.09min
Epoch [21/100], Training accuracy: 0.647, Validation accuracy: 0.637, loss = 1.874
Time spent on epoch 21: 3.09min
Epoch [22/100], Training accuracy: 0.656, Validation accuracy: 0.646, loss = 1.967
Time spent on epoch 22: 3.09min
Epoch [23/100], Training accuracy: 0.664, Validation accuracy: 0.662, loss = 1.84
Time spent on epoch 23: 3.09min
Epoch [24/100], Training accuracy: 0.667, Validation accuracy: 0.657, loss = 1.715
Time spent on epoch 24: 3.09min
Epoch [25/100], Training accuracy: 0.677, Validation accuracy: 0.667, loss = 1.828
T

Epoch [89/100], Training accuracy: 0.892, Validation accuracy: 0.861, loss = 1.7
Time spent on epoch 89: 3.04min
Epoch [90/100], Training accuracy: 0.892, Validation accuracy: 0.863, loss = 1.708
Time spent on epoch 90: 3.05min
Epoch [91/100], Training accuracy: 0.894, Validation accuracy: 0.859, loss = 1.77
Time spent on epoch 91: 3.04min
Epoch [92/100], Training accuracy: 0.897, Validation accuracy: 0.863, loss = 1.463
Time spent on epoch 92: 3.03min
Epoch [93/100], Training accuracy: 0.897, Validation accuracy: 0.863, loss = 1.714
Time spent on epoch 93: 3.02min
Epoch [94/100], Training accuracy: 0.899, Validation accuracy: 0.868, loss = 1.581
Time spent on epoch 94: 3.03min
Epoch [95/100], Training accuracy: 0.9, Validation accuracy: 0.864, loss = 1.703
Time spent on epoch 95: 3.03min
Epoch [96/100], Training accuracy: 0.9, Validation accuracy: 0.866, loss = 1.627
Time spent on epoch 96: 3.03min
Epoch [97/100], Training accuracy: 0.9, Validation accuracy: 0.862, loss = 1.536
Time s

In [None]:
# Guardar resultados
results_dict1 = {"loss": loss1,
    'Train':training1,
     'Validation': validation1,
     "Test":test1}
results_dict2 = {"loss": loss2,
    'Train':training2,
     'Validation': validation2,
     "Test":test2}
results_dict3 = {"loss": loss3,
    'Train':training3,
     'Validation': validation3,
     "Test":test3}

results1_base = pd.DataFrame(results_dict1)
results2_base = pd.DataFrame(results_dict2)
results3_base = pd.DataFrame(results_dict3)

results1_base.to_csv("./results/results_convnext_base_1.csv",index=False)
results2_base.to_csv("./results/results_convnext_base_2.csv",index=False)
results3_base.to_csv("./results/results_convnext_base_3.csv",index=False)

In [None]:
accuracy_base = (results1_base["Test"].max() + results2_base["Test"].max() + results3_base["Test"].max())/3
print(f"Accuracy del modelo base: {accuracy_base}")

Accuracy del modelo base: 0.8348333333333334


## Cambiar la proporción de los stages

El primer cambio consiste en modificar el número de bloques *Bottleneck* que tiene cada *Stage*, originalmente se propone la proporción [3,3,9,3] en vez de [3,4,6,3] como en ResNet-50, sin embargo, para evitar aumentar el número de capas y aun así conservar la proporción [n,n,3n,n], hemos optado por [2,2,6,2].

Para implementarlo solo hay que modificar el parámtro *depths* al crear el modelo.

#### Experimentos

In [None]:
# Repetimos 3 veces el experimento
model1 = ResNet(in_channels=3, n_classes = 10, stem_features=64, depths=[3,3,9,3], widths=[64, 128, 256,512]).to(device)
model2 = ResNet(in_channels=3, n_classes = 10, stem_features=64, depths=[3,3,9,3], widths=[64, 128, 256,512]).to(device)
model3 = ResNet(in_channels=3, n_classes = 10, stem_features=64, depths=[3,3,9,3], widths=[64, 128, 256,512]).to(device)

model1, training1, validation1, test1, loss1 = entrenamiento(model1, 100)
model2, training2, validation2, test2, loss2 = entrenamiento(model2, 100)
model3, training3, validation3, test3, loss3 = entrenamiento(model3, 100)

Comenzando entrenamiento
Epoch [1/100], Training accuracy: 0.308, Validation accuracy: 0.319, loss = 2.252
Time spent on epoch 1: 3.14min
Epoch [2/100], Training accuracy: 0.309, Validation accuracy: 0.311, loss = 2.089
Time spent on epoch 2: 3.17min
Epoch [3/100], Training accuracy: 0.365, Validation accuracy: 0.366, loss = 2.198
Time spent on epoch 3: 3.21min
Epoch [4/100], Training accuracy: 0.371, Validation accuracy: 0.379, loss = 2.198
Time spent on epoch 4: 3.25min
Epoch [5/100], Training accuracy: 0.397, Validation accuracy: 0.401, loss = 2.23
Time spent on epoch 5: 3.24min
Epoch [6/100], Training accuracy: 0.421, Validation accuracy: 0.429, loss = 2.067
Time spent on epoch 6: 3.26min
Epoch [7/100], Training accuracy: 0.468, Validation accuracy: 0.469, loss = 2.032
Time spent on epoch 7: 3.24min
Epoch [8/100], Training accuracy: 0.509, Validation accuracy: 0.514, loss = 2.288
Time spent on epoch 8: 3.27min
Epoch [9/100], Training accuracy: 0.521, Validation accuracy: 0.52, loss

Epoch [73/100], Training accuracy: 0.845, Validation accuracy: 0.827, loss = 1.717
Time spent on epoch 73: 3.25min
Epoch [74/100], Training accuracy: 0.846, Validation accuracy: 0.827, loss = 1.587
Time spent on epoch 74: 3.3min
Epoch [75/100], Training accuracy: 0.847, Validation accuracy: 0.83, loss = 1.659
Time spent on epoch 75: 3.25min
Epoch [76/100], Training accuracy: 0.857, Validation accuracy: 0.837, loss = 1.466
Time spent on epoch 76: 3.28min
Epoch [77/100], Training accuracy: 0.859, Validation accuracy: 0.832, loss = 1.732
Time spent on epoch 77: 3.29min
Epoch [78/100], Training accuracy: 0.858, Validation accuracy: 0.839, loss = 1.55
Time spent on epoch 78: 3.25min
Epoch [79/100], Training accuracy: 0.865, Validation accuracy: 0.839, loss = 1.841
Time spent on epoch 79: 3.24min
Epoch [80/100], Training accuracy: 0.868, Validation accuracy: 0.851, loss = 1.461
Time spent on epoch 80: 3.25min
Epoch [81/100], Training accuracy: 0.874, Validation accuracy: 0.854, loss = 1.609


Epoch [45/100], Training accuracy: 0.718, Validation accuracy: 0.708, loss = 1.73
Time spent on epoch 45: 3.46min
Epoch [46/100], Training accuracy: 0.715, Validation accuracy: 0.706, loss = 1.84
Time spent on epoch 46: 3.49min
Epoch [47/100], Training accuracy: 0.732, Validation accuracy: 0.721, loss = 1.783
Time spent on epoch 47: 3.47min
Epoch [48/100], Training accuracy: 0.726, Validation accuracy: 0.717, loss = 2.146
Time spent on epoch 48: 3.44min
Epoch [49/100], Training accuracy: 0.741, Validation accuracy: 0.734, loss = 1.971
Time spent on epoch 49: 3.5min
Epoch [50/100], Training accuracy: 0.731, Validation accuracy: 0.718, loss = 1.703
Time spent on epoch 50: 3.47min
Epoch [51/100], Training accuracy: 0.746, Validation accuracy: 0.736, loss = 1.62
Time spent on epoch 51: 3.44min
Epoch [52/100], Training accuracy: 0.742, Validation accuracy: 0.736, loss = 1.988
Time spent on epoch 52: 3.49min
Epoch [53/100], Training accuracy: 0.738, Validation accuracy: 0.733, loss = 1.636
T

Epoch [17/100], Training accuracy: 0.617, Validation accuracy: 0.614, loss = 1.999
Time spent on epoch 17: 3.41min
Epoch [18/100], Training accuracy: 0.634, Validation accuracy: 0.625, loss = 1.904
Time spent on epoch 18: 3.41min
Epoch [19/100], Training accuracy: 0.632, Validation accuracy: 0.632, loss = 1.985
Time spent on epoch 19: 3.42min
Epoch [20/100], Training accuracy: 0.643, Validation accuracy: 0.639, loss = 1.953
Time spent on epoch 20: 3.43min
Epoch [21/100], Training accuracy: 0.641, Validation accuracy: 0.631, loss = 1.712
Time spent on epoch 21: 3.43min
Epoch [22/100], Training accuracy: 0.651, Validation accuracy: 0.64, loss = 1.836
Time spent on epoch 22: 3.42min
Epoch [23/100], Training accuracy: 0.649, Validation accuracy: 0.64, loss = 1.716
Time spent on epoch 23: 3.4min
Epoch [24/100], Training accuracy: 0.626, Validation accuracy: 0.617, loss = 2.119
Time spent on epoch 24: 3.41min
Epoch [25/100], Training accuracy: 0.657, Validation accuracy: 0.638, loss = 2.068


Epoch [89/100], Training accuracy: 0.881, Validation accuracy: 0.851, loss = 1.584
Time spent on epoch 89: 3.3min
Epoch [90/100], Training accuracy: 0.884, Validation accuracy: 0.853, loss = 1.821
Time spent on epoch 90: 3.27min
Epoch [91/100], Training accuracy: 0.887, Validation accuracy: 0.86, loss = 1.679
Time spent on epoch 91: 3.27min
Epoch [92/100], Training accuracy: 0.888, Validation accuracy: 0.855, loss = 1.583
Time spent on epoch 92: 3.26min
Epoch [93/100], Training accuracy: 0.889, Validation accuracy: 0.855, loss = 1.716
Time spent on epoch 93: 3.29min
Epoch [94/100], Training accuracy: 0.89, Validation accuracy: 0.859, loss = 1.47
Time spent on epoch 94: 3.28min
Epoch [95/100], Training accuracy: 0.888, Validation accuracy: 0.857, loss = 1.585
Time spent on epoch 95: 3.3min
Epoch [96/100], Training accuracy: 0.891, Validation accuracy: 0.858, loss = 1.727
Time spent on epoch 96: 3.31min
Epoch [97/100], Training accuracy: 0.892, Validation accuracy: 0.856, loss = 1.586
Ti

In [None]:
# Guardar resultados
results_dict1 = {"loss": loss1,
    'Train':training1,
     'Validation': validation1,
     "Test":test1}
results_dict2 = {"loss": loss2,
    'Train':training2,
     'Validation': validation2,
     "Test":test2}
results_dict3 = {"loss": loss3,
    'Train':training3,
     'Validation': validation3,
     "Test":test3}

results1_change_stage_cr = pd.DataFrame(results_dict1)
results2_change_stage_cr = pd.DataFrame(results_dict2)
results3_change_stage_cr = pd.DataFrame(results_dict3)

results1_change_stage_cr.to_csv("./results/results_convnext_change_stage_cr_1.csv",index=False)
results2_change_stage_cr.to_csv("./results/results_convnext_change_stage_cr_2.csv",index=False)
results3_change_stage_cr.to_csv("./results/results_convnext_change_stage_cr_3.csv",index=False)

In [None]:
accuracy_cr = (results1_change_stage_cr["Test"].max() + results2_change_stage_cr["Test"].max() + results3_change_stage_cr["Test"].max())/3
print(f"Accuracy del modelo al cambiar la proporción de los stages: {accuracy_cr}")

Accuracy del modelo al cambiar la proporción de los stages: 0.8343333333333334


In [None]:
torch.save(model1.state_dict(), "./model_change_stage_cr.pt")

## Cambiar Stem por Patchify

Las redes ResNet empiezan con un bloque *Stem*, éste se encarga de aplicar un *downsampling* agresivo a las imágenes de entrada. *Patchify*, por otro lado, consiste en dividir en bloques a las imágenes de entrada, mediante capas convolucionales cuyo stride es igual al tamaño del kernel.

En el modelo particular que hemos estado usando optamos por evitar usar *downsampling* en el bloque *Stem* debido a que la dimensión de nuestras imágenes es más pequeña que la de las imágenes de ImageNet para las que fue planteado este diseño. Por ello mismo, hemos optado por utilizar *patchify* de dimensión 1, para evitar reducir el tamaño de las imágenes. Originalmente se utiliza *patchify* de dimensión 4.

Para implementar esto, redefinimos el bloque Stem para que consista de una capa convolucional seguida de una capa de Batch Normalization.

In [None]:
class Stem(nn.Sequential):
    def __init__(self, in_features, out_features, patch_size=1):
        super().__init__(
            nn.Conv2d(in_features, out_features, kernel_size=patch_size, stride=patch_size),
            nn.BatchNorm2d(out_features)
        )

#### Experimentos

In [None]:
# Repetimos 3 veces el experimento
model1 = ResNet(in_channels=3, n_classes = 10, stem_features=64, depths=[3,3,9,3], widths=[64, 128, 256,512]).to(device)
model2 = ResNet(in_channels=3, n_classes = 10, stem_features=64, depths=[3,3,9,3], widths=[64, 128, 256,512]).to(device)
model3 = ResNet(in_channels=3, n_classes = 10, stem_features=64, depths=[3,3,9,3], widths=[64, 128, 256,512]).to(device)

model1, training1, validation1, test1, loss1 = entrenamiento(model1, 100)
model2, training2, validation2, test2, loss2 = entrenamiento(model2, 100)
model3, training3, validation3, test3, loss3 = entrenamiento(model3, 100)

Comenzando entrenamiento
Epoch [1/100], Training accuracy: 0.268, Validation accuracy: 0.258, loss = 2.081
Time spent on epoch 1: 3.2min
Epoch [2/100], Training accuracy: 0.343, Validation accuracy: 0.352, loss = 2.181
Time spent on epoch 2: 3.24min
Epoch [3/100], Training accuracy: 0.353, Validation accuracy: 0.357, loss = 2.19
Time spent on epoch 3: 3.28min
Epoch [4/100], Training accuracy: 0.394, Validation accuracy: 0.402, loss = 2.087
Time spent on epoch 4: 3.31min
Epoch [5/100], Training accuracy: 0.413, Validation accuracy: 0.412, loss = 2.119
Time spent on epoch 5: 3.74min
Epoch [6/100], Training accuracy: 0.444, Validation accuracy: 0.445, loss = 1.926
Time spent on epoch 6: 4.07min
Epoch [7/100], Training accuracy: 0.493, Validation accuracy: 0.484, loss = 2.304
Time spent on epoch 7: 4.18min
Epoch [8/100], Training accuracy: 0.481, Validation accuracy: 0.473, loss = 2.16
Time spent on epoch 8: 3.35min
Epoch [9/100], Training accuracy: 0.51, Validation accuracy: 0.509, loss =

Epoch [73/100], Training accuracy: 0.817, Validation accuracy: 0.798, loss = 1.718
Time spent on epoch 73: 3.63min
Epoch [74/100], Training accuracy: 0.815, Validation accuracy: 0.804, loss = 1.956
Time spent on epoch 74: 3.68min
Epoch [75/100], Training accuracy: 0.826, Validation accuracy: 0.812, loss = 1.709
Time spent on epoch 75: 3.71min
Epoch [76/100], Training accuracy: 0.828, Validation accuracy: 0.81, loss = 1.797
Time spent on epoch 76: 3.74min
Epoch [77/100], Training accuracy: 0.833, Validation accuracy: 0.823, loss = 1.831
Time spent on epoch 77: 3.76min
Epoch [78/100], Training accuracy: 0.837, Validation accuracy: 0.821, loss = 1.6
Time spent on epoch 78: 3.81min
Epoch [79/100], Training accuracy: 0.845, Validation accuracy: 0.825, loss = 1.736
Time spent on epoch 79: 3.76min
Epoch [80/100], Training accuracy: 0.844, Validation accuracy: 0.819, loss = 1.473
Time spent on epoch 80: 3.83min
Epoch [81/100], Training accuracy: 0.849, Validation accuracy: 0.829, loss = 1.589


Epoch [45/100], Training accuracy: 0.746, Validation accuracy: 0.745, loss = 1.463
Time spent on epoch 45: 3.34min
Epoch [46/100], Training accuracy: 0.742, Validation accuracy: 0.737, loss = 1.709
Time spent on epoch 46: 3.34min
Epoch [47/100], Training accuracy: 0.739, Validation accuracy: 0.742, loss = 2.202
Time spent on epoch 47: 3.33min
Epoch [48/100], Training accuracy: 0.748, Validation accuracy: 0.752, loss = 1.736
Time spent on epoch 48: 3.34min
Epoch [49/100], Training accuracy: 0.736, Validation accuracy: 0.727, loss = 1.509
Time spent on epoch 49: 3.34min
Epoch [50/100], Training accuracy: 0.759, Validation accuracy: 0.76, loss = 1.798
Time spent on epoch 50: 3.34min
Epoch [51/100], Training accuracy: 0.758, Validation accuracy: 0.75, loss = 2.074
Time spent on epoch 51: 3.34min
Epoch [52/100], Training accuracy: 0.764, Validation accuracy: 0.764, loss = 1.952
Time spent on epoch 52: 3.34min
Epoch [53/100], Training accuracy: 0.769, Validation accuracy: 0.758, loss = 2.149

Epoch [17/100], Training accuracy: 0.591, Validation accuracy: 0.595, loss = 2.004
Time spent on epoch 17: 3.32min
Epoch [18/100], Training accuracy: 0.586, Validation accuracy: 0.589, loss = 2.001
Time spent on epoch 18: 3.38min
Epoch [19/100], Training accuracy: 0.593, Validation accuracy: 0.583, loss = 1.862
Time spent on epoch 19: 3.33min
Epoch [20/100], Training accuracy: 0.602, Validation accuracy: 0.608, loss = 1.713
Time spent on epoch 20: 3.39min
Epoch [21/100], Training accuracy: 0.597, Validation accuracy: 0.606, loss = 2.066
Time spent on epoch 21: 3.33min
Epoch [22/100], Training accuracy: 0.601, Validation accuracy: 0.614, loss = 1.586
Time spent on epoch 22: 3.32min
Epoch [23/100], Training accuracy: 0.618, Validation accuracy: 0.624, loss = 1.969
Time spent on epoch 23: 3.31min
Epoch [24/100], Training accuracy: 0.605, Validation accuracy: 0.605, loss = 2.141
Time spent on epoch 24: 3.32min
Epoch [25/100], Training accuracy: 0.622, Validation accuracy: 0.625, loss = 1.6

Epoch [89/100], Training accuracy: 0.849, Validation accuracy: 0.82, loss = 1.462
Time spent on epoch 89: 4.76min
Epoch [90/100], Training accuracy: 0.848, Validation accuracy: 0.823, loss = 1.476
Time spent on epoch 90: 4.75min
Epoch [91/100], Training accuracy: 0.851, Validation accuracy: 0.822, loss = 1.542
Time spent on epoch 91: 4.76min
Epoch [92/100], Training accuracy: 0.854, Validation accuracy: 0.823, loss = 1.586
Time spent on epoch 92: 4.75min
Epoch [93/100], Training accuracy: 0.853, Validation accuracy: 0.828, loss = 1.797
Time spent on epoch 93: 4.75min
Epoch [94/100], Training accuracy: 0.857, Validation accuracy: 0.827, loss = 1.68
Time spent on epoch 94: 4.75min
Epoch [95/100], Training accuracy: 0.858, Validation accuracy: 0.819, loss = 1.79
Time spent on epoch 95: 4.76min
Epoch [96/100], Training accuracy: 0.858, Validation accuracy: 0.828, loss = 1.463
Time spent on epoch 96: 4.75min
Epoch [97/100], Training accuracy: 0.859, Validation accuracy: 0.828, loss = 1.472


In [None]:
torch.save(model2.state_dict(), "./model_patchify.pt")

In [None]:
# Guardar resultados
results_dict1 = {"loss": loss1,
    'Train':training1,
     'Validation': validation1,
     "Test":test1}
results_dict2 = {"loss": loss2,
    'Train':training2,
     'Validation': validation2,
     "Test":test2}
results_dict3 = {"loss": loss3,
    'Train':training3,
     'Validation': validation3,
     "Test":test3}

results1_patchify = pd.DataFrame(results_dict1)
results2_patchify = pd.DataFrame(results_dict2)
results3_patchify = pd.DataFrame(results_dict3)

results1_patchify.to_csv("./results/results_convnext_patchify_1.csv",index=False)
results2_patchify.to_csv("./results/results_convnext_patchify_2.csv",index=False)
results3_patchify.to_csv("./results/results_convnext_patchify_3.csv",index=False)

In [None]:
accuracy_patch = (results1_patchify["Test"].max() + results2_patchify["Test"].max() + results3_patchify["Test"].max())/3
print(f"Accuracy del modelo al aplicar Patchify: {accuracy_patch}")

Accuracy del modelo al aplicar Patchify: 0.7947333333333333


## Añadir *depthwise convolution* (*ResNeXt-ify*)

Basados en el artículo [Aggregated Residual Transformations for Deep Neural Networks](https://arxiv.org/abs/1611.05431), se propone utilizar *grouped convolution* en la capa convolucional en la que el kernel es de tamaño 3 del bloque Bottleneck . En particular, se utiliza un tipo de *grouped convolution* llamdo *depthwise*, en donde el número de grupos es el mismo que el número de canales de entrada.

Para implementar este cambio basta utilizar el parámetro *groups* al llamar a la capa nn.Conv2d() en cuestión.

In [None]:
class BottleNeckBlock(nn.Module):
    def __init__(self,in_features, out_features, reduction = 4, stride = 1):
        super().__init__()
        reduced_features = out_features // reduction
        self.block = nn.Sequential(
            # Reducción de canales
            utilConv(in_features, reduced_features, kernel_size=1, stride=stride, bias=False),
            # El número de canales se mantiene fijo
            utilConv(reduced_features, reduced_features, kernel_size=3, bias=False, groups=reduced_features), # en esta capa se utiliza grouped convolution
            # Aumento de canales
            utilConv(reduced_features, out_features, kernel_size=1, bias=False, act=nn.Identity),
        )

        # self.shortcut es utilizado para transformar al input a las dimensiones correctas para poder sumarlo a la salida del bloque
        if in_features != out_features:
            self.shortcut =nn.Sequential(utilConv(in_features, out_features, kernel_size=1, stride=stride, bias=False))
        else:
            self.shortcut = nn.Identity()

        self.act = nn.ReLU()

    def forward(self, x):
        res = x
        x = self.block(x)
        res = self.shortcut(res)
        x += res
        x = self.act(x)
        return x

#### Experimentos

In [None]:
model1 = ResNet(in_channels=3, n_classes = 10, stem_features=64, depths=[3,3,9,3], widths=[64, 128, 256,512]).to(device)
model2 = ResNet(in_channels=3, n_classes = 10, stem_features=64, depths=[3,3,9,3], widths=[64, 128, 256,512]).to(device)
model3 = ResNet(in_channels=3, n_classes = 10, stem_features=64, depths=[3,3,9,3], widths=[64, 128, 256,512]).to(device)

model1, training1, validation1, test1, loss1 = entrenamiento(model1, 100)
model2, training2, validation2, test2, loss2 = entrenamiento(model2, 100)
model3, training3, validation3, test3, loss3 = entrenamiento(model3, 100)

Comenzando entrenamiento
Epoch [1/100], Training accuracy: 0.231, Validation accuracy: 0.229, loss = 2.085
Time spent on epoch 1: 3.92min
Epoch [2/100], Training accuracy: 0.319, Validation accuracy: 0.326, loss = 2.299
Time spent on epoch 2: 3.96min
Epoch [3/100], Training accuracy: 0.373, Validation accuracy: 0.382, loss = 2.037
Time spent on epoch 3: 4.07min
Epoch [4/100], Training accuracy: 0.445, Validation accuracy: 0.451, loss = 2.172
Time spent on epoch 4: 4.03min
Epoch [5/100], Training accuracy: 0.507, Validation accuracy: 0.5, loss = 1.932
Time spent on epoch 5: 3.85min
Epoch [6/100], Training accuracy: 0.532, Validation accuracy: 0.519, loss = 1.953
Time spent on epoch 6: 3.77min
Epoch [7/100], Training accuracy: 0.548, Validation accuracy: 0.548, loss = 2.185
Time spent on epoch 7: 3.89min
Epoch [8/100], Training accuracy: 0.59, Validation accuracy: 0.584, loss = 1.984
Time spent on epoch 8: 3.89min
Epoch [9/100], Training accuracy: 0.577, Validation accuracy: 0.573, loss 

Epoch [1/100], Training accuracy: 0.297, Validation accuracy: 0.297, loss = 2.215
Time spent on epoch 1: 3.34min
Epoch [2/100], Training accuracy: 0.342, Validation accuracy: 0.348, loss = 2.001
Time spent on epoch 2: 3.33min
Epoch [3/100], Training accuracy: 0.39, Validation accuracy: 0.395, loss = 2.064
Time spent on epoch 3: 3.34min
Epoch [4/100], Training accuracy: 0.442, Validation accuracy: 0.452, loss = 2.018
Time spent on epoch 4: 3.33min
Epoch [5/100], Training accuracy: 0.477, Validation accuracy: 0.482, loss = 2.148
Time spent on epoch 5: 3.32min
Epoch [6/100], Training accuracy: 0.532, Validation accuracy: 0.534, loss = 2.219
Time spent on epoch 6: 3.34min
Epoch [7/100], Training accuracy: 0.546, Validation accuracy: 0.535, loss = 2.08
Time spent on epoch 7: 3.32min
Epoch [8/100], Training accuracy: 0.577, Validation accuracy: 0.573, loss = 1.812
Time spent on epoch 8: 3.36min
Epoch [9/100], Training accuracy: 0.581, Validation accuracy: 0.58, loss = 1.955
Time spent on epo

Epoch [73/100], Training accuracy: 0.832, Validation accuracy: 0.808, loss = 1.764
Time spent on epoch 73: 3.32min
Epoch [74/100], Training accuracy: 0.837, Validation accuracy: 0.817, loss = 1.463
Time spent on epoch 74: 3.32min
Epoch [75/100], Training accuracy: 0.84, Validation accuracy: 0.82, loss = 1.833
Time spent on epoch 75: 3.32min
Epoch [76/100], Training accuracy: 0.848, Validation accuracy: 0.827, loss = 1.711
Time spent on epoch 76: 3.32min
Epoch [77/100], Training accuracy: 0.849, Validation accuracy: 0.823, loss = 1.606
Time spent on epoch 77: 3.32min
Epoch [78/100], Training accuracy: 0.848, Validation accuracy: 0.831, loss = 1.461
Time spent on epoch 78: 3.33min
Epoch [79/100], Training accuracy: 0.855, Validation accuracy: 0.834, loss = 1.635
Time spent on epoch 79: 3.32min
Epoch [80/100], Training accuracy: 0.861, Validation accuracy: 0.838, loss = 1.572
Time spent on epoch 80: 3.32min
Epoch [81/100], Training accuracy: 0.863, Validation accuracy: 0.842, loss = 1.591

Epoch [45/100], Training accuracy: 0.71, Validation accuracy: 0.711, loss = 1.795
Time spent on epoch 45: 3.33min
Epoch [46/100], Training accuracy: 0.709, Validation accuracy: 0.703, loss = 1.707
Time spent on epoch 46: 3.33min
Epoch [47/100], Training accuracy: 0.733, Validation accuracy: 0.725, loss = 1.809
Time spent on epoch 47: 3.33min
Epoch [48/100], Training accuracy: 0.738, Validation accuracy: 0.73, loss = 1.751
Time spent on epoch 48: 3.32min
Epoch [49/100], Training accuracy: 0.736, Validation accuracy: 0.726, loss = 1.834
Time spent on epoch 49: 3.32min
Epoch [50/100], Training accuracy: 0.747, Validation accuracy: 0.742, loss = 2.061
Time spent on epoch 50: 3.33min
Epoch [51/100], Training accuracy: 0.746, Validation accuracy: 0.751, loss = 1.571
Time spent on epoch 51: 3.32min
Epoch [52/100], Training accuracy: 0.739, Validation accuracy: 0.729, loss = 1.896
Time spent on epoch 52: 3.32min
Epoch [53/100], Training accuracy: 0.758, Validation accuracy: 0.751, loss = 1.709

In [None]:
# Guardar resultados
results_dict1 = {"loss": loss1,
    'Train':training1,
     'Validation': validation1,
     "Test":test1}
results_dict2 = {"loss": loss2,
    'Train':training2,
     'Validation': validation2,
     "Test":test2}
results_dict3 = {"loss": loss3,
    'Train':training3,
     'Validation': validation3,
     "Test":test3}

results1_resnextify = pd.DataFrame(results_dict1)
results2_resnextify = pd.DataFrame(results_dict2)
results3_resnextify = pd.DataFrame(results_dict3)

results1_resnextify.to_csv("./results/results_convnext_resnextify_1.csv",index=False)
results2_resnextify.to_csv("./results/results_convnext_resnextify_2.csv",index=False)
results3_resnextify.to_csv("./results/results_convnext_resnextify_3.csv",index=False)

In [None]:
accuracy_resnext = (results1_resnextify["Test"].max() + results2_resnextify["Test"].max() + results3_resnextify["Test"].max())/3
print(f"Accuracy del modelo al aplicar  ResNeXt-ify: {accuracy_resnext}")

Accuracy del modelo al aplicar  ResNeXt-ify: 0.8309666666666665


In [None]:
torch.save(model1.state_dict(), "./model_resnextify.pt")

## Inverted-Bottleneck

El modelo ResNet-50 utiliza el bloque Bottleneck. El nombre de dicho bloque se debe a que primero reduce el número de canales mediante una convolución con kernel de tamaño 1, después mantiene el número de canales y aplica una convolución con kernel de tamaño 3 y por último aumenta el número de canales al original.

Los transformadores de visión utilizan un sistema opuesto. Primero aumentan el número de canales, después aplican la convolución con kernel de dimensión mayor a 1 y por último reducen el número de canales al original. Este tipo de bloque se llama *Inverted-Bottleneck*, o Bottleneck invertido.

Inspirados en el diseño de los transformadores de visión, se propone utilizar bloques Bottleneck invertidos. La implementación se puede llevar a cabo mediante la modificación del bloque BottleNeck como sigue:

In [None]:
class BottleNeckBlock(nn.Module):
    def __init__(self, in_features, out_features, expansion = 4, stride = 1):
        super().__init__()
        expanded_features = out_features * expansion
        self.block = nn.Sequential(
            # Aumento de canales
            utilConv(in_features, expanded_features, kernel_size=1, stride=stride, bias=False),
            # El número de canales se mantiene fijo (Aquí se aplica la convolución depthwise)
            utilConv(expanded_features, expanded_features, kernel_size=3, bias=False, groups=in_features),
            # Reducción de canales
            utilConv(expanded_features, out_features, kernel_size=1, bias=False, act=nn.Identity)
        )

        # self.shortcut es utilizado para transformar al input a las dimensiones correctas para poder sumarlo a la salida del bloque
        if in_features != out_features:
            self.shortcut =nn.Sequential(utilConv(in_features, out_features, kernel_size=1, stride=stride, bias=False))
        else:
            self.shortcut = nn.Identity()

        self.act = nn.ReLU()

    def forward(self, x):
        res = x
        x = self.block(x)
        res = self.shortcut(res)
        x += res
        x = self.act(x)
        return x

#### Experimentos

In [None]:
# Repetimos 3 veces el experimento
model1 = ResNet(in_channels=3, n_classes = 10, stem_features=64, depths=[3,3,9,3], widths=[64, 128, 256,512]).to(device)
model2 = ResNet(in_channels=3, n_classes = 10, stem_features=64, depths=[3,3,9,3], widths=[64, 128, 256,512]).to(device)
model3 = ResNet(in_channels=3, n_classes = 10, stem_features=64, depths=[3,3,9,3], widths=[64, 128, 256,512]).to(device)

model1, training1, validation1, test1, loss1 = entrenamiento(model1, 100)
results_dict1 = {"loss": loss1,
    'Train':training1,
     'Validation': validation1,
     "Test":test1}
results1_inverted = pd.DataFrame(results_dict1)
results1_inverted.to_csv("./results/results_convnext_inverted_1.csv",index=False)

model2, training2, validation2, test2, loss2 = entrenamiento(model2, 100)
results_dict2 = {"loss": loss2,
    'Train':training2,
     'Validation': validation2,
     "Test":test2}
results2_inverted = pd.DataFrame(results_dict2)
results2_inverted.to_csv("./results/results_convnext_inverted_2.csv",index=False)

model3, training3, validation3, test3, loss3 = entrenamiento(model3, 100)
results_dict3 = {"loss": loss3,
    'Train':training3,
     'Validation': validation3,
     "Test":test3}
results3_inverted = pd.DataFrame(results_dict3)
results3_inverted.to_csv("./results/results_convnext_inverted_3.csv",index=False)

Comenzando entrenamiento
Epoch [1/100], Training accuracy: 0.286, Validation accuracy: 0.29, loss = 2.271
Time spent on epoch 1: 4.91min
Epoch [2/100], Training accuracy: 0.335, Validation accuracy: 0.334, loss = 2.055
Time spent on epoch 2: 4.91min
Epoch [3/100], Training accuracy: 0.326, Validation accuracy: 0.316, loss = 2.175
Time spent on epoch 3: 4.91min
Epoch [4/100], Training accuracy: 0.382, Validation accuracy: 0.389, loss = 2.265
Time spent on epoch 4: 4.9min
Epoch [5/100], Training accuracy: 0.431, Validation accuracy: 0.439, loss = 1.908
Time spent on epoch 5: 4.9min
Epoch [6/100], Training accuracy: 0.454, Validation accuracy: 0.464, loss = 1.957
Time spent on epoch 6: 4.9min
Epoch [7/100], Training accuracy: 0.513, Validation accuracy: 0.521, loss = 1.943
Time spent on epoch 7: 4.92min
Epoch [8/100], Training accuracy: 0.558, Validation accuracy: 0.574, loss = 1.854
Time spent on epoch 8: 4.92min
Epoch [9/100], Training accuracy: 0.582, Validation accuracy: 0.584, loss =

In [None]:
# Guardar resultados
results_dict1 = {"loss": loss1,
    'Train':training1,
     'Validation': validation1,
     "Test":test1}
results_dict2 = {"loss": loss2,
    'Train':training2,
     'Validation': validation2,
     "Test":test2}
results_dict3 = {"loss": loss3,
    'Train':training3,
     'Validation': validation3,
     "Test":test3}

results1_inverted = pd.DataFrame(results_dict1)
results2_inverted = pd.DataFrame(results_dict2)
results3_inverted = pd.DataFrame(results_dict3)

results1_inverted.to_csv("./results/results_convnext_inverted_1.csv",index=False)
results2_inverted.to_csv("./results/results_convnext_inverted_2.csv",index=False)
results3_inverted.to_csv("./results/results_convnext_inverted_3.csv",index=False)

In [None]:
accuracy_inverted = (results1_inverted["Test"].max() + results2_inverted["Test"].max() + results3_inverted["Test"].max())/3
print(f"Accuracy del modelo al invertir el cuello de botella: {accuracy_inverted}")

Accuracy del modelo al invertir el cuello de botella: 0.8484666666666666


In [None]:
torch.save(model1.state_dict(), "./model_inverted_bn.pt")

## Aumentar el tamaño de los kernels

Tomando como inspiración el modelo [Swin Transformers](https://arxiv.org/abs/2103.14030), se propone aumentar el tamaño del kernel de las capas convolucionales en el bloque Bottleneck, de 3x3 a 7x7. Para ello se plantea que es necesario hacer dos cosas:

1. Mover la capa de *depthwise convolution* para que sea la primera del bloque
2. Incrementar el tamaño del kernel de la capa de *depthwise convolution* a 7x7

In [None]:
class BottleNeckBlock(nn.Module):
    def __init__(self, in_features, out_features, expansion = 4, stride = 1,
    ):
        super().__init__()
        expanded_features = out_features * expansion
        self.block = nn.Sequential(
            # El número de canales se mantiene fijo (con grupos depth-wise y kernel más grande)
            utilConv(in_features, in_features, kernel_size=7, stride=stride, bias=False, groups=in_features),
            # Aumento en el número de canales
            utilConv(in_features, expanded_features, kernel_size=1),
            # Reducción de canales
            utilConv(expanded_features, out_features, kernel_size=1, bias=False, act=nn.Identity),
        )

        # self.shortcut es utilizado para transformar al input a las dimensiones correctas para poder sumarlo a la salida del bloque
        if in_features != out_features:
            self.shortcut =nn.Sequential(utilConv(in_features, out_features, kernel_size=1, stride=stride, bias=False))
        else:
            self.shortcut = nn.Identity()

        self.act = nn.ReLU()

    def forward(self, x):
        res = x
        x = self.block(x)
        res = self.shortcut(res)
        x += res
        x = self.act(x)
        return x

#### Experimentos

In [None]:
# Repetimos 3 veces el experimento
model1 = ResNet(in_channels=3, n_classes = 10, stem_features=64, depths=[3,3,9,3], widths=[64, 128, 256,512]).to(device)
model2 = ResNet(in_channels=3, n_classes = 10, stem_features=64, depths=[3,3,9,3], widths=[64, 128, 256,512]).to(device)
model3 = ResNet(in_channels=3, n_classes = 10, stem_features=64, depths=[3,3,9,3], widths=[64, 128, 256,512]).to(device)

model1, training1, validation1, test1, loss1 = entrenamiento(model1, 100)
model2, training2, validation2, test2, loss2 = entrenamiento(model2, 100)
model3, training3, validation3, test3, loss3 = entrenamiento(model3, 100)

Comenzando entrenamiento
Epoch [1/100], Training accuracy: 0.259, Validation accuracy: 0.259, Test accuracy:0.282, loss = 2.03
Time spent on epoch 1: 5.54min
Epoch [2/100], Training accuracy: 0.323, Validation accuracy: 0.317, Test accuracy:0.3325, loss = 1.996
Time spent on epoch 2: 5.66min
Epoch [3/100], Training accuracy: 0.342, Validation accuracy: 0.338, Test accuracy:0.348, loss = 2.105
Time spent on epoch 3: 5.69min
Epoch [4/100], Training accuracy: 0.366, Validation accuracy: 0.363, Test accuracy:0.3644, loss = 2.323
Time spent on epoch 4: 5.68min
Epoch [5/100], Training accuracy: 0.389, Validation accuracy: 0.392, Test accuracy:0.3843, loss = 1.954
Time spent on epoch 5: 5.65min
Epoch [6/100], Training accuracy: 0.421, Validation accuracy: 0.426, Test accuracy:0.4168, loss = 2.157
Time spent on epoch 6: 5.67min
Epoch [7/100], Training accuracy: 0.412, Validation accuracy: 0.41, Test accuracy:0.4264, loss = 1.932
Time spent on epoch 7: 5.65min
Epoch [8/100], Training accuracy: 

In [None]:
# Guardar resultados
results_dict1 = {"loss": loss1,
    'Train':training1,
     'Validation': validation1,
     "Test":test1}
results_dict2 = {"loss": loss2,
    'Train':training2,
     'Validation': validation2,
     "Test":test2}
results_dict3 = {"loss": loss3,
    'Train':training3,
     'Validation': validation3,
     "Test":test3}

results1_kernel = pd.DataFrame(results_dict1)
results2_kernel = pd.DataFrame(results_dict2)
results3_kernel = pd.DataFrame(results_dict3)

results1_kernel.to_csv("./results/results_convnext_kernel_1.csv",index=False)
results2_kernel.to_csv("./results/results_convnext_kernel_2.csv",index=False)
results3_kernel.to_csv("./results/results_convnext_kernel_3.csv",index=False)

In [None]:
accuracy_kernel = (results1_kernel["Test"].max() + results2_kernel["Test"].max() + results3_kernel["Test"].max())/3
print(f"Accuracy del modelo al aumentar el tamaño de los kernels: {accuracy_kernel}")

Accuracy del modelo al aumentar el tamaño de los kernels: 0.8513000000000001


In [None]:
torch.save(model1.state_dict(), "./model_kernel_size.pt")

## Cambios en el micro-diseño

Por último se proponen los siguientes cambios de menor escala en el diseño de la arquitectura:

1. Cambiar la función de activación ReLU por GELU
2. Disminuir el número de funciones de activación
3. Disminución en el número de capas de normalización
4. Cambiar *batch normalization* por *layer normalization*
5. Separar las capas de *downsampling*
6. Añadir *Stochastic Depth*, también conocido como *Drop Path* y *Layer Scale*

Nuevamente, debido a que nuestras imágenes son de dimensión 32x32, en vez de 224x224 como fue propuesto en el artículo, optamos por omitir la primera capa de downsampling.

Implementemos dichos cambios:

In [None]:
class LayerScaler(nn.Module):
    def __init__(self, init_value, dimensions):
        super().__init__()
        self.gamma = nn.Parameter(init_value * torch.ones((dimensions)),
                                    requires_grad=True)

    def forward(self, x):
        return self.gamma[None,...,None,None] * x

class BottleNeckBlock(nn.Module):
    def __init__(self, in_features, out_features, expansion = 4, drop_p = .0, layer_scaler_init_value = 1e-6):
        super().__init__()
        expanded_features = out_features * expansion
        self.block = nn.Sequential(
            # El número de canales se mantiene fijo (con grupos depth-wise y kernel más grande)
            nn.Conv2d(
                in_features, in_features, kernel_size=7, padding=3, bias=False, groups=in_features
            ),
            # nn.GroupNorm(num_groups=1) nos permite aplicar LayerNorm
            nn.GroupNorm(num_groups=1, num_channels=in_features),
            # Aumento en el número de canales
            nn.Conv2d(in_features, expanded_features, kernel_size=1),
            nn.GELU(),
            # Reducción de canales
            nn.Conv2d(expanded_features, out_features, kernel_size=1),
        )
        self.layer_scaler = LayerScaler(layer_scaler_init_value, out_features)
        self.drop_path = StochasticDepth(drop_p, mode="batch")


    def forward(self, x):
        res = x
        x = self.block(x)
        x = self.layer_scaler(x)
        x = self.drop_path(x)
        x += res
        return x

class Stage(nn.Sequential):
    def __init__(self, in_features, out_features, depth, drop_p = .0):

        if in_features != out_features:
            super().__init__(
                # Añadimos la capa de downsampling previo al stage.
                nn.Sequential(
                    nn.GroupNorm(num_groups=1, num_channels=in_features),
                    nn.Conv2d(in_features, out_features, kernel_size=2, stride=2)
                ),
                # Añadimos los stages
                *[BottleNeckBlock(out_features, out_features, drop_p=drop_p) for _ in range(depth)]
            )
        else:
            super().__init__(
                # Con esto garantizamos que no haya downsampling previo al primer stage
                # Esta capa no tiene Layer Normalization puesto que no se aplicará downsampling
                *[BottleNeckBlock(out_features, out_features, drop_p=drop_p) for _ in range(depth)]
            )

#### Construimos el enconder y decoder de ConvNeXt

In [None]:
class ConvNextEncoder(nn.Module):
    def __init__(self, in_channels, stem_features, depths, widths, drop_p = .0):
        super().__init__()
        self.stem = Stem(in_channels, stem_features)

        in_out_widths = list(zip(widths, widths[1:]))
        # Probabilidades para la implementación de Drop Path
        drop_probs = [x.item() for x in torch.linspace(0, drop_p, sum(depths))]

        self.stages = nn.ModuleList()
        self.stages.append(Stage(stem_features, widths[0], depths[0], drop_p=drop_probs[0])) # el primer stage depende de "stem_features"

        for (in_features, out_features), depth, drop_p in zip(in_out_widths, depths[1:], drop_probs[1:]):
            self.stages.append(Stage(in_features, out_features, depth, drop_p=drop_p))


    def forward(self, x):
        x = self.stem(x)
        for stage in self.stages:
            x = stage(x)
        return x

In [None]:
class ConvNextDecoder(nn.Sequential):
    def __init__(self, num_channels, num_classes = 10):
        super().__init__(
            nn.AdaptiveAvgPool2d((1, 1)),
            nn.Flatten(1),
            nn.LayerNorm(num_channels),
            nn.Linear(num_channels, num_classes)
        )

#### Definición del modelo final Modelo final

In [None]:
class ConvNext(nn.Sequential):
    def __init__(self, in_channels, stem_features, depths, widths, drop_p = .0, num_classes = 10):
        super().__init__()
        self.encoder = ConvNextEncoder(in_channels, stem_features, depths, widths, drop_p)
        self.head = ConvNextDecoder(widths[-1], num_classes)

In [None]:
# Número de parámetros del modelo
model = ConvNext(3,64,[2,2,6,2],[64, 128, 256, 512])
print("Number of parameters: {:,}".format(sum(p.numel() for p in model.parameters())))

#### Experimentos

In [None]:
# Repetimos 3 veces el experimento
model1 = ConvNext(3,64,[2,2,6,2],[64, 128, 256, 512]).to(device)
model2 = ConvNext(3,64,[2,2,6,2],[64, 128, 256, 512]).to(device)
model3 = ConvNext(3,64,[2,2,6,2],[64, 128, 256, 512]).to(device)

model1, training1, validation1, test1, loss1 = entrenamiento(model1, 100)
model2, training2, validation2, test2, loss2 = entrenamiento(model2, 100)
model3, training3, validation3, test3, loss3 = entrenamiento(model3, 100)

Comenzando entrenamiento
Epoch [1/100], Training accuracy: 0.395, Validation accuracy: 0.396, loss = 1.779
Time spent on epoch 1: 4.17min
Epoch [2/100], Training accuracy: 0.475, Validation accuracy: 0.486, loss = 1.587
Time spent on epoch 2: 4.21min
Epoch [3/100], Training accuracy: 0.481, Validation accuracy: 0.478, loss = 0.908
Time spent on epoch 3: 4.21min
Epoch [4/100], Training accuracy: 0.566, Validation accuracy: 0.572, loss = 0.758
Time spent on epoch 4: 4.19min
Epoch [5/100], Training accuracy: 0.6, Validation accuracy: 0.608, loss = 1.585
Time spent on epoch 5: 4.21min
Epoch [6/100], Training accuracy: 0.642, Validation accuracy: 0.629, loss = 1.898
Time spent on epoch 6: 4.2min
Epoch [7/100], Training accuracy: 0.673, Validation accuracy: 0.67, loss = 0.529
Time spent on epoch 7: 4.22min
Epoch [8/100], Training accuracy: 0.699, Validation accuracy: 0.686, loss = 0.366
Time spent on epoch 8: 4.22min
Epoch [9/100], Training accuracy: 0.683, Validation accuracy: 0.68, loss = 

In [None]:
# Guardar resultados
results_dict1 = {"loss": loss1,
    'Train':training1,
     'Validation': validation1,
     "Test":test1}
results_dict2 = {"loss": loss2,
    'Train':training2,
     'Validation': validation2,
     "Test":test2}
results_dict3 = {"loss": loss3,
    'Train':training3,
     'Validation': validation3,
     "Test":test3}

results1_final = pd.DataFrame(results_dict1)
results2_final = pd.DataFrame(results_dict2)
results3_final = pd.DataFrame(results_dict3)

results1_final.to_csv("./results/results_convnext_final_1.csv",index=False)
results2_final.to_csv("./results/results_convnext_final_2.csv",index=False)
results3_final.to_csv("./results/results_convnext_final_3.csv",index=False)

In [None]:
# accuracy_final = (results1_final["Test"].max() + results2_final["Test"].max() + results3_final["Test"].max())/3
print(f"Accuracy del modelo ConvNeXt: {accuracy_final}")

Accuracy del modelo ConvNeXt: 0.9079666666666667


## Resultados y Conclusiones