<h1 align="center"><font color="yellow">Pytorch: Datasets</font></h1>

<font color="yellow">Data Scientist.: PhD.Eddy Giusepe Chirinos Isidro</font>

<font color="orange">Neste script vamos a estudar como o Pytorch define nossos datasets.</font>

In [None]:
%conda install requests,matplotlib --yes

In [1]:
%load_ext watermark 
%watermark -v -p numpy,pandas,matplotlib,requests,torch

Python implementation: CPython
Python version       : 3.9.13
IPython version      : 8.13.2

numpy     : 1.24.3
pandas    : 2.0.1
matplotlib: 3.7.1
requests  : 2.31.0
torch     : 2.0.1



# Iterando Tensores

In [3]:
import torch
from sklearn.datasets import fetch_openml
import numpy as np

# Descarregando nosso Dataset
mnist = fetch_openml('mnist_784', version=1)
X, Y = mnist["data"], mnist["target"]

# Normalizando:
X_train, X_test, y_train, y_test = X[:60000] / 255., X[60000:] / 255., Y[:60000].astype(int), Y[60000:].astype(int)


X_t = torch.from_numpy(X_train.values).float().cuda()
Y_t = torch.from_numpy(y_train.values).long().cuda()


  warn(


In [5]:
D_in, H, D_out = 784, 100, 10

model = torch.nn.Sequential(
    torch.nn.Linear(D_in, H),
    torch.nn.ReLU(),
    torch.nn.Linear(H, D_out),
).to("cuda")

In [6]:
from sklearn.metrics import accuracy_score

def softmax(x):
    return torch.exp(x) / torch.exp(x).sum(axis=-1,keepdims=True)

def evaluate(x):
    model.eval()
    y_pred = model(x)
    y_probas = softmax(y_pred)
    return torch.argmax(y_probas, axis=1)

In [8]:
criterion = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=0.8)


epochs = 100
log_each = 10
l = []
model.train()
for e in range(1, epochs+1): 
    
    # forward
    y_pred = model(X_t)

    # loss
    loss = criterion(y_pred, Y_t)
    l.append(loss.item())
    
    # ponemos a cero los gradientes
    optimizer.zero_grad()

    # Backprop (calculamos todos los gradientes automáticamente)
    loss.backward()

    # update de los pesos
    optimizer.step()
    
    if not e % log_each:
        print(f"Epoch {e}/{epochs} Loss {np.mean(l):.5f}")
        
y_pred = evaluate(torch.from_numpy(X_test.values).float().cuda())

print("")
print("\033[93mA accuracy é: \033[0m")
accuracy_score(y_test, y_pred.cpu().numpy())

Epoch 10/100 Loss 0.25565
Epoch 20/100 Loss 0.24702
Epoch 30/100 Loss 0.23930
Epoch 40/100 Loss 0.23329
Epoch 50/100 Loss 0.22821
Epoch 60/100 Loss 0.22371
Epoch 70/100 Loss 0.21960
Epoch 80/100 Loss 0.21579
Epoch 90/100 Loss 0.21223
Epoch 100/100 Loss 0.20887

[93mA accuracy é: [0m


0.9493

# Iterando por Batches

<font color="orange">Na implementação anterior estamos Otimizando nosso modelo com o Algoritmo de `batch gradient descent`, na qual utilizamos todos nossos Dados em cada passo de Otimização. No entanto, um algoritmo que pode convergir mais rápido (e única opção se nosso dataset é tão grande que não cabe em memória) é o de `mini-batch gradient descent`.</font>

In [11]:
D_in, H, D_out = 784, 100, 10

model = torch.nn.Sequential(
    torch.nn.Linear(D_in, H),
    torch.nn.ReLU(),
    torch.nn.Linear(H, D_out),
).to("cuda")

criterion = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=0.8)

epochs = 100
batch_size = 100
log_each = 1
l = []
model.train()
batches = len(X_t) // batch_size
for e in range(1, epochs+1): 
    
    _l = []
    # Iteramos por batches
    for b in range(batches):
        x_b = X_t[b*batch_size:(b+1)*batch_size]
        y_b = Y_t[b*batch_size:(b+1)*batch_size]
        
        # forward
        y_pred = model(x_b)

        # loss
        loss = criterion(y_pred, y_b)
        _l.append(loss.item())

        # ponemos a cero los gradientes
        optimizer.zero_grad()

        # Backprop (calculamos todos los gradientes automáticamente)
        loss.backward()

        # update de los pesos
        optimizer.step()
    
    l.append(np.mean(_l))
    if not e % log_each:
        print(f"Epoch {e}/{epochs} Loss {np.mean(l):.5f}")
        
y_pred = evaluate(torch.from_numpy(X_test.values).float().cuda())

print("")
print("\033[93mA accuracy é: \033[0m")
accuracy_score(y_test, y_pred.cpu().numpy())

Epoch 1/100 Loss 0.29843
Epoch 2/100 Loss 0.20763
Epoch 3/100 Loss 0.16669
Epoch 4/100 Loss 0.14133
Epoch 5/100 Loss 0.12349
Epoch 6/100 Loss 0.10985
Epoch 7/100 Loss 0.09893
Epoch 8/100 Loss 0.08992
Epoch 9/100 Loss 0.08235
Epoch 10/100 Loss 0.07595
Epoch 11/100 Loss 0.07033
Epoch 12/100 Loss 0.06549
Epoch 13/100 Loss 0.06117
Epoch 14/100 Loss 0.05732
Epoch 15/100 Loss 0.05389
Epoch 16/100 Loss 0.05081
Epoch 17/100 Loss 0.04805
Epoch 18/100 Loss 0.04556
Epoch 19/100 Loss 0.04331
Epoch 20/100 Loss 0.04126
Epoch 21/100 Loss 0.03939
Epoch 22/100 Loss 0.03768
Epoch 23/100 Loss 0.03612
Epoch 24/100 Loss 0.03467
Epoch 25/100 Loss 0.03334
Epoch 26/100 Loss 0.03210
Epoch 27/100 Loss 0.03096
Epoch 28/100 Loss 0.02989
Epoch 29/100 Loss 0.02889
Epoch 30/100 Loss 0.02796
Epoch 31/100 Loss 0.02709
Epoch 32/100 Loss 0.02627
Epoch 33/100 Loss 0.02549
Epoch 34/100 Loss 0.02477
Epoch 35/100 Loss 0.02408
Epoch 36/100 Loss 0.02343
Epoch 37/100 Loss 0.02281
Epoch 38/100 Loss 0.02223
Epoch 39/100 Loss 0.0

0.9798

Si bien esta implementación es correcta y funcional, dependiendo de nuestros datos puede llegar a complicarse mucho (por ejemplo, si necesitamos cargar muchas imágenes a las cuales queremos aplicar transformaciones, juntar en batches, etc...). Además, es común reutilizar la lógica para cargar nuestros datos no sólo para entrenar la red, si no para generar predicciones. Este hecho motiva el uso de las clases especiales que `Pytorch` nos ofrece para ello.