<h1 align="center"><font color="yellow">Pytorch: Neural Networks</font></h1>

<font color="yellow">Data Scientist.: PhD.Eddy Giusepe Chirinos Isidro</font>

In [1]:
%load_ext watermark 
%watermark -v -p numpy,pandas,matplotlib,requests,torch

Python implementation: CPython
Python version       : 3.9.13
IPython version      : 8.13.2

numpy     : 1.24.3
pandas    : 2.0.1
matplotlib: 3.7.1
requests  : 2.31.0
torch     : 2.0.1



In [2]:
import torch

# <font color="red">Modelos Sequenciais</font>

A forma mais simples de definir uma `Rede Neural` em `Pytorch` é utilizando a classe `Sequential`. Esta classe nos permite definir uma sequência de camadas, que se aplicaram de maneira sequencial (as saídas de uma camada serão a entrada da seguinte). 

In [3]:
D_in, H, D_out = 784, 100, 10

model = torch.nn.Sequential(torch.nn.Linear(D_in, H), # H --> Hidden
                            torch.nn.ReLU(),
                            torch.nn.Linear(H, D_out),
                           )


Este modelo anterior é um `MLP` com $784$ entradas, $100$ neurônios na camada oculta e $10$ saídas. Vejamos um exemplo de como calcular as saídas do Modelo a partir de umas entradas de exemplo:

In [4]:
outputs = model(torch.randn(64, 784))
outputs.shape


torch.Size([64, 10])

<font color="orange">É importante observar que os modelos de `Pytorch` (pelo geral) sempre esperam que a primeira dimensão seja a `Dimensão Batch`. Lembramos que treinar na `GPU` é assim:</font>

In [5]:
model.to("cuda")

Sequential(
  (0): Linear(in_features=784, out_features=100, bias=True)
  (1): ReLU()
  (2): Linear(in_features=100, out_features=10, bias=True)
)

Pegamos como exemplo: MNIST:

In [6]:
from sklearn.datasets import fetch_openml

# descarga datos

mnist = fetch_openml('mnist_784', version=1)
X, Y = mnist["data"], mnist["target"]

X.shape, Y.shape

  warn(


((70000, 784), (70000,))

In [7]:
import numpy as np

# Normalização e Split:

X_train, X_test, y_train, y_test = X[:60000] / 255., X[60000:] / 255., Y[:60000].astype(int), Y[60000:].astype(int)

In [8]:
# Função Loss e Derivada:

def softmax(x):
    return torch.exp(x) / torch.exp(x).sum(axis=-1,keepdims=True)


def cross_entropy(output, target):
    logits = output[torch.arange(len(output)), target]
    loss = - logits + torch.log(torch.sum(torch.exp(output), axis=-1))
    loss = loss.mean()
    return loss

In [9]:
# Convertemos os dados a tensores e copiamos para a  gpu
X_t = torch.from_numpy(X_train.values).float().cuda()
Y_t = torch.from_numpy(y_train.values).long().cuda()

# Loop de Treinamento
epochs = 1000
lr = 0.8
log_each = 10
l = []
for e in range(1, epochs+1): 
    
    # forward
    y_pred = model(X_t)

    # loss
    loss = cross_entropy(y_pred, Y_t)
    l.append(loss.item())
    
    # Zeramos os gradientes:
    model.zero_grad()

    # Backprop (calculamos todos os gradientes automáticamente)
    loss.backward()

    # update dos pesos:
    with torch.no_grad():
        for param in model.parameters():
            param -= lr * param.grad
    
    if not e % log_each:
        print(f"Epoch {e}/{epochs} Loss {np.mean(l):.5f}")


Epoch 10/1000 Loss 1.76898
Epoch 20/1000 Loss 1.43745
Epoch 30/1000 Loss 1.24596
Epoch 40/1000 Loss 1.07874
Epoch 50/1000 Loss 0.95532
Epoch 60/1000 Loss 0.85228
Epoch 70/1000 Loss 0.77425
Epoch 80/1000 Loss 0.71332
Epoch 90/1000 Loss 0.66424
Epoch 100/1000 Loss 0.62398
Epoch 110/1000 Loss 0.59095
Epoch 120/1000 Loss 0.56215
Epoch 130/1000 Loss 0.53644
Epoch 140/1000 Loss 0.51377
Epoch 150/1000 Loss 0.49363
Epoch 160/1000 Loss 0.47560
Epoch 170/1000 Loss 0.45933
Epoch 180/1000 Loss 0.44456
Epoch 190/1000 Loss 0.43107
Epoch 200/1000 Loss 0.41869
Epoch 210/1000 Loss 0.40728
Epoch 220/1000 Loss 0.39672
Epoch 230/1000 Loss 0.38690
Epoch 240/1000 Loss 0.37774
Epoch 250/1000 Loss 0.36918
Epoch 260/1000 Loss 0.36114
Epoch 270/1000 Loss 0.35359
Epoch 280/1000 Loss 0.34646
Epoch 290/1000 Loss 0.33973
Epoch 300/1000 Loss 0.33335
Epoch 310/1000 Loss 0.32730
Epoch 320/1000 Loss 0.32154
Epoch 330/1000 Loss 0.31606
Epoch 340/1000 Loss 0.31084
Epoch 350/1000 Loss 0.30584
Epoch 360/1000 Loss 0.30106
E

In [10]:
from sklearn.metrics import accuracy_score

def evaluate(x):
    model.eval()
    y_pred = model(x)
    y_probas = softmax(y_pred)
    return torch.argmax(y_probas, axis=1)

y_pred = evaluate(torch.from_numpy(X_test.values).float().cuda())
accuracy_score(y_test, y_pred.cpu().numpy())


0.9743

# <font color="red">Otimizadores e Funções de Perda</font>

Vamos usar as funções que Pytorch nos facilita. Ver a [Documentação Pytorch](https://pytorch.org/docs/stable/index.html) para mais detalhes.

In [11]:
criterion = torch.nn.CrossEntropyLoss()


In [12]:
# Otimizadores em `torch.optim`

optimizer = torch.optim.SGD(model.parameters(), lr=0.8)

Nosso Loop de Treinamento fica mais compactado, assim:

In [13]:
model = torch.nn.Sequential(torch.nn.Linear(D_in, H),
                            torch.nn.ReLU(),
                            torch.nn.Linear(H, D_out),
                           ).to("cuda")


criterion = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=0.8)

epochs = 100
log_each = 10
l = []
model.train()
for e in range(1, epochs+1): 
    
    # forward
    y_pred = model(X_t)

    # loss
    loss = criterion(y_pred, Y_t)
    l.append(loss.item())
    
    # Zeramos os gradientes
    optimizer.zero_grad()

    # Backprop (calculamos todos os gradientes automáticamente)
    loss.backward()

    # update dos pesos
    optimizer.step()
    
    if not e % log_each:
        print(f"Epoch {e}/{epochs} Loss {np.mean(l):.5f}")
        
y_pred = evaluate(torch.from_numpy(X_test.values).float().cuda())

print("")
print("\033[93mA accuracy é: \033[0m")
accuracy_score(y_test, y_pred.cpu().numpy())

Epoch 10/100 Loss 1.85469
Epoch 20/100 Loss 1.47284
Epoch 30/100 Loss 1.20941
Epoch 40/100 Loss 1.03613
Epoch 50/100 Loss 0.90591
Epoch 60/100 Loss 0.81915
Epoch 70/100 Loss 0.74571
Epoch 80/100 Loss 0.68822
Epoch 90/100 Loss 0.64184
Epoch 100/100 Loss 0.60346

[93mA accuracy é: [0m


0.931

# <font color="pink">Modelos customizados</font>

<font color="orange">Em muitos casos definir uma `Rede Neural` como uma sequência de camadas é suficiente, em outros casos será um fator limitante. `Um exemplo` são as Redes Residuais, nas que não só utilizamos a saída de uma camada para alimentar a seguinte senão que, ademais, sumamos sua própria entrada. Esse tipo de arquitetura não pode ser definida com a classe `Sequential`, para isso precisamos CUSTOMIZAR. Para isso `Pytorch` nos oferece a seguinte sintaxe:</font>

In [14]:
# Criamos uma classe que herda de `torch.nn.Module`

class Model(torch.nn.Module):
    
    # Construtor
    def __init__(self, D_in, H, D_out):
        
        # Chamamos ao construtor da classe pai
        super(Model, self).__init__()
        
        # Definimos nossa camadas
        self.fc1 = torch.nn.Linear(D_in, H)
        self.relu = torch.nn.ReLU()
        self.fc2 = torch.nn.Linear(H, D_out)
        
    # Lógica para calcular as saídas da Rede:
    def forward(self, x):
        x = self.fc1(x)
        x = self.relu(x)
        x = self.fc2(x)
        return x
    

In [15]:
model = Model(784, 100, 10)

outputs = model(torch.randn(64, 784))
outputs.shape

torch.Size([64, 10])

<font color="orange">Agora, podemos treinar a nossa Rede Neural da mesma forma que fizemos anteriormente:</font>

In [16]:
model.to("cuda")

criterion = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=0.5)

epochs = 500
log_each = 10
l = []
model.train()
for e in range(1, epochs+1): 
    
    # Função Forward
    y_pred = model(X_t)

    # loss
    loss = criterion(y_pred, Y_t)
    l.append(loss.item())
    
    # ponemos a cero los gradientes
    optimizer.zero_grad()

    # Backprop (calculamos todos los gradientes automáticamente)
    loss.backward()

    # update de los pesos
    optimizer.step()
    
    if not e % log_each:
        print(f"Epoch {e}/{epochs} Loss {np.mean(l):.5f}")
        
y_pred = evaluate(torch.from_numpy(X_test.values).float().cuda())
print("")

print("\033[93mA accuracy é: \033[0m")
accuracy_score(y_test, y_pred.cpu().numpy())


Epoch 10/500 Loss 1.78496
Epoch 20/500 Loss 1.36229
Epoch 30/500 Loss 1.13097
Epoch 40/500 Loss 0.97517
Epoch 50/500 Loss 0.87966
Epoch 60/500 Loss 0.79961
Epoch 70/500 Loss 0.73613
Epoch 80/500 Loss 0.68640
Epoch 90/500 Loss 0.64643
Epoch 100/500 Loss 0.61382
Epoch 110/500 Loss 0.58643
Epoch 120/500 Loss 0.56236
Epoch 130/500 Loss 0.54115
Epoch 140/500 Loss 0.52239
Epoch 150/500 Loss 0.50566
Epoch 160/500 Loss 0.49062
Epoch 170/500 Loss 0.47698
Epoch 180/500 Loss 0.46454
Epoch 190/500 Loss 0.45311
Epoch 200/500 Loss 0.44257
Epoch 210/500 Loss 0.43279
Epoch 220/500 Loss 0.42369
Epoch 230/500 Loss 0.41517
Epoch 240/500 Loss 0.40719
Epoch 250/500 Loss 0.39967
Epoch 260/500 Loss 0.39258
Epoch 270/500 Loss 0.38588
Epoch 280/500 Loss 0.37951
Epoch 290/500 Loss 0.37347
Epoch 300/500 Loss 0.36771
Epoch 310/500 Loss 0.36222
Epoch 320/500 Loss 0.35697
Epoch 330/500 Loss 0.35195
Epoch 340/500 Loss 0.34713
Epoch 350/500 Loss 0.34251
Epoch 360/500 Loss 0.33807
Epoch 370/500 Loss 0.33379
Epoch 380/

0.9551

<font color="orange">A seguir vamos ver outro exemplo de como definir `MLP` com conexões residuais, algo que podemos fazer simplesmente usando um Modelo sequential:</font>

In [17]:
class Model(torch.nn.Module):
    
    def __init__(self, D_in, H, D_out):        
        super(Model, self).__init__()
        self.fc1 = torch.nn.Linear(D_in, H)
        self.relu = torch.nn.ReLU()
        self.fc2 = torch.nn.Linear(H, D_out)
        
    def forward(self, x):
        x1 = self.fc1(x)
        x = self.relu(x1)
        x = self.fc2(x + x1)
        return x
    

In [25]:
model = Model(784, 100, 10).to("cuda")
criterion = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=0.2)

epochs = 300
log_each = 10
l = []
model.train()
for e in range(1, epochs+1): 
    
    # forward
    y_pred = model(X_t)

    # loss
    loss = criterion(y_pred, Y_t)
    l.append(loss.item())
    
    # ponemos a cero los gradientes
    optimizer.zero_grad()

    # Backprop (calculamos todos los gradientes automáticamente)
    loss.backward()

    # update de los pesos
    optimizer.step()
    
    if not e % log_each:
        print(f"Epoch {e}/{epochs} Loss {np.mean(l):.5f}")
        
y_pred = evaluate(torch.from_numpy(X_test.values).float().cuda())
accuracy_score(y_test, y_pred.cpu().numpy())


Epoch 10/300 Loss 1.63359
Epoch 20/300 Loss 1.18767
Epoch 30/300 Loss 1.01025
Epoch 40/300 Loss 0.87232
Epoch 50/300 Loss 0.78144
Epoch 60/300 Loss 0.72021
Epoch 70/300 Loss 0.67277
Epoch 80/300 Loss 0.63354
Epoch 90/300 Loss 0.60169
Epoch 100/300 Loss 0.57529
Epoch 110/300 Loss 0.55302
Epoch 120/300 Loss 0.53394
Epoch 130/300 Loss 0.51742
Epoch 140/300 Loss 0.50306
Epoch 150/300 Loss 0.49061
Epoch 160/300 Loss 0.47957
Epoch 170/300 Loss 0.46930
Epoch 180/300 Loss 0.45980
Epoch 190/300 Loss 0.45109
Epoch 200/300 Loss 0.44309
Epoch 210/300 Loss 0.43570
Epoch 220/300 Loss 0.42886
Epoch 230/300 Loss 0.42250
Epoch 240/300 Loss 0.41657
Epoch 250/300 Loss 0.41102
Epoch 260/300 Loss 0.40581
Epoch 270/300 Loss 0.40091
Epoch 280/300 Loss 0.39628
Epoch 290/300 Loss 0.39192
Epoch 300/300 Loss 0.38780


0.9263

<font color="orange">Desta maneira, temos muita flexibilidade para definir as nossas Redes.</font>

# <font color="red">Acessando às camadas de uma Rede</font>

In [26]:
model

Model(
  (fc1): Linear(in_features=784, out_features=100, bias=True)
  (relu): ReLU()
  (fc2): Linear(in_features=100, out_features=10, bias=True)
)

In [27]:
model.fc1

Linear(in_features=784, out_features=100, bias=True)

<font color="orange">Também podemos aceder diretamente aos Tensores que contém os parâmetros com as propriedades adequadas:</font>

In [28]:
model.fc1.weight

Parameter containing:
tensor([[ 0.0159, -0.0229,  0.0113,  ..., -0.0183,  0.0072, -0.0007],
        [-0.0189, -0.0216, -0.0135,  ..., -0.0063, -0.0003,  0.0232],
        [-0.0170, -0.0250, -0.0053,  ..., -0.0132,  0.0067,  0.0086],
        ...,
        [-0.0180, -0.0107,  0.0088,  ...,  0.0055,  0.0215,  0.0318],
        [-0.0067,  0.0273,  0.0289,  ...,  0.0233,  0.0005,  0.0336],
        [-0.0323,  0.0089, -0.0010,  ...,  0.0157, -0.0080, -0.0112]],
       device='cuda:0', requires_grad=True)

In [29]:
model.fc1.bias

Parameter containing:
tensor([ 0.1082, -0.0217,  0.1455,  0.0188, -0.0056,  0.0581,  0.1001,  0.1073,
         0.0963, -0.0345,  0.0243,  0.0543,  0.0204, -0.0627,  0.0835,  0.0312,
         0.0204,  0.0597, -0.0005,  0.0392, -0.0237,  0.0480,  0.0710,  0.0781,
         0.0917, -0.0373,  0.0061,  0.0374,  0.0402,  0.0301,  0.0962,  0.0024,
         0.0268,  0.0414,  0.1452,  0.0637, -0.0013,  0.0554, -0.0441,  0.0666,
         0.0002, -0.0438, -0.0196,  0.0400,  0.0073,  0.0926,  0.0123,  0.0045,
         0.0761, -0.0249,  0.1841,  0.0791,  0.0968, -0.0323, -0.0499, -0.0494,
        -0.0205,  0.0655,  0.0882,  0.0645, -0.0514, -0.0200,  0.0351,  0.0148,
        -0.0408,  0.0727,  0.0690,  0.0370,  0.0026,  0.0065, -0.0556, -0.0213,
         0.0795, -0.0045,  0.0572,  0.0283,  0.0649, -0.0406, -0.0110,  0.0244,
        -0.0532,  0.0339, -0.1059, -0.0242, -0.0016,  0.0421, -0.0104,  0.0575,
         0.0158,  0.0851,  0.0120, -0.0594, -0.0497,  0.0238, -0.0286,  0.0692,
         0.0486, -

É possível `sobreescrever` uma camada da seguinte maneira:

In [30]:
model.fc2 = torch.nn.Linear(100, 1)

model

Model(
  (fc1): Linear(in_features=784, out_features=100, bias=True)
  (relu): ReLU()
  (fc2): Linear(in_features=100, out_features=1, bias=True)
)

<font color="orange">Mais truques:</font>

In [31]:
# Obter uma lista com a camadas de uma Red

list(model.children())

[Linear(in_features=784, out_features=100, bias=True),
 ReLU(),
 Linear(in_features=100, out_features=1, bias=True)]

In [32]:
# Cria uma nova REDE a partir da lista (excluindo as últimas duas camadas)

new_model = torch.nn.Sequential(*list(model.children())[:-2])
new_model 

Sequential(
  (0): Linear(in_features=784, out_features=100, bias=True)
)

In [33]:
# Cria uma nova REDE a partir da lista (excluindo a última camada)

new_model = torch.nn.ModuleList(list(model.children())[:-1])
new_model

ModuleList(
  (0): Linear(in_features=784, out_features=100, bias=True)
  (1): ReLU()
)