# Problemas

Como vimos acima, há muitos passos na criação e definição de uma nova rede neural.
A grande parte desses ajustes dependem diretamente do problemas.

Abaixo, listamos alguns problemas. Todos os problemas e datasets usados vem do [Center for Machine Learning and Intelligent Systems](http://archive.ics.uci.edu/ml/datasets.php).


**Seu objetivo é determinar e implementar um modelo para cada problema.**

Isso inclui:

1. definir uma arquitetura.
Por enquanto usando somente camadas [Lineares](https://pytorch.org/docs/stable/nn.html#linear), porém podemos variar as ativações, como [Sigmoid](https://pytorch.org/docs/stable/nn.html#sigmoid), [Tanh](https://pytorch.org/docs/stable/nn.html#tanh), [ReLU](https://pytorch.org/docs/stable/nn.html#relu), [LeakyReLU](https://pytorch.org/docs/stable/generated/torch.nn.LeakyReLU.html), [ELU](https://pytorch.org/docs/stable/generated/torch.nn.ELU.html), [SeLU](https://pytorch.org/docs/stable/generated/torch.nn.SELU.html), [PReLU](https://pytorch.org/docs/stable/generated/torch.nn.PReLU.html), [RReLU](https://pytorch.org/docs/stable/generated/torch.nn.RReLU.html)
2. definir uma função de custo. Algums opções que vimos previamente incluem[L1](https://pytorch.org/docs/stable/nn.html#l1loss), [L2/MSE](https://pytorch.org/docs/stable/nn.html#mseloss), [Huber/SmoothL1](https://pytorch.org/docs/stable/nn.html#smoothl1loss), [*Cross-Entropy*](https://pytorch.org/docs/stable/nn.html#crossentropyloss), [Hinge](https://pytorch.org/docs/stable/nn.html#hingeembeddingloss)), e
3. definir um algoritmo de otimização ([SGD](https://pytorch.org/docs/stable/optim.html#torch.optim.SGD), [RMSProp](https://pytorch.org/docs/stable/optim.html#torch.optim.RMSprop), [Adam](https://pytorch.org/docs/stable/optim.html#torch.optim.Adam))

A leitura do dado assim como a função de treinamento já estão implementados para você.

# Preâmbulo

In [2]:
# imports basicos
from sklearn import preprocessing
from sklearn.model_selection import train_test_split

import torch
import torch.nn.functional as F
import torchvision

from torchvision import datasets, transforms
from torch import optim, nn

import os
import sys
import time
import numpy as np

In [4]:
import matplotlib.pyplot as plt
plt.ion()

<contextlib.ExitStack at 0x29ea31490>

In [14]:
# Test if GPU is avaliable, if not, use cpu instead
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
n = torch.cuda.device_count()
devices_ids = list(range(n))
device

device(type='cpu')

In [6]:
# funções básicas

def load_array(features, labels, batch_size, is_train=True):
    """Construct a Torch data loader"""
    if type(features) != torch.tensor:
        features = torch.tensor(features)
    if type(labels) != torch.tensor:
        labels = torch.tensor(labels)
    dataset = torch.utils.data.TensorDataset(features, labels)
    return torch.utils.data.DataLoader(dataset, batch_size, shuffle=is_train)

def _get_batch(batch):
    """Return features and labels on ctx."""
    features, labels = batch
    if labels.type() != features.type():
        labels = labels.type(features.type())
    return (torch.nn.DataParallel(features, device_ids=devices_ids),
            torch.nn.DataParallel(labels, device_ids=devices_ids), features.shape[0])

# Função usada para calcular acurácia
def evaluate_accuracy(data_iter, net, loss):
    """Evaluate accuracy of a model on the given data set."""

    acc_sum, n, l = 0, 0, 0
    net.eval()
    with torch.no_grad():
      for X, y in data_iter:
          X, y = X.to(device), y.to(device)
          y_hat = net(X)
          l += loss(y_hat, y.long())
          acc_sum += (y_hat.argmax(axis=1) == y).sum().item()
          n += y.size(0)

    return acc_sum / n, l.item() / len(data_iter)
  
    
# Função usada no treinamento e validação da rede
def train_validate(net, train_iter, test_iter, batch_size, trainer, loss,
                   num_epochs, type='regression'):
    print('training on', device)
    for epoch in range(num_epochs):
        net.train()
        train_l_sum, train_acc_sum, n, start = 0.0, 0.0, 0, time.time()
        for X, y in train_iter:
            X, y = X.to(device), y.to(device)
            trainer.zero_grad()
            y_hat = net(X)
            if type == 'regression':
              l = loss(y_hat, y.float())
            else:
              l = loss(y_hat, y.long())
            l.backward()
            trainer.step()
            train_l_sum += l.item()
            train_acc_sum += (y_hat.argmax(axis=1) == y).sum().item()
            n += y.size(0)
        test_acc, test_loss = evaluate_accuracy(test_iter, net, loss)
        if type == 'regression':
          print('epoch %d, train loss %.4f, test loss %.4f, time %.1f sec'
                % (epoch + 1, train_l_sum / len(train_iter), test_loss, time.time() - start))
        else:
          print('epoch %d, train loss %.4f, train acc %.3f, test loss %.4f, '
              'test acc %.3f, time %.1f sec'
              % (epoch + 1, train_l_sum / len(train_iter), train_acc_sum / n, test_loss, 
                 test_acc, time.time() - start))
          
        
# funcao usada para teste
def test(net, test_iter):
    print('testing on', device)
    net.eval()
    first = True
    for X in test_iter:
        X = X.to(device)
        y_hat = net(X)
        if first is True:
            pred_logits = y_hat
            pred_labels = y_hat.argmax(axis=1)
            first = False
        else:
            pred_logits = torch.concat(pred_logits, y_hat, dim=0)
            pred_labels = torch.concat(pred_labels, y_hat.argmax(axis=1), dim=0)

    return pred_logits.numpy(), pred_labels.numpy()

# Função para inicializar pesos da rede
def weights_init(m):
    if type(m) == nn.Linear:
        m.weight.data.normal_(0.0, 0.01) # valores iniciais são uma normal
        m.bias.data.fill_(0)

# Problema 1

Neste problema, você receberá 14 *features* coletadas de pacientes e tentará predizer se eles tem algum sinal de doença cardíaca. Mais sobre esse dataset aqui: https://archive.ics.uci.edu/ml/datasets/Heart+Disease

In [5]:
#!wget https://archive.ics.uci.edu/ml/machine-learning-databases/heart-disease/processed.cleveland.data
data = np.genfromtxt('processed.cleveland.data', delimiter=',', dtype=np.float32)
data = np.nan_to_num(data)

print(data.shape, data[0, :])
X, y = data[:, :-1], data[:, -1]
print(X.shape, X[0, :])
print(y.shape, y[0])
train_features, test_features, train_labels, test_labels = train_test_split(X, y, test_size=0.20, random_state=42)
  
batch_size = 100
train_iter = load_array(train_features, train_labels, batch_size)
test_iter = load_array(test_features, test_labels, batch_size, False)

(303, 14) [ 63.    1.    1.  145.  233.    1.    2.  150.    0.    2.3   3.    0.
   6.    0. ]
(303, 13) [ 63.    1.    1.  145.  233.    1.    2.  150.    0.    2.3   3.    0.
   6. ]
(303,) 0.0


In [6]:
batch_size, epochs, lr, weight_decay = 10, 25, 0.0001, 0.1

net = nn.Sequential(nn.Linear(13, 512), 
                    nn.ReLU(),
                    nn.Dropout(0.25),
                    nn.Linear(512, 256),
                    nn.ReLU(),
                    nn.Dropout(0.5),
                    nn.Linear(256, 128),
                    nn.ReLU(),
                    nn.Linear(128, 64),
                    nn.ReLU(),
                    nn.Linear(64, 5))


net.apply(weights_init)
net.to(device)

loss = nn.CrossEntropyLoss().to(device)
trainer = torch.optim.SGD(net.parameters(), lr=lr, weight_decay=weight_decay)

train_validate(net, train_iter, test_iter, batch_size, trainer, loss, epochs, 'classification')



training on cpu
epoch 1, train loss 1.6095, train acc 0.136, test loss 1.6095, test acc 0.066, time 0.0 sec
epoch 2, train loss 1.6094, train acc 0.202, test loss 1.6095, test acc 0.066, time 0.0 sec
epoch 3, train loss 1.6094, train acc 0.190, test loss 1.6094, test acc 0.066, time 0.0 sec
epoch 4, train loss 1.6094, train acc 0.285, test loss 1.6094, test acc 0.230, time 0.0 sec
epoch 5, train loss 1.6093, train acc 0.322, test loss 1.6093, test acc 0.426, time 0.0 sec
epoch 6, train loss 1.6093, train acc 0.393, test loss 1.6093, test acc 0.459, time 0.0 sec
epoch 7, train loss 1.6092, train acc 0.401, test loss 1.6093, test acc 0.459, time 0.0 sec
epoch 8, train loss 1.6092, train acc 0.467, test loss 1.6092, test acc 0.475, time 0.0 sec
epoch 9, train loss 1.6092, train acc 0.479, test loss 1.6092, test acc 0.475, time 0.0 sec
epoch 10, train loss 1.6090, train acc 0.521, test loss 1.6092, test acc 0.475, time 0.0 sec
epoch 11, train loss 1.6090, train acc 0.537, test loss 1.6091,

# Problema 2

Neste problema, você receberá 90 *features* extraídas de diversas músicas (datadas de 1922 até 2011) e deve predizer o ano de cada música. Mais sobre esse dataset aqui: https://archive.ics.uci.edu/ml/datasets/YearPredictionMSD

In [7]:
# download do dataset
#!wget http://archive.ics.uci.edu/ml/machine-learning-databases/00203/YearPredictionMSD.txt.zip
#!unzip YearPredictionMSD.txt.zip
data = np.genfromtxt('YearPredictionMSD.txt', delimiter=',', dtype=np.float32)

print(data[0, :])
X, y = data[:, 1:], data[:, 0]
train_features, test_features, train_labels, test_labels = train_test_split(X, y, test_size=0.33, random_state=42)
  
batch_size = 100
train_iter = load_array(train_features, train_labels, batch_size)
test_iter = load_array(test_features, test_labels, batch_size, False)

[ 2.0010000e+03  4.9943569e+01  2.1471140e+01  7.3077499e+01
  8.7486095e+00 -1.7406281e+01 -1.3099050e+01 -2.5012020e+01
 -1.2232570e+01  7.8308902e+00 -2.4678299e+00  3.3213601e+00
 -2.3152101e+00  1.0205560e+01  6.1110913e+02  9.5108960e+02
  6.9811426e+02  4.0898486e+02  3.8370911e+02  3.2651511e+02
  2.3811327e+02  2.5142413e+02  1.8717351e+02  1.0042652e+02
  1.7919498e+02 -8.4155798e+00 -3.1787039e+02  9.5862663e+01
  4.8102589e+01 -9.5663033e+01 -1.8062149e+01  1.9698400e+00
  3.4424381e+01  1.1726700e+01  1.3679000e+00  7.7944398e+00
 -3.6994001e-01 -1.3367851e+02 -8.3261650e+01 -3.7297649e+01
  7.3046669e+01 -3.7366840e+01 -3.1385300e+00 -2.4215309e+01
 -1.3230660e+01  1.5938090e+01 -1.8604780e+01  8.2154793e+01
  2.4057980e+02 -1.0294070e+01  3.1584311e+01 -2.5381870e+01
 -3.9077201e+00  1.3292580e+01  4.1550598e+01 -7.2627201e+00
 -2.1008631e+01  1.0550848e+02  6.4298561e+01  2.6084810e+01
 -4.4591099e+01 -8.3065701e+00  7.9370599e+00 -1.0736600e+01
 -9.5447662e+01 -8.20330

In [8]:
epochs, lr, weitght_decay = 400, 0.0001, 0.0000001

net = nn.Sequential(nn.Linear(90, 128),
                    nn.ReLU(),
                    nn.Linear(128, 256),
                    nn.ReLU(),
                    nn.Linear(256, 1))

net.apply(weights_init)
net.to(device)

loss = nn.SmoothL1Loss().to(device)

trainer = torch.optim.Adam(net.parameters(), lr=lr, weight_decay=weight_decay)

train_validate(net, train_iter, test_iter, batch_size, trainer, loss, epochs)

training on cpu


  return F.smooth_l1_loss(input, target, reduction=self.reduction, beta=self.beta)
  return F.smooth_l1_loss(input, target, reduction=self.reduction, beta=self.beta)
  return F.smooth_l1_loss(input, target, reduction=self.reduction, beta=self.beta)


epoch 1, train loss 381.5955, test loss 169.7589, time 4.9 sec
epoch 2, train loss 137.0310, test loss 115.7180, time 4.8 sec
epoch 3, train loss 105.7628, test loss 99.6134, time 4.8 sec
epoch 4, train loss 98.0318, test loss 96.3751, time 4.8 sec
epoch 5, train loss 95.6785, test loss 94.9139, time 4.7 sec
epoch 6, train loss 94.1506, test loss 107.5833, time 4.7 sec
epoch 7, train loss 92.9504, test loss 91.6857, time 4.7 sec
epoch 8, train loss 91.9238, test loss 91.2163, time 4.8 sec
epoch 9, train loss 91.3622, test loss 89.1317, time 4.9 sec
epoch 10, train loss 90.6620, test loss 89.2406, time 5.0 sec
epoch 11, train loss 90.0917, test loss 88.6142, time 4.9 sec
epoch 12, train loss 89.6248, test loss 87.7521, time 4.8 sec
epoch 13, train loss 89.0940, test loss 86.7760, time 4.8 sec
epoch 14, train loss 88.7468, test loss 90.2669, time 4.9 sec
epoch 15, train loss 88.3741, test loss 89.2062, time 4.9 sec
epoch 16, train loss 87.9793, test loss 91.6440, time 4.8 sec
epoch 17, t

In [16]:
# mostra o resultado predito para as 5 primeiras instâncias de teste
y = net(torch.Tensor(test_features[0:5, :]).to(device))
print(y, test_labels[0:5])

tensor([[2119.0874],
        [2008.2554],
        [2067.5142],
        [1765.3359],
        [1925.9906]], grad_fn=<AddmmBackward0>) [2008. 2001. 2006. 2008. 1998.]


O resultado está ok, considerando que computamos tais valores localmente. Seriam necessárias mais épocas para conseguirmos uma acurácia maior.

# Problema 3

Neste problema, você receberá várias *features* (como altura média, inclinação, etc) descrevendo uma região e o modelo deve predizer qual o tipo da região (floresta, montanha, etc). Mais informações sobre esse dataset aqui: https://archive.ics.uci.edu/ml/datasets/covertype

In [7]:
#!wget http://archive.ics.uci.edu/ml/machine-learning-databases/covtype/covtype.data.gz
#!gzip covtype.data.gz
data = np.genfromtxt('covtype.data', delimiter=',', dtype=np.float32)

print(data.shape, data[0, :])
X, y = data[:, :-1], data[:, -1]
print(X.shape, X[0, :])
print(y.shape, y[0])
train_features, test_features, train_labels, test_labels = train_test_split(X, y, test_size=0.33, random_state=42)
train_labels = train_labels - 1
test_labels = test_labels - 1
  
batch_size = 100
train_iter = load_array(train_features, train_labels, batch_size)
test_iter = load_array(test_features, test_labels, batch_size, False)

(581012, 55) [2.596e+03 5.100e+01 3.000e+00 2.580e+02 0.000e+00 5.100e+02 2.210e+02
 2.320e+02 1.480e+02 6.279e+03 1.000e+00 0.000e+00 0.000e+00 0.000e+00
 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00
 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00
 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00
 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00
 1.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00
 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 5.000e+00]
(581012, 54) [2.596e+03 5.100e+01 3.000e+00 2.580e+02 0.000e+00 5.100e+02 2.210e+02
 2.320e+02 1.480e+02 6.279e+03 1.000e+00 0.000e+00 0.000e+00 0.000e+00
 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00
 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00
 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00
 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00

In [15]:
epochs, lr, weight_decay = 20, 0.001, 0.0001

net = nn.Sequential(nn.Linear(54, 128),
                    nn.ReLU(),
                    nn.Dropout(0.25),
                    nn.Linear(128, 256),
                    nn.ReLU(),
                    nn.Dropout(0.5),
                    nn.Linear(256, 7))

net.apply(weights_init)
net.to(device)

loss = nn.CrossEntropyLoss().to(device)

trainer = torch.optim.Adam(net.parameters(), lr=lr, weight_decay=weight_decay)

train_validate(net, train_iter, test_iter, batch_size, trainer, loss, epochs, 'classification')

training on cpu
epoch 1, train loss 0.8852, train acc 0.622, test loss 0.8169, test acc 0.670, time 7.5 sec
epoch 2, train loss 0.8860, train acc 0.638, test loss 0.8178, test acc 0.681, time 7.5 sec
epoch 3, train loss 0.8727, train acc 0.647, test loss 0.8728, test acc 0.627, time 7.4 sec
epoch 4, train loss 0.8611, train acc 0.653, test loss 0.8355, test acc 0.671, time 7.4 sec
epoch 5, train loss 0.8580, train acc 0.653, test loss 0.7985, test acc 0.674, time 7.4 sec
epoch 6, train loss 0.8514, train acc 0.655, test loss 0.8335, test acc 0.671, time 7.4 sec
epoch 7, train loss 0.8470, train acc 0.658, test loss 0.8024, test acc 0.679, time 7.4 sec
epoch 8, train loss 0.8524, train acc 0.653, test loss 0.8032, test acc 0.672, time 7.5 sec
epoch 9, train loss 0.8501, train acc 0.654, test loss 0.8207, test acc 0.644, time 7.4 sec
epoch 10, train loss 0.8481, train acc 0.655, test loss 0.8476, test acc 0.626, time 7.5 sec
epoch 11, train loss 0.8481, train acc 0.654, test loss 0.8748,

# Problema 4

Neste problema, você receberá 11 *features* extraídas de tipos de vinhos, e terá que predizer um *score* para cada vinho. Mais sobre esse dataset aqui: https://archive.ics.uci.edu/ml/datasets/Wine+Quality

In [51]:
# download do dataset
#!wget https://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-red.csv
#!wget https://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-white.csv
data_red = np.genfromtxt('winequality-red.csv', delimiter=';', dtype=np.float32, skip_header=1)
data_white = np.genfromtxt('winequality-white.csv', delimiter=';', dtype=np.float32, skip_header=1)
data = np.concatenate((data_red, data_white), axis=0)
data = np.nan_to_num(data)

print(data[0, :])
X, y = data[:, :-1], data[:, -1]
print(X.shape, y.shape)
train_features, test_features, train_labels, test_labels = train_test_split(X, y, test_size=0.33, random_state=42)

batch_size = 100
train_iter = load_array(train_features, train_labels, batch_size)
test_iter = load_array(test_features, test_labels, batch_size, False)

[ 7.4     0.7     0.      1.9     0.076  11.     34.      0.9978  3.51
  0.56    9.4     5.    ]
(6497, 11) (6497,)


In [58]:
epochs, lr, weight_decay = 100, 0.0001, 0.0001

net = nn.Sequential(nn.Linear(11, 128),
                    nn.ReLU(),
                    nn.Dropout(0.25),
                    nn.Linear(128, 256),
                    nn.ReLU(),
                    nn.Dropout(0.5),
                    nn.Linear(256, 128),
                    nn.ReLU(),
                    nn.Linear(128, 1))

net.apply(weights_init)
net.to(device)

loss = nn.SmoothL1Loss().to(device)

trainer = torch.optim.Adam(net.parameters(), lr=lr, weight_decay=weight_decay)

train_validate(net, train_iter, test_iter, batch_size, trainer, loss, epochs)


training on cpu
epoch 1, train loss 5.2175, test loss 4.8701, time 0.1 sec
epoch 2, train loss 2.9656, test loss 1.5894, time 0.1 sec
epoch 3, train loss 1.4268, test loss 1.2943, time 0.1 sec
epoch 4, train loss 1.2213, test loss 1.0835, time 0.1 sec
epoch 5, train loss 0.9777, test loss 0.7426, time 0.1 sec
epoch 6, train loss 0.6518, test loss 0.4079, time 0.1 sec
epoch 7, train loss 0.4948, test loss 0.3782, time 0.1 sec
epoch 8, train loss 0.4817, test loss 0.3728, time 0.1 sec
epoch 9, train loss 0.4752, test loss 0.3721, time 0.1 sec
epoch 10, train loss 0.4776, test loss 0.3704, time 0.1 sec
epoch 11, train loss 0.4829, test loss 0.3720, time 0.1 sec
epoch 12, train loss 0.4752, test loss 0.3698, time 0.1 sec
epoch 13, train loss 0.4732, test loss 0.3679, time 0.1 sec
epoch 14, train loss 0.4714, test loss 0.3699, time 0.1 sec
epoch 15, train loss 0.4675, test loss 0.3722, time 0.1 sec
epoch 16, train loss 0.4751, test loss 0.3658, time 0.1 sec
epoch 17, train loss 0.4724, test

In [59]:
# mostra o resultado predito para as 5 primeiras instâncias de teste
y = net(torch.Tensor(test_features[0:5, :]).to(device))
print(y, test_labels[0:5])

tensor([[5.9847],
        [5.7880],
        [5.9104],
        [5.6218],
        [5.7857]], grad_fn=<AddmmBackward0>) [8. 5. 7. 6. 6.]
