# Problemas

Como vimos acima, há muitos passos na criação e definição de uma nova rede neural.
A grande parte desses ajustes dependem diretamente do problemas.

Abaixo, listamos alguns problemas. Todos os problemas e datasets usados vem do [Center for Machine Learning and Intelligent Systems](http://archive.ics.uci.edu/ml/datasets.php).


**Seu objetivo é determinar e implementar um modelo para cada problema.**

Isso inclui:

1. definir uma arquitetura.
Por enquanto usando somente camadas [Lineares](https://pytorch.org/docs/stable/nn.html#linear), porém podemos variar as ativações, como [Sigmoid](https://pytorch.org/docs/stable/nn.html#sigmoid), [Tanh](https://pytorch.org/docs/stable/nn.html#tanh), [ReLU](https://pytorch.org/docs/stable/nn.html#relu), [LeakyReLU](https://pytorch.org/docs/stable/generated/torch.nn.LeakyReLU.html), [ELU](https://pytorch.org/docs/stable/generated/torch.nn.ELU.html), [SeLU](https://pytorch.org/docs/stable/generated/torch.nn.SELU.html), [PReLU](https://pytorch.org/docs/stable/generated/torch.nn.PReLU.html), [RReLU](https://pytorch.org/docs/stable/generated/torch.nn.RReLU.html)
2. definir uma função de custo. Algums opções que vimos previamente incluem[L1](https://pytorch.org/docs/stable/nn.html#l1loss), [L2/MSE](https://pytorch.org/docs/stable/nn.html#mseloss), [Huber/SmoothL1](https://pytorch.org/docs/stable/nn.html#smoothl1loss), [*Cross-Entropy*](https://pytorch.org/docs/stable/nn.html#crossentropyloss), [Hinge](https://pytorch.org/docs/stable/nn.html#hingeembeddingloss)), e
3. definir um algoritmo de otimização ([SGD](https://pytorch.org/docs/stable/optim.html#torch.optim.SGD), [RMSProp](https://pytorch.org/docs/stable/optim.html#torch.optim.RMSprop), [Adam](https://pytorch.org/docs/stable/optim.html#torch.optim.Adam))

A leitura do dado assim como a função de treinamento já estão implementados para você.

# Preâmbulo

In [1]:
# imports basicos
from sklearn import preprocessing
from sklearn.model_selection import train_test_split

import torch
import torch.nn.functional as F
import torchvision

from torchvision import datasets, transforms
from torch import optim, nn

import os
import sys
import time
import numpy as np

In [2]:
import matplotlib.pyplot as plt
plt.ion()

In [3]:
# Test if GPU is avaliable, if not, use cpu instead
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
n = torch.cuda.device_count()
devices_ids = list(range(n))
device

device(type='cuda')

In [4]:
# funções básicas

def load_array(features, labels, batch_size, is_train=True):
    """Construct a Torch data loader"""
    if type(features) != torch.tensor:
        features = torch.tensor(features)
    if type(labels) != torch.tensor:
        labels = torch.tensor(labels)
    dataset = torch.utils.data.TensorDataset(features, labels)
    return torch.utils.data.DataLoader(dataset, batch_size, shuffle=is_train)

def _get_batch(batch):
    """Return features and labels on ctx."""
    features, labels = batch
    if labels.type() != features.type():
        labels = labels.type(features.type())
    return (torch.nn.DataParallel(features, device_ids=devices_ids),
            torch.nn.DataParallel(labels, device_ids=devices_ids), features.shape[0])

# Função usada para calcular acurácia
def evaluate_accuracy(data_iter, net, loss):
    """Evaluate accuracy of a model on the given data set."""

    acc_sum, n, l = 0, 0, 0
    
    with torch.no_grad():
      for X, y in data_iter:
          X, y = X.to(device), y.to(device)
          y_hat = net(X)
          l += loss(y_hat, y.long())
          acc_sum += (y_hat.argmax(axis=1) == y).sum().item()
          n += y.size(0)

    return acc_sum / n, l.item() / len(data_iter)
  
    
# Função usada no treinamento e validação da rede
def train_validate(net, train_iter, test_iter, batch_size, trainer, loss,
                   num_epochs, type='regression'):
    print('training on', device)
    for epoch in range(num_epochs):
        train_l_sum, train_acc_sum, n, start = 0.0, 0.0, 0, time.time()
        for X, y in train_iter:
            X, y = X.to(device), y.to(device)
            trainer.zero_grad()
            y_hat = net(X)
            if type == 'regression':
              l = loss(y_hat, y.float())
            else:
              l = loss(y_hat, y.long())
            l.backward()
            trainer.step()
            train_l_sum += l.item()
            train_acc_sum += (y_hat.argmax(axis=1) == y).sum().item()
            n += y.size(0)
        test_acc, test_loss = evaluate_accuracy(test_iter, net, loss)
        if type == 'regression':
          print('epoch %d, train loss %.4f, test loss %.4f, time %.1f sec'
                % (epoch + 1, train_l_sum / len(train_iter), test_loss, time.time() - start))
        else:
          print('epoch %d, train loss %.4f, train acc %.3f, test loss %.4f, '
              'test acc %.3f, time %.1f sec'
              % (epoch + 1, train_l_sum / len(train_iter), train_acc_sum / n, test_loss, 
                 test_acc, time.time() - start))
          
        
# funcao usada para teste
def test(net, test_iter):
    print('testing on', device)
    first = True
    for X in test_iter:
        X = X.to(device)
        y_hat = net(X)
        if first is True:
            pred_logits = y_hat
            pred_labels = y_hat.argmax(axis=1)
            first = False
        else:
            pred_logits = torch.concat(pred_logits, y_hat, dim=0)
            pred_labels = torch.concat(pred_labels, y_hat.argmax(axis=1), dim=0)

    return pred_logits.numpy(), pred_labels.numpy()

# Função para inicializar pesos da rede
def weights_init(m):
    if type(m) == nn.Linear:
        m.weight.data.normal_(0.0, 0.01) # valores iniciais são uma normal
        m.bias.data.fill_(0)

# Problema 1

Neste problema, você receberá 14 *features* coletadas de pacientes e tentará predizer se eles tem algum sinal de doença cardíaca. Mais sobre esse dataset aqui: https://archive.ics.uci.edu/ml/datasets/Heart+Disease

In [11]:
!wget https://archive.ics.uci.edu/ml/machine-learning-databases/heart-disease/processed.cleveland.data
data = np.genfromtxt('processed.cleveland.data', delimiter=',', dtype=np.float32)
data = np.nan_to_num(data)

print(data.shape, data[0, :])
X, y = data[:, :-1], data[:, -1]
print(X.shape, X[0, :])
print(y.shape, y[0])
train_features, test_features, train_labels, test_labels = train_test_split(X, y, test_size=0.20, random_state=42)
  
batch_size = 100
train_iter = load_array(train_features, train_labels, batch_size)
test_iter = load_array(test_features, test_labels, batch_size, False)

--2021-02-02 16:58:27--  https://archive.ics.uci.edu/ml/machine-learning-databases/heart-disease/processed.cleveland.data
Resolving archive.ics.uci.edu (archive.ics.uci.edu)... 128.195.10.252
Connecting to archive.ics.uci.edu (archive.ics.uci.edu)|128.195.10.252|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 18461 (18K) [application/x-httpd-php]
Saving to: ‘processed.cleveland.data.3’


2021-02-02 16:58:28 (264 KB/s) - ‘processed.cleveland.data.3’ saved [18461/18461]

(303, 14) [ 63.    1.    1.  145.  233.    1.    2.  150.    0.    2.3   3.    0.
   6.    0. ]
(303, 13) [ 63.    1.    1.  145.  233.    1.    2.  150.    0.    2.3   3.    0.
   6. ]
(303,) 0.0


In [12]:
# parâmetros: número de epochs, learning rate (ou taxa de aprendizado), e lambda do weight decay
num_epochs, lr, wd_lambda = 10, 0.001, 0.0001

# rede simples somente com perceptrons e camadas densamente conectadas
net = nn.Sequential(
        nn.Linear(13, 128),
        nn.ReLU(),
        nn.Dropout(0.5),
        nn.Linear(128, 256),
        nn.ReLU(),
        nn.Dropout(0.5),
        nn.Linear(256, 128),
        nn.ReLU(),
        nn.Dropout(0.5),
        nn.Linear(128, 5)
)
net.to(device)

# função de custo (ou loss)
loss = nn.CrossEntropyLoss().to(device)

# optimizer
trainer = optim.SGD(net.parameters(), lr=lr, weight_decay=wd_lambda)

# treinamento e validação
train_validate(net, train_iter, test_iter, batch_size, trainer, loss, num_epochs, type='classification')

training on cuda
epoch 1, train loss 7.5748, train acc 0.273, test loss 6.3751, test acc 0.393, time 0.0 sec
epoch 2, train loss 5.6528, train acc 0.372, test loss 6.9824, test acc 0.295, time 0.0 sec
epoch 3, train loss 6.1780, train acc 0.306, test loss 4.2531, test acc 0.410, time 0.0 sec
epoch 4, train loss 4.8677, train acc 0.335, test loss 6.2629, test acc 0.246, time 0.0 sec
epoch 5, train loss 4.1185, train acc 0.364, test loss 6.0365, test acc 0.311, time 0.0 sec
epoch 6, train loss 4.3485, train acc 0.368, test loss 5.3161, test acc 0.328, time 0.0 sec
epoch 7, train loss 4.5289, train acc 0.360, test loss 4.0035, test acc 0.377, time 0.0 sec
epoch 8, train loss 3.9843, train acc 0.368, test loss 4.5059, test acc 0.279, time 0.0 sec
epoch 9, train loss 3.2715, train acc 0.434, test loss 4.6388, test acc 0.279, time 0.0 sec
epoch 10, train loss 3.7248, train acc 0.347, test loss 2.9318, test acc 0.328, time 0.0 sec


# Problema 2

Neste problema, você receberá 90 *features* extraídas de diversas músicas (datadas de 1922 até 2011) e deve predizer o ano de cada música. Mais sobre esse dataset aqui: https://archive.ics.uci.edu/ml/datasets/YearPredictionMSD

In [73]:
# download do dataset
!wget http://archive.ics.uci.edu/ml/machine-learning-databases/00203/YearPredictionMSD.txt.zip
!unzip YearPredictionMSD.txt.zip
data = np.genfromtxt('YearPredictionMSD.txt', delimiter=',', dtype=np.float32)

print(data[0, :])
X, y = data[:, 1:], data[:, 0]
train_features, test_features, train_labels, test_labels = train_test_split(X, y, test_size=0.33, random_state=42)
  
batch_size = 100
train_iter = load_array(train_features, train_labels, batch_size)
test_iter = load_array(test_features, test_labels, batch_size, False)

--2021-02-02 16:39:50--  http://archive.ics.uci.edu/ml/machine-learning-databases/00203/YearPredictionMSD.txt.zip
Resolving archive.ics.uci.edu (archive.ics.uci.edu)... 128.195.10.252
Connecting to archive.ics.uci.edu (archive.ics.uci.edu)|128.195.10.252|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 211011981 (201M) [application/x-httpd-php]
Saving to: ‘YearPredictionMSD.txt.zip.1’


2021-02-02 16:40:07 (12.4 MB/s) - ‘YearPredictionMSD.txt.zip.1’ saved [211011981/211011981]

Archive:  YearPredictionMSD.txt.zip
replace YearPredictionMSD.txt? [y]es, [n]o, [A]ll, [N]one, [r]ename: N
[ 2.0010000e+03  4.9943569e+01  2.1471140e+01  7.3077499e+01
  8.7486095e+00 -1.7406281e+01 -1.3099050e+01 -2.5012020e+01
 -1.2232570e+01  7.8308902e+00 -2.4678299e+00  3.3213601e+00
 -2.3152101e+00  1.0205560e+01  6.1110913e+02  9.5108960e+02
  6.9811426e+02  4.0898486e+02  3.8370911e+02  3.2651511e+02
  2.3811327e+02  2.5142413e+02  1.8717351e+02  1.0042652e+02
  1.7919498e+02 -8.4

In [74]:
# parâmetros: número de epochs, learning rate (ou taxa de aprendizado), e lambda do weight decay
num_epochs, lr, wd_lambda = 100, 0.00001, 0.0000001

# arquitetura
net = nn.Sequential(
        nn.Linear(90, 128),
        nn.ReLU(),
        nn.Linear(128, 256),
        nn.ReLU(),
        nn.Linear(256, 1)
)
net.to(device)

# função de custo (ou loss)
loss = nn.SmoothL1Loss().to(device)

# optimizer
trainer = optim.Adam(net.parameters(), lr=lr, weight_decay=wd_lambda)

# treinamento e validação
train_validate(net, train_iter, test_iter, batch_size, trainer, loss, num_epochs)

training on cuda


  return F.smooth_l1_loss(input, target, reduction=self.reduction, beta=self.beta)
  return F.smooth_l1_loss(input, target, reduction=self.reduction, beta=self.beta)
  return F.smooth_l1_loss(input, target, reduction=self.reduction, beta=self.beta)


epoch 1, train loss 828.3568, test loss 531.5590, time 10.9 sec
epoch 2, train loss 484.3003, test loss 456.6558, time 11.0 sec
epoch 3, train loss 433.7129, test loss 413.9627, time 11.0 sec
epoch 4, train loss 392.5443, test loss 372.9806, time 11.0 sec
epoch 5, train loss 351.7587, test loss 332.7040, time 10.8 sec
epoch 6, train loss 313.0188, test loss 295.6150, time 11.0 sec
epoch 7, train loss 276.8993, test loss 260.2495, time 11.5 sec
epoch 8, train loss 243.1305, test loss 228.1428, time 10.6 sec
epoch 9, train loss 212.8374, test loss 200.0085, time 11.0 sec
epoch 10, train loss 188.9946, test loss 179.5744, time 11.0 sec
epoch 11, train loss 172.7330, test loss 167.4416, time 10.8 sec
epoch 12, train loss 162.0664, test loss 157.8766, time 11.1 sec
epoch 13, train loss 154.2655, test loss 150.9136, time 11.1 sec
epoch 14, train loss 147.8607, test loss 145.0188, time 10.8 sec
epoch 15, train loss 142.3203, test loss 140.6690, time 11.0 sec
epoch 16, train loss 137.3547, tes

KeyboardInterrupt: ignored

In [75]:
# mostra o resultado predito para as 5 primeiras instâncias de teste
y = net(torch.Tensor(test_features[0:5, :]).to(device))
print(y, test_labels[0:5])

tensor([[2123.6570],
        [2055.9392],
        [2100.2898],
        [1703.6139],
        [1944.5286]], device='cuda:0', grad_fn=<AddmmBackward>) [2008. 2001. 2006. 2008. 1998.]


# Problema 3

Neste problema, você receberá várias *features* (como altura média, inclinação, etc) descrevendo uma região e o modelo deve predizer qual o tipo da região (floresta, montanha, etc). Mais informações sobre esse dataset aqui: https://archive.ics.uci.edu/ml/datasets/covertype

In [53]:
!wget http://archive.ics.uci.edu/ml/machine-learning-databases/covtype/covtype.data.gz
!gzip covtype.data.gz
data = np.genfromtxt('covtype.data', delimiter=',', dtype=np.float32)

print(data.shape, data[0, :])
X, y = data[:, :-1], data[:, -1]
print(X.shape, X[0, :])
print(y.shape, y[0])
train_features, test_features, train_labels, test_labels = train_test_split(X, y, test_size=0.33, random_state=42)
train_labels = train_labels - 1
test_labels = test_labels - 1
  
batch_size = 100
train_iter = load_array(train_features, train_labels, batch_size)
test_iter = load_array(test_features, test_labels, batch_size, False)

--2021-02-02 16:23:01--  http://archive.ics.uci.edu/ml/machine-learning-databases/covtype/covtype.data.gz
Resolving archive.ics.uci.edu (archive.ics.uci.edu)... 128.195.10.252
Connecting to archive.ics.uci.edu (archive.ics.uci.edu)|128.195.10.252|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 11240707 (11M) [application/x-httpd-php]
Saving to: ‘covtype.data.gz.3’


2021-02-02 16:23:03 (14.0 MB/s) - ‘covtype.data.gz.3’ saved [11240707/11240707]

gzip: covtype.data.gz already has .gz suffix -- unchanged
(581012, 55) [2.596e+03 5.100e+01 3.000e+00 2.580e+02 0.000e+00 5.100e+02 2.210e+02
 2.320e+02 1.480e+02 6.279e+03 1.000e+00 0.000e+00 0.000e+00 0.000e+00
 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00
 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00
 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00
 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00
 1.000e+00 0.000e+00 0.000e+

In [66]:
# parâmetros: número de epochs, learning rate (ou taxa de aprendizado), e lambda do weight decay
num_epochs, lr, wd_lambda = 20, 0.001, 0.0001

# rede simples somente com perceptrons e camadas densamente conectadas
net = nn.Sequential(
        nn.Linear(54, 256),
        nn.ReLU(),
        nn.Dropout(0.5),
        nn.Linear(256, 128),
        nn.ReLU(),
        nn.Dropout(0.5),
        nn.Linear(128, 64),
        nn.ReLU(),
        nn.Dropout(0.5),
        nn.Linear(64, 7)
)
net.to(device)

# função de custo (ou loss)
loss = nn.CrossEntropyLoss().to(device)

# optimizer
trainer = optim.SGD(net.parameters(), lr=lr, weight_decay=wd_lambda)

# treinamento e validação
train_validate(net, train_iter, test_iter, batch_size, trainer, loss, num_epochs, type='classification')

training on cuda
epoch 1, train loss 1.3487, train acc 0.468, test loss 1.1978, test acc 0.483, time 11.7 sec
epoch 2, train loss 1.1846, train acc 0.482, test loss 1.1695, test acc 0.488, time 12.3 sec
epoch 3, train loss 1.1593, train acc 0.486, test loss 1.1472, test acc 0.488, time 11.8 sec
epoch 4, train loss 1.1352, train acc 0.491, test loss 1.1230, test acc 0.496, time 11.9 sec
epoch 5, train loss 1.1049, train acc 0.504, test loss 1.0931, test acc 0.510, time 11.6 sec
epoch 6, train loss 1.0626, train acc 0.528, test loss 1.0507, test acc 0.528, time 12.0 sec
epoch 7, train loss 1.0321, train acc 0.547, test loss 1.0091, test acc 0.558, time 11.8 sec
epoch 8, train loss 1.0102, train acc 0.560, test loss 0.9897, test acc 0.573, time 12.3 sec
epoch 9, train loss 0.9923, train acc 0.570, test loss 0.9691, test acc 0.583, time 12.0 sec
epoch 10, train loss 0.9781, train acc 0.578, test loss 0.9557, test acc 0.586, time 11.7 sec
epoch 11, train loss 0.9630, train acc 0.586, test l

# Problema 4

Neste problema, você receberá 11 *features* extraídas de tipos de vinhos, e terá que predizer um *score* para cada vinho. Mais sobre esse dataset aqui: https://archive.ics.uci.edu/ml/datasets/Wine+Quality

In [13]:
# download do dataset
!wget https://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-red.csv
!wget https://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-white.csv
data_red = np.genfromtxt('winequality-red.csv', delimiter=';', dtype=np.float32, skip_header=1)
data_white = np.genfromtxt('winequality-white.csv', delimiter=';', dtype=np.float32, skip_header=1)
data = np.concatenate((data_red, data_white), axis=0)
data = np.nan_to_num(data)

print(data[0, :])
X, y = data[:, :-1], data[:, -1]
print(X.shape, y.shape)
train_features, test_features, train_labels, test_labels = train_test_split(X, y, test_size=0.33, random_state=42)

batch_size = 100
train_iter = load_array(train_features, train_labels, batch_size)
test_iter = load_array(test_features, test_labels, batch_size, False)

--2021-02-02 17:13:31--  https://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-red.csv
Resolving archive.ics.uci.edu (archive.ics.uci.edu)... 128.195.10.252
Connecting to archive.ics.uci.edu (archive.ics.uci.edu)|128.195.10.252|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 84199 (82K) [application/x-httpd-php]
Saving to: ‘winequality-red.csv’


2021-02-02 17:13:31 (595 KB/s) - ‘winequality-red.csv’ saved [84199/84199]

--2021-02-02 17:13:31--  https://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-white.csv
Resolving archive.ics.uci.edu (archive.ics.uci.edu)... 128.195.10.252
Connecting to archive.ics.uci.edu (archive.ics.uci.edu)|128.195.10.252|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 264426 (258K) [application/x-httpd-php]
Saving to: ‘winequality-white.csv’


2021-02-02 17:13:32 (946 KB/s) - ‘winequality-white.csv’ saved [264426/264426]

[ 7.4     0.7     0.      1.9 

In [14]:
# parâmetros: número de epochs, learning rate (ou taxa de aprendizado), e lambda do weight decay
num_epochs, lr, wd_lambda = 100, 0.00001, 0.0000001

# arquitetura
net = nn.Sequential(
        nn.Linear(11, 128),
        nn.ReLU(),
        nn.Linear(128, 256),
        nn.ReLU(),
        nn.Linear(256, 1)
)
net.to(device)

# função de custo (ou loss)
loss = nn.SmoothL1Loss().to(device)

# optimizer
trainer = optim.Adam(net.parameters(), lr=lr, weight_decay=wd_lambda)

# treinamento e validação
train_validate(net, train_iter, test_iter, batch_size, trainer, loss, num_epochs)

training on cuda
epoch 1, train loss 1.5276, test loss 1.5283, time 0.1 sec


  return F.smooth_l1_loss(input, target, reduction=self.reduction, beta=self.beta)
  return F.smooth_l1_loss(input, target, reduction=self.reduction, beta=self.beta)
  return F.smooth_l1_loss(input, target, reduction=self.reduction, beta=self.beta)


epoch 2, train loss 1.4710, test loss 1.4665, time 0.1 sec
epoch 3, train loss 1.4104, test loss 1.4061, time 0.1 sec
epoch 4, train loss 1.3554, test loss 1.3453, time 0.2 sec
epoch 5, train loss 1.2932, test loss 1.2806, time 0.2 sec
epoch 6, train loss 1.2262, test loss 1.2100, time 0.2 sec
epoch 7, train loss 1.1495, test loss 1.1290, time 0.2 sec
epoch 8, train loss 1.0649, test loss 1.0464, time 0.2 sec
epoch 9, train loss 0.9923, test loss 0.9778, time 0.2 sec
epoch 10, train loss 0.9324, test loss 0.9216, time 0.1 sec
epoch 11, train loss 0.8785, test loss 0.8643, time 0.1 sec
epoch 12, train loss 0.8231, test loss 0.8090, time 0.1 sec
epoch 13, train loss 0.7736, test loss 0.7612, time 0.1 sec
epoch 14, train loss 0.7327, test loss 0.7251, time 0.1 sec
epoch 15, train loss 0.6908, test loss 0.6822, time 0.1 sec
epoch 16, train loss 0.6549, test loss 0.6442, time 0.2 sec
epoch 17, train loss 0.6203, test loss 0.6124, time 0.2 sec
epoch 18, train loss 0.5936, test loss 0.5825, t

In [15]:
# mostra o resultado predito para as 5 primeiras instâncias de teste
y = net(torch.Tensor(test_features[0:5, :]).to(device))
print(y, test_labels[0:5])

tensor([[6.2811],
        [5.7228],
        [6.1114],
        [5.6124],
        [5.8584]], device='cuda:0', grad_fn=<AddmmBackward>) [8. 5. 7. 6. 6.]
