<a href="https://colab.research.google.com/github/caocscar/workshops/blob/master/pytorch/Workshop_Regression_Sequential.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Regression Problem**

In [1]:
import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
from torch.utils.data import TensorDataset, DataLoader
import numpy as np
import pandas as pd

print('Torch version', torch.__version__)
print('Pandas version', pd.__version__)
print('Numpy version', np.__version__)

Torch version 1.3.1
Pandas version 0.25.3
Numpy version 1.17.4


The following should say `cuda:0`. If it does not, we need to go to *Edit* -> *Notebook settings* and change it to a `GPU` from `None`. You only have to do this once per notebook.

In [2]:
device = 'cuda:0' if torch.cuda.is_available() else 'cpu'
device

'cuda:0'

Read in dataset

In [0]:
df_train = pd.read_csv('https://raw.githubusercontent.com/greght/Workshop-Keras-DNN/master/ChallengeProblems/dataRegression_train.csv', header=None)
df_val = pd.read_csv('https://raw.githubusercontent.com/greght/Workshop-Keras-DNN/master/ChallengeProblems/dataRegression_test.csv', header=None)

Construct our x,y variables along with the training and validation dataset

In [0]:
x_train = df_train.iloc[:,0:2]
y_train = df_train.iloc[:,2]
x_val = df_val.iloc[:,0:2]
y_val = df_val.iloc[:,2]

Preprocess our data to go from a `pandas` DataFrame to a `numpy` array to a `torch` tensor.

In [0]:
x_train_tensor = torch.tensor(x_train.to_numpy(), device=device, dtype=torch.float, requires_grad=True)
y_train_tensor = torch.tensor(y_train.to_numpy(), device=device, dtype=torch.float, requires_grad=True)
x_val_tensor = torch.tensor(x_val.to_numpy(), device=device, dtype=torch.float, requires_grad=True)
y_val_tensor = torch.tensor(y_val.to_numpy(), device=device, dtype=torch.float, requires_grad=True)
y_train_tensor = y_train_tensor.view(-1,1)
y_val_tensor = y_val_tensor.view(-1,1)

Set up our model using the `nn.Sequential` function. We then have to transfer it to the GPU.

In [6]:
model = nn.Sequential(
    nn.Linear(x_train_tensor.shape[1],5),
    nn.ReLU(),
    nn.Linear(5,5),
    nn.ReLU(),
    nn.Linear(5,1),
).to(device)
print(model)

Sequential(
  (0): Linear(in_features=2, out_features=5, bias=True)
  (1): ReLU()
  (2): Linear(in_features=5, out_features=5, bias=True)
  (3): ReLU()
  (4): Linear(in_features=5, out_features=1, bias=True)
)


`model.parameters()` contains the **weights** and **bias** (alternating) for each of the 3 layers



In [7]:
params = list(model.parameters())
print(f'There are {len(params)} parameters')
for param in params:
    print(param)

There are 6 parameters
Parameter containing:
tensor([[-0.1215, -0.3991],
        [ 0.2788,  0.4872],
        [-0.6555, -0.1202],
        [ 0.4907,  0.5229],
        [-0.4818,  0.5152]], device='cuda:0', requires_grad=True)
Parameter containing:
tensor([-0.6059, -0.3664, -0.3488,  0.2570, -0.4900], device='cuda:0',
       requires_grad=True)
Parameter containing:
tensor([[-0.0455, -0.1971, -0.3817,  0.2707,  0.1970],
        [-0.3687, -0.3841, -0.2834, -0.1866, -0.3692],
        [-0.0723, -0.2126,  0.3957, -0.3226, -0.0748],
        [-0.3706, -0.0668, -0.2493,  0.0368, -0.0211],
        [ 0.2951, -0.0035, -0.2387,  0.2200, -0.0834]], device='cuda:0',
       requires_grad=True)
Parameter containing:
tensor([-0.3126,  0.2648, -0.0284, -0.2545,  0.3206], device='cuda:0',
       requires_grad=True)
Parameter containing:
tensor([[ 0.0796,  0.0458, -0.3610,  0.0517,  0.4346]], device='cuda:0',
       requires_grad=True)
Parameter containing:
tensor([-0.3093], device='cuda:0', requires_grad=Tr

We define our *loss function*, *learning rate*, and our *optimizer*.

In [0]:
loss_fn = nn.MSELoss(reduction='mean') #default
learning_rate = 0.1
optimizer = optim.Adagrad(model.parameters(), lr=learning_rate)

Here is our training loop.

In [9]:
epochs = 100
for epoch in range(epochs):
    # training
    output = model(x_train_tensor)
    loss = loss_fn(output, y_train_tensor)
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()
    # testing
    yhat = model(x_val_tensor)
    validation_loss = loss_fn(yhat, y_val_tensor)
    # print intermediate results
    if epoch % 10 == 9:
        print(epoch, loss.item(), validation_loss.item())

9 7.365058898925781 12.146252632141113
19 6.494891166687012 10.409762382507324
29 6.211064338684082 9.872771263122559
39 5.948689937591553 9.375182151794434
49 5.702565670013428 8.90356159210205
59 5.4718708992004395 8.4547758102417
69 5.2584710121154785 8.030385971069336
79 5.065769195556641 7.634967803955078
89 4.897581100463867 7.27454137802124
99 4.756939888000488 6.954959869384766


We can generalize some of the code inside the `for` loop. We'll define a template for our `fit_model` function that contains `train` and `validate` functions. 





In [0]:
def fit_model(model, loss_fn, optimizer):
    def train(x,y):
        yhat = model(x)
        loss = loss_fn(yhat,y)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        return loss.item()
    
    def validate(x,y):
        yhat = model(x)
        loss = loss_fn(yhat,y)
        return loss.item()
    
    return train, validate

 We pass our model, loss function, and optimizer to `fit_model` to return our `train` and `validate` functions.

In [0]:
train, validate = fit_model(model, loss_fn, optimizer)

## Mini-batches

From the documentation: `torch.nn` only supports mini-batches. The entire `torch.nn` package only supports inputs that are a mini-batch of samples, and not a single sample.

In [0]:
train_data = TensorDataset(x_train_tensor, y_train_tensor)
train_loader = DataLoader(dataset=train_data, batch_size=10, shuffle=True)

Here is our training loop with mini-batch processing. We have to move each mini-batch onto the GPU.

In [13]:
epochs = 100
for epoch in range(epochs):
    # training
    losses = []
    for i, (xbatch, ybatch) in enumerate(train_loader):
        xbatch = xbatch.to(device)
        ybatch = ybatch.to(device)
        loss = train(xbatch, ybatch)
        losses.append(loss)
    training_loss = np.mean(losses)
    # validation
    validation_loss = validate(x_val_tensor, y_val_tensor)
    if epoch%10 == 9:
        print(epoch, training_loss, validation_loss)

9 4.431602434678511 5.717512130737305
19 4.4578500660983 5.79561185836792
29 4.42373438314958 5.699330806732178
39 4.4469006061553955 5.70211124420166
49 4.433960112658414 5.681665420532227
59 4.442956664345481 5.705132961273193
69 4.463378797877919 5.76559591293335
79 4.430946686051109 5.709566116333008
89 4.428162076256492 5.758986949920654
99 4.425388639623469 5.7122626304626465


We can view the current state of our model using the `state_dict` method.

In [14]:
model.state_dict()

OrderedDict([('0.weight', tensor([[-0.1215, -0.3991],
                      [ 0.1468,  0.3617],
                      [-0.6555, -0.1202],
                      [ 2.2297, -0.1139],
                      [-0.4818,  0.5152]], device='cuda:0')),
             ('0.bias',
              tensor([-0.6059, -0.4847, -0.3488,  0.3747, -0.4900], device='cuda:0')),
             ('2.weight',
              tensor([[-0.0455, -0.0129, -0.3817,  1.2829,  0.1970],
                      [-0.3687, -0.2262, -0.2834,  0.7972, -0.3692],
                      [-0.0723, -0.2126,  0.3957, -0.3226, -0.0748],
                      [-0.3706, -0.0668, -0.2493,  0.0368, -0.0211],
                      [ 0.2951,  0.1044, -0.2387,  1.0469, -0.0834]], device='cuda:0')),
             ('2.bias',
              tensor([-0.1693,  0.5226, -0.0284, -0.2545,  0.5484], device='cuda:0')),
             ('4.weight',
              tensor([[ 1.0519,  0.7333, -0.3610,  0.0517,  1.0764]], device='cuda:0')),
             ('4.bias', tensor