<a href="https://colab.research.google.com/github/caocscar/workshops/blob/master/pytorch/Workshop_Regression_Sequential.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Regression Problem**

In [29]:
import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
import numpy as np
import pandas as pd
from torch.utils.data import TensorDataset, DataLoader

print('Torch version', torch.__version__)
print('Pandas version', pd.__version__)
print('Numpy version', np.__version__)

Torch version 1.3.1
Pandas version 0.25.3
Numpy version 1.17.4


The following should say `cuda:0`. If it does not, we need to go to *Edit* -> *Notebook settings* and change it to a `GPU` from `None`. You only have to do this once per notebook.

In [30]:
device = 'cuda:0' if torch.cuda.is_available() else 'cpu'
device

'cuda:0'

Read in dataset

In [0]:
df_train = pd.read_csv('https://raw.githubusercontent.com/greght/Workshop-Keras-DNN/master/ChallengeProblems/dataRegression_train.csv', header=None)
df_test = pd.read_csv('https://raw.githubusercontent.com/greght/Workshop-Keras-DNN/master/ChallengeProblems/dataRegression_test.csv', header=None)

Construct our x,y variables along with the training and validation dataset

In [0]:
x_train = df_train.iloc[:,0:2]
y_train = df_train.iloc[:,2]
x_test = df_test.iloc[:,0:2]
y_test = df_test.iloc[:,2]

Preprocess our data to go from a `pandas` DataFrame to a `numpy` array to a `torch` tensor.

In [0]:
x_train_tensor = torch.tensor(x_train.to_numpy(), device=device, dtype=torch.float, requires_grad=True)
y_train_tensor = torch.tensor(y_train.to_numpy(), device=device, dtype=torch.float, requires_grad=True)
x_test_tensor = torch.tensor(x_test.to_numpy(), device=device, dtype=torch.float, requires_grad=True)
y_test_tensor = torch.tensor(y_test.to_numpy(), device=device, dtype=torch.float, requires_grad=True)
y_train_tensor = y_train_tensor.view(-1,1)
y_test_tensor = y_test_tensor.view(-1,1)

Set up our model using the `nn.Sequential` function. We then have to transfer it to the GPU.

In [34]:
model = nn.Sequential(
    nn.Linear(x_train_tensor.shape[1],5),
    nn.ReLU(),
    nn.Linear(5,5),
    nn.ReLU(),
    nn.Linear(5,1),
).to(device)
print(model)

Sequential(
  (0): Linear(in_features=2, out_features=5, bias=True)
  (1): ReLU()
  (2): Linear(in_features=5, out_features=5, bias=True)
  (3): ReLU()
  (4): Linear(in_features=5, out_features=1, bias=True)
)


`model.parameters()` contains the **weights** and **bias** (alternating) for each of the 3 layers



In [35]:
params = list(model.parameters())
print(f'There are {len(params)} parameters')
for param in params:
    print(param)

There are 6 parameters
Parameter containing:
tensor([[ 0.2993, -0.7038],
        [-0.2556, -0.2137],
        [ 0.4313, -0.2261],
        [ 0.6978, -0.4342],
        [ 0.2998,  0.4698]], device='cuda:0', requires_grad=True)
Parameter containing:
tensor([ 0.3835, -0.0449,  0.3708, -0.5495,  0.0828], device='cuda:0',
       requires_grad=True)
Parameter containing:
tensor([[-0.3304, -0.0695, -0.1616,  0.1920, -0.0792],
        [-0.0019, -0.0333,  0.1520,  0.2004,  0.3427],
        [ 0.4436,  0.4199,  0.1999,  0.1085,  0.0091],
        [ 0.2067,  0.0822,  0.2345, -0.1940,  0.3547],
        [ 0.3026, -0.1192, -0.0611, -0.0837,  0.4313]], device='cuda:0',
       requires_grad=True)
Parameter containing:
tensor([ 0.0255, -0.0129,  0.2609, -0.3787,  0.0815], device='cuda:0',
       requires_grad=True)
Parameter containing:
tensor([[-0.2820,  0.1917,  0.1158,  0.2552, -0.3178]], device='cuda:0',
       requires_grad=True)
Parameter containing:
tensor([0.1014], device='cuda:0', requires_grad=Tru

We define our *loss function*, *learning rate*, and our *optimizer*.

In [0]:
loss_fn = nn.MSELoss(reduction='mean') #default
learning_rate = 0.1
optimizer = optim.Adagrad(model.parameters(), lr=learning_rate)

Here is our training loop.

In [37]:
epochs = 100
for epoch in range(epochs):
    # training
    output = model(x_train_tensor)
    loss = loss_fn(output, y_train_tensor)
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()
    # testing
    yhat = model(x_test_tensor)
    validation_loss = loss_fn(yhat, y_test_tensor)
    # print intermediate results
    if epoch % 10 == 9:
        print(epoch, loss.item(), validation_loss.item())

9 5.108928680419922 7.624905586242676
19 4.459510326385498 6.413210391998291
29 4.118618488311768 5.67863130569458
39 3.918602228164673 5.210153102874756
49 3.789799213409424 4.893531322479248
59 3.6861917972564697 4.655496597290039
69 3.5980217456817627 4.458011150360107
79 3.5049703121185303 4.278006553649902
89 3.4194657802581787 4.097975254058838
99 3.344447374343872 3.9381158351898193


We can generalize some of the code inside the `for` loop. We'll define a template for our `fit_model` function that contains `train` and `validate` functions. 





In [0]:
def fit_model(model, loss_fn, optimizer):
    def train(x,y):
        yhat = model(x)
        loss = loss_fn(yhat,y)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        return loss.item()
    
    def validate(x,y):
        yhat = model(x)
        loss = loss_fn(yhat,y)
        return loss.item()
    
    return train, validate

 We pass our model, loss function, and optimizer to `fit_model` to return our `train` and `validate` functions.

In [0]:
train, validate = fit_model(model, loss_fn, optimizer)

## Mini-batches

From the documentation: `torch.nn` only supports mini-batches. The entire `torch.nn` package only supports inputs that are a mini-batch of samples, and not a single sample.

In [0]:
train_data = TensorDataset(x_train_tensor, y_train_tensor)
train_loader = DataLoader(dataset=train_data, batch_size=10, shuffle=True)

Here is our training loop with mini-batch processing. We have to move each mini-batch onto the GPU.

In [41]:
epochs = 100
for epoch in range(epochs):
    # training
    losses = []
    for i, (xbatch, ybatch) in enumerate(train_loader):
        xbatch = xbatch.to(device)
        ybatch = ybatch.to(device)
        loss = train(xbatch, ybatch)
        losses.append(loss)
    training_loss = np.mean(losses)
    # validation
    validation_loss = validate(x_test_tensor, y_test_tensor)
    if epoch%10 == 9:
        print(epoch, training_loss, validation_loss)

9 3.277724266052246 3.5477635860443115
19 3.102418693629178 3.2486870288848877
29 2.9172640171918003 2.9883697032928467
39 2.7863523851741445 2.6826000213623047
49 2.543387673117898 2.3177974224090576
59 2.4614998427304355 2.1546740531921387
69 2.403803164308721 1.9900814294815063
79 2.33422927422957 2.0553269386291504
89 2.2794611291451887 1.8800286054611206
99 2.237360813400962 1.7929385900497437


We can view the current state of our model using the `state_dict` method.

In [42]:
model.state_dict()

OrderedDict([('0.weight', tensor([[-0.0169, -1.9129],
                      [-0.2556, -0.2137],
                      [ 0.4382,  0.3790],
                      [ 1.9701,  0.1989],
                      [ 0.1641,  0.9642]], device='cuda:0')),
             ('0.bias',
              tensor([ 0.6931, -0.0449,  0.7120, -1.1062,  0.2659], device='cuda:0')),
             ('2.weight',
              tensor([[-0.3304, -0.0695, -0.1616,  0.1920, -0.0792],
                      [ 1.6074, -0.0333,  0.3871,  1.9815,  0.7275],
                      [ 2.1149,  0.4199,  0.4397,  1.8962,  0.4107],
                      [ 1.8056,  0.0822,  0.4862,  1.5626,  0.7789],
                      [-0.4971, -0.1192,  0.5471, -1.6378,  0.7071]], device='cuda:0')),
             ('2.bias',
              tensor([ 0.0255,  0.3121,  0.5950, -0.0270,  0.1713], device='cuda:0')),
             ('4.weight',
              tensor([[-0.2820,  0.9930,  0.8562,  1.0883, -1.0211]], device='cuda:0')),
             ('4.bias', tensor