# Thinking in tensors in PyTorch

Deep learning for neuroscientists - hands-on training  by [Piotr Migdał](https://p.migdal.pl) (2019). Version 0.2.


## Notebook 4: PyTorch optimization

We use linear regression as an example.

* [Linear regression](http://www.stat.yale.edu/Courses/1997-98/101/linreg.htm)
* [Ordinary Least Squares Regression-Explained Visually](http://setosa.io/ev/ordinary-least-squares-regression/)
* [Pearson correlation coefficient](https://en.wikipedia.org/wiki/Pearson_correlation_coefficient)

In [None]:
%matplotlib inline
from matplotlib import pyplot as plt

import torch
from torch import nn
from torch.nn import Parameter

## Optimization

Let's start with a problem suited for linear regression.

In [None]:
x = torch.tensor([[1.0, 4.4], 
                  [1.0, 6.1], 
                  [1.0, 6.6], 
                  [1.0, 7.2], 
                  [1.0, 9.1], 
                  [1.0, 11.0]])
y = torch.tensor([1.4, 3.1, 3.6, 4.2, 6.1, 8.0])

noise = torch.randn(y.size())
y.add_(0.1 * noise)  # let's add noise to make it more complicated :)

In [None]:
weights_ideal = torch.tensor([[-3.], [1.]])
weights = torch.randn((2, 1), requires_grad=True)

In [None]:
# at some point we want them to become sth like
weights_ideal

In [None]:
x.mm(weights_ideal).t()

In [None]:
# we start with random weights
weights

## Calculating functions each time

In [None]:
weights = torch.randn((2, 1), requires_grad=True)
loss1 = []

for i in range(1000):

    net_output = x.mm(weights).t()
    loss = (net_output - y).pow(2).mean()
    loss.backward()
    
    weights.data.add_(-0.01 * weights.grad.data)
    weights.grad.data.zero_()
    # gradients are being added cumulatively, so they need zeroing after each update!
    
    loss1.append(loss.item())

In [None]:
# new, modified weights
weights

In [None]:
def show_loss(losses, logy=False):
    print("Minimal loss: {:.3f}".format(losses[-1]))
    if logy:
        plt.semilogy(range(len(losses)), losses)
    else:
        plt.plot(range(len(losses)), losses);
    plt.xlabel("Step")
    plt.ylabel("Loss")

In [None]:
show_loss(loss1, logy=True)

## Module

It is very useful to create a network as a class.
Use `Parameter`. In this case it:

* computes gradient by default,
* registers it as a trainable parameters for optimizer.

In [None]:
class Model(nn.Module):
    
    def __init__(self):
        super(Model, self).__init__()
        self.weights = Parameter(torch.zeros(2, 1))
    
    def forward(self, x):
        output = x.mm(self.weights)
        return output.view(-1)

In [None]:
model = Model()
criterion = torch.nn.MSELoss()
optimizer = torch.optim.SGD(model.parameters(), lr=0.01)

In [None]:
list(model.parameters())

In [None]:
loss2 = []
for i in range(1000):
    
    outputs = model(x)
    loss = criterion(outputs, y)
    loss2.append(loss.item())
    
    optimizer.zero_grad()
    loss.backward()        
    optimizer.step()

In [None]:
show_loss(loss2, logy=True)

In [None]:
list(model.parameters())

## Module - version with nn.Linear 

Some parameters can be directly used from `nn` library. It makes it slightly simpler than manually writing all expressions.


In [None]:
class Model(torch.nn.Module):
    
    def __init__(self):
        super(Model, self).__init__()
        self.fc = nn.Linear(2, 1, bias=False)
    
    def forward(self, x):
        output = self.fc(x)
        return output.view(-1)

In [None]:
model = Model()
criterion = torch.nn.MSELoss()
optimizer = torch.optim.SGD(model.parameters(), lr=0.01)

In [None]:
model

In [None]:
list(model.parameters())

In [None]:
loss3 = []
for i in range(1000):
    outputs = model(x)
    loss = criterion(outputs, y)
    loss3.append(loss.item())
    
    optimizer.zero_grad()
    loss.backward()        
    optimizer.step()

In [None]:
show_loss(loss3, logy=True)

In [None]:
outputs

# Sequential

For some simple models, we can just compose layers using `nn.Sequential`. Sometimes it's convenient, but often only for sub-parts of the network.

In [None]:
model = nn.Sequential(
    nn.Linear(2, 1)
)
# or even: model = nn.Linear(2, 1)
criterion = torch.nn.MSELoss()
optimizer = torch.optim.SGD(model.parameters(), lr=0.01)

In [None]:
model

In [None]:
loss4 = []
for i in range(1000):

    outputs = model(x)
    loss = criterion(outputs, y.unsqueeze(-1)) 
    # depending on other operations sometimes we need to be careful
    # about shapes; sequueze/unsqueeze are common for PyTorch
    loss4.append(loss.item())
    
    optimizer.zero_grad()
    loss.backward()        
    optimizer.step()

In [None]:
show_loss(loss4, logy=True)