In [None]:
from __future__ import print_function
%matplotlib inline

## Tensors

Central to all neural networks in PyTorch is the tensor class. Tensors are similar to NumPy’s ndarrays, with the following addition:
* Tensors can also be used on a GPU to accelerate computing.
* Tensors can be set to automatically track all operations on them and compute gradients for backprop.
* Tensors can be converted to NumPy ndarrays and vice versa.

In [None]:
import torch

# type of tensors
torch.zeros(2,3)                  # default dtype float32
torch.ones(2,3)                   # default dtype float32
torch.empty(2,3)                  # default dtype float32, un-initialized
torch.rand(2,3)                   # default dtype float32
torch.Tensor(2,3)                 # default float32

# construct tensors and initialize them to defined values
torch.tensor([[2,3,4],[1,2,3]])   # default dtype int64
torch.Tensor([[2,3,4],[1,2,3]])   # default dtype float32, can not set attribute

### Tensor `device` attribute
  
Tensors can be moved from CPU to GPU devices or vice versa by setting the appropriate values to the `device(torch.device)` attribute.

### Tensor `requires_grad` attribute

The `requires_grad` attribute is used to enable automatic differentiation. Whenever a tensor's  `requires_grad` attribute is set as `True`, it starts to track all operations on it. Afterward, when its `.backward()` function is called, the tensor will compute all gradients automatically and store them in the `grad` attribute.

Let's start by setting the `requires_grad` attribute to `False` and observe the value of `*.grad_fn` and `*.grad`.

In [None]:
# gradient, requires_grad = False example

x = torch.ones(1, requires_grad=False)
print("Tensor created with default value where requires_grad=False")
print(x, x.dtype, x.requires_grad, x.device)

# perform math operation
out = x.pow(2)
print("* grad_fn wrt x:", x.grad_fn)
print("* grad wrt x:", x.grad)
print("* grad_fn wrt out:", out.grad_fn)
print("* grad wrt out:", out.grad)

# This will give error because its requires_grad is set to False. Thus, it has no grad_fn for backprop
#out.backward()  

Now, set `requires_grad` to `True`. Math operations on the tensor will be tracked and gradients will be calculated once the `.backward` function is called.

In [None]:
# gradient,, requires_grad = True
# example with multidimensional tensor

x = torch.ones(3, 1, requires_grad=True)
print("Tensor created with requires_grad=True")
print(x, x.dtype, x.requires_grad, x.device)

out = x.pow(2).sum()
print("Output value:", out, "size:", out.size())
print("* grad_fn wrt x:", x.grad_fn)
print("* grad wrt x:", x.grad)
print("* grad_fn wrt out:", out.grad_fn)
print("* grad wrt out:", out.grad)

print("\nCall .backward function for gradients calcuations:")
out.backward()
print("* grad_fn wrt x:", x.grad_fn)
print("* grad wrt x:", x.grad)
print("* grad_fn wrt out:", out.grad_fn)
print("* grad wrt out:", out.grad)

### NumPy ndarray and Tensor

Tensor can be converted to NumPy ndarray and vice versa.

In [None]:
# numpy ndarray to tensor and vise versa

import numpy as np
np_v = np.ones(5)

# create tensor from numpy
tensor_v = torch.from_numpy(np_v)
print("np_v:", np_v)
print("tensor_v:", tensor_v)

# convert tensor to numpy
x = tensor_v.numpy()
print("x:", x)

## Supervised Learning Training Examples

The majority of practical machine learning uses supervised learning. Supervised learning is used to learn the relationships (mapping function) from existing input and output data.  The goal is to be able to use the learned relationships to predict the output data from new input data.

### Model Training Flow

Training a model involves looping through several fundamental steps:

* Define model.
* Prepare input and target (label) data in a format that can be consumed by the model.
* Run the data through the computations defined by the model.
* Get the prediction (output).
* Compute loss by comparing prediction to target.
* Minimize loss by using an optimization algorithm to adjust the learned variables (weights, biases, ...)
![](img/trainingFlow.png)

### Simple Linear Regression

Simple linear regression is a statistical method that allows us to summarize and study relationships between two continuous (quantitative) variables:

* One variable, denoted x, is regarded as the predictor, explanatory, or independent variable.
* The other variable, denoted y, is regarded as the response, outcome, or dependent variable.

Linear regression uses a linear equation of the form:
$
   Y = WX + b
$

Where:
* **X**: input data
* **Y**: predicted data
* **W**: weight to be learned during training
* **b**: bias to be learned

Let's set the `device` variable to determine whether our example will be run on CPU or GPU.

In [None]:
device = torch.device('cpu')
#device = torch.device('cuda') # Uncomment this to run on GPU

For simplicity, in our linear regression example below, we will ignore the bias term.  So the relationship between our data (X, Y) is just Y = WX. We will try to determine (learn) the value of W.

We will create our data by letting X be a tensor with random values and Y is just double of X.

To begin with, all training math operations will be performed manually. These include: gradient and loss calculations, weight adjustment, etc.

In [None]:
# Linear Regression
import torch

# N is batch size; D_in is input dimension;
# D_out is output dimension.
N, D_in, D_out = 64, 1, 1

# Prepare sample data
x = torch.randn(N, D_in, device=device)
y = 2*x

# Randomly initialize weights
w = torch.randn(D_in, D_out, device=device)
print(w, w.size())

learning_rate = 1e-4
for t in range(1000):    
    # Forward pass: compute predicted y
    y_pred = x.mm(w)
    
    # Compute and print loss; loss is a scalar, and is stored in a PyTorch Tensor
    # of shape (); we can get its value as a Python number with loss.item()
    loss=(y_pred - y).pow(2).sum()
    if (t%100 ==0):
        print(t, loss.item())

    # Backprop to compute gradients of with respect to loss
    grad_y_pred = 2.0 * (y_pred - y)
    grad_w = x.t().mm(grad_y_pred)
  
    # Update weights using gradient descent
    w -= learning_rate * grad_w
      
print(w)        

Although gradients can be calculated manually, the operations are tedious and error-prone, especially with complex neural networks. In the following example, we will replace the loss calculations with a PyTorch pre-defined loss function. We will also use autograd for gradient computations.

In [None]:
# Linear regression using autograd, pre-defined loss function
import torch

# N is batch size; D_in is input dimension;
# D_out is output dimension.
N, D_in, D_out = 64, 1, 1

# Prepare data
x = torch.randn(N, D_in, device=device)
y = 2*x

# Randomly initialize weights, enable autograd
w = torch.randn(D_in, D_out, device=device, requires_grad=True)
print(w, w.size())

# Use PyTorch pre-defined loss function
loss_fn = torch.nn.MSELoss(reduction='sum')  # for PyTorch 0.4.1 and later
#loss_fn = torch.nn.MSELoss(size_average=False) # for PyTorch 0.4.0

learning_rate = 1e-4
for t in range(1000):
    
    # Forward pass: compute predicted y
    y_pred = x.mm(w)

    # Compute and print loss; loss is a scalar, and is stored in a PyTorch Tensor
    # of shape (); we can get its value as a Python number with loss.item().
    loss = loss_fn(y_pred, y)  
    if (t%100 ==0):
        print(t, loss.item())

    # call backward to compute gradients of w with respect to loss        
    loss.backward()
    
    with torch.no_grad():
        # Update weights using gradient descent
        w -= learning_rate * w.grad
    w.grad.zero_()

print(w)
x = torch.randn(1, D_in, device=device)
print(x)
print(x.mm(w))

PyTorch includes a number of optimization algorithms for trainable parameter adjustment. In the next example, we will use one of those optimizers and calling its `.step()` function to adjust the weights.

In [None]:
# Linear regression using PyTorch optimizer functions
import torch

# N is batch size; D_in is input dimension;
# D_out is output dimension.
N, D_in, D_out = 64, 1, 1

# Prepare data
x = torch.randn(N, D_in, device=device)
y = 2*x

# Randomly initialize weights, enable autograd
w = torch.randn(D_in, D_out, device=device, requires_grad=True)
print(w, w.size())

# Use PyTorch pre-defined loss function
# loss_fn = torch.nn.MSELoss(reduction='sum')  # for PyTorch 0.4.1
loss_fn = torch.nn.MSELoss(size_average=False) # for PyTorch 0.4.0

learning_rate = 1e-4
# Use PyTorch pre-defined optimizer
optimizer = torch.optim.SGD([w], lr=learning_rate)

for t in range(1000):
    
    # Forward pass: compute predicted y
    y_pred = x.mm(w)

    # Compute and print loss; loss is a scalar, and is stored in a PyTorch Tensor
    # of shape (); we can get its value as a Python number with loss.item().
    loss = loss_fn(y_pred, y)  
    if (t%100 ==0):
        print(t, loss.item())

    # Zero gradients, perform a backward pass, and update the weights.
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()
    
print(w)
x = torch.randn(1, D_in, device=device)
print(x)
print(x.mm(w))

Pytorch `nn` package defines a set of Modules, which you can think of as a neural network layer that produces output from input and may have some trainable weights. In the following example, we will use `torch.nn.Linear` module.

In [None]:
# Linear regression using torch.nn.model
import torch

# N is batch size; D_in is input dimension;
# D_out is output dimension.
N, D_in, D_out = 64, 1, 1

# Prepare data
x = torch.randn(N, D_in, device=device)
y = 2*x

# Use PyTorch pre-defined loss function
# loss_fn = torch.nn.MSELoss(reduction='sum')  # for PyTorch 0.4.1
loss_fn = torch.nn.MSELoss(size_average=False) # for PyTorch 0.4.0

# Linear model
model=torch.nn.Linear(D_in, D_out, bias=False).to(device)

learning_rate = 1e-4
# Use PyTorch pre-defined optimizer
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)

for t in range(1000):
    
    # Forward pass: compute predicted y
    y_pred = model(x)

    # Compute and print loss; loss is a scalar, and is stored in a PyTorch Tensor
    # of shape (); we can get its value as a Python number with loss.item().
    loss = loss_fn(y_pred, y)  
    if (t%100 ==0):
        print(t, loss.item())

    # Zero gradients, perform a backward pass, and update the weights.
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()
    
for p in model.parameters():
    print("parameter:",p.data.size(), p.data)
       
x = torch.randn(2, D_in, device=device)
print('Input', x)
print('Predict', model.forward(x))    

### Multilayer Network

A multilayer network is a stacked representation of a single-layer neural network. The input layer is tacked onto the first-layer neural network and a feed-forward network. Each subsequent layer after the input layer uses the output of the previous layer as its input.

In the following example, we replace the simple regression model with a fully-connected ReLU network from https://github.com/jcjohnson/pytorch-examples

In [None]:
# Multilayer model
import torch

# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = 64, 1000, 100, 10

# Create random input and output data
x = torch.randn(N, D_in, device=device)
y = torch.randn(N, D_out, device=device)

model = torch.nn.Sequential(
          torch.nn.Linear(D_in, H),
          torch.nn.ReLU(),
          torch.nn.Linear(H, D_out),
        ).to(device)

# Use PyTorch pre-defined loss function
# loss_fn = torch.nn.MSELoss(reduction='sum')  # for PyTorch 0.4.1
loss_fn = torch.nn.MSELoss(size_average=False) # for PyTorch 0.4.0
learning_rate = 1e-4
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)

for t in range(500):
    # Forward pass: compute predicted y
    y_pred = model(x) 

    # Compute and print loss; loss is a scalar, and is stored in a PyTorch Tensor
    # of shape (); we can get its value as a Python number with loss.item().
    loss = loss_fn(y_pred, y)  
    if (t%100 == 0):
        print(t, loss.item())

    # Zero gradients, perform a backward pass, and update the weights.
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()
    
for p in model.parameters():
    print("parameter:",p.data.size(), p.data)

The following example ( from https://pytorch.org/tutorials/beginner/examples_nn/two_layer_net_module.html) shows how to build
a customized model for multilayer networks that can't be built with a sequence of existing Modules.

In [None]:
# Custom nn modules
import torch

class TwoLayerNet(torch.nn.Module):
    def __init__(self, D_in, H, D_out):
        """
        In the constructor we instantiate two nn.Linear modules and assign them as
        member variables.
        """
        super(TwoLayerNet, self).__init__()
        self.linear1 = torch.nn.Linear(D_in, H)
        self.linear2 = torch.nn.Linear(H, D_out)

    def forward(self, x):
        """
        In the forward function we accept a Tensor of input data and we must return
        a Tensor of output data. We can use Modules defined in the constructor as
        well as arbitrary (differentiable) operations on Tensors.
        """
        h_relu = self.linear1(x).clamp(min=0)
        y_pred = self.linear2(h_relu)
        return y_pred

# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = 64, 1000, 100, 10

device = torch.device('cpu')
# Create random Tensors to hold inputs and outputs
x = torch.randn(N, D_in, device=device)
y = torch.randn(N, D_out, device=device)

# Construct our model by instantiating the class defined above.
model = TwoLayerNet(D_in, H, D_out)
model.to(device)

# Construct our loss function and an Optimizer. The call to model.parameters()
# in the SGD constructor will contain the learnable parameters of the two
# nn.Linear modules which are members of the model.
# loss_fn = torch.nn.MSELoss(reduction='sum')  # for PyTorch 0.4.1
loss_fn = torch.nn.MSELoss(size_average=False) # for PyTorch 0.4.0
optimizer = torch.optim.SGD(model.parameters(), lr=1e-4)
for t in range(1000):
    # Forward pass: Compute predicted y by passing x to the model
    y_pred = model(x)

    # Compute and print loss
    loss = loss_fn(y_pred, y)
    if (t%100 == 0):
        print(t, loss.item())

    # Zero gradients, perform a backward pass, and update the weights.
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

## References

* https://github.com/jcjohnson/pytorch-examples
* https://machinelearningmastery.com/linear-regression-for-machine-learning/
* https://machinelearningmastery.com/supervised-and-unsupervised-machine-learning-algorithms/
* https://pytorch.org/tutorials/beginner/examples_nn/two_layer_net_module.htm