# Learning PyTorch with Examples

https://pytorch.org/tutorials/beginner/pytorch_with_examples.html

In [6]:
import numpy as np
import math
import torch
import torchvision
import matplotlib.pyplot as plt

## Tensors

In [14]:
dtype =  torch.float
device  = torch.device("cpu")

x = torch.linspace(-math.pi, math.pi, 2000, device=device, dtype=dtype)
y = torch.sin(x)

a = torch.randn((), device=device, dtype=dtype)
b = torch.randn((), device=device, dtype=dtype)
c = torch.randn((), device=device, dtype=dtype)
d = torch.randn((), device=device, dtype=dtype)


learning_rate = 1e-6
for t in range(2000):
    
    #Prediction in the form of a cubic function
    y_pred = a + b * x + c * x ** 2 + d * x ** 3
    
    #Loss
    loss =  (y_pred - y).pow(2).sum().item()
    
    #print for every thousand iterations
    if t % 100 == 99:
        print("Iteration:", t, "\nloss:", loss)
        
    #Backprop to compute gradients with respect to loss, manually calculated
    grad_y_pred = 2.0 * (y_pred - y)
    grad_a = grad_y_pred.sum()
    grad_b = (grad_y_pred * x).sum()
    grad_c = (grad_y_pred * x ** 2).sum()
    grad_d = (grad_y_pred * x ** 3).sum()
    
    #Update weights with gradient descent
    a -= learning_rate * grad_a
    b -= learning_rate * grad_b
    c -= learning_rate * grad_c
    d -= learning_rate * grad_d

print(f'Result: y = {a.item()} + {b.item()}x + {c.item()} x^2 + {d.item()} x^3')
    

Iteration: 99 
loss: 782.5994873046875
Iteration: 199 
loss: 547.6012573242188
Iteration: 299 
loss: 384.28106689453125
Iteration: 399 
loss: 270.67474365234375
Iteration: 499 
loss: 191.58172607421875
Iteration: 599 
loss: 136.4716033935547
Iteration: 699 
loss: 98.04182434082031
Iteration: 799 
loss: 71.22300720214844
Iteration: 899 
loss: 52.49333190917969
Iteration: 999 
loss: 39.40369415283203
Iteration: 1099 
loss: 30.249523162841797
Iteration: 1199 
loss: 23.843446731567383
Iteration: 1299 
loss: 19.357749938964844
Iteration: 1399 
loss: 16.214862823486328
Iteration: 1499 
loss: 14.011587142944336
Iteration: 1599 
loss: 12.466182708740234
Iteration: 1699 
loss: 11.381658554077148
Iteration: 1799 
loss: 10.620211601257324
Iteration: 1899 
loss: 10.085333824157715
Iteration: 1999 
loss: 9.709456443786621
Result: y = -0.030294794589281082 + 0.848427414894104x + 0.005226354114711285 x^2 + -0.09214787185192108 x^3


## Observations

This is a simple algorithm that utilizes gradient descent to find an approximation to $sin(x)$ with a learned cubic polynomial $a + bx + cx^2 + dx^3$. Notably the mechanics of torch can be utilized to use CPU or GPU acceleration. The data for each of the Torch objects can also be specified.

This examples appears to be very similar to numpy except with GPU acceleration and better control of typing.

# Autograd

In [17]:
dtype =  torch.float
device  = torch.device("cpu")

x = torch.linspace(-math.pi, math.pi, 2000, device=device, dtype=dtype)
y = torch.sin(x)

#Creates size zero tensors, scalars
a = torch.randn((), device=device, dtype=dtype, requires_grad=True)
b = torch.randn((), device=device, dtype=dtype, requires_grad=True)
c = torch.randn((), device=device, dtype=dtype, requires_grad=True)
d = torch.randn((), device=device, dtype=dtype, requires_grad=True)


learning_rate = 1e-6
for t in range(2000):
    
    #Prediction in the form of a cubic function, tensor
    y_pred = a + b * x + c * x ** 2 + d * x ** 3
    
    #Loss, tensor
    loss =  (y_pred - y).pow(2).sum()
    
    #print for every thousand iterations
    if t % 100 == 99:
        print("Iteration:", t, "\nloss:", loss.item())
        
    #Backprop
    loss.backward()
    
    #Update weights with gradient descent, weights are updated manually
    #tracking of the gradient doesn't need to be found here
    with torch.no_grad():
        a -= learning_rate * grad_a
        b -= learning_rate * grad_b
        c -= learning_rate * grad_c
        d -= learning_rate * grad_d
        
        #Set the gradients to zero after updating the weights
        a.grad = None
        b.grad = None
        c.grad = None
        d.grad = None

print(f'Result: y = {a.item()} + {b.item()}x + {c.item()} x^2 + {d.item()} x^3')
    

Iteration: 99 
loss: 13373.6513671875
Iteration: 199 
loss: 13377.8515625
Iteration: 299 
loss: 13382.1064453125
Iteration: 399 
loss: 13386.4169921875
Iteration: 499 
loss: 13390.7822265625
Iteration: 599 
loss: 13395.203125
Iteration: 699 
loss: 13399.6787109375
Iteration: 799 
loss: 13404.2109375
Iteration: 899 
loss: 13408.794921875
Iteration: 999 
loss: 13413.4375
Iteration: 1099 
loss: 13418.1357421875
Iteration: 1199 
loss: 13422.8876953125
Iteration: 1299 
loss: 13427.6953125
Iteration: 1399 
loss: 13432.5576171875
Iteration: 1499 
loss: 13437.4755859375
Iteration: 1599 
loss: 13442.44921875
Iteration: 1699 
loss: 13447.4765625
Iteration: 1799 
loss: 13452.560546875
Iteration: 1899 
loss: 13457.697265625
Iteration: 1999 
loss: 13462.8720703125
Result: y = 0.9407525062561035 + 0.027000483125448227x + 0.2496086061000824 x^2 + -0.11617597937583923 x^3


## Observations

Notice that notable changes include "requires_grad=True" a parameter add to the torch objects to  indicate if the tensor, object of torch, needs to have its gradient calculated. This parameter is captured as false by default.

The backpropogation of the loss is simply found using the following:

loss.backward()

This is effective because it uses auto-differentiation after calculating the gradients for a, b, c, and d.

A following example where a class for another function to differentiate can be made with "torch.autograd.Function" as a subclass of the created function and implementing forward and backward methods for auto-differentiation

## nn Module

In [28]:
# -*- coding: utf-8 -*-
import torch
import math


# Create Tensors to hold input and outputs.
x = torch.linspace(-math.pi, math.pi, 2000)
y = torch.sin(x)

# For this example, the output y is a linear function of (x, x^2, x^3), so
# we can consider it as a linear layer neural network. Let's prepare the
# tensor (x, x^2, x^3).
p = torch.tensor([1, 2, 3])
xx = x.unsqueeze(-1).pow(p)

# In the above code, x.unsqueeze(-1) has shape (2000, 1), and p has shape
# (3,), for this case, broadcasting semantics will apply to obtain a tensor
# of shape (2000, 3) 

# Use the nn package to define our model as a sequence of layers. nn.Sequential
# is a Module which contains other Modules, and applies them in sequence to
# produce its output. The Linear Module computes output from input using a
# linear function, and holds internal Tensors for its weight and bias.
# The Flatten layer flatens the output of the linear layer to a 1D tensor,
# to match the shape of `y`.
model = torch.nn.Sequential(
    torch.nn.Linear(3, 1),
    torch.nn.Flatten(0, 1)
)

# The nn package also contains definitions of popular loss functions; in this
# case we will use Mean Squared Error (MSE) as our loss function.
loss_fn = torch.nn.MSELoss(reduction='sum')

learning_rate = 1e-6
for t in range(2000):

    # Forward pass: compute predicted y by passing x to the model. Module objects
    # override the __call__ operator so you can call them like functions. When
    # doing so you pass a Tensor of input data to the Module and it produces
    # a Tensor of output data.
    y_pred = model(xx)

    # Compute and print loss. We pass Tensors containing the predicted and true
    # values of y, and the loss function returns a Tensor containing the
    # loss.
    loss = loss_fn(y_pred, y)
    if t % 100 == 99:
        print(t, loss.item())

    # Zero the gradients before running the backward pass.
    model.zero_grad()

    # Backward pass: compute gradient of the loss with respect to all the learnable
    # parameters of the model. Internally, the parameters of each Module are stored
    # in Tensors with requires_grad=True, so this call will compute gradients for
    # all learnable parameters in the model.
    loss.backward()

    # Update the weights using gradient descent. Each parameter is a Tensor, so
    # we can access its gradients like we did before.
    with torch.no_grad():
        for param in model.parameters():
            param -= learning_rate * param.grad

# You can access the first layer of `model` like accessing the first item of a list
linear_layer = model[0]

# For linear layer, its parameters are stored as `weight` and `bias`.
print(f'Result: y = {linear_layer.bias.item()} + {linear_layer.weight[:, 0].item()} x + {linear_layer.weight[:, 1].item()} x^2 + {linear_layer.weight[:, 2].item()} x^3')

99 265.669921875
199 178.87326049804688
299 121.41695404052734
399 83.38004302978516
499 58.19725799560547
599 41.523109436035156
699 30.481714248657227
799 23.169546127319336
899 18.326589584350586
999 15.118606567382812
1099 12.993467330932617
1199 11.585432052612305
1299 10.652438163757324
1399 10.03408145904541
1499 9.624232292175293
1599 9.352520942687988
1699 9.172361373901367
1799 9.052884101867676
1899 8.97361946105957
1999 8.921043395996094
Result: y = 0.0026803219225257635 + 0.8471440672874451 x + -0.0004624009889084846 x^2 + -0.09196533262729645 x^3


## Observations

p = torch.tensor([1,2,3])

This represents the powers for the tensor xx, which is just a transformation of the data stored in x.

<p style="text-align: center;">xx = x.unsqueeze(-1).pow(p)</p>

The unsqueeze method is used to change the dimensions of the tensor that is being past. Without the unsqueeze method the data will resemble size [2000] rather than size [2000, 1]. This is a necessary transformation because of the application of p.

<p style="text-align: center;">model = torch.nn.Sequential(<br>
    torch.nn.Linear(3, 1),<br>
    torch.nn.Flatten(0, 1)<br>
)</p>

The previous code describes the construction of a model with 3 inputs parameters(x^1, x^2, x^3) and one output. "torch.nn.Sequential" is a container that applies operations in the order that it is past to the constructor. The Linear layer describes a single hidden layer that is takes in the input parameters and outputs a tensor for the approximator. The "torch.nn.Flatten" command is utilized to create a value that matches the shape of the true solution, y.

<p style="text-align: center;">loss_fn = torch.nn.MSELoss(reduction='sum')<br>
    loss = loss_fn(y_pred, y)</p>

The previous section of code describes the computation of the loss function using the predicted data and the true solution, y.

This section of code represents changes iteratively to weights utilizing the gradient:
<p style="text-align: center;">with torch.no_grad():<br>
        for param in model.parameters():<br>
            param -= learning_rate * param.grad</p>