# Learning PyTorch with Examples

https://pytorch.org/tutorials/beginner/pytorch_with_examples.html

In [2]:
import numpy as np
import math
import torch
import torchvision
import matplotlib.pyplot as plt

## Tensors

In [12]:
dtype =  torch.float
device  = torch.device("cpu")

x = torch.linspace(-math.pi, math.pi, 2000, device=device, dtype=dtype)
y = torch.sin(x)

a = torch.randn((), device=device, dtype=dtype)
b = torch.randn((), device=device, dtype=dtype)
c = torch.randn((), device=device, dtype=dtype)
d = torch.randn((), device=device, dtype=dtype)


learning_rate = 1e-6
for t in range(2000):
    
    #Prediction in the form of a cubic function
    y_pred = a + b * x + c * x ** 2 + d * x ** 3
    
    #Loss
    loss =  (y_pred - y).pow(2).sum().item()
    
    #print for every thousand iterations
    if t % 100 == 99:
        print("Iteration:", t, "\nloss:", loss)
        
    #Backprop to compute gradients with respect to loss, manually calculated
    grad_y_pred = 2.0 * (y_pred - y)
    grad_a = grad_y_pred.sum()
    grad_b = (grad_y_pred * x).sum()
    grad_c = (grad_y_pred * x ** 2).sum()
    grad_d = (grad_y_pred * x ** 3).sum()
    
    #Update weights with gradient descent
    a -= learning_rate * grad_a
    b -= learning_rate * grad_b
    c -= learning_rate * grad_c
    d -= learning_rate * grad_d

print(f'Result: y = {a.item()} + {b.item()}x + {c.item()} x^2 + {d.item()} x^3')
    

Iteration: 99 
loss: 997.5477905273438
Iteration: 199 
loss: 685.270263671875
Iteration: 299 
loss: 472.1541748046875
Iteration: 399 
loss: 326.544189453125
Iteration: 499 
loss: 226.942626953125
Iteration: 599 
loss: 158.73399353027344
Iteration: 699 
loss: 111.97013854980469
Iteration: 799 
loss: 79.87225341796875
Iteration: 899 
loss: 57.8158073425293
Iteration: 999 
loss: 42.64226531982422
Iteration: 1099 
loss: 32.19221496582031
Iteration: 1199 
loss: 24.98740005493164
Iteration: 1299 
loss: 20.014644622802734
Iteration: 1399 
loss: 16.57879066467285
Iteration: 1499 
loss: 14.202404022216797
Iteration: 1599 
loss: 12.557086944580078
Iteration: 1699 
loss: 11.416814804077148
Iteration: 1799 
loss: 10.625794410705566
Iteration: 1899 
loss: 10.07652759552002
Iteration: 1999 
loss: 9.69480037689209
Result: y = 0.02768823131918907 + 0.8432229161262512x + -0.004776681307703257 x^2 + -0.09140757471323013 x^3


## Observations

This is a simple algorithm that utilizes gradient descent to find an approximation to $sin(x)$ with a learned cubic polynomial $a + bx + cx^2 + dx^3$. Notably the mechanics of torch can be utilized to use CPU or GPU acceleration. The data for each of the Torch objects can also be specified.

This examples appears to be very similar to numpy except with GPU acceleration and better control of typing.

# Autograd

## Observations

Notice that notable changes include "requires_grad=True" a parameter add to the torch objects to  indicate if the tensor, object of torch, needs to have its gradient calculated. This parameter is captured as false by default.

The backpropogation of the loss is simply found using the following:

loss.backward()

This is effective because it uses auto-differentiation after calculating the gradients for a, b, c, and d.

A following example where a class for another function to differentiate can be made with "torch.autograd.Function" as a subclass of the created function and implementing forward and backward methods for auto-differentiation

## nn Module

In [13]:
# -*- coding: utf-8 -*-
import torch
import math


# Create Tensors to hold input and outputs.
x = torch.linspace(-math.pi, math.pi, 2000)
y = torch.sin(x)

# For this example, the output y is a linear function of (x, x^2, x^3), so
# we can consider it as a linear layer neural network. Let's prepare the
# tensor (x, x^2, x^3).
p = torch.tensor([1, 2, 3])
xx = x.unsqueeze(-1).pow(p)

# In the above code, x.unsqueeze(-1) has shape (2000, 1), and p has shape
# (3,), for this case, broadcasting semantics will apply to obtain a tensor
# of shape (2000, 3) 

# Use the nn package to define our model as a sequence of layers. nn.Sequential
# is a Module which contains other Modules, and applies them in sequence to
# produce its output. The Linear Module computes output from input using a
# linear function, and holds internal Tensors for its weight and bias.
# The Flatten layer flatens the output of the linear layer to a 1D tensor,
# to match the shape of `y`.
model = torch.nn.Sequential(
    torch.nn.Linear(3, 1),
    torch.nn.Flatten(0, 1)
)

# The nn package also contains definitions of popular loss functions; in this
# case we will use Mean Squared Error (MSE) as our loss function.
loss_fn = torch.nn.MSELoss(reduction='sum')

learning_rate = 1e-6
for t in range(2000):

    # Forward pass: compute predicted y by passing x to the model. Module objects
    # override the __call__ operator so you can call them like functions. When
    # doing so you pass a Tensor of input data to the Module and it produces
    # a Tensor of output data.
    y_pred = model(xx)

    # Compute and print loss. We pass Tensors containing the predicted and true
    # values of y, and the loss function returns a Tensor containing the
    # loss.
    loss = loss_fn(y_pred, y)
    if t % 100 == 99:
        print(t, loss.item())

    # Zero the gradients before running the backward pass.
    model.zero_grad()

    # Backward pass: compute gradient of the loss with respect to all the learnable
    # parameters of the model. Internally, the parameters of each Module are stored
    # in Tensors with requires_grad=True, so this call will compute gradients for
    # all learnable parameters in the model.
    loss.backward()

    # Update the weights using gradient descent. Each parameter is a Tensor, so
    # we can access its gradients like we did before.
    with torch.no_grad():
        for param in model.parameters():
            param -= learning_rate * param.grad

# You can access the first layer of `model` like accessing the first item of a list
linear_layer = model[0]

# For linear layer, its parameters are stored as `weight` and `bias`.
print(f'Result: y = {linear_layer.bias.item()} + {linear_layer.weight[:, 0].item()} x + {linear_layer.weight[:, 1].item()} x^2 + {linear_layer.weight[:, 2].item()} x^3')

99 1333.317138671875
199 891.6704711914062
299 597.5769653320312
399 401.6537170410156
499 271.07073974609375
599 183.9946746826172
699 125.90037536621094
799 87.12100219726562
899 61.22007369995117
999 43.91058349609375
1099 32.33544158935547
1199 24.589935302734375
1299 19.40355682373047
1399 15.92824935913086
1499 13.59780216217041
1599 12.033838272094727
1699 10.983478546142578
1799 10.277424812316895
1899 9.80244255065918
1999 9.48260498046875
Result: y = 0.015420329757034779 + 0.8360465168952942 x + -0.0026602635625749826 x^2 + -0.09038680046796799 x^3


## Observations

p = torch.tensor([1,2,3])

This represents the powers for the tensor xx, which is just a transformation of the data stored in x.

<p style="text-align: center;">xx = x.unsqueeze(-1).pow(p)</p>

The unsqueeze method is used to change the dimensions of the tensor that is being past. Without the unsqueeze method the data will resemble size [2000] rather than size [2000, 1]. This is a necessary transformation because of the application of p.

<p style="text-align: center;">model = torch.nn.Sequential(<br>
    torch.nn.Linear(3, 1),<br>
    torch.nn.Flatten(0, 1)<br>
)</p>

The previous code describes the construction of a model with 3 inputs parameters(x^1, x^2, x^3) and one output. "torch.nn.Sequential" is a container that applies operations in the order that it is past to the constructor. The Linear layer describes a single hidden layer that is takes in the input parameters and outputs a tensor for the approximator. The "torch.nn.Flatten" command is utilized to create a value that matches the shape of the true solution, y.

<p style="text-align: center;">loss_fn = torch.nn.MSELoss(reduction='sum')<br>
    loss = loss_fn(y_pred, y)</p>

The previous section of code describes the computation of the loss function using the predicted data and the true solution, y.

This section of code represents changes iteratively to weights utilizing the gradient:
<p style="text-align: center;">with torch.no_grad():<br>
        for param in model.parameters():<br>
            param -= learning_rate * param.grad</p>