<a href="https://colab.research.google.com/github/BedinEduardo/Colab_Repositories/blob/master/Learning_PyTorch_With_Examples.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Learning PyTorch with Examples

This tutorial introduces the fundamental concepts of **PyTorch** trhough self-contained examples.

PyTorch:

* An- *n-dimensional* Tensor, similar Numpy but can run on GPUs
* Automatic differentiation for buildinf and training NN

## Tensors

Numpy provides an n-dimensional array object, and many functions for manipulatiing these arrays.

In [None]:
#
import numpy as np
import math

# build random input and output data
x = np.linspace(-math.pi, math.pi, 20000)

y = np.sin(x)

print(x)
print(y)

[-3.14159265 -3.14127848 -3.1409643  ...  3.1409643   3.14127848
  3.14159265]
[-1.22464680e-16 -3.14174969e-04 -6.28349907e-04 ...  6.28349907e-04
  3.14174969e-04  1.22464680e-16]


In [None]:
# random initialize weights
a = np.random.randn()
b = np.random.randn()
c = np.random.randn()
d = np.random.randn()

print(a)
print(b)
print(c)
print(d)

-0.8146623655005207
-1.6856753979805996
-0.6123274389539223
1.767770404568969


In [None]:
learning_rate = 0.000001
for t in range(2000):
  # forward pass: compute predicted y
  # y = a + b x + c^2 + d x^3
  y_pred = a + b*x + c*x**2 + d*x**3

  # compute and print loss
  loss = np.square(y_pred - y).sum()
  if t % 100 == 99:
    print(t, loss)

  # backprop to compute gradients of a, b, c, d with respect to loss
  grad_y_pred = 2.0 * (y_pred -y)
  grad_a = grad_y_pred.sum()
  grad_b = (grad_y_pred * x).sum()
  grad_c = (grad_y_pred * x ** 2).sum()
  grad_d = (grad_y_pred * x ** 3).sum()

  # update weights
  a -= learning_rate * grad_a
  b -= learning_rate * grad_b
  c -= learning_rate * grad_c
  d -= learning_rate * grad_d

print(f"Result: y = {a} + {b} + {c}X^2 + {d} X^3")

99 1.3607234142327261e+138
199 6.362311075775465e+270


  return umr_sum(a, axis, dtype, out, keepdims, initial, where)
  loss = np.square(y_pred - y).sum()


299 inf
399 inf


  return umr_sum(a, axis, dtype, out, keepdims, initial, where)
  d -= learning_rate * grad_d


499 nan
599 nan
699 nan
799 nan
899 nan
999 nan
1099 nan
1199 nan
1299 nan
1399 nan
1499 nan
1599 nan
1699 nan
1799 nan
1899 nan
1999 nan
Result: y = nan + nan + nanX^2 + nan X^3


## Pytorch: Tensors

Numpy is a great framework, but can not utilize GPU to accelerate its numerical computations.
GPU spedups 50x or greater

PyTorch concept: **Tensor**

A PyTorch Tensor is conceptually identical to a numpy array: a Tensor is an n-dimensional array, and PyTorch provides many functions for operating on these Tensors.

Run the Tensor on GPU, should specify the correct device.

In [None]:
import torch
import math

dtype = torch.float
device = torch.device("cpu")

# Build random input and data output
x = torch.linspace(-math.pi, math.pi, 2000, device=device, dtype=dtype)
y = torch.sin(x)

# randomly initialize weights
a = torch.randn((), device=device, dtype=dtype)
b = torch.randn((), device=device, dtype=dtype)
c = torch.randn((), device=device, dtype=dtype)
d = torch.randn((), device=device, dtype=dtype)

learning_rate = 0.001

for t in range(2000):
  # forward pass: compute predicted y
  y_pred = a + b*x + c*x**2 + d*x**3

  # compute and print loss
  loss = (y_pred - y).pow(2).sum().item()
  if t % 100 == 99:
    print(t, loss)

  # Backprop to compute gradients of a,b,c,d withrespect to loss
  grad_y_pred = 2.0 * (y_pred - y)
  grad_a = grad_y_pred.sum()
  grad_b = (grad_y_pred * x).sum()
  grad_c = (grad_y_pred * x ** 2).sum()
  grad_d = (grad_y_pred * x**3).sum()

  # update weights using gradient descent
  a -= learning_rate * grad_a
  b -= learning_rate * grad_b
  c -= learning_rate * grad_c
  d -= learning_rate * grad_d

print(f"Result = y {a.item()} + {b.item()}x + {c.item()}X^2 + {d.item()} X^3")

99 nan
199 nan
299 nan
399 nan
499 nan
599 nan
699 nan
799 nan
899 nan
999 nan
1099 nan
1199 nan
1299 nan
1399 nan
1499 nan
1599 nan
1699 nan
1799 nan
1899 nan
1999 nan
Result = y nan + nanx + nanX^2 + nan X^3


## Autograd

Manually implementing the backward pass is not a big deal for a small two-layer network, but can quickly get very hairy for large complex networks.

**automatic differentiation**
The **autograd** package in PyTorch provides exactly this functionality.
When using autograd, the forward pass of your network will define a **computational graph**; nodes in the graph will be Tensors, and edges will be functions that produce output Tensors from input Tensor.

Now we no longer need to manually implement the backward pass through the networks.

In [None]:
import torch
import math

dtype = torch.float
device = torch.accelerator.current_accelerator().type if torch.accelerator.is_available() else "cpu"
print(f"Using {device} device")
torch.set_default_device(device)

Using cpu device


In [None]:
# building random tensors
a = torch.randn((), dtype=dtype, requires_grad=True)
b = torch.randn((), dtype=dtype, requires_grad=True)
c = torch.randn((), dtype=dtype, requires_grad=True)
d = torch.randn((), dtype=dtype, requires_grad=True)

In [None]:
learning_rate = 1e-6
for t in range(2000):
    # Forward pass: compute predicted y using operations on Tensors.
    y_pred = a + b * x + c * x ** 2 + d * x ** 3

    # Compute and print loss using operations on Tensors.
    # Now loss is a Tensor of shape (1,)
    # loss.item() gets the scalar value held in the loss.
    loss = (y_pred - y).pow(2).sum()
    if t % 100 == 99:
        print(t, loss.item())

    # Use autograd to compute the backward pass. This call will compute the
    # gradient of loss with respect to all Tensors with requires_grad=True.
    # After this call a.grad, b.grad. c.grad and d.grad will be Tensors holding
    # the gradient of the loss with respect to a, b, c, d respectively.
    loss.backward()

    # Manually update weights using gradient descent. Wrap in torch.no_grad()
    # because weights have requires_grad=True, but we don't need to track this
    # in autograd.
    with torch.no_grad():
        a -= learning_rate * a.grad
        b -= learning_rate * b.grad
        c -= learning_rate * c.grad
        d -= learning_rate * d.grad

        # Manually zero the gradients after updating weights
        a.grad = None
        b.grad = None
        c.grad = None
        d.grad = None

print(f"Result: y = {a.item()} + {b.item()}x + {c.item()}x^2 + {d.item()} x^3")

99 2763.3330078125
199 1852.7413330078125
299 1244.0357666015625
399 836.8740844726562
499 564.3440551757812
599 381.8023376464844
699 259.44683837890625
799 177.37184143066406
899 122.27376556396484
999 85.25555419921875
1099 60.36354446411133
1199 43.610862731933594
1299 32.32604217529297
1399 24.71729850769043
1499 19.582271575927734
1599 16.113296508789062
1699 13.767494201660156
1799 12.179561614990234
1899 11.103515625
1999 10.373579978942871
Result: y = 0.027593832463026047 + 0.8279563188552856x + -0.004760395735502243x^2 + -0.08923603594303131 x^3


## PyTorch: Defining new autograd functions

Under the hood, each primitive in autograd operator is really two functions that operates on Tensors.
The **forward** function computes ouput Tensors from input Tensors.
The **backward** functiono receives the gradient of the output Tensors with respect to some scalar value, and computes the gradient of the input Tensor wiht respect to thar same scalar value.

In PyTorch define our own autograd operator by defining a subclass of `torch.autograd.Function` and implementing the `forward` and `backward` functions.
We can then use our new autograd operator by constructing an instance and calling it like a function, passing Tensors containing input data.

In [None]:
import torch
import math

class LegendPolynomial3(torch.autograd.Function):
  """
  We can implement our own custom autograd Functions by subclassing
  torch.autograd.Function and implementing the forward and backward passes
  which operate on Tensors.
  """

  @staticmethod
  def forward(ctx, input):
    """
    In the forward pass we receive a Tensor containing the input and return
    a Tensor containing the output. ctx is a context object that can be used
    to stash information for backward computation. You can cache tensors for
    use in the backward pass using the ``ctx.save_for_backward`` method. Other
    objects can be stored directly as attributes on the ctx object, such as
    ``ctx.my_object = my_object``. Check out `Extending torch.autograd <https://docs.pytorch.org/docs/stable/notes/extending.html#extending-torch-autograd>`_
    for further details.
    """
    ctx.save_for_backward(input)
    return 0.5 * (5 * input ** 3 - 3 * input)

  @staticmethod
  def backward(ctx, grad_output):
    """
    In the backward pass we receive a Tensor containing the gradient of the loss
    with respect to the output, and we need to compute the gradient of the loss
    with respect to the input.
    """

    input, = ctx.saved_tensors
    return grad_output * 1.5 * (5 * input ** 2 -1)

In [None]:
dtype = torch.float
device = torch.device("cpu")

# build tensors to hold input and outputs
# By default, requires_grad=False, which indicates that we do not need to
# compute gradients with respect to theese Tensors during the backward pass.

x = torch.linspace(-math.pi, math.pi, 2000, device=device, dtype=dtype)
y = torch.sin(x)

# Build random Tensors for weights
# needs to be initialized
# not too far from the correct result to ensure convergence
# Setting requires_grad = True indicates that we want to compute gradients with
# respect to these Tensors during the backward pass.
a = torch.full((), 0.0, device=device, dtype=dtype, requires_grad=True)
b = torch.full((), -1.0, device=device, dtype=dtype, requires_grad=True)
c = torch.full((), 0.0, device=device, dtype=dtype, requires_grad=True)
d = torch.full((), 0.3, device=device, dtype=dtype, requires_grad=True)

learning_rate = 0.001
for t in range(2000):
  # To apply our Function, we use Function.apply method. We alias this as 'P3'
  P3 = LegendPolynomial3.apply
  # forward pass
  # P3 using our custom autograd operation
  y_pred = a + b * P3(c + d * x)

  # Compute and print loss
  loss = (y_pred - y).pow(2).sum()  # only sum
  if t % 100 == 99:
      print(t, loss.item())

  # Use autograd to compute the backward pass.
  loss.backward()

  # Update weights using gradient descent
  with torch.no_grad():
      a -= learning_rate * a.grad
      b -= learning_rate * b.grad
      c -= learning_rate * c.grad
      d -= learning_rate * d.grad

  # Manually zero the gradients after updating weights.
  a.grad = None
  b.grad = None
  c.grad = None
  d.grad = None

print(f'Result: y = {a.item()} + {b.item()} * P3({c.item()} + {d.item()} x)')



99 nan
199 nan
299 nan
399 nan
499 nan
599 nan
699 nan
799 nan
899 nan
999 nan
1099 nan
1199 nan
1299 nan
1399 nan
1499 nan
1599 nan
1699 nan
1799 nan
1899 nan
1999 nan
Result: y = nan + nan * P3(nan + nan x)


## `nn` module

## PyTorch: `nn`

Cmputational graphs and autograd are very powerful paradigm for defining complex operators and automatically taking derivates; however for large NN raw autograd can be a bit too low-level.

Build NN, frequently arranging the computation into **layers**, some of which have **learnable parameters** which will be optimized during learning.

`nn` package server to higher-level abstractiion over raw computational graphs that are useful for building NN.

A set of **Modules**, which are roughly equivalent to NN.
A module receives input Tensors and computes output Tensors, but may also hold internal state such as Tensors containing learnable parameters.
The `nn` package also defines a set of useful loss functions that are commonly used when training NN.



In [None]:
import torch
import math

# build tensors to hold input and outputs
x = torch.linspace(-math.pi, math.pi, 2000)
y = torch.sin(x)

# Lienar layeer NN
p = torch.tensor([1,2,3])
xx = x.unsqueeze(-1).pow(p)


In [None]:
print(p)
print(xx)

tensor([1, 2, 3])
tensor([[ -3.1416,   9.8696, -31.0063],
        [ -3.1384,   9.8499, -30.9133],
        [ -3.1353,   9.8301, -30.8205],
        ...,
        [  3.1353,   9.8301,  30.8205],
        [  3.1384,   9.8499,  30.9133],
        [  3.1416,   9.8696,  31.0063]])


In [None]:
print(p.shape)
print(xx.shape)

torch.Size([3])
torch.Size([2000, 3])


In [None]:
# use the nn package to define our model as a sequence of layers
# nn.Sequential is a Module that contains other Modules, and applies them in sequende to produce the output
# The Linear Module computes output from input using a linear function
# and holds internal Tensors for its weights and bias.
# The Flatten layer flattens the output of the linear layer to a 1D tensor.
# to match the shape of `y`
model = torch.nn.Sequential(
    torch.nn.Linear(3,1),
    torch.nn.Flatten(0,1)
)

In [None]:
print(model)

Sequential(
  (0): Linear(in_features=3, out_features=1, bias=True)
  (1): Flatten(start_dim=0, end_dim=1)
)


In [None]:
loss_fn = torch.nn.MSELoss(reduction="sum")

In [None]:
learning_rate = 0.001
for t in range(2000):
  # forward pass
  y_pred = model(xx)  #trought the data into model

  loss = loss_fn(y_pred, y)  # calculate the loss

  # zero the gradients before running the backward pass
  model.zero_grad()

  # backward pass: compute gradient of the loss wiht respect to all learnable parameters
  # in Tensors with requires_grad=True, so this call will compute gradients for all learnable parameters
  loss.backward()

  # Update the weights using gradient descent. Each parameter is a Tensor - can be updated by `optim`
  with torch.no_grad():
    for param in model.parameters():
      param -= learning_rate * param.grad

# You can access the firs layer of the "model" like accessing the 1st item of a list
linear_layer = model[0]

print(f"Result: y = {linear_layer.bias.item()} + {linear_layer.weight[:,0].item()} x + {linear_layer.weight[:,1].item()}X^2 + {linear_layer.weight[:,2].item()}X^3")

Result: y = nan + nan x + nanX^2 + nanX^3


## PyTorch: optim

Up this point we have updated the weights of our models by manually mutating the Tensors holding learnable parameters with `torch.no_grad()`.

When often train NN using more sophisticated optimizers like `AdaGrad`, `RMSProp`, `Adam`

The `optim` package in PyTorch abstracts the ideas of an optimization algorithm and provides implementations of commonly used optimization algorithms.

In [None]:
import torch
import math


# Create Tensors to hold input and outputs.
x = torch.linspace(-math.pi, math.pi, 2000)
y = torch.sin(x)

# Prepare the input tensor (x, x^2, x^3).
p = torch.tensor([1, 2, 3])
xx = x.unsqueeze(-1).pow(p)

# Use the nn package to define our model and loss function.
model = torch.nn.Sequential(
    torch.nn.Linear(3, 1),
    torch.nn.Flatten(0, 1)
)
loss_fn = torch.nn.MSELoss(reduction='sum')

# Use the optim package to define an Optimizer that will update the weights of
# the model for us. Here we will use RMSprop; the optim package contains many other
# optimization algorithms. The first argument to the RMSprop constructor tells the
# optimizer which Tensors it should update.
learning_rate = 1e-3
optimizer = torch.optim.RMSprop(model.parameters(), lr=learning_rate)

for t in range(2000):
  y_pred = model(xx)

  loss = loss_fn(y_pred, y)
  if t % 100 == 99:
    print(t, loss.item())

  optimizer.zero_grad()

  loss.backward() # backpropagation

  # Calling the step function on an optimizer makes an update to its parameters
  optimizer.step()

linear_layer = model[0]

print(f'Result: y = {linear_layer.bias.item()} + {linear_layer.weight[:, 0].item()} x + {linear_layer.weight[:, 1].item()} x^2 + {linear_layer.weight[:, 2].item()} x^3')

99 2615.024169921875
199 915.942138671875
299 706.7383422851562
399 627.2535400390625
499 538.1360473632812
599 439.22088623046875
699 340.0784912109375
799 250.116455078125
899 174.33290100097656
999 114.22319030761719
1099 69.51390838623047
1199 39.00996398925781
1299 20.828262329101562
1399 12.626038551330566
1499 9.353293418884277
1599 9.099508285522461
1699 8.966208457946777
1799 8.89753532409668
1899 8.89608097076416
1999 8.930171012878418
Result: y = -0.0006351800402626395 + 0.8562504053115845 x + -0.0006351760821416974 x^2 + -0.09384316205978394 x^3


## PyTorch: Custom `nn` Modules

Specify models that are more complex than a sequence of existing Modules.

Can subclassing `nn.Module` and defining `forward` which receives input Tensors and produces output Tensors using other modules or other autograd operations on Tensors.

In [None]:
import torch
import math

class Polynomial3(torch.nn.Module):
  def __init__(self):
    """
      In this constructor we instantiate four parameters and assingn them as member parameters
    """

    super().__init__()
    self.a = torch.nn.Parameter(torch.randn(()))
    self.b = torch.nn.Parameter(torch.randn(()))
    self.c = torch.nn.Parameter(torch.randn(()))
    self.d = torch.nn.Parameter(torch.randn(()))

  def forward(self, x):
    """
    In the forward function we accept a Tensor of input data and we must return
    a Tensor of output data. We can use Modules defined in the constructor as
    well as arbitrary operators on Tensors.
    """

    return self.a + self.b * x + self.c * x **2 + self.d*x**2

  def string(self):
    """
    Just like any class in Python, you can also define custom method on PyTorch modules
    """
    return f'y = {self.a.item()} + {self.b.item()} x + {self.c.item()} x^2 + {self.d.item()} x^3'


In [None]:
# Build Tensors to hold input and outputs
x = torch.linspace(-math.pi, math.pi, 2000)
y = torch.sin(x)

In [None]:
# Construct our model by instantiating the class defined above
model = Polynomial3()

In [None]:
criterion = torch.nn.MSELoss(reduction='sum')
optimizer = torch.optim.SGD(model.parameters(), lr=0.001)

In [None]:
for t in range(2000):
  # Forward pass: Compute predicted y by passing x to the model
  y_pred = model(x)

  # compute and print loss
  loss = criterion(y_pred,y)
  if t % 100 ==99 :
    print(t, loss.item())

  #Zero gradients, perform a backward passs, and update the weights
  optimizer.zero_grad()
  loss.backward()
  optimizer.step()

print(f'Result: {model.string()}')


99 nan
199 nan
299 nan
399 nan
499 nan
599 nan
699 nan
799 nan
899 nan
999 nan
1099 nan
1199 nan
1299 nan
1399 nan
1499 nan
1599 nan
1699 nan
1799 nan
1899 nan
1999 nan
Result: y = nan + nan x + nan x^2 + nan x^3


## PyTorch: Control Flow + Weight Sharing

Example of dynamic graphs and weight sharing, we implement a very strange model: a third-fifth order polynomial that on each forward pass chooses a random number between 3 and 5 and uses that many orders.
Reusing the same weights multiple times to compute the fourth and fifth order.

Python flow control to implement the loop, and we can implement weight sharing by simply reusing the same parameter multiple times when defining the forward pass.

In [None]:
import random
import torch
import math

class DynamicNet(torch.nn.Module):
  def __init__(self):
    """
    In the constructor we instantiate five parameters and assign them as members.
    """
    super().__init__()
    self.a = torch.nn.Parameter(torch.randn(()))
    self.b = torch.nn.Parameter(torch.randn(()))
    self.c = torch.nn.Parameter(torch.randn(()))
    self.d = torch.nn.Parameter(torch.randn(()))
    self.e = torch.nn.Parameter(torch.randn(()))

  def forward(self, x):
    """
    For the forward pass of the model, we randomly choose either 4, 5
    and reuse the e parameter to compute the contribution of these orders.

    Since each forward pass builds a dynamic computation graph, we can use normal
    Python control-flow operators like loops or conditional statements when
    defining the forward pass of the model.

    Here we also see that it is perfectly safe to reuse the same parameter many
    times when defining a computational graph.
    """

    y = self.a + self.b * x + self.c * x **2 + self.d * x ** 3
    for exp in range(4, random.randint(4,6)):
      y = y + self.e * x ** exp
    return y

  def string(self):
    """
    Just like any class in Python, you can also define custom method on PyTorch modules
    """
    return f'y = {self.a.item()} + {self.b.item()} x + {self.c.item()} x^2 + {self.d.item()} x^3 + {self.e.item()} x^4 ? + {self.e.item()} x^5 ?'

# Create Tensors to hold input and outputs.
x = torch.linspace(-math.pi, math.pi, 2000)
y = torch.sin(x)

# Construct our model by instantiating the class defined above
model = DynamicNet()

# Construct our loss function and an Optimizer. Training this strange model with
# vanilla stochastic gradient descent is tough, so we use momentum
criterion = torch.nn.MSELoss(reduction='sum')
optimizer = torch.optim.SGD(model.parameters(), lr=1e-8, momentum=0.9)
for t in range(30000):
    # Forward pass: Compute predicted y by passing x to the model
    y_pred = model(x)

    # Compute and print loss
    loss = criterion(y_pred, y)
    if t % 2000 == 1999:
        print(t, loss.item())

    # Zero gradients, perform a backward pass, and update the weights.
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

print(f'Result: {model.string()}')

1999 1971.2418212890625
3999 872.4619140625
5999 457.234619140625
7999 224.2855987548828
9999 111.95813751220703
11999 58.53325653076172
13999 32.564537048339844
15999 20.350635528564453
17999 14.531449317932129
19999 11.725642204284668
21999 10.215021133422852
23999 9.4893798828125
25999 9.14384937286377
27999 9.002721786499023
29999 8.912214279174805
Result: y = -0.00845723133534193 + 0.8542687296867371 x + 0.0009974547429010272 x^2 + -0.09323780238628387 x^3 + 9.15713535505347e-05 x^4 ? + 9.15713535505347e-05 x^5 ?
