In [None]:
import torch
import numpy as np
import random

### Tensors

Tensors are similar to NumPy’s ndarrays, with the addition being that
Tensors can also be used on a GPU to accelerate computing.



Construct a 5x3 matrix, uninitialized:



In [None]:
x = torch.empty(5, 3)
print(x)

Construct a randomly initialized matrix:



In [None]:
x = torch.rand(5, 3)
print(x)

Construct a matrix filled zeros and of dtype long:



In [None]:
x = torch.zeros(5, 3, dtype=torch.long)
print(x)

Construct a tensor directly from data:



In [None]:
x = torch.tensor([5.5, 3])
print(x)

or create a tensor based on an existing tensor. These methods
will reuse properties of the input tensor, e.g. dtype, unless
new values are provided by user



In [None]:
x = x.new_ones(5, 3, dtype=torch.double)      # new_* methods take in sizes
print(x)

x = torch.randn_like(x, dtype=torch.float)    # override dtype!
print(x)                                      # result has the same size

Get its size:



In [None]:
print(x.size())
print(x.shape)

In [None]:
y = torch.rand(5, 3)
print(x + y)

Addition: syntax 2



In [None]:
print(torch.add(x, y))

Addition: providing an output tensor as argument



In [None]:
result = torch.empty(5, 3)
torch.add(x, y, out=result)
print(result)

Addition: in-place



In [None]:
# adds x to y
y.add_(x)
print(y)

In [None]:
print(x[:, 1])

Resizing: If you want to resize/reshape tensor, you can use ``torch.view``:



In [None]:
import torch
x = torch.randn(4, 4)
y = x.view(16)
z = x.view(-1, 4)  # the size -1 is inferred from other dimensions
print(x.size(), y.size(), z.size())

If you have a one element tensor, use ``.item()`` to get the value as a
Python number



In [None]:
x = torch.randn(1)
print(x)
print(x.item())


### NumPy Bridge


Converting a Torch Tensor to a NumPy array and vice versa is a breeze.

The Torch Tensor and NumPy array will share their underlying memory
locations, and changing one will change the other.

Converting a Torch Tensor to a NumPy Array

In [None]:
a = torch.ones(5)
print(a)

In [None]:
b = a.numpy()
print(b)

See how the numpy array changed in value.



In [None]:
a.add_(1)
print(a)
print(b)

## Converting NumPy Array to Torch Tensor

See how changing the np array changed the Torch Tensor automatically



In [None]:
import numpy as np
a = np.ones(5)
b = torch.from_numpy(a)
np.add(a, 1, out=a)
print(a)
print(b)

# Autograd

When you do operations on Tensors, PyTorch can keep track of the computation graph in order to be able to backpropagate. To tell PyTorch to record operations performed on a tensor, each tensor has a function called requires_grad_.

If there’s at least one input to an operation that requires gradient, its output will also require gradient. Conversely, only if all inputs don’t require gradient, the output also won’t require it. Backward computation is never performed in the subgraphs, where all Tensors didn’t require gradients.

Inplace operations are non-differentiable. That is why x.zero_() gives an error if x requires gradient computation.

For a tensor x, the underlying data is stored in a tensor that is accessible via x.data. If you do an operation on x.data PyTorch does not add the operation to the computation graph.

In [None]:
A = torch.randint(10, (1,2), dtype=torch.float)
print("A : ", A)

In [None]:
print("A.requires_grad :", A.requires_grad)

In [None]:
A.requires_grad_(True)
print("A.requires_grad :", A.requires_grad)

A.requires_grad_(False)
print("A.requires_grad :", A.requires_grad)

In [None]:
X = torch.Tensor([1, 2, 3]).requires_grad_(True)
Y = torch.Tensor([5, 6, 7]).requires_grad_(True)

f = torch.sin(torch.dot(X,Y))
print("f =", f)

How do we get the partial derivatives of $f$ w.r.t. $x$ and $y$?

- $f$ can be written as a composite function $f = h \circ g$

  $h(z) = \sin(z)$ with derivative $\dfrac{d h}{d z}(z) = \cos(z)$

  $g(x, y) =  \langle x , y \rangle$ with partial derivatives $\dfrac{\partial g}{\partial x}(x, y) = y$ and $\dfrac{\partial g}{\partial y}(x, y) = x$
  


-  Using the chain rule, we can easily get the derivative of $f$ w.r.t. $x$ and $y$:

  $\dfrac{d f }{d x} (x,y) = \cos\big( \langle x , y \rangle \big) \cdot y $

  and

  $\dfrac{d f }{d y} (x,y) = \cos\big( \langle x , y \rangle \big) \cdot x $


In [None]:
dfdx = torch.cos(torch.dot(X,Y)) * Y
print("df / dx = ", dfdx)

In [None]:
dfdy = torch.cos(torch.dot(X,Y)) * X
print("df / dy = ", dfdy)

In [None]:
# Gradient is populated by the backward function

print("X.grad :", X.grad)
print("Y.grad :", Y.grad)

f.backward()

print("\n-- Backward --\n")
print("X.grad :", X.grad)
print("Y.grad :", Y.grad)

# Using Autograd for Optimization

We will minimize the function $f$ "by hand" using the gradient descent algorithm.

As a reminder, the update step of the algorithm is:
$$x_{t+1} = x_{t} - \lambda \nabla_x f (x_t)$$

Note:
- The gradient information $\nabla_x f (x)$ is stored in `x.grad`. Once we have run the `backward` function, we can use it to do our update step.
- We need to do `x.data = ...` in the update step since want to change x in place but don't want autograd to track this change 

In [None]:
# Define a funciton that we want to minimize
def f(x):
    return x ** 2

Find the minimum by taking the derivative and setting equal to 0:

df/dx = 2x

2x = 0

Minimum @ x = 0

In [None]:
x0 = 100
lr = .3
iterations = 15

# create a tensor w/ requires_grad set to True
x = torch.Tensor([x0]).requires_grad_()

for i in range(iterations):
    # get current prediction
    y = f(x)
    print('Before update: ',y.data)
    
    # make sure the gradients are zeroed out before you calculate them (the default is that they'll accumulate)
    if x.grad is not None:
        x.grad.zero_()
    
    # calculate gradients
    y.backward()
    
    #update
    x.data = x - lr * x.grad
    
    print('After update: ',y.data)

print('Minimum x value: ', x)

# How would you do linear regression using autograd?

In [None]:
# create a dataset with x from 1 to 100
x = torch.tensor(np.arange(1,100,1))

# and y values w/ a slope of 20 and a y_intercept of 5 +- some noise
gt_w = 20
gt_b = 5
y = (gt_w*x+gt_b+random.randint(-2,3)).reshape(-1)

In [None]:
#initialising weight and bias term
w = torch.tensor(0.,requires_grad=True)
b = torch.tensor(0.,requires_grad=True)

In [None]:
# TODO: Use pytorch and automatic differentiation to find the optimal w and b
