# Lecture09

In this assignment you will work with the basic building blocks of PyTorch, including tensors and autograd. To start with this will be very  low-level PyTorch, in future lectures we will get into the higher-level functionality of PyTorch which make it really easy to build and train neural networks.

In [None]:
import torch
print(torch.__version__)
import numpy as np
import matplotlib.pyplot as plt

!pip install torchviz
from torchviz import make_dot

## Initializing tensors

### Tensor from Data

In [None]:
scalar = torch.tensor(5.)
vector = torch.tensor([5,4,3,2,1])
matrix = torch.tensor([[1,2,3],[4,5,6],[7,8,9]])

print(scalar)
print(vector)
print(matrix)

### Tensor from NumPy

If you have data in a NumPy array, you can convert it to a PyTorch tensor using `torch.from_numpy`

In [None]:
data_np = np.array([[2,5],[6,4]])
x_data = torch.from_numpy(data_np)

print(f'data_np:\n {data_np}')
print(f'x_data:\n {x_data}')

### Random Tensor

The weights of a neural network are initalized with random values. PyTorch has `torch.rand` and `torch.randn` functions for generating random tensors with a specified size. For example `torch.rand(2,3)` will crate a tensor with size $2\times3$ (2 rows, 3 columns). Below, create two tensors each with size $1000\times500$, using `torch.rand` for one, and `torch.randn` for the other. After the tensors are initialized, plot histograms to show the distributions of each tensor.

In [None]:
rand_tensor = XXX
randn_tensor = XXX

fig, ax = plt.subplots(1,2, figsize = (10,4))
_ = ax[0].hist(torch.flatten(rand_tensor),bins=25,edgecolor='k')
_ = ax[1].hist(torch.flatten(randn_tensor),bins=25,edgecolor='k')
ax[0].set_title("torch.rand")
ax[1].set_title("torch.randn")


After viewing the histograms, what do you notice about the two distributions? What is the difference between `torch.rand` and `torch.randn`

XXX

## Autograd

PyTorch provides automatic differentiation (autograd) functionality, enabling gradients to be tracked on tensor operations. 

Given the function

$$f = (x+y)\times z$$

with $x=5$, $y=-2$, $z=4$, use autograd to compute 

$$\nabla f = \begin{bmatrix} \frac{\partial f}{\partial x} \\ \frac{\partial f}{\partial y} \\ \frac{\partial f}{\partial z}  \end{bmatrix}$$

First, create and initalize tensors for $x$, $y$, $z$. Since the goal is to compute $\nabla f$, make sure to use `requires_grad=True`.

In [None]:
x = XXX
y = XXX
z = XXX


Note you can pass the `requires_grad=True` when creating a tensor **or** modify the `requires_grad` attribute after the tensor is created using the `requires_grad_(True)` member function of a tensor. Note pytorch uses the underscore at the end of a function for in-place operations.

In [None]:
x.requires_grad_(True)

Write the expression to compute $f$ using operations on the above tensors, i.e., the forward pass.


In [None]:
f = XXX
print(f)

All tensors with `requires_grad=True` will have a `grad` attribute the gradient with respect to that tensor. Below try printing the `grad` value for each tensor.

In [None]:
print(x.grad)
print(y.grad)
print(z.grad)

Currently, no values are stored in the `grad` attributes. The function $f$ has been evaluated (i.e., forward pass), however, the backward pass (which calculates gradients) has not been executed. The backward pass is executed by calling `backward()` on the tensor for which you are computing the gradient on. Here we want to take $\nabla f$ so we call `f.backward()`

In [None]:
f.backward()

print(x.grad)
print(y.grad)
print(z.grad)

Gradients are **accumulated** with each call of `backward()`. To see this, run the forward and backward pass again and print the gradient values:

In [None]:
f = (x+y)*z
f.backward()

print(x.grad)
print(y.grad)
print(z.grad)

Typically, for training neural networks this is not the desired behavior. We want to use the gradients of only the current step, we do not want accumulate gradients over all steps. Therefore, it is neccessary to zero the gradients after each pass. Forgetting this is a common bug.

In [None]:
x.grad.zero_()
y.grad.zero_()
z.grad.zero_()

print(x.grad)
print(y.grad)
print(z.grad)


In [None]:
params = dict(x = x, y = y, z = z)
make_dot(f, params = params)

## Practice problems

For each practice problem:

- Create and intiizlize input tensors
- Evaluate function (forward pass)
- Perform backward pass
- Report gradients


### Problem 1

Given the function

$$f(x,y) = \sqrt{x^2 + y^2}$$

Compute 
$$\nabla f = \begin{bmatrix} \frac{\partial f}{\partial x} \\ \frac{\partial f}{\partial y} \end{bmatrix}$$

Assume $x=4$, $y=3$


### Problem 2

Given the function

$$f(a,x) = e^{-(ax)}$$

Compute 
$$\nabla f = \begin{bmatrix} \frac{\partial f}{\partial a} \\ \frac{\partial f}{\partial x} \end{bmatrix}$$

Assume $a=2$, $x=-0.5$

### Problem 3

Given the function

$$f(x,y,z) = \frac{xy}{z}$$

Compute 
$$\nabla f = \begin{bmatrix} \frac{\partial f}{\partial x} \\ \frac{\partial f}{\partial y} \\ \frac{\partial f}{\partial z} \end{bmatrix}$$

Assume $x=6$, $y=3$, $z=9$

### Problem 4

Given the function

$$f(a,b,w_0, w_1, x_0, x_1) = a \mathrm{ReLU}(w_0x_0) + \mathrm{ReLU}(w_1x_1) - b$$

Compute
$$\nabla f = \begin{bmatrix} \frac{\partial f}{\partial a} \\ \frac{\partial f}{\partial b} \\ \frac{\partial f}{\partial w_0} \\ \frac{\partial f}{\partial w_1} \\ \frac{\partial f}{\partial x_0} \\ \frac{\partial f}{\partial x_1}\end{bmatrix}$$

Assume $a=2$, $b=6$, $w_0=3$,  $w_1=5$, $x_0=4$, $x_1=-2$

### Problem 5

Given the function

$$f = ax^2 + bxy$$

compute 

$$\nabla f = \begin{bmatrix} \frac{\partial f}{\partial a} \\ \frac{\partial f}{\partial b} \\ \frac{\partial f}{\partial x} \\ \frac{\partial f}{\partial y} \end{bmatrix}$$

Assume $a=2$, $b=4$, $x=3$, $y=5$


Save a PDF of the document to upload to ICON and then add/commit/push to your git repository.