# Pytorch primitives

Pytorch uses _tensors_ which are analogous to numpy arrays. Several API calls from numpy have drop-in replacements (though not always).

A major difference is that tensors have an associated `device`, which is `cpu` by default, but can be a co-processor like `gpu`.

In [None]:
import numpy as np
import torch

In [None]:
x = torch.linspace(-1, 1, 10)
print("Tensor:", x)
print("Tensor device:", x.device)
print("In numpy: ", x.numpy())

Another major difference compared to numpy arrays is gradient tracking. **Note**: this is a different concept compared to numerical calculation of gradient.

In [None]:
x = torch.linspace(1, 10, 10, requires_grad=True)
print("Value of x.grad:", x.grad)

In [None]:
# Note the difference with np.gradient
np_gradient = np.gradient(x.detach().numpy())
print("Result of np.gradient:", np_gradient) # this computes numerical gradient

Tensor gradients are possible to calculate after an operations _that produces a scalar_.

In [None]:
non_scalar = x**2 # is not a scalar

In [None]:
scalar = (x**2).sum() # is a scalar

Gradients are computed using the `.backward()` call on a _scalar_. Here we have,

\begin{equation}
    L = \sum_{i=1}^{10} x_i^2
\end{equation}

The gradient is calculated in the usual sense of partial derivativies:
\begin{equation}
    \frac{\partial L}{\partial x_k} = \frac{\partial}{\partial x_k}\left(\sum_{i=1}^{10} x_i^2\right) = 2 x_k
\end{equation}

In [None]:
scalar.backward()

In [None]:
print("Value of x.grad now: ", x.grad)

**Few points to note**

- The gradient for the _scalar_ as opposed to the array.
- It is _exact_ as opposed to being a numerical estimate.
- We get it only at one point. If we change the value of $\mathbf{x}$, we need to compute the scalar, $L$, again. Only then we get the gradient.
- This is automatic differentiation: we know the value $\mathbf{x}$ and derivative $\partial L/\partial x_a\vert_{\mathbf{x}}$ at a single point. 

# Basics of Neural Networks

- Neural networks are functions with parameters that can be tuned.
- In their simplest form, they involve repeating linear transformations, followed by non-linear activation.

For example, $f: \mathcal{R}^m \rightarrow \mathcal{R}^n$ with $\bf{y} = f(\bf{x})$ is a neural net

\begin{equation}
y_j = \sum_{k=1}^{p} A_{jk}\;\sigma\;\left(\sum_{j=1}^{m} W_{ik}\;x_k\right) + B_j
\\
\text{   where, }\sigma(x) = \frac{1}{1 + \exp(x)}
\end{equation}

There is a linear transformation of $\bf{x}$ using matrix $\bf{W}$, followed by passing the result to a squeezer function $\sigma(x)$, followed by another linear transformation using matrix $\bf{A}$ and vector $\bf{B}$.

- $\bf{W}$ has $(m \times p)$ tunable parameters
- $\sigma$ has none
- $\bf{A}$ has $(n \times p)$ tunable parameters
- $\bf{B}$ has $n$ tunable parameters

### An example with $m = n = 20; p = 10$

In [None]:
m = n = 20
p = 10

In [None]:
poly = lambda x: x**3 - x  # function to approximate

x_vals = torch.linspace(-1, 1, m)
y_vals = poly(x_vals)

In [None]:
import matplotlib.pyplot as plt

In [None]:
# Define the neural network
class Net(torch.nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.w = torch.nn.Linear(m, p, bias=False)
        self.a_b = torch.nn.Linear(p, n, bias=True) 

    def forward(self, r):
        r = torch.sigmoid(self.w(r))
        r = self.a_b(r)
        return r

In [None]:
net = Net()

In [None]:
total_trainable_params = 0
print("Trainable Parameters:")
for name, param in net.named_parameters():
    if param.requires_grad:
        total_trainable_params += param.numel()

print(f"Total Trainable Parameters: {total_trainable_params}")

Plot the NN output before training

In [None]:
plt.plot(x_vals, y_vals, label='True function', linestyle='dashed')
plt.plot(x_vals, net(x_vals).detach(), label='NN before training')
plt.legend()

In [None]:
# auxiliary function to create a live plot
from IPython.display import clear_output
from time import sleep

def live_plot(x_vals, y_vals, y_pred, epoch, loss_val):
    """Auxiliary function to visualize the distribution"""
    clear_output(wait=True)
    sleep(1)
    fig, ax = plt.subplots(1, 1, figsize=(6, 6))
    ax.plot(x_vals, y_vals, label='True function', linestyle='dashed')
    ax.plot(x_vals, y_pred)

    ax.legend()
    ax.set_title(f'Epoch {epoch} ; Loss = {loss_val:.3e}')

In [None]:
criterion = torch.nn.MSELoss()
optimizer = torch.optim.Adam(net.parameters(), lr=0.01)  # Using Adam optimizer

for epoch in range(100):
    optimizer.zero_grad()

    # pass input through the NN
    y_pred = net(x_vals)

    # compute the scalar loss
    loss = criterion(y_pred, y_vals)
    # then evaluate gradients
    loss.backward()
    
    # adjust parameters using gradients
    optimizer.step()

    if epoch % 10 == 0:
        #print("Epoch {}: Loss = {:.4e}".format(epoch, loss.detach().numpy()))

        with torch.no_grad():
            y_pred_train = net(x_vals)
        live_plot(x_vals, y_vals, y_pred_train, epoch, loss.detach().numpy())
        plt.show()