# Import & Verify `PyTorch` is working

In [46]:
import torch
from torch import nn
print(torch.__version__)

2.5.1


# Tensors

Let's start with something you should know, `numpy`. This is a scientific computing library which allows for each manipulation of vectors & matricies. `PyTorch` has a similar data structure called a `Tensor`.

In [39]:
# Create some vectors
a = torch.tensor([1, 2, 3])
b = torch.tensor([4, 5, 6])

# Create a matrix
M = torch.tensor([[1, 2, 3],
                  [4, 5, 6],
                  [7, 8, 9]])

# Create a tensor with specific dimensions
batch_size = 10
dim = 3
T = torch.randn(size=(batch_size, dim, dim))
S = torch.rand(size=(batch_size, dim, dim))
# `randn` generates a RANDom values drawn from a Normal distribution.
print(T.shape)


torch.Size([10, 3, 3])


Q1) What is the default datatype for `Tensors` in `PyTorch`? Note it depends on how you initalize the tensor.

In [36]:
# Addition / muliplication work element-wise, just like numpy
print(a + b)
print(a * b)

# `@` operator does matrix multiplication
print(a @ b)
print(M @ a)

tensor([5, 7, 9])
tensor([ 4, 10, 18])
tensor(32)
tensor([14, 32, 50])


In [38]:
T.shape

torch.Size([10, 3, 4])

In [44]:
# There's also einstein summation
# Example: a @ b
print(torch.einsum('i,i->', a, b))

# Example: M @ a
print(torch.einsum('ij,j->i', M, a))

# Example: Tr(M)
print(torch.einsum('ii->', M))

# We can also do more complicated things, like batch matrix multiplication
# So instead of running
# > for i in range(batch_size):
# >    C[i] = T[i] @ S[i]
# We can do it all at once with einsum
C = torch.einsum('bij,bjk->bik', T, S)
print(C.shape)

tensor(32)
tensor([14, 32, 50])
tensor(15)
torch.Size([10, 3, 3])


# Data 

# Model Architecture

In class, we discussed a simple multilayer perceptron. Recall it was mathematically defined as
$$
\begin{align}
z^{(\ell)} & = W^{(\ell)} \sigma(z^{(\ell-1)}) + b^{(\ell)}, \ \text{s.t. }z^{(0)} = W^{(0)} x + b^{(0)}\\
f_\theta(x) & = z^{(L)}
\end{align}
$$
where the model parameter $\theta = \{W^{(\ell)}, b^{(\ell)}\}_{\ell=0}^L$ and $\sigma(\cdot)$ is a non-linear activation function.

In [48]:
class MLP(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super(MLP, self).__init__()
        self.fc1 = nn.Linear(input_size, hidden_size)
        self.activation = nn.ReLU()
        self.fc2 = nn.Linear(hidden_size, output_size)
    
    def forward(self, x):
        out = self.fc1(x)
        out = self.activation(out)
        out = self.fc2(out)
        return out

# Training

# Inference / Validation