## PyTorch exercises

### Tensors

1. Make a tensor of size (2, 17)
2. Make a torch.FloatTensor of size (3, 1)
3. Make a torch.LongTensor of size (5, 2, 1)
  - fill the entire tensor with 7s
4. Make a torch.ByteTensor of size (5,)
  - fill the middle 3 indices with ones such that it records [0, 1, 1, 1, 0]
5. Perform a matrix multiplication of two tensors of size (2, 4) and (4, 2). Then do it in-place.
6. Do element-wise multiplication of two randomly filled $(n_1,n_2,n_3)$ tensors. Then store the result in an Numpy array.

### Forward-prop/backward-prop
1. Create a Tensor that `requires_grad` of size (5, 5).
2. Sum the values in the Tensor.
3. Multiply the tensor by 2 and assign the result to a new python variable (i.e. `x = result`)
4. Sum the variable's elements and assign to a new python variable
5. Print the gradients of all the variables
6. Now perform a backward pass on the last variable (NOTE: for each new python variable that you define, call `.retain_grad()`)
7. Print all gradients again

### Deep-forward NNs
1. Look at Lab 3. In Exercise 12 there, you had to build an $L$-layer neural network with the following structure: *[LINEAR -> RELU]$\times$(L-1) -> LINEAR -> SIGMOID*. Reimplement the manual code in PyTorch.
2. Compare test accuracy using different optimizers: SGD, Adam, Momentum.

In [20]:
import torch
from torch import nn
from torch.nn import functional as F
from torchvision import datasets, transforms
from torch.utils.data import DataLoader

In [22]:
# 1. Tensor of size (2, 17)
A = torch.rand(2, 17)
print("A.shape:", A.shape)

# 2. FloatTensor of size (3, 1)
B = torch.FloatTensor(3, 1).uniform_(0, 1)
print("B.dtype, B.shape:", B.dtype, B.shape)

# 3. LongTensor of size (5, 2, 1)
C = torch.LongTensor(5, 2, 1).fill_(7)
print("C.dtype, C.shape:", C.dtype, C.shape)

print("=== 4. ByteTensor ===")
# 4. ByteTensor of size (5,), filled [0,1,1,1,0]
byte_tensor = torch.zeros(5, dtype=torch.uint8)
byte_tensor[1:4] = 1
print("ByteTensor:", byte_tensor)

print("\n=== 5. Matrix Multiplication ===")
# 5. Matrix multiplication (2,4) @ (4,2)
mat1 = torch.randn(2, 4)
mat2 = torch.randn(4, 2)
# Normal matmul
res = torch.mm(mat1, mat2)
print("Result (normal):\n", res)

# In-place using out= pre-allocated tensor
res_inplace = torch.empty(2, 2)
torch.mm(mat1, mat2, out=res_inplace)
print("Result (in-place with out=):\n", res_inplace)

print("\n=== 6. Element-wise Mul & NumPy Conversion ===")
# 6. Element-wise multiplication of (n1,n2,n3) tensors
n1, n2, n3 = 3, 4, 5
t1 = torch.rand(n1, n2, n3)
t2 = torch.rand(n1, n2, n3)
t3 = t1 * t2
np_array = t3.numpy()
print("t3 shape:", t3.shape, "| NumPy array shape:", np_array.shape)
print("NumPy array dtype:", np_array.dtype)

print("\n=== Forward/Backward Propagation ===")
# 1. Create a Tensor that requires grad
x0 = torch.randn(5, 5, requires_grad=True)

# 2. Sum the values
s1 = x0.sum()
s1.retain_grad()

# 3. Multiply the tensor by 2
x = x0 * 2
x.retain_grad()

# 4. Sum the new variable
s2 = x.sum()
s2.retain_grad()

# 5. Print gradients before backward
print("Gradients before backward:")
print("x0.grad:", x0.grad)
print("s1.grad:", s1.grad)
print("x.grad:", x.grad)
print("s2.grad:", s2.grad)

# 6. Backward on last variable
s2.backward()

# 7. Print gradients after backward
print("\nGradients after backward:")
print("x0.grad:", x0.grad)
print("s1.grad:", s1.grad)
print("x.grad:", x.grad)
print("s2.grad:", s2.grad)

A.shape: torch.Size([2, 17])
B.dtype, B.shape: torch.float32 torch.Size([3, 1])
C.dtype, C.shape: torch.int64 torch.Size([5, 2, 1])
=== 4. ByteTensor ===
ByteTensor: tensor([0, 1, 1, 1, 0], dtype=torch.uint8)

=== 5. Matrix Multiplication ===
Result (normal):
 tensor([[-1.5341, -2.1713],
        [-0.7821, -0.4532]])
Result (in-place with out=):
 tensor([[-1.5341, -2.1713],
        [-0.7821, -0.4532]])

=== 6. Element-wise Mul & NumPy Conversion ===
t3 shape: torch.Size([3, 4, 5]) | NumPy array shape: (3, 4, 5)
NumPy array dtype: float32

=== Forward/Backward Propagation ===
Gradients before backward:
x0.grad: None
s1.grad: None
x.grad: None
s2.grad: None

Gradients after backward:
x0.grad: tensor([[2., 2., 2., 2., 2.],
        [2., 2., 2., 2., 2.],
        [2., 2., 2., 2., 2.],
        [2., 2., 2., 2., 2.],
        [2., 2., 2., 2., 2.]])
s1.grad: None
x.grad: tensor([[1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.],
        

In [24]:
class LLayerNet(nn.Module):
    def __init__(self, layer_dims):
        """
        layer_dims: list of sizes, e.g. [784, 128, 64, 10]
        """
        super().__init__()
        layers = []
        for in_dim, out_dim in zip(layer_dims[:-1], layer_dims[1:]):
            layers.append(nn.Linear(in_dim, out_dim))
            layers.append(nn.Sigmoid())
        layers.pop()  # remove last Sigmoid to allow logits for CrossEntropyLoss
        self.net = nn.Sequential(*layers)

    def forward(self, x):
        # flatten if needed
        x = x.view(x.size(0), -1)
        return self.net(x)

In [26]:
transform = transforms.ToTensor()
train_ds = datasets.MNIST(root="mnist_data", train=True,  download=True, transform=transform)
test_ds  = datasets.MNIST(root="mnist_data", train=False, download=True, transform=transform)
train_loader = DataLoader(train_ds, batch_size=128, shuffle=True)
test_loader  = DataLoader(test_ds, batch_size=256, shuffle=False)

In [28]:
def train_one_epoch(model, loader, loss_fn, optimizer, device):
    model.train()
    total_loss = 0
    for X, y in loader:
        X, y = X.to(device), y.to(device)
        optimizer.zero_grad()
        logits = model(X)
        loss = loss_fn(logits, y)
        loss.backward()
        optimizer.step()
        total_loss += loss.item() * X.size(0)
    return total_loss / len(loader.dataset)

def evaluate(model, loader, device):
    model.eval()
    correct = 0
    with torch.no_grad():
        for X, y in loader:
            X, y = X.to(device), y.to(device)
            preds = model(X).argmax(dim=1)
            correct += (preds == y).sum().item()
    return correct / len(loader.dataset)


In [30]:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
dims = [28*28, 128, 64, 10]
results = {}

for opt_name, opt_fn in [
    ("SGD", lambda p: torch.optim.SGD(p, lr=0.1)),
    ("Adam", lambda p: torch.optim.Adam(p, lr=0.001)),
    ("Momentum", lambda p: torch.optim.SGD(p, lr=0.1, momentum=0.9))
]:
    model = LLayerNet(dims).to(device)
    optimizer = opt_fn(model.parameters())
    loss_fn = nn.CrossEntropyLoss()

    # train for 3 epochs (demo)
    for epoch in range(3):
        train_loss = train_one_epoch(model, train_loader, loss_fn, optimizer, device)
    test_acc = evaluate(model, test_loader, device)
    results[opt_name] = test_acc
    print(f"{opt_name:10s} test accuracy: {test_acc*100:.2f}%")

SGD        test accuracy: 80.64%
Adam       test accuracy: 94.84%
Momentum   test accuracy: 94.87%
