# PyTorch notebook: The nitty-gritty

I hope to one day make this into a lovely notebook that gets into the nitty-gritty of Pytorch

In [3]:
import torch
import torch.nn as nn

### Layers

**Linear Layers**

* `nn.Linear(input_features, output_features)` performs the transformation:
    - $Y = X W^{T} + b $, where:
    - $Y$ is (batch_size, output_features), here output_features > 1, if we are trying to predict >1 outcome.
    - $X$ is (batch_size, input features)
    - $W^{T}$ is (input_features, output_features)
* `nn.Linear(...)` returns $Y$.
* When you return the weight matrix of `nn.Linear(...)` it's shape is (output_features, input_features)

In [1]:
batch_size = 3
in_features = 5
out_features = 3

linear = nn.Linear(in_features, out_features)

X = torch.rand(batch_size, in_features)

Y = linear(X)

print("X...")
print(X)
print(X.shape)
print("")

print("Y...")
print(Y)
print(Y.shape)
print("")

print("Linear Weight...")
print(linear.weight)
print(linear.weight.shape)

NameError: name 'nn' is not defined

**RNN layer**


In [None]:
input_size = 3
seq_length = 5
batch_size = 1

rnn = nn.RNN(input_size=input_size, hidden_size=3)

X = torch.rand(seq_length, batch_size, input_size)
print(f"X shape: {X.shape}")

out, hh = rnn(X)
print(f"Output shape: {out.shape}")
print(f"Hidden shape: {hh.shape}")
print("hh outputs the hidden shape ONLY for the final time step")
print(hh)
print("")
print("hh outputs the hidden shape for all time steps")
print(out)


X shape: torch.Size([5, 1, 3])
Output shape: torch.Size([5, 1, 3])
Hidden shape: torch.Size([1, 1, 3])
hh outputs the hidden shape ONLY for the final time step
tensor([[[-0.1781,  0.9003,  0.6967]]], grad_fn=<StackBackward0>)

hh outputs the hidden shape for all time steps
tensor([[[ 0.4643,  0.8609,  0.6027]],

        [[-0.0283,  0.9431,  0.6206]],

        [[-0.1864,  0.9239,  0.6808]],

        [[-0.0794,  0.9508,  0.7809]],

        [[-0.1781,  0.9003,  0.6967]]], grad_fn=<StackBackward0>)


**Loss Functions**

* `loss_cat = nn.CrossEntropyLoss(y_hat,y)` is cross entropy loss. Inputs should be the length of the number of categories. y_hat should be logits (output from linear layer), y should be ground truth labels
* $y_{pred}$ should be (batch_size, num_features, num_classes)
* $y_{true}$ should (batch_size, num features), with the correct label in the num_features column

In [7]:
# Example of target with class indices
loss = nn.CrossEntropyLoss()
input = torch.randn(3, 5, requires_grad=True)
target = torch.empty(3, dtype=torch.long).random_(5)
output = loss(input, target)
output.backward()
# Example of target with class probabilities
input = torch.randn(3, 5, requires_grad=True)
target = torch.randn(3, 5).softmax(dim=1)
output = loss(input, target)
output.backward()

In [5]:
input

tensor([[-0.5966,  0.2707,  0.1667,  1.3254,  0.0822],
        [-0.4303, -0.3144, -0.1222,  0.1305, -0.5695],
        [ 0.6989, -0.0115, -1.4339,  1.8006, -0.5688]], requires_grad=True)

In [6]:
target

tensor([[0.1327, 0.0646, 0.4527, 0.2125, 0.1376],
        [0.0821, 0.2198, 0.3922, 0.1196, 0.1864],
        [0.2808, 0.3997, 0.1361, 0.0798, 0.1036]])

### A few extra notes

* Be careful when overwriting tensor like `tensor_a[:] = tensor_b[:]`. This can mess up the computation graph and will not allow you to gradient descend 