The purpose of this notebook is to understand the basics of neural networks.

We will start with a single neuron and then move on to a simple neural network.





In [189]:
import torch
import pandas as pd

torch.manual_seed(0)


class Model(torch.nn.Module):
    def __init__(self):
        super().__init__()
        self.linear = torch.nn.Linear(1, 1)
        # Convert model parameters to lower precision (float16)
        self.half()  # Converts all parameters to half precision (float16)

    def forward(self, x):
        # Ensure input is also in lower precision
        x = x.half()
        # Pass through the linear layer (single neuron)
        return self.linear(x)

# Create an instance of the model
model = Model()
print(f"{model=}")

model=Model(
  (linear): Linear(in_features=1, out_features=1, bias=True)
)


In [190]:
# Let's examine the model parameters
print("Model parameters:")
for name, param in model.named_parameters():
    print(f"{name}: {param.data}")
print(f"{model(torch.tensor([1.0]))=}")


Model parameters:
linear.weight: tensor([[-0.0075]], dtype=torch.float16)
linear.bias: tensor([0.5366], dtype=torch.float16)
model(torch.tensor([1.0]))=tensor([0.5293], dtype=torch.float16, grad_fn=<ViewBackward0>)


The output of the model for the input of 1.0 is a tensor with a single value that is the result of a random initialization of the weights and biases.
The output is 0.5290, which is input * weight + bias: 1.0 * -0.0075 + 0.5364 = 0.5289. This is slightly different than 0.5290 b/c the printed values are rounded from the actual float values used.

By default, autograd is tracking the operations that are performed on the model.
`grad_fn=<ViewBackward0>` tells us that the last operation to create this output was a view operation. Without the view operation the output would have been [[0.5290]] rather than [0.5290]. The `Backward` means that during backgpropagation the Backward function will be called. The `0` suffix means that this will be the first backward call.







In [191]:
# Let's create a simple training example
def train_step(x: torch.Tensor, y: torch.Tensor) -> None:
    criterion = torch.nn.MSELoss()

    # Forward pass
    output = model(x)
    loss = criterion(output, y)
    optimizer = torch.optim.SGD(model.parameters(), lr=0.1)

    # Backward pass
    loss.backward()  # This computes gradients for all model parameters
    optimizer.step()  # This uses the gradients to update the weights and biases
    weight_grad = model.linear.weight.grad.item()
    bias_grad = model.linear.bias.grad.item()
    # Reset gradients for the next step
    optimizer.zero_grad()
    
    # Return values for tabulation
    return {
        'prediction': output.item(),
        'loss': loss.item(),
        'weight': model.linear.weight.data.item(),
        'bias': model.linear.bias.data.item(),
        'weight_grad': weight_grad,
        'bias_grad': bias_grad
    }
first_result = train_step(x=torch.tensor([1.0], dtype=torch.float16), y=torch.tensor([1.0], dtype=torch.float16))
first_result

{'prediction': 0.529296875,
 'loss': 0.2215576171875,
 'weight': 0.08660888671875,
 'bias': 0.630859375,
 'weight_grad': -0.94140625,
 'bias_grad': -0.94140625}

Here, the loss is ~0.22 which is calculated from the MSE: (1-0.5289)^2 = 0.2219.  
The new gradients are the result of the optimizer updating the weights and biases.  
For MSELoss, the gradient is ∂Loss/∂output = 2 * (output - target) / n, which is 2 * (0.5289 - 1) / 1 = -0.9422 for the weight and bias (not shown). This is then used to update the weights and biases from above:  
new weight = old_weight - learning_rate * gradient_weight = -0.0075 - 0.01 * -0.9422 = 0.0019  
new bias = old_bias - learning_rate * gradient_bias = 0.5364 - 0.01 * -0.9422 = 0.5458


In [192]:

# Create a DataFrame from the first training result
results_df = pd.DataFrame([first_result])
# Add an iteration column and set it to 0
results_df.insert(0, 'iteration', 0)

for i in range(1, 20):
    result = train_step(x=torch.tensor([1.0], dtype=torch.float16), y=torch.tensor([1.0], dtype=torch.float16))
    # Create a new row with the current iteration and result
    new_row = pd.DataFrame([{**{'iteration': i}, **result}])
    # Append the new row to results_df
    results_df = pd.concat([results_df, new_row], ignore_index=True)
results_df

Unnamed: 0,iteration,prediction,loss,weight,bias,weight_grad,bias_grad
0,0,0.529297,0.2215576,0.086609,0.630859,-0.941406,-0.941406
1,1,0.717285,0.07995605,0.143066,0.6875,-0.56543,-0.56543
2,2,0.830566,0.02870178,0.177002,0.721191,-0.338867,-0.338867
3,3,0.898438,0.01031494,0.197266,0.741699,-0.203125,-0.203125
4,4,0.938965,0.003725052,0.209473,0.753906,-0.12207,-0.12207
5,5,0.963379,0.001340866,0.216797,0.76123,-0.073242,-0.073242
6,6,0.978027,0.0004827976,0.221191,0.765625,-0.043945,-0.043945
7,7,0.986816,0.0001738071,0.223877,0.768066,-0.026367,-0.026367
8,8,0.992188,6.103516e-05,0.225464,0.769531,-0.015625,-0.015625
9,9,0.995117,2.384186e-05,0.22644,0.770508,-0.009766,-0.009766


Once the loss is zero, the gradients also become zero because they're calculated from the loss. At this point the weights and biases are not updated.

Side note: If torch.float32 is used, it will take longer to converge because it takes longer for the loss to reach zero at float32 precision.