The purpose of this notebook is to understand the basics of neural networks.

We will start with a single neuron and then move on to a simple neural network.





In [178]:
import torch
import pandas as pd

torch.manual_seed(0)


class Model(torch.nn.Module):
    def __init__(self):
        super().__init__()
        self.linear = torch.nn.Linear(1, 1)

    def forward(self, x):
        # Pass through the linear layer (single neuron)
        return self.linear(x)


# Create an instance of the model
model = Model()
print(f"{model=}")

model=Model(
  (linear): Linear(in_features=1, out_features=1, bias=True)
)


In [179]:
# Let's examine the model parameters
print("Model parameters:")
for name, param in model.named_parameters():
    print(f"{name}: {param.data}")
print(f"{model(torch.tensor([1.0]))=}")


Model parameters:
linear.weight: tensor([[-0.0075]])
linear.bias: tensor([0.5364])
model(torch.tensor([1.0]))=tensor([0.5290], grad_fn=<ViewBackward0>)


The output of the model for the input of 1.0 is a tensor with a single value that is the result of a random initialization of the weights and biases.
The output is 0.5290, which is input * weight + bias: 1.0 * -0.0075 + 0.5364 = 0.5289. This is slightly different than 0.5290 b/c the printed values are rounded from the actual float values used.

By default, autograd is tracking the operations that are performed on the model.
`grad_fn=<ViewBackward0>` tells us that the last operation to create this output was a view operation. Without the view operation the output would have been [[0.5290]] rather than [0.5290]. The `Backward` means that during backgpropagation the Backward function will be called. The `0` suffix means that this will be the first backward call.







In [180]:
# Let's create a simple training example
def train_step(x: torch.Tensor, y: torch.Tensor) -> None:
    criterion = torch.nn.MSELoss()

    # Forward pass
    output = model(x)
    loss = criterion(output, y)
    optimizer = torch.optim.SGD(model.parameters(), lr=0.1)

    # Backward pass
    loss.backward()  # This computes gradients for all model parameters
    optimizer.step()  # This uses the gradients to update the weights and biases
    weight_grad = model.linear.weight.grad.item()
    bias_grad = model.linear.bias.grad.item()
    # Reset gradients for the next step
    optimizer.zero_grad()
    
    # Return values for tabulation
    return {
        'prediction': output.item(),
        'loss': loss.item(),
        'weight': model.linear.weight.data.item(),
        'bias': model.linear.bias.data.item(),
        'weight_grad': weight_grad,
        'bias_grad': bias_grad
    }
first_result = train_step(x=torch.tensor([1.0]), y=torch.tensor([1.0]))
first_result

{'prediction': 0.5289567708969116,
 'loss': 0.2218817174434662,
 'weight': 0.08672183007001877,
 'bias': 0.6306522488594055,
 'weight_grad': -0.9420864582061768,
 'bias_grad': -0.9420864582061768}

Here, the loss is ~0.22 which is calculated from the MSE: (1-0.5289)^2 = 0.2219.  
The new gradients are the result of the optimizer updating the weights and biases.  
For MSELoss, the gradient is ∂Loss/∂output = 2 * (output - target) / n, which is 2 * (0.5289 - 1) / 1 = -0.9422 for the weight and bias (not shown). This is then used to update the weights and biases from above:  
new weight = old_weight - learning_rate * gradient_weight = -0.0075 - 0.01 * -0.9422 = 0.0019  
new bias = old_bias - learning_rate * gradient_bias = 0.5364 - 0.01 * -0.9422 = 0.5458


In [181]:

# Create a DataFrame from the first training result
results_df = pd.DataFrame([first_result])
# Add an iteration column and set it to 0
results_df.insert(0, 'iteration', 0)

for i in range(1, 50):
    result = train_step(x=torch.tensor([1.0]), y=torch.tensor([1.0]))
    # Create a new row with the current iteration and result
    new_row = pd.DataFrame([{**{'iteration': i}, **result}])
    # Append the new row to results_df
    results_df = pd.concat([results_df, new_row], ignore_index=True)
results_df

Unnamed: 0,iteration,prediction,loss,weight,bias,weight_grad,bias_grad
0,0,0.528957,0.2218817,0.086722,0.630652,-0.9420865,-0.9420865
1,1,0.717374,0.07987741,0.143247,0.687177,-0.5652518,-0.5652518
2,2,0.830424,0.02875588,0.177162,0.721093,-0.3391511,-0.3391511
3,3,0.898255,0.01035212,0.197511,0.741442,-0.2034907,-0.2034907
4,4,0.938953,0.00372676,0.209721,0.753651,-0.1220944,-0.1220944
5,5,0.963372,0.001341637,0.217046,0.760977,-0.07325673,-0.07325673
6,6,0.978023,0.0004829889,0.221442,0.765372,-0.04395401,-0.04395401
7,7,0.986814,0.0001738763,0.224079,0.768009,-0.02637243,-0.02637243
8,8,0.992088,6.259471e-05,0.225661,0.769592,-0.01582336,-0.01582336
9,9,0.995253,2.253432e-05,0.226611,0.770541,-0.009494066,-0.009494066
