# Basic Neural Network


## Example 1

Consider a simple neural network with one weight $ w $ and no bias. Let the input $ x $ and the actual output $ y $ be given. The prediction $ \hat{y} $ is:

$$ \hat{y} = w \cdot x $$

The MSE loss function for a single data point is:

$$ \text{MSE} = (y - \hat{y})^2 $$

#### Step-by-Step Calculation

1. **Prediction**: Compute the predicted output $ \hat{y} $:

   $$ \hat{y} = w \cdot x $$

2. **Loss Calculation**: Compute the MSE loss:

    $$ \text{MSE} = (y - w \cdot x)^2 $$

   Let's take a numerical example to illustrate this process.

   **Numerical Example:**
   - Suppose $ x = 2 $, $ y = 5 $, and the current weight $ w = 1 $.

   **Prediction**:

   $$ \hat{y} = 1 \cdot 2 = 2 $$

   **Loss Calculation**:

   $$ \text{MSE} = (5 - 2)^2 = 9 $$

3. **Gradient Calculation**: Compute the partial derivative of the loss with respect to the weight $ w $:

   $$ \frac{\partial \text{MSE}}{\partial w} = \frac{\partial}{\partial w} (y - w \cdot x)^2 $$

   Using the chain rule:

   $$ \frac{\partial \text{MSE}}{\partial w} = 2 (y - w \cdot x) \cdot (-x) $$

   Substitute the values from our example:

   $$ \frac{\partial \text{MSE}}{\partial w} = 2 (5 - 1 \cdot 2) \cdot (-2) $$

   $$ \frac{\partial \text{MSE}}{\partial w} = 2 (5 - 2) \cdot (-2) $$

   $$ \frac{\partial \text{MSE}}{\partial w} = 2 \cdot 3 \cdot (-2) $$

   $$ \frac{\partial \text{MSE}}{\partial w} = -12 $$

4. **Update the Weight**: Use the gradient to update the weight $ w $ using gradient descent:
   - Choose a learning rate $ \eta $, typically a small positive number.
   - Update rule: $ w_{\text{new}} = w_{\text{old}} - \eta \cdot \frac{\partial \text{MSE}}{\partial w} $

   **For example:**
   - Suppose the learning rate $ \eta = 0.1 $.

   **Weight Update**:
   
   $$ w_{\text{new}} = 1 - 0.1 \cdot (-12) $$
   $$ w_{\text{new}} = 1 + 1.2 $$
   $$ w_{\text{new}} = 2.2 $$

So, after one iteration of gradient descent with MSE loss, the updated weight $ w $ is $ 2.2 $.

### Iterative Process

This process repeats for each training sample during training:
- Compute $ \hat{y} $.
- Calculate $ \text{MSE} $.
- Compute $ \frac{\partial \text{MSE}}{\partial w} $.
- Update $ w $.

Through multiple iterations (epochs), the network learns to minimize the MSE loss across all training samples, improving its ability to predict $ y $ from $ x $.

### Conclusion

The gradient of the loss function (in this case, MSE) with respect to the weights of the network tells us how to adjust the weights to reduce the error between predicted and actual outputs during training. This iterative process is fundamental in training neural networks using gradient-based optimization algorithms like gradient descent.

In [8]:
import torch
import torch.nn as nn
import torch.optim as optim

# Define the input and output data (tensor form)
x_data = torch.tensor([2.0])  # Input
y_data = torch.tensor([5.0])  # Actual output

# Define the model: Linear regression model with one weight parameter
class LinearRegression(nn.Module):
    def __init__(self):
        super(LinearRegression, self).__init__()
        self.w = nn.Parameter(torch.tensor([1.0]))  # Weight parameter

    def forward(self, x):
        return self.w * x

# Instantiate the model
model = LinearRegression()

# Define the Mean Squared Error (MSE) loss function
criterion = nn.MSELoss()

# Define the optimizer (Gradient Descent optimizer with learning rate 0.1)
optimizer = optim.SGD(model.parameters(), lr=0.1)

# Training loop
epochs = 5  # Number of training epochs

for epoch in range(epochs):
    # Forward pass
    y_pred = model(x_data)

    # Compute loss
    loss = criterion(y_pred, y_data)

    # Backward pass and optimize
    optimizer.zero_grad()  # Clear gradients
    loss.backward()  # Compute gradients
    optimizer.step()  # Update weights

    # Print progress
    print(f'Epoch [{epoch + 1}/{epochs}], Loss: {loss.item():.4f}, Updated weight: {model.w.item():.4f}, Y: {y_data.item():.4f}, Y_pred: {y_pred.item():.4f}')
    # Print gradients
    print(f'Gradients: {model.w.grad.item():.4f}')

# After training, you can access the updated weight
print(f'Final weight: {model.w.item():.4f}')


Epoch [1/5], Loss: 9.0000, Updated weight: 2.2000, Y: 5.0000, Y_pred: 2.0000
Gradients: -12.0000
Epoch [2/5], Loss: 0.3600, Updated weight: 2.4400, Y: 5.0000, Y_pred: 4.4000
Gradients: -2.4000
Epoch [3/5], Loss: 0.0144, Updated weight: 2.4880, Y: 5.0000, Y_pred: 4.8800
Gradients: -0.4800
Epoch [4/5], Loss: 0.0006, Updated weight: 2.4976, Y: 5.0000, Y_pred: 4.9760
Gradients: -0.0960
Epoch [5/5], Loss: 0.0000, Updated weight: 2.4995, Y: 5.0000, Y_pred: 4.9952
Gradients: -0.0192
Final weight: 2.4995
