# PyTorch Mechanics and Model Customization

---

This notebook serves as the introduction to **Chapter 13: Going Deeper – The Mechanics of PyTorch**. It focuses on exploring the core technical features and customization capabilities that are essential for building and debugging complex deep neural networks in PyTorch.

This is a **mechanics-focused notebook** that prepares the necessary building blocks without running a complete training iteration.

## This notebook Concepts Demonstrated:

1.  **Autograd and Gradient Tracking:**
    * The notebook explicitly creates Tensors with the parameter **`requires_grad=True`** (e.g., `a`, `b`). This is the primary signal to PyTorch's **Autograd engine** to start tracking every operation performed on these Tensors.
    * This tracking builds a **dynamic computation graph** that allows for the automatic, efficient calculation of gradients during the backward pass. 

2.  **Modular Model Construction (`nn.Sequential`):**
    * A deep network structure is defined using **`nn.Sequential`**, which provides a convenient, ordered container for stacking layers.
    * The model uses multiple **`nn.Linear`** layers and the **`nn.ReLU()`** activation function, showcasing the modular approach to network design.

3.  **Advanced Model Customization:**
    * **Weight Initialization:** The code applies an advanced technique, **Xavier (Glorot) initialization** (`nn.init.xavier_normal_`), to the weights of the first layer. This is a best practice for improving training stability, especially in deep networks.
    * **Custom Loss Term (L1 Regularization):** The notebook includes the calculation of an **L1 penalty** (`l1_penalty`). This demonstrates how to directly access model parameters (weights) and create **custom loss components** that can be added to the primary loss function during training—a key aspect of regularization and custom objective design.

4.  **Training Components Setup:**
    * A loss function (`nn.BCELoss`, Binary Cross-Entropy Loss) and an optimizer (`optim.SGD`) are instantiated. These components are defined and configured, ready to be incorporated into the full parameter-update loop, which is a major focus of the full chapter text.

In [9]:
import torch
from torch import nn, optim

In [5]:
a = torch.tensor(3.14, requires_grad= True)

In [8]:
b = torch.tensor([1., 2., 3.], requires_grad= True)

In [9]:
print(a, b)

tensor(3.1400, requires_grad=True) tensor([1., 2., 3.], requires_grad=True)


In [10]:
w = torch.tensor([1., 5.])

In [17]:
print(w.requires_grad)

False


In [18]:
w.requires_grad_()
print(w.requires_grad)

True


In [21]:
w = torch.empty(2, 3)

In [24]:
nn.init.xavier_normal_(w)

tensor([[ 0.4092, -0.5024, -0.1142],
        [-0.0578,  0.2419, -0.4782]])

In [25]:
class MyModule(nn.Module):
    def __init__(self):
        super().__init__()
        self.w1 = torch.empty(3, 4)
        nn.init.xavier_normal_(self.w1)
        self.w2 = torch.empty(4, 7)
        nn.init.xavier_normal_(self.w2)

In [7]:
w = torch.tensor(1, requires_grad= True, dtype=torch.float32)
b = torch.tensor(0.5, requires_grad= True, dtype= torch.float32)
x = torch.tensor([1.4])
y = torch.tensor([2.1])
z = torch.add(torch.mul(w, x), b)
loss = (y-z).pow(2).sum()

In [8]:
loss.backward()
print(f'dL/dW: {w.grad}\ndL/db: {b.grad}')

dL/dW: -0.5599997639656067
dL/db: -0.39999985694885254


In [12]:
(2 * x)*(torch.add(torch.mul(w, x), b) - y)

tensor([-0.5600], grad_fn=<MulBackward0>)

In [14]:
2*(torch.add(torch.mul(w, x), b) - y)

tensor([-0.4000], grad_fn=<MulBackward0>)

In [5]:
model = nn.Sequential(
    nn.Linear(4, 16),
    nn.ReLU(),
    nn.Linear(16, 32),
    nn.ReLU()
)
model

Sequential(
  (0): Linear(in_features=4, out_features=16, bias=True)
  (1): ReLU()
  (2): Linear(in_features=16, out_features=32, bias=True)
  (3): ReLU()
)

In [7]:
nn.init.xavier_normal_(model[0].weight)
l1_param = 0.01
l1_penalty = l1_param * model[2].weight.abs().sum()

In [11]:
loss = nn.BCELoss()
optimizer = optim.SGD(model.parameters(), lr= 0.001)