# **Parameter Management**

- **Once an architecture and hyperparameters are set**, the next step is the **training loop**:
  - The goal is to **find parameter values** that **minimize the loss function**.
  - After training, parameters are **needed for future predictions**.

- **Why extract parameters?**
  - To **reuse them** in another context.
  - To **save the model** for execution in other software.
  - To **examine parameters** for gaining scientific understanding.

- **Most of the time**, deep learning frameworks handle parameter declaration and manipulation.
  - However, in **non-standard architectures**, manual parameter management is sometimes required.

- **Topics covered in this section**:
  - **Accessing parameters** for debugging, diagnostics, and visualizations.
  - **Sharing parameters** across different model components.


In [1]:
import torch
from torch import nn

- **We start by focusing on an MLP with one hidden layer.**


In [2]:
net = nn.Sequential(nn.LazyLinear(8),
                    nn.ReLU(),
                    nn.LazyLinear(1))

X = torch.rand(size=(2, 4))
net(X).shape

torch.Size([2, 1])

## **Parameter Access**
:label:`subsec_param-access`

- We begin by exploring **how to access parameters** from models.
- This applies to **models we have already encountered**.


- When a model is defined using the **`Sequential` class**:
  - Any **layer** can be accessed by **indexing into the model**, just like a list.
  - Each layer’s **parameters** are stored in its **attributes**.


- We can inspect the parameters of the second fully connected layer as follows.


In [3]:
net[2].state_dict()

OrderedDict([('weight',
              tensor([[-0.1649,  0.0605,  0.1694, -0.2524,  0.3526, -0.3414, -0.2322,  0.0822]])),
             ('bias', tensor([0.0709]))])

- A **fully connected layer** contains **two parameters**:
  - **Weights**.
  - **Biases**.

### **Targeted Parameters**

- Each parameter is represented as an **instance of the parameter class**.
- To manipulate parameters, we must **access their underlying numerical values**.

- **Ways to access parameter values**:
  - Some methods are **simpler**.
  - Others are **more general**.

- **Example**:
  - The following code extracts the **bias** from the second neural network layer.
  - It first **retrieves a parameter class instance**.
  - Then, it **further accesses the parameter's value**.


In [4]:
type(net[2].bias), net[2].bias.data

(torch.nn.parameter.Parameter, tensor([0.0709]))

- **Parameters are complex objects**, containing:
  - **Values** (numerical weights or biases).
  - **Gradients** (used for optimization).
  - **Additional metadata**.

- **Explicitly requesting parameter values**:
  - Since parameters store more than just values, we must **explicitly request** the numerical value.

- **Accessing gradients**:
  - Each parameter also provides access to its **gradient**.
  - If **backpropagation has not been invoked**, the gradient remains in its **initial state**.


In [5]:
net[2].weight.grad == None

True

### **All Parameters at Once**

- **Accessing parameters one by one** can be **tedious**, especially when:
  - The model is **large**.
  - The architecture is **nested with multiple sub-modules**.
  
- **Challenges with complex models**:
  - Requires **recursively traversing** the entire module tree.
  - Extracting parameters manually can be **cumbersome**.

- **Solution**:
  - We can **access all parameters at once**.
  - The following example demonstrates how to **retrieve parameters from all layers** efficiently.


In [6]:
[(name, param.shape) for name, param in net.named_parameters()]

[('0.weight', torch.Size([8, 4])),
 ('0.bias', torch.Size([8])),
 ('2.weight', torch.Size([1, 8])),
 ('2.bias', torch.Size([1]))]

## **Tied Parameters**

- **Sharing parameters across multiple layers**:
  - Often, we want to **reuse the same parameters** in different layers.
  - This helps **reduce memory usage** and **maintain consistency** across layers.

- **How to achieve this?**
  1. **Allocate a fully connected layer**.
  2. **Use its parameters** to set those of **another layer**.

- **Key implementation detail**:
  - Before accessing the parameters, we must first **run forward propagation** using `net(X)`.


In [7]:
# We need to give the shared layer a name so that we can refer to its
# parameters
shared = nn.LazyLinear(8)
net = nn.Sequential(nn.LazyLinear(8), nn.ReLU(),
                    shared, nn.ReLU(),
                    shared, nn.ReLU(),
                    nn.LazyLinear(1))

net(X)
# Check whether the parameters are the same
print(net[2].weight.data[0] == net[4].weight.data[0])
net[2].weight.data[0, 0] = 100
# Make sure that they are actually the same object rather than just having the
# same value
print(net[2].weight.data[0] == net[4].weight.data[0])

tensor([True, True, True, True, True, True, True, True])
tensor([True, True, True, True, True, True, True, True])


- **Parameter tying in layers**:
  - In this example, the **parameters of the second and third layers are tied**.
  - They are **not just equal** but are actually **represented by the same exact tensor**.

- **Implication of tied parameters**:
  - **Modifying one parameter** will **automatically update** the other.
  - This ensures **synchronization** between the layers using shared parameters.


- **What happens to gradients when parameters are tied?**
  - Since model parameters **store gradients**, tying parameters affects backpropagation.

- **Gradient behavior in tied parameters**:
  - During **backpropagation**, the gradients of:
    - The **second hidden layer**.
    - The **third hidden layer**.
  - Are **added together** before updating the shared parameters.


## **Summary**

- We have several ways of **accessing model parameters**.
- We have several ways of **tying model parameters**.
