- nn.Module
- nn.Module is a base class in PyTorch that is used to define neural network models. All custom neural network layers or models in PyTorch typically inherit from nn.Module. It provides useful methods and attributes to facilitate model creation, training, and evaluation.

___

- Here's a brief breakdown of what nn.Module provides:

1. Layer Initialization: When you create a subclass of nn.Module, you define your network layers (such as nn.Linear, nn.Conv2d, etc.) inside the __init__ method.

2. Forward Pass: The forward method is where you define how the input data flows through the layers of the network. This method is called when you pass data through the model.

3. Parameter Management: nn.Module automatically tracks the model's parameters (e.g., weights and biases) and allows you to update them during training using optimizers like torch.optim.

4. Device Management: nn.Module can automatically transfer model parameters between CPU and GPU using .to(device).

### pytorch workflow module

- get data ( turn into tensors)
- build or pick a pretrained model
- fit model
- evaluate
- imporove
- save and reload
 

___

```python
import torch
from torch import nn
#nn contains all of pytorchs building blocks for neural networks

```python
weight=0.7
bias=0.3
start=0
end=1
step=0.02
X=torch.arange(start,end,step).unsqueeze(dim=1)
y=weight*X+bias

x[:10],y[:10]

___

- training dataset
- validation dataset
- test set

- training doesnt see validation/test data
-   

___

```python
train_split=int(0.8*len(X))
x_train,y_train=X[:train_split],y[:train_split]
x_test,y_test=X[train_split:],y[train_split:]

___

## ploting data points

```python
def plot_predictions(train_data=x_train,train_labels=y_train,test_data=x_test,test_labels=y_test,predictions=None):
     
     plt.figure(figsize=(10,7))
     plt.scatter(train_data,train_labels,c="b",s=4,label="training_data")
     plt.scatter(test_data,test_labels,c="g",)
     if predictions is not None:
        plt.scatter(test_data,predictions,c="r")
    plt.legend(prop={"size":14})




___

```python
class LinearRegressionModel(nn.Module):
    def __init__(self):
        super().__init__()
        self.weights=nn.Parameter(torch.randn(1,requires_grad=True,dtype=torch.float))
        self.bias=nn.Parameter(torch.randn(1,requires_grad=True,dtype=torch.float))
    def forward(self,x:torch.Tensor)->torch.Tensor:
        return self.weight*x+self.bias


___

 - In PyTorch, setting requires_grad=True on a tensor means that the tensor will track operations performed on it for the purpose of automatic differentiation. This is useful when you want to optimize or update the tensor's values during training in a neural network

- In PyTorch, torch.nn.Parameter is a subclass of torch.Tensor that is specifically used to define parameters in neural networks. A Parameter is a tensor that is automatically registered as a parameter of a model (i.e., it is included in the model's parameters() list, which makes it easy to optimize during training).

- Key Points about torch.nn.Parameter:
- Automatic Inclusion in Model Parameters:

  - When you use torch.nn.Parameter to define a tensor, it is automatically treated as a model parameter that should be optimized during training. This means it is added to the list of parameters returned by model.parameters().
- Gradient Tracking:

  - torch.nn.Parameter automatically has requires_grad=True, so it will track operations for automatic differentiation. This is useful when defining weights, biases, or other parameters in a neural network.
- How to Use It:

  - You can create a Parameter and assign it to the model. It is commonly used to define the weights or bias terms in layers like nn.Linear, nn.Conv2d, etc., but you can also use it explicitly in custom layers.

___

```python

model=LinEARegressionModel()
model.parameters()
model.state_dict()
for param in model.parameters():
    print(param)

___

- torch.inference_mode(), which is a feature introduced in PyTorch 2.0. It's designed to disable gradient tracking during inference, similar to torch.no_grad(), but it provides additional performance optimizations.
- Purpose: torch.inference_mode() is used to disable gradient computation during inference, which can improve performance. While torch.no_grad() does this as well, inference_mode() provides even more efficient memory usage and can further accelerate inference.
- Usage: It's particularly useful when you are running a model in a production or deployment setting where you do not need gradients, and you want to ensure optimal performance.

```python
loss_fn=nn.L1Loss()
optimizer=torch.optim.SGD(params=model.arameters(),lr=0.01)

epochs=200
torch.manual_seed(42)
for epoch in range(epochs):
    model.train() #set model to training mode set all parameters to require gradients
    y_pred=model(x_train)
    loss=loss_fn(y_pred,y_train)
    optimizer.zero_grad()
    loss.backward()#backward propagation on loss wrt parameters
    optimizer.step()#step the optimizer(perform gradient descent)
    model.eval()
    with torch.inference_mode():
    test_pred=model1(x_test)
    test_loss=loss_fn(test_pred,y_test)
    if epoch % 10 ==0:
        print(f"epoch:{epoch} Loss:{loss} test  Loss : {test_loss}")


___

- The optimizer.zero_grad() function in PyTorch is used to clear (zero out) the gradients of all model parameters before performing the backpropagation step.
-In the context of training a neural network, gradients are accumulated in PyTorch by default. After each backward pass, gradients are stored in each parameterâ€™s ``.grad `` attribute, which are then used by the optimizer to update the model's weights. If you don't clear the gradients before each new backward pass, the gradients will accumulate from the previous iterations, leading to incorrect updates.

### Simple Math Example: Relevance of Clearing Gradients

Let's consider a simple scenario to understand the relevance of clearing gradients.

#### Setup:
We have:
- $w$ = weight parameter
- $x$ = input value
- $y$ = target value (ground truth)

The loss function is the Mean Squared Error (MSE):
$$
\text{Loss} = \frac{1}{2} \cdot (y - w \cdot x)^2
$$

#### Step-by-Step Example:

1. **First Iteration:**
    - Let $w = 2$, $x = 3$, and $y = 6$.
    - Compute the model's output: $\hat{y} = w \cdot x = 2 \cdot 3 = 6$.
    - Compute the loss: 
    $$
    \text{Loss} = \frac{1}{2} \cdot (6 - 6)^2 = 0
    $$
    - Compute the gradient of the loss with respect to $w$:
    $$
    \frac{\partial \text{Loss}}{\partial w} = x \cdot (w \cdot x - y) = 3 \cdot (6 - 6) = 0
    $$

2. **Second Iteration (without clearing gradients):**
    - Now, change the input to $x = 2$, keeping $y = 6$ and $w = 2$.
    - Compute the model's output: $\hat{y} = w \cdot x = 2 \cdot 2 = 4$.
    - Compute the loss: 
    $$
    \text{Loss} = \frac{1}{2} \cdot (6 - 4)^2 = 2
    $$
    - Compute the gradient:
    $$
    \frac{\partial \text{Loss}}{\partial w} = x \cdot (w \cdot x - y) = 2 \cdot (4 - 6) = -4
    $$

    If gradients are **not cleared**, the gradient accumulates:
    $$
    \text{Accumulated Gradient} = 0 + (-4) = -4
    $$

    The optimizer would use this accumulated gradient for weight update.

3. **Second Iteration (with clearing gradients):**
    - If we **clear the gradients** using `optimizer.zero_grad()`, the gradient from the first iteration (which was 0) is discarded, and the optimizer will only use the gradient from the second iteration:
    $$
    \text{Gradient for Update} = -4
    $$

#### Conclusion:

- **Without clearing gradients**, the optimizer accumulates gradients, causing incorrect updates.
- **With clearing gradients**, each iteration starts with fresh gradients, leading to correct and intended weight updates.

This illustrates why clearing gradients is crucial in neural network training for proper optimization.


- model.eval()
- turns off different settings in the model not needed for evaluation/testing ,dropout/batch norm layers/

___

#### disabling gradient calculations and weight updates

```python

with torch.no_grad():
    y_pred=

with torch.inference_mode():
    y_preds=model(x_test)

y_preds

