# Pytorch NN module

The torch.nn module in Pytorch is a core library used to build neural networks efficiently and effectively. It abstracts the complexity of creating and training by pre-built layers, loss function adn many more.


## Simple Neural Network

In [1]:
import torch
import torch.nn as nn
# We have to inherit using nn.Module to use this
class MyNN(nn.Module):
# num_features tells that how many features are used
  def __init__(self,num_features):
    super().__init__()
    # nn.Linear(number of features, number of output_features)
    self.linear = nn.Linear(num_features,1)
    # Activation function can be used like nn.ReLU,nn.Sigmoid....
    self.sigmoid = nn.Sigmoid()
  def forward(self,features):
    output = self.linear(features)
    output = self.sigmoid(output)
    return output

In [2]:
# Random dataset
features = torch.rand(10,4)
# create an instance of our MyNN class
model = MyNN(features.shape[1])
# forward pass
model(features)

tensor([[0.4414],
        [0.4077],
        [0.4266],
        [0.3584],
        [0.4137],
        [0.3667],
        [0.3628],
        [0.4265],
        [0.3568],
        [0.4095]], grad_fn=<SigmoidBackward0>)

In [3]:
# we can see the weights and bias used by our nn
print(f"Weights: {model.linear.weight}")
print("---"*10)
print(f"Bias: {model.linear.bias}")

Weights: Parameter containing:
tensor([[ 0.2411, -0.0843,  0.1891, -0.3246]], requires_grad=True)
------------------------------
Bias: Parameter containing:
tensor([-0.4688], requires_grad=True)


In [4]:
# We can visualize our nn using torchinfo
%pip install torchinfo

Collecting torchinfo
  Downloading torchinfo-1.8.0-py3-none-any.whl.metadata (21 kB)
Downloading torchinfo-1.8.0-py3-none-any.whl (23 kB)
Installing collected packages: torchinfo
Successfully installed torchinfo-1.8.0


In [5]:
from torchinfo import summary
# it takes our instance and inputs_size (features = torch.rand(10,4))
summary(model,input_size=(10,4))

Layer (type:depth-idx)                   Output Shape              Param #
MyNN                                     [10, 1]                   --
├─Linear: 1-1                            [10, 1]                   5
├─Sigmoid: 1-2                           [10, 1]                   --
Total params: 5
Trainable params: 5
Non-trainable params: 0
Total mult-adds (Units.MEGABYTES): 0.00
Input size (MB): 0.00
Forward/backward pass size (MB): 0.00
Params size (MB): 0.00
Estimated Total Size (MB): 0.00

## Neural Network with Hidden Layers

A **linear transformation** is a mathematical operation where we multiply input features by weights and add a bias. In neural networks, this is performed by fully connected (dense) layers using nn.Linear in PyTorch.
A linear transformation in a neural network follows this equation:
+ 𝑌=𝑋𝑊+𝑏
Where:

- X = Input features (a vector or matrix of values)
- W = Weights (learnable parameters that adjust how inputs affect outputs)
- b = Bias (a learnable parameter that shifts the output)
- Y = Output after transformation
This operation maps the input linearly to the output.

In [6]:
import torch
import torch.nn as nn
#Inherit from nn.Module to define a custom neural network
class MyNN1(nn.Module):
# num_features tells that how many features are used
  def __init__(self,num_features):
    super().__init__()
    # Input layer with 'num_features' connected to 3 neurons
    self.linear1 = nn.Linear(num_features,3)
    self.relu = nn.ReLU()
    # Hidden layer (3 neurons) connected to 1 output neuron
    self.linear2 = nn.Linear(3,1)
    self.sigmoid = nn.Sigmoid()
  # Here in forward pass
  # features(4) -> linear1 -> relu -> linear2 -> sigmoid
  def forward(self,features):
    output = self.linear1(features) # Linear transformation
    output = self.relu(output)  # Apply ReLU activation
    output = self.linear2(output) # Another linear transformation
    output = self.sigmoid(output) # Sigmoid activation
    return output

In [9]:
# Random dataset
features = torch.rand(10,4)
# create an instance of our MyNN class
model = MyNN1(features.shape[1])
# forward pass
model(features)

tensor([[0.4748],
        [0.4951],
        [0.4620],
        [0.4874],
        [0.4594],
        [0.4872],
        [0.4799],
        [0.4530],
        [0.4815],
        [0.4799]], grad_fn=<SigmoidBackward0>)

In [10]:
from torchinfo import summary
summary(model=model,input_size=(10,4))

Layer (type:depth-idx)                   Output Shape              Param #
MyNN1                                    [10, 1]                   --
├─Linear: 1-1                            [10, 3]                   15
├─ReLU: 1-2                              [10, 3]                   --
├─Linear: 1-3                            [10, 1]                   4
├─Sigmoid: 1-4                           [10, 1]                   --
Total params: 19
Trainable params: 19
Non-trainable params: 0
Total mult-adds (Units.MEGABYTES): 0.00
Input size (MB): 0.00
Forward/backward pass size (MB): 0.00
Params size (MB): 0.00
Estimated Total Size (MB): 0.00

## Sequential Container
Without explictly defining like this in the forward function   

      output = self.linear1(features) # Linear transformation
      output = self.relu(output)  # Apply ReLU activation
      output = self.linear2(output) # Another linear transformation
      output = self.sigmoid(output) # Sigmoid activation
      return output
  We can use Sequential from pytorch
  `torch.nn.Sequential` is a simpler way to define neural networks in PyTorch. It eliminates the need to explicitly define the `forward` method.


In [12]:
import torch
import torch.nn as nn
class MyNN2(nn.Module):
  def __init__(self,num_features):
    super().__init__()
    self.network = nn.Sequential(
    nn.Linear(num_features,3),
    nn.ReLU(),
    nn.Linear(3,1),
    nn.Sigmoid()
    )
  def forward(self,features):
    output = self.network(features)
    return output

#  PyTorch Loss Functions & Optimizers

In PyTorch, **loss functions** and **optimizers** are essential for training neural networks. This document provides an overview of the most commonly used ones.

---

##  Loss Functions in PyTorch

Loss functions measure how well the model's predictions match the actual values. PyTorch provides many built-in loss functions under `torch.nn`.

###  Common Loss Functions
| Loss Function | Usage | Suitable for |
|--------------|-------|--------------|
| `nn.MSELoss()` | Mean Squared Error | Regression |
| `nn.L1Loss()` | Mean Absolute Error (MAE) | Regression |
| `nn.CrossEntropyLoss()` | Combines `Softmax` + `NLLLoss` | Multi-class classification |
| `nn.BCELoss()` | Binary Cross-Entropy | Binary classification |
| `nn.BCEWithLogitsLoss()` | More stable than `BCELoss` (includes `Sigmoid`) | Binary classification |
| `nn.NLLLoss()` | Negative Log Likelihood Loss | Classification (log-prob outputs) |
| `nn.HuberLoss()` | Smooth L1 loss (less sensitive to outliers) | Regression |

---

##  Optimizers in PyTorch

Optimizers adjust model weights to minimize the loss function. PyTorch provides optimizers in `torch.optim`.

### Common Optimizers
| Optimizer | Description | Notes |
|-----------|-------------|-------|
| `optim.SGD` | Stochastic Gradient Descent | Simple, requires tuning learning rate |
| `optim.Adam` | Adaptive Moment Estimation | Works well in most cases, adaptive learning rate |
| `optim.AdamW` | Adam with weight decay | Prevents overfitting better than Adam |
| `optim.RMSprop` | Root Mean Square Propagation | Good for RNNs |
| `optim.Adagrad` | Adaptive Gradient Algorithm | Adapts learning rates per parameter |
| `optim.Adadelta` | Improved version of Adagrad | Less sensitive to hyperparameters |

---

## Example

```python
import torch
import torch.nn as nn
import torch.optim as optim

# Example model
model = nn.Sequential(
    nn.Linear(10, 5),
    nn.ReLU(),
    nn.Linear(5, 1)
)

# Define loss function
criterion = nn.MSELoss()  # Using Mean Squared Error for regression

# Define optimizer
optimizer = optim.Adam(model.parameters(), lr=0.01)  # Adam optimizer with LR = 0.01

# Example forward pass
x = torch.randn(10)  # Random input
output = model(x)  # Forward pass

# Compute loss
target = torch.tensor([1.0])  # Example target
loss = criterion(output, target)  

# Backpropagation
optimizer.zero_grad()  # Clear previous gradients
loss.backward()  # Compute gradients
optimizer.step()  # Update model parameters


# Dataset and DataLoaders

In PyTorch, **`Dataset`** and **`DataLoader`** are used to efficiently load and process data during training.
### **1.1 Custom Dataset using `torch.utils.data.Dataset`**
PyTorch provides `torch.utils.data.Dataset` as a base class for custom datasets.


In [28]:
X = torch.rand(100,6)
y = torch.randint(0,2,(100,))
print(f"{X}")
print("="*100)
print(f"{y}")

tensor([[0.5776, 0.9000, 0.1609, 0.9306, 0.4972, 0.4618],
        [0.7566, 0.0880, 0.1550, 0.8078, 0.1599, 0.9444],
        [0.7484, 0.5140, 0.2987, 0.6938, 0.6796, 0.0094],
        [0.4809, 0.3469, 0.9955, 0.3154, 0.9456, 0.5202],
        [0.6395, 0.3494, 0.4725, 0.3510, 0.5624, 0.8934],
        [0.3550, 0.5643, 0.2131, 0.1157, 0.8093, 0.3874],
        [0.9285, 0.0211, 0.7063, 0.1076, 0.2905, 0.5838],
        [0.2188, 0.3940, 0.7828, 0.0511, 0.7277, 0.7516],
        [0.4336, 0.8794, 0.7858, 0.6426, 0.8207, 0.3101],
        [0.0871, 0.6215, 0.3880, 0.6541, 0.0267, 0.5145],
        [0.3413, 0.2524, 0.7783, 0.1326, 0.7127, 0.0904],
        [0.2215, 0.5700, 0.9435, 0.6462, 0.7096, 0.0707],
        [0.4819, 0.2569, 0.9140, 0.5903, 0.9070, 0.9486],
        [0.7732, 0.6609, 0.6805, 0.2378, 0.9784, 0.7439],
        [0.2996, 0.0130, 0.8797, 0.0993, 0.0937, 0.7945],
        [0.2765, 0.9892, 0.9798, 0.6344, 0.7299, 0.9269],
        [0.0853, 0.3762, 0.3205, 0.6500, 0.6059, 0.7222],
        [0.628

In [29]:
import torch
from torch.utils.data import Dataset,DataLoader
class CustomDataset(Dataset):
  def __init__(self,features,labels):
    self.features = features
    self.labels = labels
  def __len__(self):
    return len(self.features)
  def __getitem__(self,index):
    return self.features[index],self.labels[index]

In [31]:
X.shape

torch.Size([100, 6])

In [32]:
y.shape

torch.Size([100])

### CustomDataset(X, y)
→ Wraps X and y in a dataset object.
### DataLoader(dataset, batch_size=10, shuffle=True)

- Splits data into batches of 10 for efficient training.
- Shuffles data to improve learning.
- Makes it easy to iterate during training.

In [34]:
dataset = CustomDataset(X,y)
len(dataset)

100

In [35]:
dataset[2]

(tensor([0.7484, 0.5140, 0.2987, 0.6938, 0.6796, 0.0094]), tensor(1))

In [36]:
dataloader = DataLoader(dataset, batch_size=10, shuffle=True)

In [37]:
for batch_features, batch_labels in dataloader:
  print(batch_features)
  print(batch_labels)
  print("="*50)

tensor([[0.4035, 0.3907, 0.2688, 0.4353, 0.3443, 0.1205],
        [0.4819, 0.2569, 0.9140, 0.5903, 0.9070, 0.9486],
        [0.9876, 0.3681, 0.8653, 0.2028, 0.4190, 0.3275],
        [0.2205, 0.1346, 0.4050, 0.1876, 0.1370, 0.7164],
        [0.6395, 0.3494, 0.4725, 0.3510, 0.5624, 0.8934],
        [0.2912, 0.0195, 0.0927, 0.5971, 0.7170, 0.4106],
        [0.9285, 0.0211, 0.7063, 0.1076, 0.2905, 0.5838],
        [0.4591, 0.4444, 0.3081, 0.6307, 0.4244, 0.4820],
        [0.4911, 0.9111, 0.1223, 0.2204, 0.4577, 0.6061],
        [0.2491, 0.9403, 0.7755, 0.8972, 0.0646, 0.0385]])
tensor([1, 0, 1, 1, 0, 0, 0, 1, 0, 0])
tensor([[0.2987, 0.5454, 0.4139, 0.1221, 0.5482, 0.0716],
        [0.1718, 0.5224, 0.6840, 0.3126, 0.5118, 0.3419],
        [0.7473, 0.7798, 0.9900, 0.9377, 0.1537, 0.8897],
        [0.8719, 0.3343, 0.4702, 0.2665, 0.4823, 0.5450],
        [0.0853, 0.3762, 0.3205, 0.6500, 0.6059, 0.7222],
        [0.8815, 0.9248, 0.5555, 0.8305, 0.9029, 0.1655],
        [0.7277, 0.2814, 0.5271,

Now, we can use these batches during training to prevent **RAM overload** when working with large datasets. By processing **smaller batches**, we efficiently compute gradients and update the model parameters using an optimizer for **each batch**.

## Important Parameters

| Parameter     | Purpose                                        | When to Use?                                     |
|--------------|--------------------------------|--------------------------------------------------|
| `num_workers` | Loads data in parallel using multiple CPU threads | When working with large datasets |
| `pin_memory` | Speeds up GPU transfers by using pinned memory | When training on GPU |
| `drop_last` | Drops last batch if it’s incomplete | If batch consistency is needed (e.g., BatchNorm) |
| `collate_fn` | Customizes how batches are created | When handling variable-sized data |
| `sampler` | Controls sample selection | When needing custom sampling (e.g., imbalanced data) |

