#### Los Function

| Model Type        | Task                       | Loss Function          |
|-------------------|----------------------------|------------------------|
| ANN (FCN)         | Multi-class Classification  | CrossEntropyLoss       |
| CNN               | Binary Classification       | BCEWithLogitsLoss      |
| RNN               | Regression                  | MSELoss                |
| Any Model         | Robust Regression           | SmoothL1Loss           |
| Any Model         | Distribution Learning       | KLDivLoss              |
| Any Model         | Margin-based Classification | HingeEmbeddingLoss     |
| Any Model         | Similarity Learning         | CosineEmbeddingLoss    |
| Any Model         | Metric Learning             | TripletMarginLoss      |


In [None]:
criterion = nn.CrossEntropyLoss() 

criterion = nn.BCEWithLogitsLoss()  # Suitable for binary classification

criterion = nn.MSELoss()  # Suitable for regression

criterion = nn.SmoothL1Loss()  # Replaces MSE for robustness

criterion = nn.KLDivLoss(reduction='batchmean')

criterion = nn.HingeEmbeddingLoss()

criterion = nn.CosineEmbeddingLoss()

criterion = nn.TripletMarginLoss(margin=1.0)


#### Activation Functions


| Activation Function | ANN | CNN | RNN |
|--------------------|----|----|----|
| **ReLU** (`nn.ReLU()`) | ✅ | ✅ | ❌ |
| **Leaky ReLU** (`nn.LeakyReLU()`) | ✅ | ✅ | ❌ |
| **Sigmoid** (`nn.Sigmoid()`) | ✅ | ✅ | ❌ |
| **Tanh** (`nn.Tanh()`) | ❌ | ❌ | ✅ |
| **Softmax** (`nn.Softmax()`) | ✅ | ❌ | ❌ |
| **ELU** (`nn.ELU()`) | ✅ | ✅ | ❌ |
| **SELU** (`nn.SELU()`) | ✅ | ❌ | ✅ |
| **GELU** (`nn.GELU()`) | ✅ | ✅ | ✅ |
| **Swish** (`nn.SiLU()`) | ✅ | ✅ | ❌ |


In [None]:
# Relu
self.relu = nn.ReLU()

# Sigmoid
self.sigmoid = nn.Sigmoid()

# Leaky relu
self.leaky_relu = nn.LeakyReLU(0.1)

# Tanh
self.tanh = nn.Tanh()

# Gelu
self.gelu = nn.GELU()

# ElU
self.elu = nn.ELU()

# SELU
self.selu = nn.SELU()

# Swish
self.silu = nn.SiLU()

#### Regularization



| Technique | ANN | CNN | RNN | When to Use | Why to Use |
|-----------|----|----|----|-------------|------------|
| **Batch Normalization** (`nn.BatchNorm1d`, `nn.BatchNorm2d`) | ✅ | ✅ | ❌ | When training deep networks, especially CNNs and ANNs | Speeds up training and stabilizes learning by normalizing inputs per mini-batch |
| **Layer Normalization** (`nn.LayerNorm`) | ✅ | ❌ | ✅ | When using RNNs and transformers | Normalizes across features instead of batches, making it useful for varying batch sizes |
| **Instance Normalization** (`nn.InstanceNorm1d`, `nn.InstanceNorm2d`) | ❌ | ✅ | ❌ | When working with style transfer and image generation | Normalizes per sample, effective for tasks where batch statistics vary |
| **Group Normalization** (`nn.GroupNorm`) | ❌ | ✅ | ❌ | When BatchNorm is ineffective due to small batch sizes | Normalizes across grouped channels instead of full batches |
| **Dropout** (`nn.Dropout`) | ✅ | ❌ | ✅ | When overfitting occurs in fully connected layers and RNNs | Randomly disables neurons to prevent co-adaptation and improve generalization |
| **Dropout2d** (`nn.Dropout2d`) | ❌ | ✅ | ❌ | When overfitting occurs in CNNs | Drops entire feature maps instead of individual neurons, improving feature independence |


In [None]:
#####################################################
################### Normalization ###################
#####################################################

self.bn1 = nn.BatchNorm1d(128)  # Batch Normalization

self.bn1 = nn.BatchNorm2d(32)

self.in2 = nn.InstanceNorm2d(64)

self.ln = nn.LayerNorm(hidden_size)

self.gn = nn.GroupNorm(num_groups=8, num_channels=64)

self.inorm = nn.InstanceNorm2d(32)

#####################################################
###################### Dropout ######################
#####################################################
# 
self.dropout = nn.Dropout(0.3)  # Dropout

self.dropout3d = nn.Dropout3d(0.3)

#### Weight Initialization

#### Embedding

#### Utilities and Tools

In [None]:
# Device Management

tensor = tensor.to('cuda')
model = model.to('cuda')


tensor = tensor.cuda()
model = model.cuda()

tensor = tensor.cpu()
model = model.cpu()

# Model Saving and Loading

torch.save(model.state_dict(), 'model.pth')

state_dict = torch.load('model.pth')

model.load_state_dict(torch.load('model.pth'))

#b Gradient Calculation
loss.backward()

with torch.no_grad():
    predictions = model(inputs)

gradient = tensor.grad

grads = torch.autograd.grad(loss, model.parameters())
