## **1. List of Available Pretrained Models**

#### **1.1 Classification Models**

Use:

```python
from torchvision import models

# List all classification models
print(dir(models))
```

**Common pretrained classification models**:

| Model Family | Model Names                                                  |
| ------------ | ------------------------------------------------------------ |
| VGG          | `vgg11`, `vgg13`, `vgg16`, `vgg19`, and their `_bn` variants |
| ResNet       | `resnet18`, `resnet34`, `resnet50`, `resnet101`, `resnet152` |
| DenseNet     | `densenet121`, `densenet161`, `densenet169`, `densenet201`   |
| MobileNet    | `mobilenet_v2`, `mobilenet_v3_large`, `mobilenet_v3_small`   |
| EfficientNet | `efficientnet_b0` to `efficientnet_b7`                       |
| ViT          | `vit_b_16`, `vit_b_32`, `vit_l_16`, etc.                     |
| ConvNeXt     | `convnext_tiny`, `convnext_base`, etc.                       |
| RegNet       | `regnet_y_400mf`, `regnet_y_1_6gf`, etc.                     |
| SqueezeNet   | `squeezenet1_0`, `squeezenet1_1`                             |

---

#### **1.2 Segmentation Models**

Available under `torchvision.models.segmentation`:

```python
from torchvision.models import segmentation
print(dir(segmentation))
```

**Popular segmentation models**:

* `fcn_resnet50`
* `fcn_resnet101`
* `deeplabv3_resnet50`
* `deeplabv3_resnet101`
* `lraspp_mobilenet_v3_large`
---


## **2.Structured of Network in and Modular Components**
Models are often structured in modular components referred to as **Backbone**, **Neck**, and **Head**. These components organize how features are extracted, refined, and used for predictions.

Here's a breakdown of what each part typically does, along with related components:

---

####  2.1. **Backbone** – Feature Extractor

**What it is**:
The **backbone** is the main feature extractor. It takes the raw input (e.g., an image) and outputs high-level features.

**Examples**:

* **ResNet**, **VGG**, **EfficientNet**, **ViT** (Vision Transformer), **ConvNeXt**
* Trained on datasets like ImageNet for classification

**Output**:

* A feature map with reduced spatial resolution but rich semantic content (e.g., shape `[B, C, H/32, W/32]`)

**Usage**:

* Used across tasks: classification, detection, segmentation, etc.

---

####  2.2. **Neck** – Feature Refinement / Aggregation

**What it is**:
The **neck** connects the backbone to the head. It processes and refines feature maps—often enhancing multi-scale features or fusing spatial information.

**Common types**:

* **FPN (Feature Pyramid Network)**: Combines features at different resolutions
* **BiFPN (EfficientDet)**: Bidirectional FPN
* **PANet**: For better path aggregation
* **Transformer Encoders**: As necks in hybrid models

**Why use it**:

* Helps the model detect objects of different sizes
* Improves information flow between layers

---

####  2.3. **Head** – Task-Specific Prediction

**What it is**:
The **head** converts features into outputs (e.g., class labels, bounding boxes, masks).

**Examples**:

* **Classification head**: `Linear → Softmax`
* **Detection head** (e.g., YOLO): Predicts classes, bounding boxes, objectness score
* **Segmentation head**: Upsamples and predicts pixel-wise labels
* **Pose estimation head**: Keypoints or coordinates

**Output**:

* Final predictions shaped for the task (e.g., `[B, num_classes]` for classification)

---

```
Input Image
    ↓
[Preprocessor]
    ↓
Backbone → (Feature Maps)
    ↓
Neck     → (Enhanced Features)
    ↓
Head     → (Predictions: class/box/mask/etc.)
    ↓
[Post-processing]
    ↓
Final Output
```

---



## **3. Determining the required input size**


#### 3.1. **Use TorchVision Documentation or Model Summary**

The [official PyTorch documentation](https://pytorch.org/vision/stable/models.html) lists **default input sizes** for each pretrained model.

---


####  3.2 **Inspect the Model Internals**

For most models, like `resnet18`, you can inspect how many times the input is halved due to pooling/stride:

```python
from torchvision import models
from torchinfo import summary  # pip install torchinfo

model = models.resnet18(weights=models.ResNet18_Weights.IMAGENET1K_V1)
summary(model, input_size=(1, 3, 224, 224))
```

---

####  3.3 **Render the Model Diagram (Visualization)**

```python
input=torch.randn(size=[1,3,128,128])

resnet18_graph=torchviz.make_dot(resnet18(input) ,dict(resnet18.named_parameters()))
resnet18_graph.format='svg'
resnet18_graph.save('images/resnet18_graph')
resnet18_graph.render()
```

## **4. Input Image Size Different From The Pretrained Model Input**

If your input image size is **different** from what the pretrained model expects, you have **two main options**, depending on your task:


#### **4.1 Resize your input image to match the model**


```python
from torchvision import transforms

transform = transforms.Compose([
    transforms.Resize((224, 224)),  # Match model's expected input
    transforms.ToTensor(),
])
```

---
#### **4.2 Adapt the model to your image size**

**Advanced — use only if resizing hurts performance or semantics.**

- Use **adaptive pooling** in place of fixed `AvgPool2d` or `Linear` assumptions (e.g., in custom CNNs):

```python
nn.AdaptiveAvgPool2d((1, 1))  # Allows any input size
```

- **Replace classifier layers** if needed:

If your model fails because of mismatched `in_features` in `Linear`, do:

```python
# Forward pass dummy input to find flattened size
dummy_input = torch.randn(1, 3, your_H, your_W)
features = model.features(dummy_input)  # or model.backbone for ResNet
flattened_size = features.view(1, -1).shape[1]

# Replace classifier accordingly
model.classifier = nn.Sequential(
    nn.Linear(flattened_size, 256),
    nn.ReLU(),
    nn.Linear(256, num_classes),
)
```

When to use:

* You're training **from scratch** or fine-tuning a model deeply.
* Your data has **very different resolution** (e.g. medical images 512x512).
* You want to **preserve spatial details** for segmentation/localization.

---


## **5. Fine-tuning of a Pretrained Network Classifier**
#### 5.1  Determining Parameters Network,
First you have to know the parameters of your network, for instance:

```python
model_vgg19_bn = models.vgg19_bn(weights=models.VGG19_BN_Weights.IMAGENET1K_V1)
print(model_vgg19_bn)
```

This will give you the entire model features (covnet layer) + fully connected layer:

```bash
(features): Sequential(
    (0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (2): ReLU(inplace=True)
    (3): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    .
    .
    .
    
    (50): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (51): ReLU(inplace=True)
    (52): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)

(classifier): Sequential(
    (0): Linear(in_features=25088, out_features=4096, bias=True)
    (1): ReLU(inplace=True)
    (2): Dropout(p=0.5, inplace=False)
    (3): Linear(in_features=4096, out_features=4096, bias=True)
    (4): ReLU(inplace=True)
    (5): Dropout(p=0.5, inplace=False)
    (6): Linear(in_features=4096, out_features=1000, bias=True)

```

or 


```python
for param in model_vgg19_bn.features.parameters():
    print(param.shape)
```

gives you features (covnet layer):

```bash
torch.Size([64, 3, 3, 3])
torch.Size([64])
.
.
.
torch.Size([512, 512, 3, 3])
torch.Size([512])
torch.Size([512])
torch.Size([512])
```


For **ResNet18**, we have only covnet layer and 1 fully connected layer, input is 512 and output 1000 classes:

```python
resnet18 = models.resnet18(weights=models.ResNet18_Weights.IMAGENET1K_V1)
print("resnet18 input size: ", resnet18.fc.in_features)
print("resnet18 output size: ", resnet18.fc.out_features)
```

```bash
resnet18 input size:  512
resnet18 output size:  1000
```
---

#### 5.2 Freeze **all feature extractor layers**

```python
for p in model.parameters():
    p.requires_grad = False
```

This makes everything frozen (conv + bn + fc). Usually you then unfreeze the head.

#### 5.3 Freeze only the convolutional backbone (leave `fc` trainable)

```python
for p in model.fc.parameters():
    p.requires_grad = True   # classifier
for p in model.layer4.parameters():
    p.requires_grad = False  # example: freeze last block
```

#### 5.4 Unfreeze some block (e.g. `layer4`)

```python
for p in model.layer4.parameters():
    p.requires_grad = True
```

ResNet is organized like this:

```
model.conv1 -> model.bn1 -> model.layer1 -> model.layer2 -> model.layer3 -> model.layer4 -> model.fc
```

so you can target any block.

---

#### 5.5 Safely replace the head

ResNet18’s final FC (`model.fc`) outputs **1000 classes** (ImageNet).
You replace it with your own classifier:

```python
num_features = model.fc.in_features   # 512 for resnet18
num_classes = 5                       # example

model.fc = nn.Linear(num_features, num_classes)
```

This is the cleanest and most common way. The rest of the model stays intact.

---


#### 5.6 Replace the Final Classifier


```python
num_classes = 3  # your problem
in_features = model.fc.in_features

resnet18.fc = nn.Linear(in_features, num_classes)  # new classifier layer
```
---
                        
#### 5.7 Optimizer setup (important!)

If you froze parameters, make sure your optimizer only updates trainable ones:

```python
optimizer = torch.optim.Adam(
    (p for p in model.parameters() if p.requires_grad),
    lr=1e-3
)
```

---


## **6. When to  Learn Feature maps**

---

**Transfer Learning Modes**

| Mode                               | Freeze Feature Layers? | Fine-Tune Feature Layers?                     | Train Classifier? | When to Use                                                                                      |
| ---------------------------------- | ---------------------- | --------------------------------------------- | ----------------- | ------------------------------------------------------------------------------------------------ |
| **1. Feature Extraction (Frozen)** | ✅ Yes                  | ❌ No                                          | ✅ Yes             | When dataset is **small** and **similar** to ImageNet                                            |
| **2. Fine-Tuning Last Block**      | 🚫 No (partial)        | ✅ Last layers only (e.g., `layer4` in ResNet) | ✅ Yes             | When dataset is **moderate in size** and **domain-shifted**                                      |
| **3. Full Fine-Tuning**            | ❌ No                   | ✅ All conv layers                             | ✅ Yes             | When dataset is **large** or **significantly different** from ImageNet (e.g. medical, satellite) |
| **4. Training from Scratch**       | ❌ N/A                  | ✅ All layers randomly initialized             | ✅ Yes             | When you have a **huge custom dataset** and **no pretraining** is applicable                     |

---



## **7. `torch.nn.Identity`**
`torch.nn.Identity` is a simple module in PyTorch that **does nothing to its input** — it just returns it unchanged. It's often used as a **placeholder** when you want to **remove or skip a layer** in a model (e.g., when doing ablation studies, or when modifying pretrained models).

---


```python
import torch.nn as nn

identity = nn.Identity()
output = identity(input)
```

Here, `output` will be **exactly the same** as `input`.


When do we need `nn.Identity`?

####  7.1 **Ablation studies / removing layers**

If you're testing the effect of removing a layer:

```python
self.dropout = nn.Dropout(p=0.5) if use_dropout else nn.Identity()
```

####  7.2 **Replace classifier head**

If you load a pretrained model and want to keep everything except the final classification layer:

```python
model.fc = nn.Identity()  # For example in ResNet
```

####  7.3 **Skip connections or conditional architectures**

If you want to optionally add a layer, but still keep the same forward pass logic:

```python
self.extra = nn.BatchNorm1d(256) if use_bn else nn.Identity()
```


---

