## ResNet 


ResNet-18 is a **lightweight convolutional neural network** of the family **ResNet-34, ResNet-50, and ResNet-152**. 

---

## **1. Core Strengths**

* **Feature extraction from images**
  ResNet-18 learns rich low-to-mid level visual features — edges, textures, shapes — and can generalize well to many computer vision tasks.
* **Transfer learning**
  Because of its pretrained availability on ImageNet, it’s often used as a backbone for other tasks, fine-tuning only the last few layers.
* **Efficiency**
  Fewer layers and parameters mean **faster inference** and **lower memory usage**, making it ideal for edge devices and embedded systems.

---

## **2. Tasks ResNet-18 is Well-Suited For**

| Category                             | Examples                                            | Why ResNet-18 Works Well                                                                           |
| ------------------------------------ | --------------------------------------------------- | -------------------------------------------------------------------------------------------------- |
| **Image Classification**             | CIFAR-10, ImageNet subset, medical images           | Pretrained weights + residual connections allow robust classification even with moderate datasets. |
| **Object Detection (as a backbone)** | Faster R-CNN, YOLO variants (lightweight configs)   | Good trade-off between speed and accuracy; lighter than ResNet-50.                                 |
| **Semantic Segmentation**            | U-Net / DeepLabV3 with ResNet-18 encoder            | Captures spatial features well without massive computation.                                        |
| **Face Recognition & Verification**  | FaceNet/ArcFace style embeddings                    | Can be trained to produce discriminative embeddings.                                               |
| **Medical Imaging**                  | X-ray, MRI, histopathology classification           | Handles 2D image modalities well; pretrained models transfer nicely.                               |
| **Embedded & Mobile Vision**         | On drones, robots, Raspberry Pi                     | Low latency and small memory footprint.                                                            |
| **Representation Learning**          | Self-supervised tasks (SimCLR, MoCo with ResNet-18) | Simpler backbone speeds up experimentation.                                                        |
| **Anomaly / Defect Detection**       | Industrial inspection                               | Good for feature extraction, combined with anomaly detection models.                               |

---

## **3. When NOT to Choose ResNet-18**

* If you need **state-of-the-art accuracy on very complex datasets** (e.g., full ImageNet competition, COCO detection) → deeper ResNets or vision transformers might do better.
* If the input has **extreme fine-grained details** (e.g., high-res satellite imagery), a deeper backbone often performs better.

---


In [16]:
import torch
import torchvision
import torchviz


#ResNet18_Weights.DEFAULT is equivalent to ResNet18_Weights.IMAGENET1K_V1. 
resnet18=torchvision.models.resnet18(weights='ResNet18_Weights.DEFAULT', progress= True)

## How to find model input size


In [17]:

print("resnet18 input size: ", resnet18.fc.in_features)
print("resnet18 output size: ",resnet18.fc.out_features)


resnet18 input size:  512
resnet18 output size:  1000


In [18]:
#  resnet18 has an averagepool layer at the end.
#  So the input size does not matter much provided the feature map size is greater than kernel size.

input=torch.randn(size=[1,3,128,128])

resnet18_graph=torchviz.make_dot(resnet18(input) ,dict(resnet18.named_parameters()))
resnet18_graph.format='svg'
resnet18_graph.save('images/resnet18_graph')
resnet18_graph.render()



'images/resnet18_graph.svg'

![](images/resnet18_graph.svg)

## Finetune the model on a new dataset with 10 labels


Let’s say we want to finetune the model on a new dataset with `10` labels. In resnet, the classifier is the last linear layer `model.fc.` We can simply replace it with a new linear layer (unfrozen by default) that acts as our classifier.


```python
for params in resnet18.parameters():
    params.requiers_gard=False

resnet18.fc=torch.nn.Linear(512,10)

```

Now all parameters in the model, except the parameters of `model.fc`, are frozen. The only parameters that compute gradients are the `weights` and `bias` of `model.fc.`

```python
optimizer=torch.optim.SGD(resnet18.fc.parameters(),lr=1e-2,momentum=0.9)
```

## Black and white Image Input

The pretrained ResNet-18 expects 3-channel RGB at `conv1`. With a single-channel (monochrome) input you have a few good options:

### 1) Duplicate the channel (fastest, no weight surgery)

Preprocess your 1-channel image to 3 channels by repeating it. `transforms.Grayscale`:
- If num_output_channels == 1 : returned image is single channel
- If num_output_channels == 3 : returned image is 3 channel with r == g == b

```python
# during transforms
from torchvision import transforms

tfm = transforms.Compose([
    transforms.Grayscale(num_output_channels=3),  # 1→3 by duplication
    transforms.Resize(256),
    transforms.CenterCrop(224),
    transforms.ToTensor(),
    transforms.Normalize(mean=(0.485,0.456,0.406), std=(0.229,0.224,0.225)),  # ImageNet stats
])
```

Pros: zero code change to the model; you keep pretrained weights intact.
Cons: a tiny bit redundant, but works very well in practice.

### 2) Replace `conv1` with 1 input channel and **port** pretrained weights

Average the RGB kernels into a single channel:

```python
import torch
import torchvision.models as models
import torch.nn as nn

m = models.resnet18(weights='models.ResNet18_Weights.IMAGENET1K_V1')
w = m.conv1.weight  # [64, 3, 7, 7]

m.conv1 = nn.Conv2d(1, 64, kernel_size=7, stride=2, padding=3, bias=False)

with torch.no_grad():
    m.conv1.weight[:] = w.mean(dim=1, keepdim=True)  # [64,1,7,7]
```

(You can also use a weighted sum like `0.2989 R + 0.5870 G + 0.1140 B` instead of `.mean`.)

Pros: no wasted computation; uses pretrained filters sensibly.
Cons: a tiny bit of code; but this is the cleanest if you’re truly single-channel end-to-end.

### 3) Add a learnable 1→3 adapter in front

Keep the pretrained model intact and learn a shallow mapping:

```python
class GrayToRGB(nn.Module):
    def __init__(self):
        super().__init__()
        self.map = nn.Conv2d(1, 3, kernel_size=1, bias=False)
        nn.init.constant_(self.map.weight, 1/3)  # start as “repeat”

    def forward(self, x):
        return self.map(x)

adapter = GrayToRGB()
model = models.resnet18(weights=models.ResNet18_Weights.IMAGENET1K_V1)
full = nn.Sequential(adapter, model)
```

Pros: lets the network learn the best 1→3 projection.
Cons: a few extra params; slightly more moving parts.

---

### Normalization notes

* If you **duplicate to 3-ch**, you can keep ImageNet mean/std as above, or compute dataset-specific stats and use those for all 3 channels (same numbers repeated).
* If you **switch to 1-ch conv1**, use single mean/std (e.g., `(mean,)` and `(std,)`) matching your grayscale dataset.

### Which should you pick?

* **Quick wins / transfer learning**: Option **1** (duplicate) is perfectly fine and very common.
* **Purist, minimal compute**: Option **2** (port weights) is elegant and usually performs best.
* **Data shift concerns** (e.g., MRI/CT with unusual intensity): Option **3** gives flexibility; also consider dataset-specific normalization and fine-tuning early layers.

### How to train the option 3 Network (learnable 1→3 adapter in front)

With option 3 (a learnable 1→3 “adapter” in front of a pretrained ResNet-18), you’ve got three common training strategies. Pick one based on data size and how different your grayscale data is from ImageNet.

### A. Freeze backbone first, train adapter + head (safe start)

1–3 epochs:

* Freeze **all** ResNet18 params.
* Train only the `GrayToRGB` adapter and the final `fc`.

Then unfreeze the backbone (optionally with a lower LR) and fine-tune.

```python
import torch.nn as nn
import torchvision.models as models

class GrayToRGB(nn.Module):
    def __init__(self):
        super().__init__()
        self.map = nn.Conv2d(1, 3, kernel_size=1, bias=False)
        nn.init.constant_(self.map.weight, 1/3)

    def forward(self, x): return self.map(x)

backbone = models.resnet18(weights=models.ResNet18_Weights.IMAGENET1K_V1)
backbone.fc = nn.Linear(backbone.fc.in_features, n_classes)  # replace head

model = nn.Sequential(GrayToRGB(), backbone)

# --- Phase 1: freeze backbone ---
for p in backbone.parameters():
    p.requires_grad = False

# optimize only adapter + fc
params = list(model[0].parameters()) + list(backbone.fc.parameters())
optimizer = torch.optim.Adam(params, lr=1e-3, weight_decay=1e-4)
```

After a few epochs:

```python
# --- Phase 2: unfreeze backbone with smaller LR ---
for p in backbone.parameters():
    p.requires_grad = True

optimizer = torch.optim.Adam([
    {"params": model[0].parameters(), "lr": 1e-3},          # adapter
    {"params": backbone.layer1.parameters(), "lr": 5e-4},
    {"params": backbone.layer2.parameters(), "lr": 2.5e-4},
    {"params": backbone.layer3.parameters(), "lr": 1.25e-4},
    {"params": backbone.layer4.parameters(), "lr": 1.25e-4},
    {"params": backbone.fc.parameters(),     "lr": 1e-3},
], weight_decay=1e-4)
```

### B. Train everything, but with **discriminative learning rates** (faster)

Good when you have a moderate dataset and want quick convergence without a freezing phase.

```python
optimizer = torch.optim.AdamW([
    {"params": model[0].parameters(),           "lr": 1e-3},  # adapter highest
    {"params": backbone.layer1.parameters(),    "lr": 5e-4},
    {"params": backbone.layer2.parameters(),    "lr": 3e-4},
    {"params": backbone.layer3.parameters(),    "lr": 2e-4},
    {"params": backbone.layer4.parameters(),    "lr": 2e-4},
    {"params": backbone.fc.parameters(),        "lr": 1e-3},  # head high
], weight_decay=1e-4)
```

### C. Unfreeze progressively (“gradual unfreezing”)

Start with only adapter+fc, then unfreeze layers one block at a time every few epochs (layer4 → layer3 → …). This is handy with small datasets.

---

### Do we freeze the adapter conv?

* **Usually not.** Let it learn a smart projection beyond simple channel copy.
* If the dataset is tiny and unstable, you *can* freeze it for the first few hundred steps.

### BatchNorm tips

* If batch size is small (≤16), consider putting the backbone’s BN layers in **eval** mode during early training:

  ```python
  def set_bn_eval(m):
      if isinstance(m, nn.BatchNorm2d):
          m.eval()
  backbone.apply(set_bn_eval)
  ```

  (Params can still be trainable; this just freezes running stats.)

### Weight decay hygiene (optional but nice)

Avoid weight decay on BN and bias:

```python
decay, no_decay = [], []
for n, p in model.named_parameters():
    if not p.requires_grad: continue
    if n.endswith('bias') or 'bn' in n.lower():
        no_decay.append(p)
    else:
        decay.append(p)
optimizer = torch.optim.AdamW([
    {"params": decay, "weight_decay": 1e-4, "lr": 3e-4},
    {"params": no_decay, "weight_decay": 0.0, "lr": 3e-4},
])
```

### Schedulers (keep it simple)

* **Cosine with warmup** or **OneCycleLR** both work well:

```python
scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max=50)
# or:
# scheduler = torch.optim.lr_scheduler.OneCycleLR(optimizer, max_lr=1e-3, steps_per_epoch=len(loader), epochs=E)
```

### Quick guidance

* **Small dataset / big domain shift (e.g., MRI):** A or C.
* **Moderate dataset / some domain shift:** B with discriminative LRs.
* **Plenty of data:** Train all, normal LRs, standard fine-tune.

