<a href="https://colab.research.google.com/github/Regina-Arthur/Coding-Practice-Projects/blob/main/Pytorch_tutorial/Chatgpt_challenge_Resnetstyle.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
import torch
import torch.nn as nn


##Instructions from ChatGPT
Nice — here’s a clean, step-by-step architecture you can build from scratch (no code, no pseudocode). I’ll pick a practical, modern, easy-to-train design: a compact ResNet-style CNN (ResNet-18 flavor) for image tasks. I give exact layer choices, channel counts, and how the spatial size changes so you can implement and verify shapes easily.

If you want a different goal (object detection, segmentation, smaller/larger model) tell me and I’ll give an adjusted plan.

Model: Compact ResNet-style CNN (good for classification)

Input assumption: RGB image, 3 x 224 x 224.
(If you use 448 x 448 or some other size, the same block layout works — only the spatial sizes after pooling change.)

1) Stem (initial feature extractor)

A convolution: kernel 7×7, filters 64, stride 2, padding 3.

Purpose: quickly expand channels and reduce spatial size a bit.

Batch Normalization on the 64 channels.

ReLU activation.

Max Pooling: 3×3 window, stride 2, padding 1.

After the stem the spatial size for 224→ becomes 56×56.

2) Residual stage 1 (output channels = 64)

Repeat 2 residual blocks (standard 2-conv blocks):

Each residual block contains:

Conv 3×3, channels 64, stride 1, padding 1 → BN → ReLU

Conv 3×3, channels 64, stride 1, padding 1 → BN

Add (identity) → ReLU after addition

No downsampling in this stage (identity skip is direct).

After stage 1: spatial 56×56, channels 64.

3) Residual stage 2 (output channels = 128)

Repeat 2 residual blocks:

First block in stage uses downsampling:

The block’s first conv uses stride 2 (or use a 1×1 conv with stride 2 in the skip path) to halve spatial size.

The identity/skip should also be projected (1×1 conv) to match channel count and stride.

Subsequent blocks in the stage use stride 1.

After stage 2: spatial 28×28, channels 128.

4) Residual stage 3 (output channels = 256)

Repeat 2 residual blocks:

First block downsamples (stride 2) and projects skip to match 256 channels.

Remaining blocks: stride 1.

After stage 3: spatial 14×14, channels 256.

5) Residual stage 4 (output channels = 512)

Repeat 2 residual blocks:

First block downsamples (stride 2) and projects skip to match 512 channels.

Remaining blocks: stride 1.

After stage 4: spatial 7×7, channels 512.

(This stage/block counts are the ResNet-18 pattern: [2,2,2,2]. You can increase blocks per stage for a deeper model.)

6) Classification head

Global average pooling across the 7×7 spatial dims → produces 512 values per sample (one per channel).

Optional dropout (e.g., 0.5) before the FC if you want more regularization.

Fully connected (linear) layer to num_classes outputs.

Use an appropriate final form for the task: softmax (for cross-entropy training on multi-class) — note: in practice you pass raw logits to a cross-entropy loss which applies softmax internally.

7) Useful internals / standard practices (where to put BN / activations)

Put BatchNorm after each convolution, before ReLU.

Use ReLU for nonlinearity throughout (or LeakyReLU if you prefer).

For shortcut projections (when channels/stride change) use a 1×1 conv + BN on the skip path.

Weight initialization: He/Kaiming for ReLU (standard frameworks set sensible defaults).

Shapes at each major milestone (for 224×224 input)

Input: 3 × 224 × 224

After stem conv (stride 2): 64 × 112 × 112

After stem maxpool (stride 2): 64 × 56 × 56

After stage1: 64 × 56 × 56

After stage2: 128 × 28 × 28

After stage3: 256 × 14 × 14

After stage4: 512 × 7 × 7

After global avg pool: 512

After FC: num_classes

(If input is 448×448 the final spatial dims will double at intermediate steps; adjust pooling to still get reasonable grid.)

Variants & options (pick one)

Smaller dataset / faster model: use fewer channels (32→64→128→256) and fewer blocks.

Deeper model: change block counts to ResNet-34 or ResNet-50 (ResNet-50 uses bottleneck blocks: 1×1 reduce → 3×3 → 1×1 expand).

If your task is detection (like YOLO): replace the classification head with a detection head that outputs a S×S×(B*5 + C) grid — the backbone (stages above) can be reused as the feature extractor.

If you want stronger regularization: add dropout, stronger data augmentation, weight decay, label smoothing.

Why this architecture?

Residual connections let you train deeper networks reliably and are easy to implement.

Doubling channels at each downsample balances representation capacity vs spatial resolution.

Global average pooling replaces big fully connected layers, reduces params and overfitting.

This architecture is a solid baseline: small-ish, trains well, and you can iterate (deeper/shallower) once you validate on data.

If you want a different specific target (tiny CNN for mobile, a full YOLO-style detection backbone, or a U-Net segmentation encoder/decoder) tell me which task and dataset size and I’ll give a tailored architecture plan the same way.

In [None]:
class model(nn.Module):
  def __init__(self):
    super().__init__
    self.layer1 = nn.Conv2d(in_channels = 3, out_channels = 64, kernel_size = 7, stride = 2, padding = 3)
    self.batchnorm = nn.BatchNorm2d(num_features = 64)
    self.relu = nn.ReLU()
    self.pooling = nn.MaxPool2d(kernel_size = 3, stride = 2, padding = 1)
    self.resblock = nn.Sequential((nn.Conv2d(in_channels = 64 , out_channels =64 , kernel_size = 3, stride = 1, padding = 1),
                                  nn.BatchNorm2d(num_features = 64),
                                  nn.ReLU()),
                                  (nn.Conv2d(in_channels = 64 , out_channels =64 , kernel_size = 3, stride = 1, padding = 1),
                                  nn.BatchNorm2d(num_features = 64),
                                  #we need to put a skip connection here, so we will define a function for that and add it
                                  nn.ReLU()))
    self.resblock = nn.Sequential((nn.Conv2d(in_channels = 64 , out_channels =64 , kernel_size = 3, stride = 1, padding = 1),
                                  nn.BatchNorm2d(num_features = 64),
                                  nn.ReLU()),
                                  (nn.Conv2d(in_channels = 64 , out_channels =64 , kernel_size = 3, stride = 1, padding = 1),
                                  nn.BatchNorm2d(num_features = 64),
                                  #we need to put a skip connection here, so we will define a function for that and add it
                                  nn.ReLU()))