# üîπ Technique 1: Data Augmentation (Primary way to reduce overfitting)
### What problem it solves

Overfitting happens when:

CNN memorizes training images

Model performs well on train data but poorly on validation data

#### Data augmentation increases data diversity without collecting new data.

### Core idea

#### Artificially modify training images so the model sees different versions of the same image.

This forces the CNN to learn general features, not exact pixel patterns. 
##### ONLY on training data



## Method 1: Using Keras preprocessing layers

In [None]:
data_augmentation = tf.keras.Sequential([
    tf.keras.layers.RandomFlip("horizontal"),    # Flips image left ‚Üî right
    tf.keras.layers.RandomRotation(0.1),         # Rotates image randomly up to ¬±10%
    tf.keras.layers.RandomZoom(0.1),             # Randomly zooms in/out
    tf.keras.layers.RandomTranslation(0.1, 0.1)  # Shifts image horizontally and vertically
])


How to use it in a CNN model

In [None]:
model = tf.keras.Sequential([
    tf.keras.layers.Input(shape=(224, 224, 3)),

    data_augmentation,          # üëà augmentation applied here

    tf.keras.layers.Conv2D(32, (3,3), activation='relu'),
    tf.keras.layers.MaxPooling2D(),

    tf.keras.layers.Conv2D(64, (3,3), activation='relu'),
    tf.keras.layers.MaxPooling2D(),

    tf.keras.layers.GlobalAveragePooling2D(),
    tf.keras.layers.Dense(10, activation='softmax')
])


### Data augmentation reduces overfitting by artificially increasing training data diversity, forcing the CNN to learn invariant and generalizable features instead of memorizing exact patterns.

# üîπ Technique 2: Early Stopping
#### What problem it solves

Overfitting often happens when:

Training loss keeps decreasing

Validation loss starts increasing

This means the model has started memorizing training data.

#### Early Stopping stops training at the right time.

#### Core idea - Stop training when validation performance stops improving.

As a callback, Not inside the model

In [None]:
from tensorflow.keras.callbacks import EarlyStopping

early_stop = EarlyStopping(
    monitor='val_loss',       # Metric to observe
    patience=5,               # Number of epochs to wait after no improvement
    restore_best_weights=True # Restores weights from the epoch with lowest validation loss
)


How to use it in training

In [None]:
model.fit(
    train_ds,
    validation_data=val_ds,
    epochs=50,
    callbacks=[early_stop]
)


# üîπ Technique 3: Dropout Regularization
#### What problem it solves

Overfitting happens when:

Neurons rely too much on specific other neurons

The network memorizes patterns instead of generalizing

Dropout breaks these dependencies.
### Core idea - Randomly deactivate a fraction of neurons during training.

This forces the network to:

Learn redundant representations

Become more robust

Generalize better

Where Dropout is applied

- After Conv layers

- After Dense layers

In [None]:
model = tf.keras.Sequential([
    tf.keras.layers.Conv2D(32, (3,3), activation='relu'),
    tf.keras.layers.MaxPooling2D(),
    tf.keras.layers.Dropout(0.25),   # 25% neurons dropped

    tf.keras.layers.Conv2D(64, (3,3), activation='relu'),
    tf.keras.layers.MaxPooling2D(),
    tf.keras.layers.Dropout(0.25),

    tf.keras.layers.GlobalAveragePooling2D(),
    tf.keras.layers.Dense(128, activation='relu'),
    tf.keras.layers.Dropout(0.5),

    tf.keras.layers.Dense(10, activation='softmax')
])


# üîπ Technique 4: L2 Regularization (Weight Decay)
What problem it solves

Overfitting occurs when:

Model learns very large weights

Decision boundary becomes overly complex

Small noise causes large output changes

#### L2 regularization penalizes large weights.
### Core idea - Add a penalty proportional to the square of the weights to the loss function

Total Loss=Data Loss+Œª‚àëw^2

#### Where L2 regularization is applied

‚úÖ On Conv2D layers
‚úÖ On Dense layers

In [None]:
model = tf.keras.Sequential([
    tf.keras.layers.Conv2D(
        32, (3,3),
        activation='relu', 
        kernel_regularizer=tf.keras.regularizers.l2(1e-4)  # Strength of regularization(lamda) - 0.0001 ,Standard (most used)
    ),
    tf.keras.layers.MaxPooling2D(),

    tf.keras.layers.Conv2D(
        64, (3,3),
        activation='relu',
        kernel_regularizer=tf.keras.regularizers.l2(1e-4)
    ),
    tf.keras.layers.MaxPooling2D(),

    tf.keras.layers.GlobalAveragePooling2D(),
    tf.keras.layers.Dense(
        128,
        activation='relu',
        kernel_regularizer=tf.keras.regularizers.l2(1e-4)
    ),

    tf.keras.layers.Dense(10, activation='softmax')
])


#### Why L2 reduces overfitting

Shrinks large weights

Encourages smoother decision boundaries

Makes model less sensitive to noise

#### Statistically: Reduces variance without increasing bias too much

| L2                 | L1                      |
| ------------------ | ----------------------- |
| Shrinks weights    | Makes some weights zero |
| Smooth solution    | Sparse solution         |
| Preferred for CNNs | Rare in CNNs            |


### L2 regularization reduces overfitting by penalizing large weights, encouraging smoother and more generalizable models.

# üîπ Technique 5: Reduce Model Capacity (Simpler Architecture)
What problem it solves

Overfitting occurs when:

Model is too powerful

Dataset is small

Model memorizes training samples

#### A simpler model generalizes better.

## Methods:
1) Reduce no. of neurons (units) in dense layers
2) Reduce no. of kernels in Conv layers
3) Reduce kernel size, ie. Dimension (5,5) -> (3,3)
4) Replace Flatten layer to GlobalAveragePooling layer [while switching from conv layers to dense layers]
5) Reduce no. of layers (dense/conv)


### Why this reduces overfitting

Fewer parameters ‚Üí less memorization

Forces model to learn essential features

Reduces variance

When this technique is MOST useful

Small datasets

Training from scratch

Limited hardware

### Reducing model capacity lowers overfitting by limiting the number of parameters, preventing the model from memorizing training data.

# üîπ Technique 6: Batch Normalization
What problem it solves

During training:

Activations shift as weights update

Training becomes unstable

Model may overfit and converge poorly

#### Batch Normalization stabilizes learning and adds mild regularization.

#### Core idea - Normalize activations within a mini-batch to have zero mean and unit variance.

In [None]:
# basic syntax
tf.keras.layers.BatchNormalization()

In [None]:
model = tf.keras.Sequential([
    tf.keras.layers.Conv2D(32, (3,3), padding='same'),
    tf.keras.layers.BatchNormalization(),
    tf.keras.layers.Activation('relu'),
    tf.keras.layers.MaxPooling2D(),

    tf.keras.layers.Conv2D(64, (3,3), padding='same'),
    tf.keras.layers.BatchNormalization(),
    tf.keras.layers.Activation('relu'),
    tf.keras.layers.MaxPooling2D(),

    tf.keras.layers.GlobalAveragePooling2D(),
    tf.keras.layers.Dense(10, activation='softmax')
])


### Why Batch Normalization reduces overfitting

Adds noise via batch statistics

Reduces sensitivity to initialization

Acts as a weak regularizer

Improves generalization

| BatchNorm              | Dropout                  |
| ---------------------- | ------------------------ |
| Normalizes activations | Randomly removes neurons |
| Stabilizes training    | Prevents co-adaptation   |
| Mild regularization    | Strong regularization    |


# üîπ Technique 7: Learning Rate Reduction (ReduceLROnPlateau)
What problem it solves

Overfitting and poor generalization occur when:

Learning rate is too high

Model keeps making large weight updates

Validation loss stops improving but training continues

#### Reducing learning rate allows finer, more generalizable learning.

#### Core idea : Automatically reduce the learning rate when validation performance plateaus.

Applied as a callback

In [None]:
from tensorflow.keras.callbacks import ReduceLROnPlateau

lr_reduce = ReduceLROnPlateau(
    monitor='val_loss',
    factor=0.3,             # Multiplies current LR by this factor , Example: 1e-4 ‚Üí 3e-5
    patience=3, 
    min_lr=1e-6             # Lower bound for learning rate , Prevents LR from becoming uselessly small
)


How to use it during training

In [None]:
model.fit(
    train_ds,
    validation_data=val_ds,
    epochs=50,
    callbacks=[lr_reduce]
)


### Learning rate reduction mitigates overfitting by slowing weight updates when validation loss plateaus, enabling more stable and generalizable convergence.

# üîπ Technique 8: Transfer Learning & Freezing Layers
What problem it solves

Overfitting happens when:

Dataset is small

CNN has too many parameters

Model learns noise instead of general patterns

Transfer learning reduces overfitting by reusing pretrained knowledge.

### 1Ô∏è‚É£ Freezing layers
What it means

Prevents weight updates in pretrained layers

Reduces number of trainable parameters

##### Why it helps

Prevents overfitting on small datasets

Keeps learned generic features intact

In [None]:
base_model = tf.keras.applications.ResNet50(
    weights='imagenet',
    include_top=False,             # doesn't have the dense layers or whatever layer at the top
    input_shape=(224,224,3)
)

base_model.trainable = False       # makes loaded model untrainable, layers freezed


In [None]:
model = tf.keras.Sequential([     # adding custom head on the base model
    base_model,
    tf.keras.layers.GlobalAveragePooling2D(),
    tf.keras.layers.Dense(256, activation='relu'),
    tf.keras.layers.Dropout(0.5),
    tf.keras.layers.Dense(num_classes, activation='softmax')
])


In [None]:
# compile
model.compile(
    optimizer='adam',
    loss='categorical_crossentropy',
    metrics=['accuracy']
)

model.fit(train_ds, validation_data=val_ds, epochs=10)


### 2Ô∏è‚É£Fine-tuning
##### What it means

Unfreeze top layers of base model

Train with very small learning rate

##### Why it helps

Adapts pretrained features to your dataset

Still avoids overfitting

In [None]:
base_model.trainable = True          # unfreezes all the layers

for layer in base_model.layers[:-30]:     # makes the initial 30 layers untrainable, ie. freeze
    layer.trainable = False


### OR

In [None]:
for layer in base_model.layers[-30:]:     # makes the last 30 layers untrainable, ie. freeze
    layer.trainable = False

if we have conv1,2,3,....250 layers, lets say we do [-30:] meaning last 30, ie. 220-250

if we do [:-30] initial 30, 1-30 layers...

In [None]:
# Recompile with low LR
model.compile(
    optimizer=tf.keras.optimizers.Adam(1e-5),
    loss='categorical_crossentropy',
    metrics=['accuracy']
)


### Transfer learning mitigates overfitting by leveraging pretrained features and limiting the number of trainable parameters through layer freezing.

# üîπ Technique 9: Early Stopping
What problem it solves

Overfitting happens when:

Training loss keeps decreasing

Validation loss starts increasing

üëâ Model is memorizing training data.

### Early Stopping stops training at the optimal point.

In [None]:
early_stop = tf.keras.callbacks.EarlyStopping(
    monitor='val_loss',       # loss is monitored, we can also monitor val_accuracy 
    patience=5,
    restore_best_weights=True
)


In [None]:
model.fit(
    train_ds,
    validation_data=val_ds,
    epochs=50,
    callbacks=[early_stop]
)


# üîπ Technique 10: Reduce Learning Rate on Plateau
What problem it solves

During training:

Loss stops improving (plateau)

High learning rate prevents finer weight updates

Model starts oscillating or overfitting noise

###  Lowering LR at the right time allows better generalization.

### Core idea : Automatically reduce the learning rate when validation performance stops improving.

Unlike fixed LR schedules, this is adaptive.

#### Where it is used

‚úÖ Implemented as a callback
‚úÖ Passed into model.fit()

In [None]:
model.compile(
    optimizer=tf.keras.optimizers.Adam(learning_rate=1e-3),
    loss='categorical_crossentropy',
    metrics=['accuracy']
)


In [None]:
lr_reduce = tf.keras.callbacks.ReduceLROnPlateau(      # define the callback
    monitor='val_loss',
    factor=0.3,
    patience=3,
    min_lr=1e-6,
    verbose=1
)


In [None]:
model.fit(
    train_ds,
    validation_data=val_ds,
    epochs=50,
    callbacks=[lr_reduce]  
)


In [None]:
## technique 9 & 10 togather..
callbacks = [
    tf.keras.callbacks.EarlyStopping(
        monitor='val_loss',
        patience=7,
        restore_best_weights=True
    ),
    tf.keras.callbacks.ReduceLROnPlateau(
        monitor='val_loss',
        factor=0.3,
        patience=3,
        min_lr=1e-6
    )
]
model.fit(
    train_ds,
    validation_data=val_ds,
    epochs=50,
    callbacks=callbacks  
)


# Master Checklist

| #      | Technique                        | Where Applied      | What It Does                               | Why It Works              | Key Code / Keyword             |
| ------ | -------------------------------- | ------------------ | ------------------------------------------ | ------------------------- | ------------------------------ |
| **1**  | Data Augmentation                | Input data         | Creates modified versions of training data | Increases data diversity  | `RandomFlip`, `RandomRotation` |
| **2**  | Dropout                          | Hidden layers      | Randomly disables neurons                  | Prevents co-adaptation    | `Dropout(0.3‚Äì0.5)`             |
| **3**  | L2 Regularization (Weight Decay) | Kernel weights     | Penalizes large weights                    | Encourages simpler models | `kernel_regularizer=l2(1e-4)`  |
| **4**  | L1 Regularization                | Kernel weights     | Shrinks some weights to zero               | Feature selection         | `l1(1e-4)`                     |
| **5**  | Batch Normalization              | After conv / dense | Normalizes activations                     | Stabilizes learning       | `BatchNormalization()`         |
| **6**  | Model Capacity Reduction         | Architecture       | Reduces parameters                         | Limits memorization       | Fewer layers / filters         |
| **7**  | Learning Rate Control            | Optimizer          | Slows weight updates                       | Avoids fitting noise      | `Adam(lr=1e-4)`                |
| **8**  | Transfer Learning                | Pretrained CNN     | Reuses learned features                    | Fewer trainable params    | `base_model.trainable=False`   |
| **9**  | Early Stopping                   | Training loop      | Stops training early                       | Prevents memorization     | `EarlyStopping()`              |
| **10** | Reduce LR on Plateau             | Callback           | Lowers LR on stagnation                    | Refines convergence       | `ReduceLROnPlateau()`          |
