# 🧠 What is Batch Normalization?

Batch Normalization (BatchNorm) is a technique in deep learning used to normalize the inputs of each layer. It helps the network train faster, become more stable, and often improves performance.

## ❓ Why Use Batch Normalization?

Deep neural networks can face issues like:

- Internal Covariate Shift: The distribution of inputs to layers keeps changing during training.
- Vanishing/Exploding Gradients: Hard to train deep networks.
- Sensitivity to Weight Initialization and Learning Rate.

## ✅ Batch Normalization helps by:

- Normalizing layer inputs (mean ≈ 0, std ≈ 1)
- Smoothing and accelerating training
- Allowing higher learning rates
- Acting like regularization (similar to Dropout)

## 🔢 How Batch Normalization Works (Step-by-Step)

Given a mini-batch of inputs x = [x₁, x₂, ..., xₙ]:

1. Compute the batch mean:  
   μ = mean(x)

2. Compute the batch variance:  
   σ² = variance(x)

3. Normalize:  
   xₙₒᵣₘ = (x - μ) / √(σ² + ε)

4. Scale and shift:  
   y = γ * xₙₒᵣₘ + β

Where:
- γ (gamma) is a trainable scaling parameter
- β (beta) is a trainable shifting parameter
- ε (epsilon) is a small constant to avoid division by zero

## 🧪 Keras Example with Batch Normalization

```python
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, BatchNormalization

model = Sequential()
model.add(Dense(3, activation='relu', input_dim=2))
model.add(BatchNormalization())
model.add(Dense(2, activation='relu'))
model.add(BatchNormalization())
model.add(Dense(1, activation='sigmoid'))
model.summary()
```

## Explanation:
- BatchNormalization is used after each Dense layer.
- It normalizes outputs and then applies trainable scale and shift.

## 🧪 PyTorch Equivalent Example

```python
import torch.nn as nn

model = nn.Sequential(
    nn.Linear(2, 3),
    nn.BatchNorm1d(3),
    nn.ReLU(),
    nn.Linear(3, 2),
    nn.BatchNorm1d(2),
    nn.ReLU(),
    nn.Linear(2, 1),
    nn.Sigmoid()
)
```

## 🧱 Typical Block Order in a Network

**Standard Architecture:**  
`Linear/Conv Layer → BatchNorm → Activation (ReLU, etc.)`

## ✅ Benefits of Batch Normalization

- **Stabilizes learning** by keeping inputs normalized
- **Reduces internal covariate shift** (changes in layer input distributions)
- **Accelerates training** and enables use of higher learning rates
- **Provides regularization** effect similar to Dropout
- **Reduces dependency** on Dropout layers

## ⚠️ Important Notes

### Training vs Inference Behavior:
- **During training:** Uses batch statistics (mean and variance of current batch)
- **During testing:** Uses running averages collected during training

### Practical Considerations:
- **Batch size matters:** Works best with batches ≥ 16-32 samples
- **Placement matters:** Typically inserted before activation functions
- **Memory overhead:** Requires storing additional parameters (γ, β) per layer

## 📚 In Summary

Batch Normalization is a fundamental technique that:
- **Improves training stability** in deep networks
- **Reduces sensitivity** to initialization and hyperparameters
- **Enables faster convergence** through normalized layer inputs
- **Widely adopted** across CNN, MLP and RNN architectures
- **Works best** when placed before activation functions in the network flow

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

In [2]:
df = pd.read_csv('/content/concertriccir2.csv')

FileNotFoundError: [Errno 2] No such file or directory: '/content/concertriccir2.csv'

In [None]:
df.head()

In [None]:
plt.scatter(df['X'],df['Y'],c=df['class'])

In [None]:
X = df.iloc[:,0:2].values
y = df.iloc[:,-1].values

In [None]:
import tensorflow
from tensorflow import keras
from keras import Sequential
from keras.layers import Dense
from tensorflow.keras.layers import BatchNormalization

In [None]:
model = Sequential()

model.add(Dense(2,activation='relu',input_dim=2))
model.add(Dense(2,activation='relu'))
model.add(Dense(1,activation='sigmoid'))

model.summary()

In [None]:
model.compile(loss='binary_crossentropy',optimizer='adam',metrics=['accuracy'])

In [None]:
history1 = model.fit(X,y,epochs=200,validation_split=0.2)

In [None]:
model = Sequential()

model.add(Dense(3,activation='relu',input_dim=2))
model.add(BatchNormalization())
model.add(Dense(2,activation='relu'))
model.add(BatchNormalization())
model.add(Dense(1,activation='sigmoid'))

model.summary()

In [None]:
model.compile(loss='binary_crossentropy',optimizer='adam',metrics=['accuracy'])

In [None]:
history2 = model.fit(X,y,epochs=200,validation_split=0.2)

In [None]:
plt.plot(history1.history['val_accuracy'],color='black')
plt.plot(history2.history['val_accuracy'],color='green')