# Kaiming Normal Initialization (He Initialization)

**The Core Problem: Bad Initialization**

When we train a neural network, each layer transforms inputs with weights:

ùë¶ = ùëäùë• + ùëè

If we choose weights poorly:

Activations can explode (grow too large)

or vanish (shrink toward 0)

or become identical (if all weights start equal, breaking learning symmetry)

This makes training slow or impossible.

So, we need to initialize weights carefully to keep signals flowing properly during forward and backward passes.

Developed by **Kaiming He et al. (2015)** for **ReLU networks**, this initialization ensures that:

- The **output variance** of each layer roughly matches the **input variance**
- This keeps activations ‚Äúalive‚Äù (not too big, not too small)

#### Formula

$$
W_{ij} \sim \mathcal{N}\left(0, \frac{2}{n_{\text{in}}}\right)
$$

Where:
- n: number of input units (inputs to a neuron)
- The factor **2** compensates for the fact that **ReLU** sets about half of the activations to zero.


| Initialization          | Effect                                        |
| ----------------------- | --------------------------------------------- |
| **Zeros**               | All neurons behave identically ‚Üí no learning  |
| **Random small values** | Might cause vanishing/exploding gradients     |
| **Kaiming Normal**      | Keeps activations well-scaled for ReLU layers |


Think of a deep network as a long pipe carrying information forward (activations) and backward (gradients).
If initialization is poor:

The signal fades out (too small weights)

Or blows up (too large weights)

Kaiming Normal keeps the flow balanced, especially for ReLU activations where half the neurons output 0.

| Concept                          | Meaning                               |
| -------------------------------- | ------------------------------------- |
| **Goal**                         | Prevent vanishing/exploding gradients |
| **Best for**                     | ReLU and LeakyReLU                    |
| **Weight distribution**          | Normal(0, 2 / fan_in)                 |
| **Bias**                         | Safe to initialize to 0               |
| **Alternative for Tanh/Sigmoid** | Xavier (Glorot) initialization        |
