# Layer Normalization

#### Step 1: Think of some numbers as data
Let’s say you have 2 data points.
Each has 5 numbers.

In [3]:
# # These are like two sentences, each with 5 words (numbers)
data = [
    [0.5, -1.2, 3.3, 0.8, -0.5],   # Data point 1
    [2.0, -0.7, 1.2, -3.5, 0.6]    # Data point 2
]
print("Data:", data)

Data: [[0.5, -1.2, 3.3, 0.8, -0.5], [2.0, -0.7, 1.2, -3.5, 0.6]]


#### Step 2: Calculate the mean of each data point

In [8]:
# # Mean = (sum of numbers) / (how many numbers)
mean1 = sum(data[0]) / len(data[0])
mean2 = sum(data[1]) / len(data[1])

print("Mean of data point 1:", mean1)
print("Mean of data point 2:", mean2)

Mean of data point 1: 0.58
Mean of data point 2: -0.08


#### Step 3: Calculate how spread out the numbers are (variance)

In [10]:
# # Variance = average of (each number - mean) squared
var1 = sum((x - mean1) ** 2 for x in data[0]) / len(data[0])
var2 = sum((x - mean2) ** 2 for x in data[1]) / len(data[1])

print("Variance of data point 1:", var1)
print("Variance of data point 2:", var2)

Variance of data point 1: 2.3575999999999997
Variance of data point 2: 3.7016


#### Step 4: Apply Layer Normalization (balance the numbers)

In [13]:
# Formula: (number - mean) / sqrt(variance)
import math

normalized1 = [(x - mean1) / math.sqrt(var1 + 1e-5) for x in data[0]]
normalized2 = [(x - mean2) / math.sqrt(var2 + 1e-5) for x in data[1]]

print("Normalized data point 1:", normalized1)
print("Normalized data point 2:", normalized2)

Normalized data point 1: [-0.05210195320817495, -1.1592684588818931, 1.771466409077949, 0.14328037132248123, -0.7033763683103622]
Normalized data point 2: [1.081105321539628, -0.3222525477666199, 0.6652955824859249, -1.7775866344545805, 0.3534382781956476]


#### Step 5: Check whether the above steps worked

In [16]:
# Check new mean and variance
mean1_after = sum(normalized1) / len(normalized1)
var1_after = sum((x - mean1_after) ** 2 for x in normalized1) / len(normalized1)

mean2_after = sum(normalized2) / len(normalized2)
var2_after = sum((x - mean2_after) ** 2 for x in normalized2) / len(normalized2)

print("After normalization — Mean 1:", mean1_after, ", Variance 1:", var1_after)
print("After normalization — Mean 2:", mean2_after, ", Variance 2:", var2_after)

After normalization — Mean 1: 1.3877787807814458e-18 , Variance 1: 0.9999957584163625
After normalization — Mean 2: 2.2204460492503132e-17 , Variance 2: 0.9999972984728265


#### Meaning of the output
- Mean 1: 1.3877787807814458e-18
   **This means 0.000000000000000001387..., which is practically zero**

- Variance 1: 0.9999957
  **This is very close to 1, just a tiny bit off due to floating-point precision**

- Similarly for Mean 2 and Variance 2,  they are 0 mean and variance of 1.

#### No matter how messy the numbers were, 

- **Mean becomes 0**

- **Variance becomes 1**

The data is clean, balanced, and ready , just like giving everyone on a Zoom call the same volume level.