# Implement a custom layer that performs layer normalization
1. The `build()` method should define two trainable weights $\alpha$ and $\beta$, both of
shape `input_shape[-1:]` and data type `tf.float32`. $\alpha$ should be initialized
with `1s`, and $\beta$ with `0s`.
2. The `call()` method should compute the mean $\mu$ and standard deviation $\sigma$
of each instanceâ€™s features. For this, you can use `tf.nn.moments(inputs,
axes=-1, keepdims=True)`, which returns the mean $\mu$ and the variance $\sigma^2$
of all instances (compute the square root of the variance to get the standard
deviation). Then the function should compute and return,
$$\alpha \otimes \frac{X - \mu}{\sigma + \varepsilon} + \beta$$ where $\otimes$ represents itemwise multiplication (*) and $\varepsilon$ is a smoothing term (a small constant to avoid division by zero, e.g., 0.001).
4. Ensure that your custom layer produces the same (or very nearly the same)
output as the `tf.keras.layers.LayerNormalization layer`.

In [17]:
import numpy as np
import tensorflow as tf
from tensorflow.types.experimental import TensorLike

In [18]:
class MyLayerNormalization(tf.keras.layers.Layer):
    def __init__(self, epsilon: float = 1e-3, **kwargs) -> None:
        super().__init__(**kwargs)
        self.epsilon = epsilon

    def build(self, input_shape: tuple[int, ...]) -> None:
        self.alpha = self.add_weight(
            name= 'alpha', 
            shape= input_shape[-1:],
            initializer= 'ones',
            trainable= True,
            dtype= 'float32'
        )
        
        self.beta = self.add_weight(
            name= 'beta', 
            shape= input_shape[-1:],
            initializer= 'zeros',
            trainable= True,
            dtype= 'float32'
        )

    def call(self, X: TensorLike) -> TensorLike:
        mean, variance = tf.nn.moments(X, axes= -1, keepdims= True)
        return self.alpha * (X - mean) / (tf.sqrt(variance + self.epsilon)) + self.beta

    def get_config(self) -> dict:
        base_config = super().get_config()
        return {
            **base_config,
            'epsilon': self.epsilon
        }

In [19]:
(x_train, _), _ = tf.keras.datasets.mnist.load_data()
x = tf.cast(x_train[:128], tf.float32)
x = tf.reshape(x, [128, -1])  # shape: (128, 784)

In [20]:
custom_layer = MyLayerNormalization()
keras_layer = tf.keras.layers.LayerNormalization()

In [21]:
tf.reduce_mean(tf.keras.losses.MeanAbsoluteError()(
    keras_layer(x),
    custom_layer(x)
))

<tf.Tensor: shape=(), dtype=float32, numpy=3.012937099811097e-08>

In [24]:
tf.keras.utils.set_random_seed(42)
random_alpha = np.random.rand(x.shape[-1])
random_beta = np.random.rand(x.shape[-1])

custom_layer.set_weights([random_alpha, random_beta])
keras_layer.set_weights([random_alpha, random_beta])

tf.reduce_mean(tf.keras.losses.MeanAbsoluteError()(
    keras_layer(x), 
    custom_layer(x)
))

<tf.Tensor: shape=(), dtype=float32, numpy=1.8992215800039958e-08>

In both the scenarios, the error is negligible.