In [1]:
#batch_normalization
"""
Normalization is putting the data on single scale to be avoid the instability while training the neural network and the exploding gradient problem. The data is standardized as per the norms like calculating the mean, etc.
Normalization is done on per layer basis and on the input and the output data in batches.

"""
#method 
tf.keras.layers.BatchNormalization(axis=-1, momentum=0.09, 
                                   
                                   epsilon=0.001, center=True, scale=True, beta_initializer="zeros", 
                                   gamma_initializer="ones", moving_mean_initializer="zeros", 
                                   moving_variance_initializer="ones", beta_regularizer=None, 
                                   gamma_regularizer=None, beta_constraint=None, gamma_constraint=None, 
                                   renorm=False, renorm_clipping=None, renorm_momentum=0.99, fused=None, 
                                   trainable=True, virtual_batch_size=None, adjustment=None, name=None)


#Note
#During training stage
"""
When using model.fit() method and set the training parameter as true.
The output (each channel) is normalized before passing it as an input to the subsequent layer.

Normalization formula:

(batch(input) - mean(bach))/ (variance(batch) + epsilon) * gamma + beta

-> epsilon is small constant 
-> gamma is a learned scaling factor (can be turned off by using scale=False)
-> beta is a learned offset factor (can be disabled by using center=False)
"""

#during inference stage
"""
when using the model.predict or model.evaluate() and setting the training argument as false.

the layer normalizes its output by using a moving avg mean and standard deviation of the batches it saw during the training.

#normalization formula

(batch- self.moving_mean) / self.moving_var + eps) * gamma + beta

-> self.moving_mean and self.moving_var are non trainable variables, i.e., they can't be optimized.

-> moving_mean = moving_mean * momentum + mean(batch) * (1-momentum)

-> moving_var = moving_var * momentum + var(batch) * (1-momentum)

"""

#arguments
"""
1. Axis - [integer] which axis is to be normalized

2. momentum - momentum for the moving average (mean & var)

3. epsilon - [float] added to variance

4. center - [boolean]. if true, beta is added as an offset

5. scale - [boolean]. if true, gamma is multiplied

6. renorm - [boolean]. whether to use batch normalization. it adds some extra variables during training.

7. renorm clipping- A dictionary that may map keys 'rmax', 'rmin', 'dmax' to scalar Tensors used to clip the renorm correction. The correction (r, d) is used as corrected_value = normalized_value * r + d, with r clipped to [rmin, rmax], and d to [-dmax, dmax]. Missing rmax, rmin, dmax are set to inf, 0, inf, respectively.

8. renorm_momentum: Momentum used to update the moving means and standard deviations with renorm. Unlike momentum, this affects training and should be neither too small (which would add noise) nor too large (which would give stale estimates). Note that momentum is still applied to get the means and variances for inference.

9. fused - [boolean and None] meant for fused implementation.

10. trainable - [boolean], if true the variables are marked as trainable

11. virtual_batch_size - [int, None(default)], for creating virtual batches for the ghost batch normalization.

12. adjustment - adjustment: A function taking the Tensor containing the (dynamic) shape of the input tensor and returning a pair (scale, bias) to apply to the normalized values (before gamma and beta), only during training. For example, if axis==-1, adjustment = lambda shape: ( tf.random.uniform(shape[-1:], 0.93, 1.07), tf.random.uniform(shape[-1:], -0.1, 0.1)) will scale the normalized value by up to 7% up or down, then shift the result by up to 0.1 (with independent scaling and bias for each feature but shared across all examples), and finally apply gamma and/or beta. If None, no adjustment is applied. Cannot be specified if virtual_batch_size is specified.

"""

#call arguments 

"""
1. inputs- input tensor of any rank

2. training- [boolean] indicating whether the layer shoud behave in training or inference mode.

"""

#notes

"""
1. integers, does not include the samples axis) when using this layer as the first layer in a model.

2. Output shape Same shape as input. 

3. About setting layer.trainable = False on a BatchNormalization layer:

-> The meaning of setting layer.trainable = False is to freeze the layer, i.e. its internal state will not change during training: its trainable weights will not be updated during fit() or train_on_batch(), and its state updates will not be run.

Usually, this does not necessarily mean that the layer is run in inference mode (which is normally controlled by the training argument that can be passed when calling a layer). "Frozen state" and "inference mode" are two separate concepts.

1. However, in the case of the BatchNormalization layer, setting trainable = False on the layer means that the layer will be subsequently run in inference mode (meaning that it will use the moving mean and the moving variance to normalize the current batch, rather than using the mean and variance of the current batch).

This behavior has been introduced in TensorFlow 2.0, in order to enable layer.trainable = False to produce the most commonly expected behavior in the convnet fine-tuning use case.

Note that: - This behavior only occurs as of TensorFlow 2.0. In 1.*, setting layer.trainable = False would freeze the layer but would not switch it to inference mode. 

- Setting trainable on an model containing other layers will recursively set the trainable value of all inner layers. 

- If the value of the trainable attribute is changed after calling compile() on a model, the new value doesn't take effect for this model until compile() is called again.

"""

NameError: name 'tf' is not defined