# Fighting the unstable gradients problem
Many of the tricks we used in DNN, can be also used for RNN's: **good parameter initialization**, **faster optimizers**, **dropout** .....
However, Relu's are not used very often in RNN's, because they can make the RNN more unstable due to the **exploding gradients problem**.
so, instead of Relu's, we use the **hyperbolic tangent activation function (tanh)** which is similar to the sigmoid function but outputs values between -1 and 1, so the mean of its output is much closer to zero, and so this helps reduce the exploding gradients problem.

Moreover, BatchNormalization doesnt work well with RNN's, because it is applied at each time step for inputs and hidden state, which will slow down training significantly. However, it is possible to use it between RNN layers, but it is not very common and slow down training also.

Another form of normalization is **Layer Normalization** which is similar to Batch Normalization, but instead of normalizing across the batch dimension, it normalizes across the features dimension.

In [1]:
import tensorflow as tf
class LNSimpleRNNCell(tf.keras.layers.Layer):
    def __init__(self, units, activation="tanh", **kwargs):
        super().__init__(**kwargs)
        self.state_size = units
        self.output_size = units
        self.simple_rnn_cell = tf.keras.layers.SimpleRNNCell(units, activation=None)
        
        self.layer_norm = tf.keras.layers.LayerNormalization()
        self.activation = tf.keras.activations.get(activation)
    
    def call(self, inputs, states):
        outputs, new_states = self.simple_rnn_cell(inputs, states)
        norm_outputs = self.activation(self.layer_norm(outputs))
        
        return norm_outputs, [norm_outputs]