# **RNN**
A Recurrent Neural Network (RNN) is a type of neural network designed for sequential data (e.g., time series, text, audio). Unlike traditional neural networks (which treat each input independently), RNNs are designed to handle dependencies between elements in a sequence by maintaining a hidden state.

### **Key Idea**
* RNNs have a loop that allows information to persist over time.
* The output at time step t depends not only on the input at time t but also on the output from the previous time step t-1.

### **Mathematical Formulation**
At each time step t, and RNN updates its hidden state h_t using:
$$ h_t = tanh(W_{hh}h_t + W_{xh}x_t + b_h) $$
Where: <br>
* $h_t$ = hidden state at time step t
* $x_t$ = input at time step t
* $W_{hh}, W_{xh}$ = weight matrices
* $b_h$ = bias term
* $tanh$ = activation function (usually tanh or ReLU)

The output is computed as:
$$ y_t = W_{hy}h_t + b_y $$

Where: <br>
* $y_t$ = output at time step t
* $W_{hy}$ = weight matrix for output
* $b_y$ = output bias

## How Information is Passed Through Time:
1. An initial hidden state $h_0$ is created (often initialized to zero)
2. The input at each time step is combined with the hidden state from the previous step.
3. The new hidden state is passed to the next time step.
4. The output is generated based on the current hidden state.

### **Strength of RNNS**
* Good for processing sequences of variable length
* Captures short-term dependencies
* Works well for simple sequential tasks

### **Problems with RNNs**
1. Vanishing Gradient Problem:
    * Gradient shrink too much during backpropagation, making it hard to update weights.
2. Exploding Gradient Problem:
    * Gradients grow too large and cause instability in training.
3. Short-Term Memory Issue:
    * RNNs struggle to remember long-term dependencies.


# **LSTM**
LSTM (Long Short-Term Memory) networks were introduced to solve the vanishing gradient problem and improve the ability to model long-term dependencies.

### **Key Idea**
* LSTM introduces a more complex memory unit called a cell state.
* LSTM regulates the flow of information using three key gates:
    * Forget Gate - Decides what information to forget
    * Input Gate - Decides what new information to store
    * Output Gate - Decides what to output based on the cell state


### **Mathematical Formulation**
At each time step t, LSTM computes:
1. Forget Gate: <br> Decides what information from the previous cell state to forget $$f_t = \sigma(W_f*[h_{t-1}, x_t] + b_f) $$
2. Input Gate: <br> Decides what new information to store in the cell state: $$i_t = \sigma(W_i*[h_{t-1},x_t] + b_i) $$ $$C_t = tanh(W_c*[h_{t-1},x_t] +b_c)$$
3. Cell State Update: <br> Update the cell state: $$C_t = f_t*C_{t-1} + i_t*C_t$$
4. Output Gate: <br> Decide what to output: $$o_t = \sigma(W_o*[h_{t-1},x_t] + b_o$$ $$h_t = o_t*tanh(C_t)$$
where: <br>
* $f_t, i_t, o_t = forget, input, and output gate activations$
* $\sigma$ = sigmoid activation function (range: 0 to 1)
* $C_t$ = candidate cell state update

### **Why LSTM Works Better Than RNN**
* Better at capturing long-term dependencies
* Solves vanishing gradient problem using forget gates
* Selectively remembers useful information


### **When to Use What**
* Use RNN - If the sequence is short or dependencies are simple
* Use LSTM - If the sequence is long and requires learning long-term dependencies.