<h1 align=center> Recurrent Neural Networks (RNNs) In Depth </h1>

- Recurrent Neural Networks (RNNs) are a class of artificial neural networks designed to recognize patterns in sequences of data, such as time series, speech, text, video, and more
- They have connections that form directed cycles, allowing them to maintain a hidden state that can capture information about previous inputs in a sequence. This makes RNNs suitable for tasks like time series prediction, natural language processing, and more

![rnn1.png](../Images/dl/rnn1.png)

### **Architecture**

- **Input Layer**: Takes the input data at each time step
- **Hidden Layer**: Processes the input and maintains a hidden state that gets updated recursively
- **Output Layer**: Produces the output for each time step based on the hidden state

![RNN.png](../Images/dl/RNN.png)

$$
h_{<t>} = g(W_{h}h_{<t-1>}  + W_{x}X_{<t>}  + b_h) \\ \hat{y}_{<t>} = g(W_{y}h_{<t>} +by)\\
$$

where:

- h_t: is the hidden state at time *t*
- *X<t>:* is the input at time *t*
- y-hat <𝑡>: is the output at time *t*
- 𝑊_h, W_x, *W_y:* are weight matrices
- 𝑏h, *by:* are bias terms
- *g:* is activation function (commonly tanh or ReLU in hidden layer and softmax or sigmoid for output)

### **Types of RNNs Architectures**

- In some RNN architecture, Tx equals Ty. In some other problems, they may not be equal so we need different architectures

**1. One-to-One RNN:**

![rnn2.png](attachment:rnn2.png)

- The above diagram represents the structure of the Vanilla Neural Network.  It is used to solve general machine learning problems that have only one input and output
- *Example: classification of images*

**2. One-to-Many RNN:**

![rnn3.png](attachment:rnn3.png)

- A single input and several outputs describe a one-to-many  Recurrent Neural Network. The above diagram is an example of this
- *Example: The image is sent into Image Captioning, which generates a sentence of words*

**3. Many-to-One RNN:**

![rnn4.png](attachment:rnn4.png)

- This RNN creates a single output from the given series of inputs
- *Example: Sentiment analysis is one of the examples of this type of network, in which a text is identified as expressing positive or negative feelings*

**4. Many-to-Many RNN:**

![rnn5.png](attachment:rnn5.png)

- This RNN receives a set of inputs and produces a set of outputs
- *Example: Machine Translation, in which the RNN scans any English text and then converts it to French*

### **Key Concepts**

1. **Sequential Data**: RNNs are designed to handle sequences of data, where the order of the data points matters. Examples include time series forecasting, natural language processing, and video analysis.
2. **Recurrent Connections**: Unlike traditional neural networks, RNNs have connections that form directed cycles. This means the network can retain information from previous inputs in the sequence, enabling it to maintain a 'memory' of past inputs.
3. **Hidden State**: RNNs maintain a hidden state vector that gets updated at each time step as new data points in the sequence are processed. This hidden state acts as a memory of previous inputs.

### **Types of RNNs**

1. **Vanilla RNN**: The simplest form, with a single hidden state and simple recurrent connections.
2. **Long Short-Term Memory (LSTM)**: A type of RNN designed to address the vanishing gradient problem, which makes it hard for vanilla RNNs to learn long-term dependencies. LSTMs use a set of gates (input, forget, and output gates) to control the flow of information.
3. **Gated Recurrent Unit (GRU)**: A simpler variant of LSTM that combines the forget and input gates into a single update gate.
4. **Bidirectional RNNs**: These RNNs process data in both forward and backward directions, making them suitable for tasks where context from both past and future is needed.
- **Attention Mechanisms**: Used to improve the performance of RNNs by allowing the model to focus on specific parts of the input sequence.
    
    `Note`:**Transformer Models**: While not RNNs, transformers have largely replaced RNNs in many NLP tasks due to their superior performance and parallelizability.
    

### Pros

- Effective for sequence modeling and time-dependent data
- Can capture temporal dynamics and long-term dependencies (especially LSTMs and GRUs)

### Cons

- Computationally intensive and challenging to train due to issues like the vanishing gradient problem
- Requires a lot of data for effective training
- Can be slower compared to feedforward networks due to recurrent connections