## 🔄 **Recurrent Neural Network (RNN) Architecture – A Deep Dive!** 🔄  

RNNs are a special type of neural network designed to process **sequential data**, such as time-series data, speech, and text. Unlike traditional ANNs, RNNs have a **memory** that allows them to consider past inputs while processing current ones.



## 🏗️ **Basic RNN Architecture**  

RNNs are different from standard ANNs because they have a **feedback loop** that allows information to persist over time.

### 🔹 **Structure of a Simple RNN**  
The architecture consists of:  
1. **Input Layer**: Takes the input sequence.  
2. **Hidden Layer (Recurrent Neurons)**: Maintains a memory of previous states and updates at each time step.  
3. **Output Layer**: Produces the final prediction.

💡 **Key difference from ANN**: The hidden layer is connected to itself! This allows information to flow from previous time steps.

### 📌 **Mathematical Representation**  
At each time step **t**, the RNN updates its hidden state using:

$$
h_t = f(W_x x_t + W_h h_{t-1} + b)
$$

Where:  
- $ h_t $ = hidden state at time step $ t $  
- $ x_t $ = input at time step $ t $  
- $ h_{t-1} $ = previous hidden state  
- $ W_x $, $ W_h $ = weight matrices  
- $ b $ = bias  
- $ f $ = activation function (commonly **tanh** or **ReLU**)  

The output is computed as:

$$
y_t = g(W_y h_t + b_y)
$$

Where:  
- $ y_t $ = output at time step $ t $  
- $ W_y $ = weight matrix for output  
- $ g $ = activation function (softmax for classification, linear for regression)  



## 🔄 **Unrolling the RNN (Time Step Representation)**  

A simple RNN processes a sequence of inputs **one time step at a time**.  
For example, if we have a sequence **X = [x₁, x₂, x₃]**, the RNN unfolds like this:

```
x₁ → [h₁] → y₁
      ↘
x₂ → [h₂] → y₂
       ↘
x₃ → [h₃] → y₃
```
  
Here:  
- The hidden state **h** carries information from previous time steps.
- Each output $ y_t $ is computed based on the current hidden state.



## 🚧 **Challenges in Basic RNNs**  
RNNs are powerful, but they face some problems:

### ❌ **Vanishing Gradient Problem**  
- When training deep RNNs with many time steps, gradients shrink to near **zero** during backpropagation.  
- This makes it **hard to learn long-term dependencies** (i.e., remembering things from many time steps ago).

### ❌ **Exploding Gradient Problem**  
- If gradients grow **too large**, they can make the training unstable.

To solve these, we use **LSTMs (Long Short-Term Memory)** and **GRUs (Gated Recurrent Units)**.



## 🔥 **Variants of RNNs**
There are different types of RNN architectures:

1. **One-to-One (Vanilla RNN)**
   - Used for simple tasks like image classification.

2. **One-to-Many**
   - Example: Generating music 🎵 from a single note.

3. **Many-to-One**
   - Example: Sentiment analysis (classifying an entire sentence as "positive" or "negative").

4. **Many-to-Many**
   - Example: Machine translation (e.g., English → French).



## 🏆 **Key Takeaways**  
✅ RNNs are great for **sequential data** processing.  
✅ They have **memory**, unlike ANNs.  
✅ They suffer from **vanishing/exploding gradients** but can be improved with **LSTMs and GRUs**.  
✅ Used in **speech recognition, time-series forecasting, chatbots, and NLP tasks**.

---

# 🔄 **Forward Propagation in Recurrent Neural Networks (RNNs) – A Complete Breakdown!** 🔄  

Forward propagation in an RNN works differently from a standard Artificial Neural Network (ANN) because it processes **sequential data** while maintaining a **hidden state** that carries information from previous time steps.



## 🏗 **Basic Structure of RNN Forward Propagation**
Unlike traditional feedforward networks, where inputs are independent, an RNN processes inputs **sequentially**, maintaining a memory of past computations.

For each time step $ t $, the RNN performs the following computations:

1️⃣ **Compute the new hidden state $ h_t $ using the current input $ x_t $ and the previous hidden state $ h_{t-1} $.**  
2️⃣ **Compute the output $ y_t $ using the hidden state $ h_t $.**  
3️⃣ **Pass the hidden state to the next time step.**  



## 🔢 **Mathematical Formulation**
At each time step $ t $, forward propagation in an RNN follows these steps:

### 1️⃣ **Hidden State Update**
The hidden state $ h_t $ is calculated using the previous hidden state $ h_{t-1} $ and the current input $ x_t $:

$$
h_t = f(W_x x_t + W_h h_{t-1} + b_h)
$$

Where:
- $ h_t $ = hidden state at time step $ t $  
- $ x_t $ = input at time step $ t $  
- $ h_{t-1} $ = hidden state from the previous time step  
- $ W_x $ = weight matrix for input  
- $ W_h $ = weight matrix for previous hidden state  
- $ b_h $ = bias term  
- $ f $ = activation function (commonly **tanh** or **ReLU**)  

### 2️⃣ **Output Calculation**
The output $ y_t $ at time step $ t $ is computed as:

$$
y_t = g(W_y h_t + b_y)
$$

Where:
- $ y_t $ = output at time step $ t $  
- $ W_y $ = weight matrix for output  
- $ b_y $ = bias for output  
- $ g $ = activation function (e.g., **softmax** for classification tasks)  



## 📜 **Step-by-Step Forward Propagation Example**
Let's assume we have an RNN processing three time steps with inputs $ x_1, x_2, x_3 $.

### 🔄 **Unrolling the RNN**
Instead of viewing an RNN as a single network, we **unroll it** across time steps:

```
x₁ → [h₁] → y₁
      ↘
x₂ → [h₂] → y₂
       ↘
x₃ → [h₃] → y₃
```

### 🔢 **Step 1: Compute the first hidden state $ h_1 $**
$$
h_1 = f(W_x x_1 + W_h h_0 + b_h)
$$
- $ h_0 $ is typically initialized as a vector of zeros.

### 🔢 **Step 2: Compute the second hidden state $ h_2 $**
$$
h_2 = f(W_x x_2 + W_h h_1 + b_h)
$$
- The hidden state $ h_1 $ from the previous time step is used.

### 🔢 **Step 3: Compute the third hidden state $ h_3 $**
$$
h_3 = f(W_x x_3 + W_h h_2 + b_h)
$$

### 🔢 **Step 4: Compute outputs $ y_1, y_2, y_3 $**
$$
y_t = g(W_y h_t + b_y)
$$
- The output is calculated at each time step based on the hidden state.



## 🔥 **Key Observations**
✔ **Recurrent Connections**: The hidden state at each time step depends on the previous state.  
✔ **Shared Weights**: The same weight matrices $ W_x, W_h, W_y $ are used across all time steps, reducing complexity.  
✔ **Memory Effect**: The network retains past information, making it suitable for **sequential tasks** like speech recognition, language modeling, and time-series forecasting.  



## 💻 **Python Code Example**
Here’s how forward propagation in an RNN can be implemented using NumPy:

```python
import numpy as np

# Activation function (tanh)
def tanh(x):
    return np.tanh(x)

# Define input, weight matrices, and bias
x = np.array([[0.5], [0.2], [0.1]])  # Input at three time steps
W_x = np.array([[0.8]])  # Input weight
W_h = np.array([[0.5]])  # Recurrent weight
W_y = np.array([[1.0]])  # Output weight
b_h = np.array([[0.1]])  # Bias for hidden state
b_y = np.array([[0.2]])  # Bias for output

# Initialize hidden state
h = np.array([[0]])  # Start with zero hidden state

# Forward propagation
for t in range(len(x)):
    h = tanh(np.dot(W_x, x[t]) + np.dot(W_h, h) + b_h)  # Update hidden state
    y = np.dot(W_y, h) + b_y  # Compute output
    print(f"Time Step {t+1}: Hidden State: {h}, Output: {y}")
```



## 🚀 **Final Thoughts**
✅ **RNN forward propagation** processes inputs **one at a time** while maintaining memory.  
✅ **Key equations** involve computing the **hidden state** and **output** at each time step.  
✅ **Challenges**: Standard RNNs struggle with long sequences due to the **vanishing gradient problem**.  
✅ **Solution**: Use **LSTMs or GRUs** to improve long-term memory handling.  

---

### 🧮 **Manual Calculation of RNN Forward Propagation – Step-by-Step Example** 🔄  

Let's take a simple example of an **RNN with one neuron** to manually compute forward propagation for **three time steps**.



## **📝 Given Parameters**
We define a simple RNN where:

- **Input size = 1 (one feature per time step)**
- **Hidden state size = 1 (one neuron in hidden layer)**
- **Output size = 1 (one neuron in output layer)**
- **Sequence length = 3 (processing 3 time steps: $ x_1, x_2, x_3 $)**

#### 🎯 **Initial Values**
| Parameter | Value |
|-----------|-------|
| $ x_1, x_2, x_3 $ | $ 0.5, 0.2, 0.1 $ (input at each time step) |
| $ W_x $ | $ 0.8 $ (weight for input) |
| $ W_h $ | $ 0.5 $ (weight for hidden state) |
| $ W_y $ | $ 1.0 $ (weight for output) |
| $ b_h $ | $ 0.1 $ (bias for hidden state) |
| $ b_y $ | $ 0.2 $ (bias for output) |
| $ h_0 $ | $ 0 $ (initial hidden state) |

💡 **Activation function**: We use the **tanh** function:
$$
\tanh(z) = \frac{e^z - e^{-z}}{e^z + e^{-z}}
$$



## **📝 Forward Propagation Steps**
At each time step, we compute:

1️⃣ **Hidden state update**  
$$
h_t = \tanh(W_x x_t + W_h h_{t-1} + b_h)
$$

2️⃣ **Output calculation**  
$$
y_t = W_y h_t + b_y
$$



## **📊 Step-by-Step Computation**
### **⏳ Time Step 1 ($ t = 1 $)**
#### 🔹 Compute hidden state $ h_1 $:

$$
h_1 = \tanh(0.8 \times 0.5 + 0.5 \times 0 + 0.1)
$$

$$
h_1 = \tanh(0.4 + 0 + 0.1) = \tanh(0.5)
$$

Using $ \tanh(0.5) \approx 0.4621 $:

$$
h_1 \approx 0.4621
$$

#### 🔹 Compute output $ y_1 $:

$$
y_1 = 1.0 \times 0.4621 + 0.2
$$

$$
y_1 \approx 0.6621
$$



### **⏳ Time Step 2 ($ t = 2 $)**
#### 🔹 Compute hidden state $ h_2 $:

$$
h_2 = \tanh(0.8 \times 0.2 + 0.5 \times 0.4621 + 0.1)
$$

$$
h_2 = \tanh(0.16 + 0.2311 + 0.1) = \tanh(0.4911)
$$

Using $ \tanh(0.4911) \approx 0.4548 $:

$$
h_2 \approx 0.4548
$$

#### 🔹 Compute output $ y_2 $:

$$
y_2 = 1.0 \times 0.4548 + 0.2
$$

$$
y_2 \approx 0.6548
$$



### **⏳ Time Step 3 ($ t = 3 $)**
#### 🔹 Compute hidden state $ h_3 $:

$$
h_3 = \tanh(0.8 \times 0.1 + 0.5 \times 0.4548 + 0.1)
$$

$$
h_3 = \tanh(0.08 + 0.2274 + 0.1) = \tanh(0.4074)
$$

Using $ \tanh(0.4074) \approx 0.3863 $:

$$
h_3 \approx 0.3863
$$

#### 🔹 Compute output $ y_3 $:

$$
y_3 = 1.0 \times 0.3863 + 0.2
$$

$$
y_3 \approx 0.5863
$$



## **📌 Final Results**
| Time Step | $ x_t $ | $ h_t $ (Hidden State) | $ y_t $ (Output) |
|-----------|----------|----------------|----------------|
| $ t = 1 $ | $ 0.5 $ | $ 0.4621 $ | $ 0.6621 $ |
| $ t = 2 $ | $ 0.2 $ | $ 0.4548 $ | $ 0.6548 $ |
| $ t = 3 $ | $ 0.1 $ | $ 0.3863 $ | $ 0.5863 $ |

🎯 **Observation**:  
- The hidden state **carries information** from previous time steps, updating with each new input.
- The outputs are computed at each time step, making the RNN suitable for **sequential data** processing.



## **🔍 Summary**
✔ We **manually computed** RNN forward propagation step by step.  
✔ The **hidden state** maintains memory across time steps.  
✔ The **output at each step** depends on both the current input and previous hidden state.  
✔ **Activation function (tanh)** ensures values remain between $-1$ and $1$.  
✔ **Weights are shared** across all time steps, making the RNN efficient.  

---