### **Forward Propagation Through Time (FPTT) in RNN**

| **Aspect**                      | **Details**                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |
| ------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| **Definition**                  | Forward Propagation Through Time is the process by which an RNN computes the output for a sequence by **unrolling** the network across time steps and propagating information step-by-step.                                                                                                                                                                                                                                                                                                                                                                                                              |
| **Concept**                     | - At each time step $t$, the network takes the input $x_t$ and the previous hidden state $h_{t-1}$.<br>- It calculates the current hidden state $h_t$ and generates an output $y_t$.<br>- This process is repeated for all time steps in the sequence.                                                                                                                                                                                                                                                                                                                                                   |
| **Equations**                   | 1. **Hidden state update:**  $h_t = f(W_h h_{t-1} + W_x x_t + b_h)$ <br>2. **Output calculation:** $y_t = g(W_y h_t + b_y)$ <br>Where:<br>- $f$ is usually **tanh** or **ReLU**.<br>- $g$ is usually **softmax** or **sigmoid** depending on the task.                                                                                                                                                                                                                                                                                                                                                   |
| **How It Works**                | - The RNN is **unrolled** across $T$ time steps.<br>- Same parameters $(W_x, W_h, W_y, b_h, b_y)$ are used at each step.<br>- The hidden state $h_t$ carries information from all previous time steps.                                                                                                                                                                                                                                                                                                                                                                                                   |
| **Illustration (Steps)**        | **Step 1:** Input $x_1$ → Compute $h_1$ → Output $y_1$<br>**Step 2:** Input $x_2$ + Previous $h_1$ → Compute $h_2$ → Output $y_2$<br>**Step 3:** Input $x_3$ + Previous $h_2$ → Compute $h_3$ → Output $y_3$<br>… and so on up to $T$.                                                                                                                                                                                                                                                                                                                                                                   |
| **Key Characteristics**         | - Information flows **forward through time**, maintaining temporal dependencies.<br>- Past information is encoded in the hidden state.<br>- Weights remain shared across all time steps.                                                                                                                                                                                                                                                                                                                                                                                                                 |
| **Example Use Case**            | Predicting the next word in a sentence by sequentially processing each word and using previous context.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  |
| **Code Example (Forward Pass)** | `python\nimport numpy as np\n\n# Initialize parameters\nW_x = np.random.randn(4, 3)   # input to hidden\nW_h = np.random.randn(4, 4)   # hidden to hidden\nW_y = np.random.randn(2, 4)   # hidden to output\nb_h = np.zeros((4,1))\nb_y = np.zeros((2,1))\n\n# Inputs (3 features, 5 time steps)\nX = [np.random.randn(3,1) for _ in range(5)]\n\n# Initial hidden state\nh_prev = np.zeros((4,1))\n\n# Forward propagation through time\nfor x_t in X:\n    h_prev = np.tanh(np.dot(W_x, x_t) + np.dot(W_h, h_prev) + b_h)\n    y_t = np.dot(W_y, h_prev) + b_y\n    print(\"Output:\", y_t.ravel())\n` |
| **Limitations**                 | - Long sequences may cause information loss.<br>- Gradient vanishing/exploding during backpropagation.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   |

---
Here is the **Forward Propagation Through Time (FPTT)** code example in a **properly formatted structure**:

```python
import numpy as np

# Initialize parameters
W_x = np.random.randn(4, 3)   # Input-to-hidden weights (hidden_size x input_size)
W_h = np.random.randn(4, 4)   # Hidden-to-hidden weights (hidden_size x hidden_size)
W_y = np.random.randn(2, 4)   # Hidden-to-output weights (output_size x hidden_size)
b_h = np.zeros((4, 1))        # Hidden bias
b_y = np.zeros((2, 1))        # Output bias

# Input sequence (3 features, 5 time steps)
X = [np.random.randn(3, 1) for _ in range(5)]

# Initial hidden state
h_prev = np.zeros((4, 1))

# Forward propagation through time
for t, x_t in enumerate(X, 1):
    # Compute current hidden state
    h_prev = np.tanh(np.dot(W_x, x_t) + np.dot(W_h, h_prev) + b_h)
    
    # Compute current output
    y_t = np.dot(W_y, h_prev) + b_y
    
    print(f"Time Step {t}: Output = {y_t.ravel()}")
```

This code:

* Initializes weights and biases.
* Processes the input sequence across 5 time steps.
* Computes hidden states and outputs sequentially.
* Prints output at each time step, showing how information flows **forward through time**.

