<a href="https://colab.research.google.com/github/Neil-Cardoz/Deep-Learning/blob/main/DLL_LAB_RNN_ARCHITECTURE.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Fundamentals of RNN

A Recurrent Neural Network (RNN) is designed to handle sequential data by using a hidden state that carries information across time steps.

## Objectives

1. Study the fundamentals of Recurrent Neural Networks (RNNs).  
2. Understand how hidden states store information across time steps.  
3. Implement a simple RNN from scratch using NumPy.  
4. Learn forward propagation through sequences.  
5. Learn backward propagation through time (BPTT) for training.  
6. Understand how gradients update weights and biases in RNNs.

# Recurrent Neural Networks (RNNs)

**Recurrent Neural Networks (RNNs)** are a class of neural networks designed to work with **sequential data**, such as time-series, text, or speech. Unlike feedforward networks, RNNs **maintain a hidden state** that acts as memory, allowing the network to capture dependencies across time steps.

## Key Concepts

* **Hidden State (`h_t`)**
  The hidden state stores information from previous inputs and is updated at each time step using the formula:

  ```
  h_t = tanh(Wxh * x_t + Whh * h_(t-1) + bh)
  ```

  * `x_t` = input at time t
  * `h_(t-1)` = hidden state from previous time step
  * `Wxh`, `Whh` = weights
  * `bh` = bias

* **Output (`y_t`)**
  Computed from the hidden state:

  ```
  y_t = Why * h_t + by
  ```

* **Backpropagation Through Time (BPTT)**
  Gradients are propagated backward through all time steps to update weights and biases. This allows the network to learn temporal patterns.

* **One-Hot Encoding**
  Inputs are often represented as one-hot vectors when dealing with discrete sequences (e.g., characters or words).

## Advantages

* Can model temporal dependencies
* Suitable for sequential tasks like text generation, speech recognition, and time-series forecasting

## Limitations

* Difficulty learning long-term dependencies due to **vanishing gradients**
* Can be slow to train on long sequences



## Key Concepts in the Code

* **Weights & Biases**

  * `Wxh` – Input → Hidden
  * `Whh` – Hidden → Hidden (recurrent)
  * `Why` – Hidden → Output
  * `bh`, `by` – Biases

* **Forward Pass**
  Computes the hidden state for each time step:
  `h_t = tanh(Wxh * x_t + Whh * h_(t-1) + bh)`
  Output is then calculated as:
  `output = Why * h_t + by`

* **Backward Pass (BPTT)**
  Gradients are propagated backward through time to update all weights and biases.

* **Input Example**
  Uses one-hot encoded vectors to represent a sequence.

## What This Demonstrates

* Memory through hidden states
* Step-by-step sequence processing
* Manual forward and backward propagation
* Gradient-based learning




In [7]:
import numpy as np


class RNN:
  def __init__(self, input_size, output_size, hidden_size=64):
    self.input_size = input_size
    self.output_size = output_size
    self.hidden_size = hidden_size

    # Innitialize Weights
    self.Wxh = np.random.randn(hidden_size, hidden_size) * 0.01
    self.Whh = np.random.randn(hidden_size, hidden_size) * 0.01
    self.Why = np.random.randn(output_size, hidden_size) * 0.01 # Corrected dimension

    # Innitalize Biases
    self.bh = np.zeros((hidden_size, 1)) # Corrected dimension
    self.by = np.zeros((output_size, 1)) # Corrected dimension


  def forward(self, inputs):
    h_prev = np.zeros((self.hidden_size, 1))
    self.last_inputs = inputs
    self.last_hs = {0: h_prev}


    # Forward pass
    for i, x in enumerate(inputs):
      x = np.reshape(x, (self.input_size, 1))
      h = np.tanh(np.dot(self.Wxh, x) + np.dot(self.Whh, h_prev) + self.bh)
      self.last_hs[i+1] = h
      h_prev = h

    # Compute the output
    output = np.dot(self.Why, h_prev) + self.by
    return output, h_prev

  def backward(self, d_y, learning_rate=0.01): # Moved inside the class and added d_y as parameter
    n = len(self.last_inputs)
    d_Why = np.zeros_like(self.Why)
    d_Whh = np.zeros_like(self.Whh)
    d_Wxh = np.zeros_like(self.Wxh)
    d_bh = np.zeros_like(self.bh)
    d_by = np.zeros_like(self.by)
    d_h = np.dot(self.Why.T, d_y) # Initial gradient from output layer

    for t in reversed(range(n)):
      temp = self.last_hs[t+1]
      d_Why += np.dot(d_y, temp.T)
      d_h_raw = (1 - self.last_hs[t+1] ** 2) * d_h # Gradient through tanh

      d_bh += d_h_raw
      d_Wxh += np.dot(d_h_raw, self.last_inputs[t].reshape(-1, 1).T) # Corrected input shape for dot product
      d_Whh += np.dot(d_h_raw, self.last_hs[t].T)
      d_h = np.dot(self.Wxh.T, d_h_raw) + np.dot(self.Whh.T, d_h_raw)

    # Update Weights and Biases
    self.Why -= learning_rate * d_Why
    self.Wxh -= learning_rate * d_Wxh
    self.Whh -= learning_rate * d_Whh
    self.bh -= learning_rate * d_bh
    self.by -= learning_rate * d_by


# Example sequence (One hot encoded input)
input = [np.array([1,0,0]), np.array([0,1,0]), np.array([0,0,1])]
rnn = RNN(input_size=3, hidden_size=3, output_size=2)

# Forward Pass
output, hidden_state = rnn.forward(input)
print(output)

# Backward Pass
d_y = np.array([[0],[1]])
rnn.backward(d_y)

[[ 7.23036265e-05]
 [-2.94302126e-05]]


# Conclusion

* The RNN successfully processed the input sequence and produced an **output vector**:

```
[[ 7.23036265e-05]
 [-2.94302126e-05]]
```

* This output shows the network’s initial response to the given sequence.
* With **training and multiple iterations**, the RNN can learn patterns in sequential data and produce meaningful predictions.
* The experiment demonstrates the **forward and backward propagation** of an RNN and how hidden states carry information across time steps.

**Key takeaway:** Even a simple RNN from scratch can model temporal dependencies and serve as a foundation for understanding more advanced sequence models like LSTMs or GRUs.
