# Implement a Simple RNN with Backpropagation Through Time (BPTT)

## Task: Implement a Simple RNN with Backpropagation Through Time (BPTT)

Your task is to implement a simple Recurrent Neural Network (RNN) and backpropagation through time (BPTT) to learn from sequential data. The RNN will process input sequences, update hidden states, and perform backpropagation to adjust weights based on the error gradient.

Write a class `SimpleRNN` with the following methods:

- `__init__(self, input_size, hidden_size, output_size)`: Initializes the RNN with random weights and zero biases.
- `forward(self, x)`: Processes a sequence of inputs and returns the hidden states and output.
- `backward(self, x, y, learning_rate)`: Performs backpropagation through time (BPTT) to adjust the weights based on the loss.

In this task, the RNN will be trained on sequence prediction, where the network will learn to predict the next item in a sequence. You should use MSE as the loss function.

Example

```python
import numpy as np

# Define sequence and labels
input_sequence = np.array([[1.0], [2.0], [3.0], [4.0]])
expected_output = np.array([[2.0], [3.0], [4.0], [5.0]])

# Initialize RNN
rnn = SimpleRNN(input_size=1, hidden_size=5, output_size=1)

# Forward pass
output = rnn.forward(input_sequence)

# Backward pass
rnn.backward(input_sequence, expected_output, learning_rate=0.01)

print(output)

# The output should show the RNN predictions for each step of the input sequence.
```
  
## Understanding Recurrent Neural Networks (RNNs) and Backpropagation Through Time (BPTT)

Recurrent Neural Networks (RNNs) are designed to handle sequential data by maintaining a hidden state that captures information from previous inputs. They are particularly useful for tasks where context or sequential order is important, such as language modeling, time series forecasting, and sequence prediction.

## RNN Architecture

An RNN processes inputs one at a time while maintaining a hidden state that gets updated at each time step. The core equations governing the forward pass of an RNN are:

**Hidden State Update:**
 
$$h_t = \tanh(W_{xh}x_t + W_{hh}h_{t-1} + b_h)$$
 
**Output Computation:**
 
$$y_t = W_{hy}h_t + b_y$$
 
Where:

- $x_t$ is the input at time step $t$.
- $h_t$ is the hidden state at time step $t$.
- $W_{xh}$ is the weight matrix for input to hidden state.
- $W_{hh}$ is the weight matrix for hidden state to hidden state.
- $W_{hy}$ is the weight matrix for hidden state to output.
- $b_h$ and $b_y$ are the bias terms for the hidden state and output, respectively.
- $\tanh$ is the hyperbolic tangent activation function applied element-wise.

## Forward Pass Implementation

In the forward pass, we iterate over each element in the input sequence, updating the hidden state and computing the output:

1. Initialize the hidden state $h_0$ to zeros.

2. For each time step $t$:
   - Compute $h_t = \tanh(W_{xh}x_t + W_{hh}h_{t-1} + b_h)$.
   - Compute $y_t = W_{hy}h_t + b_y$.
   - Store $x_t$, $h_t$, and $y_t$ for use in backpropagation.

## Loss Function

The loss function measures the discrepancy between the predicted outputs and the actual target values. For sequence prediction tasks, we often use the Mean Squared Error (MSE) loss:

$$\text{Loss} = \frac{1}{T} \sum_{t=1}^{T} (\hat{y}_t - y_t)^2$$ 
 
Where $T$ is the length of the sequence, $\hat{y}_t$ is the predicted output, and $y_t$ is the actual target at time step $t$.

## Backpropagation Through Time (BPTT)

BPTT is the process of training RNNs by unrolling them through time and applying backpropagation to compute gradients for each time step. The key steps in BPTT are:

1. Compute the gradient of the loss with respect to the outputs:

$$\frac{d\text{L}}{dy_t} = \hat{y}_t - y_t$$

2. Compute the gradients for the output layer weights and biases:
 
$$dW_{hy} += \frac{d\text{L}}{dy_t} \cdot h_t^T$$
$$db_y += \frac{d\text{L}}{dy_t}$$
 
3. Backpropagate the gradients through the hidden layers:
 
$$dh_t = W_{hy}^T \cdot \frac{d\text{L}}{dy_t} + dh_{t+1}$$
$$dh_{raw} = dh_t \circ (1 - h_t^2)$$
 
Where $\circ$ denotes element-wise multiplication, and $(1 - h_t^2)$ is the derivative of the $\tanh$ activation function.

4. Compute the gradients for the hidden layer weights and biases:
 
$$dW_{xh} += dh_{raw} \cdot x_t^T$$
$$dW_{hh} += dh_{raw} \cdot h_{t-1}^T$$
$$db_h += dh_{raw}$$

We repeat steps 1-4 for each time step in reverse order (from $T$ to 1), accumulating the gradients. The term $dh_{t+1}$ represents the gradient flowing from the next time step, initialized to zeros at the last time step.

## Updating Weights

After computing the gradients, we update the weights and biases using gradient descent:

$$W_{xh} -= \text{learning\_rate} \times dW_{xh}$$

$$W_{hh} -= \text{learning\_rate} \times dW_{hh}$$

$$W_{hy} -= \text{learning\_rate} \times dW_{hy}$$

$$b_h -= \text{learning\_rate} \times db_h$$

$$b_y -= \text{learning\_rate} \times db_y$$
 
## Implementing the RNN

To implement the RNN with BPTT, follow these steps:

1. **Initialization**: Initialize the weight matrices $W_{xh}$, $W_{hh}$, and $W_{hy}$ with small random values and biases $b_h$ and $b_y$ with zeros.
2. **Forward Pass**: Implement the forward method to process the input sequence, updating the hidden states and computing the outputs at each time step. Store the inputs, hidden states, and outputs for use in backpropagation.
3. **Backward Pass**: Implement the backward method to perform BPTT. Compute the gradients at each time step in reverse order, accumulate them, and update the weights and biases.
4. **Training Loop**: Train the RNN over multiple epochs by repeatedly performing forward and backward passes and updating the weights.

## Tips for Implementation

- **Gradient Clipping**: To prevent exploding gradients, consider applying gradient clipping, which scales down gradients if they exceed a certain threshold.
- **Learning Rate**: Choose an appropriate learning rate. If the learning rate is too high, the training may become unstable.
- **Debugging**: Check the dimensions of all matrices and vectors to ensure they align correctly during matrix multiplication.
- **Testing**: Start with small sequences and hidden sizes to test your implementation before scaling up.

## Example Calculation

Suppose we have an input sequence $x = [x_1, x_2]$ and target sequence $y = [y_1, y_2]$. Here's how you would compute the forward and backward passes:

1. Forward Pass:

   - At $t=1$:
     - Compute $h_1 = \tanh(W_{xh}x_1 + W_{hh}h_0 + b_h)$.
     - Compute $y_1 = W_{hy}h_1 + b_y$.
   - At $t=2$:
     - Compute $h_2 = \tanh(W_{xh}x_2 + W_{hh}h_1 + b_h)$.
     - Compute $y_2 = W_{hy}h_2 + b_y$.
     
2. Compute Loss: $\text{Loss} = \frac{1}{2} \sum_{t=1}^{2} (\hat{y}_t - y_t)^2$.
3. Backward Pass: Starting from $t=2$ to $t=1$:
   
   - At $t=2$:
     - Compute $d\text{L}/d\hat{y}_2 = \hat{y}_2 - y_2$.
     - Compute gradients $dW_{hy}$, $db_y$, $dh_2$.
   - At $t=1$:
     - Compute $d\text{L}/d\hat{y}_1 = \hat{y}_1 - y_1$.
     - Compute gradients $dW_{hy}$, $db_y$, $dh_1$, including the $dh_2$ term from the next time step.
     
4. Update Weights: Adjust the weights and biases using the accumulated gradients.

## Conclusion

By understanding the forward and backward passes in RNNs and how to compute and update gradients through BPTT, you can implement an RNN from scratch. This foundational knowledge is crucial for working with more advanced neural network architectures and understanding deep learning frameworks.

In [1]:
import numpy as np

class SimpleRNN:
    def __init__(self, input_size, hidden_size, output_size):
        self.hidden_size = hidden_size
        self.W_xh = np.random.randn(hidden_size, input_size) * 0.01
        self.W_hh = np.random.randn(hidden_size, hidden_size) * 0.01
        self.W_hy = np.random.randn(output_size, hidden_size) * 0.01
        self.b_h = np.zeros((hidden_size, 1))
        self.b_y = np.zeros((output_size, 1))

    def forward(self, x):
        h = np.zeros((self.hidden_size, 1))  # Initialize hidden state
        outputs = []
        self.last_inputs = []
        self.last_hiddens = [h]
        
        for t in range(len(x)):
            self.last_inputs.append(x[t].reshape(-1, 1))
            h = np.tanh(np.dot(self.W_xh, self.last_inputs[t]) + np.dot(self.W_hh, h) + self.b_h)
            y = np.dot(self.W_hy, h) + self.b_y
            outputs.append(y)
            self.last_hiddens.append(h)
        
        self.last_outputs = outputs
        return np.array(outputs)

    def backward(self, x, y, learning_rate):
        dW_xh = np.zeros_like(self.W_xh)
        dW_hh = np.zeros_like(self.W_hh)
        dW_hy = np.zeros_like(self.W_hy)
        db_h = np.zeros_like(self.b_h)
        db_y = np.zeros_like(self.b_y)

        dh_next = np.zeros((self.hidden_size, 1))

        for t in reversed(range(len(x))):
            dy = self.last_outputs[t] - y[t].reshape(-1, 1)  # (Predicted - Actual)
            dW_hy += np.dot(dy, self.last_hiddens[t+1].T)
            db_y += dy

            dh = np.dot(self.W_hy.T, dy) + dh_next
            dh_raw = (1 - self.last_hiddens[t+1] ** 2) * dh  # Derivative of tanh

            dW_xh += np.dot(dh_raw, self.last_inputs[t].T)
            dW_hh += np.dot(dh_raw, self.last_hiddens[t].T)
            db_h += dh_raw

            dh_next = np.dot(self.W_hh.T, dh_raw)

        # Update weights and biases
        self.W_xh -= learning_rate * dW_xh
        self.W_hh -= learning_rate * dW_hh
        self.W_hy -= learning_rate * dW_hy
        self.b_h -= learning_rate * db_h
        self.b_y -= learning_rate * db_y


In [2]:
import numpy as np
np.random.seed(42)
input_sequence = np.array([[1.0], [2.0], [3.0], [4.0]])
expected_output = np.array([[2.0], [3.0], [4.0], [5.0]])
rnn = SimpleRNN(input_size=1, hidden_size=5, output_size=1)
# Train the RNN over multiple epochs
for epoch in range(100):
    output = rnn.forward(input_sequence)
    rnn.backward(input_sequence, expected_output, learning_rate=0.01)
print('Test Case 1: Accepted') if np.allclose(output, np.array([[[2.24143915]],[[3.18450265]],[[4.04305928]],[[4.57419398]]])) else print('Test Case 1: Failed')
print('Input:')
print('import numpy as np\nnp.random.seed(42)\ninput_sequence = np.array([[1.0], [2.0], [3.0], [4.0]])\nexpected_output = np.array([[2.0], [3.0], [4.0], [5.0]])\nrnn = SimpleRNN(input_size=1, hidden_size=5, output_size=1)\n# Train the RNN over multiple epochs\nfor epoch in range(100):\n    output = rnn.forward(input_sequence)\n    rnn.backward(input_sequence, expected_output, learning_rate=0.01)\nprint(output)')
print()
print('Output:')
print(output)
print()
print('Expected:')
print('[[[2.24143915]],[[3.18450265]],[[4.04305928]],[[4.57419398]]]')
print()
print()
print()



import numpy as np
np.random.seed(42)
input_sequence = np.array([[1.0,2.0], [7.0,2.0], [1.0,3.0], [12.0,4.0]])
expected_output = np.array([[2.0], [3.0], [4.0], [5.0]])
rnn = SimpleRNN(input_size=2, hidden_size=3, output_size=1)
# Train the RNN over multiple epochs
for epoch in range(100):
    output = rnn.forward(input_sequence)
    rnn.backward(input_sequence, expected_output, learning_rate=0.01)
print('Test Case 2: Accepted') if np.allclose(output, np.array([[[2.42201379]],[[3.44167595]],[[3.6129965 ]],[[4.50660152]]])) else print('Test Case 2: Failed')
print('Input:')
print('import numpy as np\nnp.random.seed(42)\ninput_sequence = np.array([[1.0,2.0], [7.0,2.0], [1.0,3.0], [12.0,4.0]])\nexpected_output = np.array([[2.0], [3.0], [4.0], [5.0]])\nrnn = SimpleRNN(input_size=2, hidden_size=3, output_size=1)\n# Train the RNN over multiple epochs\nfor epoch in range(100):\n    output = rnn.forward(input_sequence)\n    rnn.backward(input_sequence, expected_output, learning_rate=0.01)\nprint(output)')
print()
print('Output:')
print(output)
print()
print('Expected:')
print('[[[2.42201379]],[[3.44167595]],[[3.6129965 ]],[[4.50660152]]]')
print()
print()
print()



import numpy as np
np.random.seed(42)
input_sequence = np.array([[1.0,2.0], [7.0,2.0], [1.0,3.0], [12.0,4.0]])
expected_output = np.array([[2.0,1.0], [3.0,7.0], [4.0,8.0], [5.0,10.0]])
rnn = SimpleRNN(input_size=2, hidden_size=10, output_size=2)
# Train the RNN over multiple epochs
for epoch in range(50):
    output = rnn.forward(input_sequence)
    rnn.backward(input_sequence, expected_output, learning_rate=0.01)
print('Test Case 3: Accepted') if np.allclose(output, np.array([[[3.28424506],[5.93532247]],[[3.60393582],[6.82013468]],[[3.52586543],[6.58278163]],[[3.61336207],[6.84916339]]])) else print('Test Case 3: Failed')
print('Input:')
print('import numpy as np\nnp.random.seed(42)\ninput_sequence = np.array([[1.0,2.0], [7.0,2.0], [1.0,3.0], [12.0,4.0]])\nexpected_output = np.array([[2.0,1.0], [3.0,7.0], [4.0,8.0], [5.0,10.0]])\nrnn = SimpleRNN(input_size=2, hidden_size=10, output_size=2)\n# Train the RNN over multiple epochs\nfor epoch in range(50):\n    output = rnn.forward(input_sequence)\n    rnn.backward(input_sequence, expected_output, learning_rate=0.01)\nprint(output)')
print()
print('Output:')
print(output)
print()
print('Expected:')
print('[[[3.28424506],[5.93532247]],[[3.60393582],[6.82013468]],[[3.52586543],[6.58278163]],[[3.61336207],[6.84916339]]]')

Test Case 1: Accepted
Input:
import numpy as np
np.random.seed(42)
input_sequence = np.array([[1.0], [2.0], [3.0], [4.0]])
expected_output = np.array([[2.0], [3.0], [4.0], [5.0]])
rnn = SimpleRNN(input_size=1, hidden_size=5, output_size=1)
# Train the RNN over multiple epochs
for epoch in range(100):
    output = rnn.forward(input_sequence)
    rnn.backward(input_sequence, expected_output, learning_rate=0.01)
print(output)

Output:
[[[2.24143915]]

 [[3.18450265]]

 [[4.04305928]]

 [[4.57419398]]]

Expected:
[[[2.24143915]],[[3.18450265]],[[4.04305928]],[[4.57419398]]]



Test Case 2: Accepted
Input:
import numpy as np
np.random.seed(42)
input_sequence = np.array([[1.0,2.0], [7.0,2.0], [1.0,3.0], [12.0,4.0]])
expected_output = np.array([[2.0], [3.0], [4.0], [5.0]])
rnn = SimpleRNN(input_size=2, hidden_size=3, output_size=1)
# Train the RNN over multiple epochs
for epoch in range(100):
    output = rnn.forward(input_sequence)
    rnn.backward(input_sequence, expected_output, learning_r