<a href="https://colab.research.google.com/github/Cliffochi/aviva_data_science_course/blob/main/Recurrent_Neural_Network.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

To implement a basic Recurrent Neural Network (RNN) from scratch, we'll create a ScratchSimpleRNNClassifier class, with its core component being the SimpleRNN layer.

We'll structure it to only support forward propagation first (with minimal libraries like numpy) and validate the output using the provided test example.

---

#### 1. Create the `SimpleRNN` class

This class handles forward propagation. It takes:

* input `x` of shape `(batch_size, n_sequences, n_features)`
* initial state `h` of shape `(batch_size, n_nodes)`
* weights `W_x` and `W_h`, bias `B`

It computes each time step using:

$$
a_t = x_t \cdot W_x + h_{t-1} \cdot W_h + B
$$

$$
h_t = \tanh(a_t)
$$

#### 2. Create the `ScratchSimpleRNNClassifier` class

This wraps the RNN layer and adds a simple output layer with sigmoid or softmax (for classification).

---

In [1]:
# Forward Propagation Only

import numpy as np

class SimpleRNN:
    def __init__(self, n_features, n_nodes):
        self.n_features = n_features
        self.n_nodes = n_nodes
        # Weight initialization
        self.W_x = np.random.randn(n_features, n_nodes) * 0.01
        self.W_h = np.random.randn(n_nodes, n_nodes) * 0.01
        self.B = np.zeros(n_nodes)

    def forward(self, x):
        """
        x: shape (batch_size, n_sequences, n_features)
        returns:
        h: hidden state at final time step, shape (batch_size, n_nodes)
        """
        batch_size, n_sequences, _ = x.shape
        h = np.zeros((batch_size, self.n_nodes))
        self.hs = []  # Save intermediate h for backpropagation if needed
        for t in range(n_sequences):
            x_t = x[:, t, :]  # shape: (batch_size, n_features)
            a_t = np.dot(x_t, self.W_x) + np.dot(h, self.W_h) + self.B
            h = np.tanh(a_t)
            self.hs.append(h)
        return h  # Return final h_t

---
#### Test Case from the Sprint

In [2]:
# Provided small-scale inputs
x = np.array([[[1, 2], [2, 3], [3, 4]]]) / 100  # (1, 3, 2)
w_x = np.array([[1, 3, 5, 7], [3, 5, 7, 8]]) / 100  # (2, 4)
w_h = np.array([[1, 3, 5, 7],
                [2, 4, 6, 8],
                [3, 5, 7, 8],
                [4, 6, 8, 10]]) / 100  # (4, 4)
b = np.array([1, 1, 1, 1])  # (4,)

# Override weights for testing
rnn = SimpleRNN(n_features=2, n_nodes=4)
rnn.W_x = w_x
rnn.W_h = w_h
rnn.B = b

# Forward pass
h = rnn.forward(x)
print("h =", h)

h = [[0.79494228 0.81839002 0.83939649 0.85584174]]


####Implementing **backward propagation** for the `SimpleRNN` class step by step.

---

#### Overview of RNN Backward Propagation

Given:

* Forward equations:

  $$
  a_t = x_t \cdot W_x + h_{t-1} \cdot W_h + B
  $$

  $$
  h_t = \tanh(a_t)
  $$

* Backward derivatives:

  $$
  \frac{\partial L}{\partial a_t} = \frac{\partial L}{\partial h_t} \cdot (1 - \tanh^2(a_t))
  $$

  $$
  \frac{\partial L}{\partial W_x} += x_t^T \cdot \frac{\partial L}{\partial a_t}
  $$

  $$
  \frac{\partial L}{\partial W_h} += h_{t-1}^T \cdot \frac{\partial L}{\partial a_t}
  $$

  $$
  \frac{\partial L}{\partial B} += \frac{\partial L}{\partial a_t}
  $$

  $$
  \frac{\partial L}{\partial h_{t-1}} = \frac{\partial L}{\partial a_t} \cdot W_h^T
  $$

---

#### Updated `SimpleRNN` Class with Backward Pass

In [8]:
import numpy as np

class SimpleRNN:
    def __init__(self, n_features, n_nodes):
        self.n_features = n_features
        self.n_nodes = n_nodes
        self.W_x = np.random.randn(n_features, n_nodes) * 0.01
        self.W_h = np.random.randn(n_nodes, n_nodes) * 0.01
        self.B = np.zeros(n_nodes, dtype=np.float64) # Initialize with float dtype

    def forward(self, x):
        self.x = x
        batch_size, n_sequences, _ = x.shape
        self.h_list = []
        self.a_list = []
        h = np.zeros((batch_size, self.n_nodes))
        self.h_list.append(h.copy())  # h_0

        for t in range(n_sequences):
            x_t = x[:, t, :]
            a_t = np.dot(x_t, self.W_x) + np.dot(h, self.W_h) + self.B
            h = np.tanh(a_t)
            self.a_list.append(a_t)
            self.h_list.append(h)
        return h  # return final state

    def backward(self, dh_last, learning_rate=0.01):
        """
        dh_last: gradient of the loss w.r.t. h at the final time step (batch_size, n_nodes)
        """
        batch_size, n_sequences, _ = self.x.shape

        # Initialize gradients
        dW_x = np.zeros_like(self.W_x)
        dW_h = np.zeros_like(self.W_h)
        dB = np.zeros_like(self.B)
        dh_next = dh_last.copy()

        for t in reversed(range(n_sequences)):
            a_t = self.a_list[t]
            h_prev = self.h_list[t]
            x_t = self.x[:, t, :]

            # Derivative through tanh activation
            da_t = dh_next * (1 - np.tanh(a_t) ** 2)  # (batch_size, n_nodes)

            # Gradient accumulation
            dW_x += np.dot(x_t.T, da_t)  # (n_features, n_nodes)
            dW_h += np.dot(h_prev.T, da_t)  # (n_nodes, n_nodes)
            dB += np.sum(da_t, axis=0)

            # Update for next step
            dh_next = np.dot(da_t, self.W_h.T)  # to pass to h_{t-1}

        # Gradient descent update
        self.W_x -= learning_rate * dW_x
        self.W_h -= learning_rate * dW_h
        self.B -= learning_rate * dB

---

#### Example: Using `forward()` and `backward()`

In [None]:
# Inputs
x = np.array([[[1, 2], [2, 3], [3, 4]]]) / 100
rnn = SimpleRNN(n_features=2, n_nodes=4)

# Overriding weights and bias for testability
rnn.W_x = np.array([[1, 3, 5, 7], [3, 5, 7, 8]]) / 100
rnn.W_h = np.array([[1, 3, 5, 7],
                    [2, 4, 6, 8],
                    [3, 5, 7, 8],
                    [4, 6, 8, 10]]) / 100
rnn.B = np.array([1, 1, 1, 1])

# Forward pass
h = rnn.forward(x)

# Assume dummy gradient from a loss (normally from output layer)
dh_last = np.ones_like(h)

# Backward pass
rnn.backward(dh_last, learning_rate=0.1)


#### Next Steps

---
* Add a fully connected output layer with cross-entropy/sigmoid for training.
* Create a full `ScratchSimpleRNNClassifier` that trains on toy sequence classification datasets.
* Wrap this in a classifier with a softmax/sigmoid output layer for classification.
* Train on real datasets.