# Introduction to Simple Recurrent Neural Network (RNN) for Regression

* This notebook demonstrates the implementation of a basic Recurrent Neural Network (RNN) using Python and NumPy for regression tasks. The RNN model is designed to predict outputs based on input sequences and corresponding labels. While this example focuses on regression, the configuration can be easily adapted for classification tasks and other purposes.

In [26]:
import numpy as np

## Data Generation

- `sequence_length = 10`: Sets the length of input sequences.
- `X_train = np.random.rand(100, sequence_length)`: Generates 100 random input sequences of length `sequence_length`.
- `y_train = np.sum(X_train, axis=1).reshape(-1, 1)`: Generates labels by summing each input sequence and reshaping to a column vector.


In [27]:
# Define the sequence length
sequence_length = 10

# Generate random input sequences
X_train = np.random.rand(100, sequence_length)

# Generate labels (example: sum of the input sequence)
y_train = np.sum(X_train, axis=1).reshape(-1, 1)

## Model Initialization

- `input_size`, `hidden_size`, `output_size`: Define the sizes of input, hidden, and output layers.
- Weight and bias initialization:
  - `Wxh`, `Whh`, `Why`: Weight matrices from input to hidden, hidden to hidden, and hidden to output layers respectively.
  - `bh`, `by`: Bias vectors for the hidden and output layers.


In [28]:
input_size = sequence_length
hidden_size = 64
output_size = 1
# Initialize weights and biases
Wxh = np.random.randn(input_size, hidden_size) * 0.01  # Input to hidden
Whh = np.random.randn(hidden_size, hidden_size) * 0.01  # Hidden to hidden
Why = np.random.randn(hidden_size, output_size) * 0.01  # Hidden to output
bh = np.zeros((1, hidden_size))  # Hidden bias
by = np.zeros((1, output_size))  # Output bias
print(Wxh.shape,Whh.shape,Why.shape,bh.shape,by.shape)

(10, 64) (64, 64) (64, 1) (1, 64) (1, 1)


3. Activation Functions
* tanh(x): Defines the hyperbolic tangent activation function.
* dtanh(x): Computes the derivative of the hyperbolic tangent function.

In [29]:
# Activation function (tanh)
def tanh(x):
    return np.tanh(x)
# Derivative of tanh
def dtanh(x):
    return 1 - np.square(np.tanh(x))


4. Forward Pass Function (forward_pass)
* Inputs: inputs (input sequences), targets (labels), h_prev (previous hidden state).
* Variables Initialization: xs, hs, ys, ps dictionaries to store input vectors, hidden states, pre-activation outputs, and final predictions.
* Forward Pass Loop: Iterates through each time step of the input sequences to compute hidden states and predictions.
* Loss Calculation: Computes the mean squared error loss between predictions (ps) and targets (targets).
* Returns: Loss, predictions (ps), final hidden state (hs), and input vectors (xs).

In [30]:
def forward_pass(inputs, targets, h_prev):
    # Initialize variables to store inputs, hidden states, and outputs
    xs, hs, ys, ps = {}, {}, {}, {}
    hs[-1] = np.copy(h_prev)

    loss = 0

    # Forward pass
    for t in range(len(inputs)):
        xs[t] = inputs[t].reshape(1, -1)
        hs[t] = tanh(np.dot(xs[t], Wxh) + np.dot(hs[t-1], Whh) + bh)
        ys[t] = np.dot(hs[t], Why) + by
        ps[t] = ys[t]  # Output without activation for simplicity

        # Compute loss (example: mean squared error)
        loss += np.mean(np.square(ps[t] - targets[t]))

    return loss / len(inputs), ps, hs, xs  # Return xs along with other outputs


# Training Loop Details

1. **Initialization**:
   - `learning_rate = 0.01`: Sets the learning rate for gradient descent optimization.
   - `num_epochs = 1000`: Defines the number of training epochs.

2. **Training Loop**:
   - Iterates through each epoch (`for epoch in range(num_epochs)`) to train the model.
   - Initializes the hidden state (`h_prev`) before each epoch.

3. **Forward Pass**:
   - Calls the `forward_pass` function to compute loss, predictions (`ps`), hidden states (`hs`), and input vectors (`xs`).

4. **Backpropagation Through Time (BPTT)**:
   - Computes gradients (`dWxh`, `dWhh`, `dWhy`, `dbh`, `dby`) for weight matrices and bias vectors using backpropagation through time (BPTT).
   - Accumulates gradients across time steps (`for t in reversed(range(sequence_length))`) and updates the gradients for each weight and bias.

5. **Weight and Bias Updates**:
   - Updates weights and biases (`Wxh`, `Whh`, `Why`, `bh`, `by`) using the computed gradients and learning rate.

6. **Monitoring Progress**:
   - Prints the loss every 100 epochs to monitor training progress (`if epoch % 100 == 0`).

This training loop implements the training process for a recurrent neural network (RNN) using backpropagation through time (BPTT) and stochastic gradient descent (SGD) optimization. It iteratively updates the model's weights and biases to minimize the loss over the training data.


In [31]:
learning_rate = 0.01
num_epochs = 100

# Training loop
for epoch in range(num_epochs):
    # Initialize hidden state
    h_prev = np.zeros((1, hidden_size))

    # Forward pass
    loss, ps, hs, xs = forward_pass(X_train, y_train, h_prev)  # Retrieve xs

    # Backpropagation through time (BPTT)
    dWxh, dWhh, dWhy = np.zeros_like(Wxh), np.zeros_like(Whh), np.zeros_like(Why)
    dbh, dby = np.zeros_like(bh), np.zeros_like(by)
    dh_next = np.zeros_like(h_prev)

    # Backward pass through time
    for t in reversed(range(sequence_length)):
        # Compute gradients for output layer
        dy = 2 * (ps[t] - y_train[t])  # Example: derivative of MSE loss
        dWhy += np.dot(hs[t].T, dy)
        dby += dy

        # Backpropagate gradients to previous time step
        dh = np.dot(dy, Why.T) + dh_next
        dhraw = dtanh(hs[t]) * dh
        dbh += dhraw
        dWxh += np.dot(xs[t].T, dhraw)
        dWhh += np.dot(hs[t-1].T, dhraw)
        dh_next = np.dot(dhraw, Whh.T)

    # Update weights and biases using gradients and learning rate
    Wxh -= learning_rate * dWxh
    Whh -= learning_rate * dWhh
    Why -= learning_rate * dWhy
    bh -= learning_rate * dbh
    by -= learning_rate * dby

    # Print loss every 100 epochs
    if epoch % 10 == 0:
        print(f'Epoch {epoch}, Loss: {loss}')


Epoch 0, Loss: 25.234006131569632
Epoch 10, Loss: 0.5730656976535442
Epoch 20, Loss: 0.5423560123253064
Epoch 30, Loss: 0.5102393562194583
Epoch 40, Loss: 0.47868200127174476
Epoch 50, Loss: 0.45161525097558886
Epoch 60, Loss: 0.4305648025442177
Epoch 70, Loss: 0.414982118508304
Epoch 80, Loss: 0.40356544889374446
Epoch 90, Loss: 0.3952657903344646



### Conclusion
* This RecurrentNeural network architecture consists of an input layer, a hidden layer with tanh activation, and an output layer for regression. It uses backpropagation through time (BPTT) with mean squared error loss for training.

* In the previous example, we utilized RNNs for regression tasks. There are numerous real-life scenarios where RNNs are applied for regression purposes.
* for example :

  1. **Energy Consumption Prediction:**
    - **Scenario**: Use historical energy usage data and weather conditions (temperature, humidity) to predict future energy consumption.

  2. **Sales Forecasting for Retail Stores:**
    - **Scenario**: Utilize historical sales data, promotional activities, and seasonal factors to forecast future sales in retail stores.
* Feel free to experiment with different architectures, activation functions, and hyperparameters to further explore neural network training.
