<a href="https://colab.research.google.com/github/Festuskipkoech/Festus_data-science/blob/main/RNN.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Core Concepts of Recurrent Neural Networks (RNNs)

## 1. **Introduction to RNNs**
- RNNs are a class of neural networks designed to process sequential data.
- They are ideal for tasks where context or temporal dependencies are important, such as time series prediction, language modeling, and speech recognition.

## 2. **Key Features of RNNs**
- **Sequence Handling**:
  - RNNs can handle variable-length input (a type of RNN designed to handle input sequences of varying lengths) sequences by processing data one step at a time.
- **Recurrent Connections**:
  - Outputs from previous time steps are fed back as inputs, allowing the network to maintain a memory of past information.
- **Shared Weights**:
  - The same weights are applied across all time steps, enabling consistent learning over sequences.

## 3. **Architecture of RNNs**
### a. **Basic RNN Cell**
- **Input**: The current input vector (\(x_t\)) and the hidden state (is a critical component that enables the network to capture temporal dependencies and maintain a context or “memory” of previous inputs)from the previous time step (\(h_{t-1}\)).
- **Hidden State Update**:
  \[
  h_t = \tanh(W_h \cdot h_{t-1} + W_x \cdot x_t + b)
  \]
  - \(W_h\): Weights for the hidden state.
  - \(W_x\): Weights for the input.
  - \(b\): Bias term.
- **Output**:
  \[
  y_t = W_y \cdot h_t
  \]

### b. **Limitations of Basic RNNs**
- **Vanishing Gradient Problem**:
  - Gradients (it’s a phenomenon that occurs during the training of RNNs, particularly when dealing with long sequences) can diminish over long sequences, making it difficult to learn long-term dependencies.
- **Exploding Gradient Problem**:
  - Gradients can grow excessively large, causing instability during training.

## 4. **Improved RNN Variants**
### a. **Long Short-Term Memory (LSTM)**
- Addresses the vanishing gradient problem by introducing memory cells and gates:
  - **Forget Gate**: Decides which information to discard from the cell state.
  - **Input Gate**: Determines what new information to store.
  - **Output Gate**: Controls the information sent to the next hidden state.
- Equation for cell state update:
  \[
  C_t = f_t \cdot C_{t-1} + i_t \cdot \tilde{C}_t
  \]

### b. **Gated Recurrent Unit (GRU)**
- A simpler alternative to LSTMs with fewer parameters.
- Combines the forget and input gates into a single **update gate**.
- Equation for hidden state update:
  \[
  h_t = z_t \cdot h_{t-1} + (1 - z_t) \cdot \tilde{h}_t
  \]

## 5. **Training RNNs**
### a. **Loss Function (Loss Function: Measures the difference between predicted and actual labels.)**
- Commonly used loss functions depend on the task:
  - Mean Squared Error (MSE) for regression.
  - Cross-Entropy Loss for classification.

### b. **Backpropagation Through Time (BPTT)**
- Extends standard backpropagation to handle sequences by unrolling the RNN through time.
- Computes gradients for all time steps and updates weights.

## 6. **Applications of RNNs**
- **Natural Language Processing (NLP)**:
  - Sentiment analysis, text generation, machine translation.
- **Speech Recognition**:
  - Transcribing spoken words into text.
- **Time Series Forecasting**:
  - Predicting stock prices, weather, or other temporal data.
- **Video Analysis**:
  - Action recognition in videos.

## 7. **Practical Tips**
- Use LSTMs or GRUs for tasks requiring long-term dependencies.
- Regularize with techniques like dropout to prevent overfitting.
- Experiment with bidirectional RNNs for tasks requiring both past and future context (e.g., text translation).
- Monitor gradients to mitigate vanishing or exploding gradients using gradient clipping.

## 8. **Advanced Concepts**
- **Bidirectional RNNs**:
  - Process the sequence in both forward and backward directions to capture complete context.
- **Attention Mechanisms**:
  - Focus on relevant parts of the sequence, enhancing performance for tasks like machine translation.
- **Sequence-to-Sequence Models**:
  - Used in applications like chatbot development and language translation.


In [2]:
from tensorflow.keras.datasets import imdb
from tensorflow.keras.preprocessing import sequence
from tensorflow.keras import layers, models

# load the dataset and preprocess the data
max_features =1000
maxlen =500
(x_train, y_train), (x_test, y_test) = imdb.load_data(num_words=max_features)
x_train =sequence.pad_sequences(x_train, maxlen=maxlen)
x_test =sequence.pad_sequences(x_test, maxlen=maxlen)

# Build the model
model = models.Sequential()
model.add(layers.Embedding(max_features, 32))
model.add(layers.SimpleRNN(32))
model.add(layers.Dense(1, activation='sigmoid'))
model.summary()


In [3]:
# compile and train the model
model.compile(optimizer='rmsprop', loss='binary_crossentropy', metrics=['accuracy'])
model.fit(x_train, y_train, epochs=10, batch_size=128, validation_split=0.2)

Epoch 1/10
[1m157/157[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m31s[0m 188ms/step - accuracy: 0.5922 - loss: 0.6635 - val_accuracy: 0.7796 - val_loss: 0.4805
Epoch 2/10
[1m157/157[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m43s[0m 205ms/step - accuracy: 0.7671 - loss: 0.4982 - val_accuracy: 0.6454 - val_loss: 0.6761
Epoch 3/10
[1m157/157[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m30s[0m 191ms/step - accuracy: 0.7996 - loss: 0.4469 - val_accuracy: 0.7510 - val_loss: 0.5046
Epoch 4/10
[1m157/157[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m42s[0m 198ms/step - accuracy: 0.8205 - loss: 0.4127 - val_accuracy: 0.8178 - val_loss: 0.4344
Epoch 5/10
[1m157/157[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m39s[0m 185ms/step - accuracy: 0.8373 - loss: 0.3793 - val_accuracy: 0.8264 - val_loss: 0.4133
Epoch 6/10
[1m157/157[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m41s[0m 187ms/step - accuracy: 0.8190 - loss: 0.4028 - val_accuracy: 0.7982 - val_loss: 0.4481
Epoch 7/10

<keras.src.callbacks.history.History at 0x7c593d23dae0>

In [4]:
# evaluate the model
test_loss, test_acc =model.evaluate(x_test, y_test)
print('Test accuracy', test_acc)

[1m782/782[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m24s[0m 31ms/step - accuracy: 0.7877 - loss: 0.4868
Test accuracy 0.7874799966812134


In [5]:
# load the dataset and preprocess the data
max_features =10000
maxlen =500
(x_train, y_train), (x_test, y_test) = imdb.load_data(num_words=max_features)
x_train =sequence.pad_sequences(x_train, maxlen=maxlen)
x_test =sequence.pad_sequences(x_test, maxlen=maxlen)

# Build the model
model = models.Sequential()
model.add(layers.Embedding(max_features, 32))
model.add(layers.LSTM(32, dropout=0.2, recurrent_dropout=0.2))
model.add(layers.Dense(1, activation='sigmoid'))
model.summary()

# compile the model
model.compile(optimizer='rmsprop', loss='binary_crossentropy', metrics=['Accuracy'])
model.fit(x_train, y_train, epochs=10, batch_size=128, validation_split=0.2)

# evaluate the model
test_lost, test_acc =model.evaluate(x_test, y_test)
print("Test Accuracy", test_acc)


Epoch 1/10
[1m157/157[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m65s[0m 398ms/step - Accuracy: 0.5606 - loss: 0.6824 - val_Accuracy: 0.7542 - val_loss: 0.5412
Epoch 2/10
[1m157/157[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m61s[0m 386ms/step - Accuracy: 0.7723 - loss: 0.5022 - val_Accuracy: 0.8366 - val_loss: 0.3837
Epoch 3/10
[1m157/157[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m64s[0m 407ms/step - Accuracy: 0.8387 - loss: 0.3874 - val_Accuracy: 0.8434 - val_loss: 0.3660
Epoch 4/10
[1m157/157[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m80s[0m 393ms/step - Accuracy: 0.8582 - loss: 0.3492 - val_Accuracy: 0.8370 - val_loss: 0.3757
Epoch 5/10
[1m157/157[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m61s[0m 390ms/step - Accuracy: 0.8701 - loss: 0.3200 - val_Accuracy: 0.8530 - val_loss: 0.3573
Epoch 6/10
[1m157/157[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m63s[0m 400ms/step - Accuracy: 0.8852 - loss: 0.2901 - val_Accuracy: 0.8426 - val_loss: 0.3687
Epoch 7/10