<a href="https://colab.research.google.com/github/Metallicode/Math/blob/main/Intro_To_RNN.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#Intro To RNN

**RNNs Basics Terms:**


**Recurrent Neural Networks (RNNs)**: A class of neural networks designed to recognize patterns in sequences of data, such as text, genomes, handwriting, or spoken words. It's called "recurrent" because they perform the same operation for every element of a sequence, with the output depending on previous computations (a loop).

**Sequence to Sequence**
- **Sequence-to-Sequence (Seq2Seq)**: An architecture used mainly for translation tasks or any task that maps an input sequence to an output sequence. It comprises two main parts: an encoder (reads the input and compresses the information into a context vector) and a decoder (reads the context vector to produce the output).

**Sequence to Vector**
- This usually refers to the encoder part of the Seq2Seq model, where an input sequence is transformed into a fixed-size context vector.

**Vanishing Gradients**
- As RNNs are trained, earlier layers may receive very tiny updates to their weights, essentially halting learning. This is caused by the multiplication of many small values (gradients) during backpropagation, which makes the gradients vanish.
- The opposite problem, exploding gradients, occurs when gradients become too large, making the network weights update too aggressively.

**Various Activation Functions**
- **ReLU (Rectified Linear Unit)**: It outputs the input for positive values and zero for negative values. It’s simple and fast, but can sometimes "die" during training.
- **Leaky ReLU**: A variant of ReLU that has a small positive slope even for negative values, preventing units from "dying".
- **ELU (Exponential Linear Unit)**: Like ReLU but tends to converge more rapidly and produces better accuracy. For negative inputs, it takes on an exponential curve.

**Batch Normalization**
- A method to improve training speed and reduce the sensitivity to initialization. It normalizes each input/activation to have a mean of zero and variance of one, using the mean and variance of the current batch's data.

**Weights Initialization**
- Proper initialization ensures that activations and gradients don’t vanish or explode during initial epochs. Popular techniques include Xavier/Glorot and He initialization.

**Gradient Clipping**
- A technique used to combat the exploding gradients problem. It involves manually setting a threshold value, and any gradient value that exceeds this threshold is set to the threshold.

This summary provides a basic understanding of RNNs and related concepts. However, the field is vast and evolving, and further reading will give you a deeper understanding of each topic.

In [1]:
import numpy as np
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import SimpleRNN, Dense

In [12]:
# Generating random data: 10000 sequences of length 5
X = np.random.rand(10000, 5, 1)  # 10000 samples, each of length 5 and having 1 feature
y = np.sum(X, axis=1) #for each sample (of 5 elements) sum the values

# Simple RNN model
model = Sequential()

# Add an RNN layer with 10 units
model.add(SimpleRNN(10, input_shape=(5, 1)))

# Output layer to predict the sum (regression problem)
model.add(Dense(1, activation='linear'))

# Compile the model
model.compile(optimizer='adam', loss='mse')

# Train the model
model.fit(X, y, epochs=10, batch_size=32)



Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
[[1.5042027]]


In [None]:
# Testing the model
X_new = np.array([[[0.1], [0.2], [0.3], [0.4], [0.5]]])  # Should be close to 1.5
print(model.predict(X_new))

In [13]:
np.sum(X_new)

1.5

###explanation

In [2]:
X = np.random.rand(10000, 5, 1)  # 10000 samples, each of length 5 and having 1 feature
y = np.sum(X, axis=1)

In [8]:
y.shape

(10000, 1)

In [9]:
X[0]

array([[0.82023843],
       [0.57754691],
       [0.0155293 ],
       [0.27080635],
       [0.92298352]])

In [10]:
y[0]

array([2.60710452])

In [11]:
np.sum(X[0])

2.6071045154346812

##The Problem With Simple RNN's

The difficulty in capturing long-term dependencies in vanilla RNNs is often not evident in short sequences like the one used in the example. However, when you deal with longer sequences, the problem becomes more pronounced.

To illustrate this, let's modify the simple example:

- Instead of summing the entire sequence, we will create a task where the target is influenced by the first and the last elements of the sequence.

- We'll increase the sequence length significantly.

Here's an illustrative example:

**Task**: Given a sequence, the target is the sum of the first and the last element.

In [14]:
import numpy as np
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import SimpleRNN, Dense

In [15]:

# Generating random data: 10000 sequences of length 30
sequence_length = 30
X = np.random.rand(10000, sequence_length, 1)  # 10000 samples
y = X[:, 0] + X[:, -1]

# Simple RNN model
model = Sequential()

# Add an RNN layer with 10 units
model.add(SimpleRNN(10, input_shape=(sequence_length, 1)))

# Output layer to predict the sum (regression problem)
model.add(Dense(1, activation='linear'))

# Compile the model
model.compile(optimizer='adam', loss='mse')

# Train the model
model.fit(X, y, epochs=50, batch_size=32)


Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50


<keras.src.callbacks.History at 0x7a0fa840a200>

In [16]:
# Testing the model
X_new = np.random.rand(1, sequence_length, 1)
true_sum = X_new[0, 0] + X_new[0, -1]
predicted_sum = model.predict(X_new)

print(f"True sum: {true_sum[0]}")
print(f"Predicted sum: {predicted_sum[0][0]}")

True sum: 1.1793697607097735
Predicted sum: 1.1744807958602905
