In [1]:
import tensorflow as tf
import numpy as np

In [2]:
doc='''Recurrent neural networks (RNNs) are deep learning models, typically used to solve problems with sequential input data such as time series. What are they, and how do we use them in time series forecasting?

RNNs are a type of neural network that retains a memory of what it has already processed and thus can learn from previous iterations during its training.

Probably you have done what most of us do when we hear any technical term for the first time. You have tried to understand what recurrent neural networks are by clicking on the top-listed non-ad Google search result. Then you will have found that Wikipedia’s article exhibits a high level of abstraction. It is of limited usefulness when we try to understand what RNNs are and what they are for: "A recurrent neural network (RNN) is a class of artificial neural networks where connections between nodes form a directed graph along a temporal sequence. This allows it to exhibit temporal dynamic behavior. Derived from feedforward neural networks, RNNs can use their internal state (memory) to process variable length sequences of inputs …. Recurrent neural networks are theoretically Turing complete and can run arbitrary programs to process arbitrary sequences of inputs." Say what?

Michael Phi provided an excellent, non-mathematical guide on RNNs in a previous Towards Data Science article of his: "Illustrated Guide to Recurrent Neural Networks | by Michael Phi | Towards Data Science". So did Will Koehrsen, in "Recurrent Neural Networks by Example in Python | by Will Koehrsen | Towards Data Science."

Let me summarize the basics we should understand about RNNs, in non-mathematical terms (and then I’d refer you to the additional explanations and illustrations in the two articles Michael and Will wrote in 2018).

A neural network – of which recurrent neural networks are one type, among other types such as convolutional networks – is composed of three elementary components: the input layer, the hidden layers, and the output layer. Each layer consists of so-called nodes (aka neurons).

I’ve read the following analogy for the three main types of neural networks, which are said to mimic human brain functions in specific ways. The following comparisons oversimplify, so best take them with a grain of salt.

the temporal lobe of our brain => artificial neural networks => mainly for classification and regression problems => one of the functions of the temporal lobe is long-term memory
the occipital lobe => convolutional neural networks => mainly for computer vision problems (though temporal convolutional networks, TCNs, can be applied to time series)
the frontal lobe => recurrent neural networks RNN => mainly for time series analysis, sequences, and lists – for instance, in language processing, which deals with sequences of characters, words, and sentences ordered by a grammar; or time series, which consist of temporal sequences of observations => one of the frontal lobe’s functions is short-term memory
Feed-forward neural networks (FFNNs) – such as the grandfather among neural networks, the original single-layer perceptron, developed in 1958— came before recurrent neural networks. In FFNNs, the information flows in only one direction: from the input layer, through the hidden layers, to the output layer, but never backwards in feedback loops. FFNN are often used in pattern recognition. The FFNN multiplies a matrix of weight factors with the inputs and generates the outputs from these weighted inputs. Feed-forward neural networks don’t retain a memory of the inputs they have processed. They suffer from anterograde amnesia, the inability to form new memories (similar to the protagonist in Christopher Nolan’s movie Memento – Wikipedia [this seemed a rare opportunity to mention anterograde amnesia and Memento in a data science article]).

A recurrent neural network, by contrast, retains a memory of what it has processed in its recent previous steps (we’ll come back to the "recent" qualifier in a minute). It makes recurrent connections by going through temporal feedback loops: the output of a preceding step is used as an input for the current process step. Unlike amnesiac FFNNs, this memory enables RNNs to process sequences of inputs without loosing track. The loops make it a recurrent network.'''

## data preprocessing

## coding LSTM

In [3]:
import numpy as np
import tensorflow as tf
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.utils import to_categorical


In [4]:
# Tokenize words
tokenizer = Tokenizer()
tokenizer.fit_on_texts([doc])
vocab_size = len(tokenizer.word_index) + 1  # +1 for padding/indexing
sequences = tokenizer.texts_to_sequences([doc])[0]

# Create input-output pairs for training
X_train, Y_train = [], []
seq_length = 3  # Number of words to predict the next word

for i in range(len(sequences) - seq_length):
    X_train.append(sequences[i:i+seq_length])
    Y_train.append(sequences[i+seq_length])

X_train = np.array(X_train)
Y_train = to_categorical(Y_train, num_classes=vocab_size)  # Convert output to one-hot


In [17]:
X_train

array([[  9,   3,   5],
       [  3,   5,  13],
       [  5,  13,  10],
       ...,
       [ 60, 298,  18],
       [298,  18,   4],
       [ 18,   4,   9]])

In [18]:
Y_train

array([[0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       ...,
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.]])

In [5]:
X_train.shape,Y_train.shape,vocab_size

((679, 3), (679, 299), 299)

In [7]:

# Define LSTM cell
class LSTMCell(tf.keras.layers.Layer):
    def __init__(self, hidden_size, vocab_size):
        super().__init__()
        self.hidden_size = hidden_size
        self.dense_gate = tf.keras.layers.Dense(hidden_size * 4, activation=None, use_bias=True)
        self.dense_output = tf.keras.layers.Dense(vocab_size, activation="softmax")  # Softmax for word prediction

    def call(self, x, h_prev, c_prev):
        concat_input = tf.concat([x, h_prev], axis=1)  # Keep axis=1 for batch processing
        gates = self.dense_gate(concat_input)

        f_gate, i_gate, o_gate, g_gate = tf.split(gates, num_or_size_splits=4, axis=-1)

        f_gate = tf.sigmoid(f_gate)
        i_gate = tf.sigmoid(i_gate)
        o_gate = tf.sigmoid(o_gate)
        g_gate = tf.tanh(g_gate)

        c_next = f_gate * c_prev + i_gate * g_gate
        h_next = o_gate * tf.tanh(c_next)
        y = self.dense_output(h_next)

        return h_next, c_next, y




In [22]:
# Training function for a custom LSTMCell model
def train(X, Y, hidden_size=10, epochs=500):
    
    # Initialize the Adam optimizer
    optimizer = tf.keras.optimizers.Adam(learning_rate=0.01)

    # Create an instance of your custom LSTM cell (defined earlier)
    lstm_cell = LSTMCell(hidden_size, vocab_size)

    # Define the word embedding layer that converts token indices to dense vectors
    embedding_dim = 8  # Size of word embeddings
    embedding_layer = tf.keras.layers.Embedding(input_dim=vocab_size, output_dim=embedding_dim, mask_zero=True)

    # Training loop
    for epoch in range(epochs):
        with tf.GradientTape() as tape:
            # Initialize hidden and cell states to zeros for each batch
            h_prev = tf.zeros((X.shape[0], hidden_size), dtype=tf.float32)  # shape: [batch_size, hidden_size]
            c_prev = tf.zeros((X.shape[0], hidden_size), dtype=tf.float32)
            loss = 0  # initialize loss

            # Embed input word indices into dense vectors
            embedded_X = embedding_layer(X)  # shape: [batch_size, seq_len, embedding_dim]

            # Loop through the time steps (i.e., sequence length)
            for t in range(seq_length):
                x_t = embedded_X[:, t, :]  # extract embedding for t-th word in sequence
                h_prev, c_prev, y_t = lstm_cell(x_t, h_prev, c_prev)  # run LSTM cell step

            # Compute categorical crossentropy loss between predicted distribution and ground truth
            loss = tf.reduce_mean(tf.keras.losses.categorical_crossentropy(Y, y_t))  # loss on final output

        # Compute gradients of loss w.r.t. LSTM cell and embedding layer parameters
        gradients = tape.gradient(loss, lstm_cell.trainable_variables + embedding_layer.trainable_variables)

        # Apply gradients to update the weights
        optimizer.apply_gradients(zip(gradients, lstm_cell.trainable_variables + embedding_layer.trainable_variables))

        # Optionally print loss every 10 epochs
        if epoch % 10 == 0:
            print(f"Epoch {epoch}, Loss: {loss.numpy()}")

        # On the last epoch, return the trained LSTM cell and embedding layer
        if epoch == epochs - 1:
            return lstm_cell, embedding_layer


In [21]:
model,emb_layer=train(X_train, Y_train)

Epoch 0, Loss: 5.70045280456543
Epoch 10, Loss: 5.559117794036865
Epoch 20, Loss: 5.1533942222595215
Epoch 30, Loss: 4.954916477203369
Epoch 40, Loss: 4.762920379638672
Epoch 50, Loss: 4.494144439697266
Epoch 60, Loss: 4.206137180328369
Epoch 70, Loss: 3.912172555923462
Epoch 80, Loss: 3.6127102375030518
Epoch 90, Loss: 3.299605131149292
Epoch 100, Loss: 2.9815399646759033
Epoch 110, Loss: 2.6761324405670166
Epoch 120, Loss: 2.393909215927124
Epoch 130, Loss: 2.1423416137695312
Epoch 140, Loss: 1.9190465211868286
Epoch 150, Loss: 1.7267868518829346
Epoch 160, Loss: 1.5622824430465698
Epoch 170, Loss: 1.419525384902954
Epoch 180, Loss: 1.2973625659942627
Epoch 190, Loss: 1.1918106079101562
Epoch 200, Loss: 1.1000354290008545
Epoch 210, Loss: 1.0176762342453003
Epoch 220, Loss: 0.9413213729858398
Epoch 230, Loss: 0.8683884143829346
Epoch 240, Loss: 0.8026047348976135
Epoch 250, Loss: 0.7456746101379395
Epoch 260, Loss: 0.6946704387664795
Epoch 270, Loss: 0.6503537893295288
Epoch 280, Los

In [10]:
# emb=emb_layer

In [11]:
def predict_next_word(model, tokenizer, input_text, hidden_size):
    """
    Predicts the next word given an input sequence.

    Parameters:
        model: The trained LSTMCell model.
        tokenizer: The tokenizer used for text preprocessing.
        input_text: A string containing the input sequence.
        hidden_size: The size of the hidden state of LSTM.

    Returns:
        Predicted next word as a string.
    """
    # Convert input text to sequence
    sequence = tokenizer.texts_to_sequences([input_text])[0]

    # Ensure sequence length matches training input length
    seq_length = 3  # Same as in training
    if len(sequence) < seq_length:
        print("Input sequence is too short!")
        return None
    sequence = sequence[-seq_length:]  # Take the last seq_length words i.e only 3

    # Convert to NumPy array
    X_input = np.array([sequence])

    # Initialize hidden and cell states
    h_prev = tf.zeros((1, hidden_size), dtype=tf.float32)
    c_prev = tf.zeros((1, hidden_size), dtype=tf.float32)

    # Get word embeddings
    embedded_X = emb_layer(X_input)

    # Pass through LSTM cell
    for t in range(seq_length):
        x_t = embedded_X[:, t, :]
        h_prev, c_prev, y_t = model(x_t, h_prev, c_prev)

    # Get predicted word index
    predicted_index = tf.argmax(y_t, axis=1).numpy()[0]

    # Convert index to word
    predicted_word = tokenizer.index_word.get(predicted_index, "<UNK>")

    return predicted_word



In [14]:
# Example usage
input_text = " we should understand about RNNs"
predicted_word = predict_next_word(model, tokenizer, input_text, hidden_size=10)
print(f"Predicted next word: {predicted_word}")


Predicted next word: in


In [102]:
'''Recurrent neural networks (RNNs) are deep learning models, typically used to solve problems with sequential input data such as time series. What are they, and how do we use them in time series forecasting?

RNNs are a type of neural network that retains a memory of what it has already processed and thus can learn from previous iterations during its training.

Probably you have done what most of us do when we hear any technical term for the first time. You have tried to understand what recurrent neural networks are by clicking on the top-listed non-ad Google search result. Then you will have found that Wikipedia’s article exhibits a high level of abstraction. It is of limited usefulness when we try to understand what RNNs are and what they are for: "A recurrent neural network (RNN) is a class of artificial neural networks where connections between nodes form a directed graph along a temporal sequence. This allows it to exhibit temporal dynamic behavior. Derived from feedforward neural networks, RNNs can use their internal state (memory) to process variable length sequences of inputs …. Recurrent neural networks are theoretically Turing complete and can run arbitrary programs to process arbitrary sequences of inputs." Say what?

Michael Phi provided an excellent, non-mathematical guide on RNNs in a previous Towards Data Science article of his: "Illustrated Guide to Recurrent Neural Networks | by Michael Phi | Towards Data Science". So did Will Koehrsen, in "Recurrent Neural Networks by Example in Python | by Will Koehrsen | Towards Data Science."

Let me summarize the basics we should understand about RNNs, in non-mathematical terms (and then I’d refer you to the additional explanations and illustrations in the two articles Michael and Will wrote in 2018).

A neural network – of which recurrent neural networks are one type, among other types such as convolutional networks – is composed of three elementary components: the input layer, the hidden layers, and the output layer. Each layer consists of so-called nodes (aka neurons).

I’ve read the following analogy for the three main types of neural networks, which are said to mimic human brain functions in specific ways. The following comparisons oversimplify, so best take them with a grain of salt.

the temporal lobe of our brain => artificial neural networks => mainly for classification and regression problems => one of the functions of the temporal lobe is long-term memory
the occipital lobe => convolutional neural networks => mainly for computer vision problems (though temporal convolutional networks, TCNs, can be applied to time series)
the frontal lobe => recurrent neural networks RNN => mainly for time series analysis, sequences, and lists – for instance, in language processing, which deals with sequences of characters, words, and sentences ordered by a grammar; or time series, which consist of temporal sequences of observations => one of the frontal lobe’s functions is short-term memory
Feed-forward neural networks (FFNNs) – such as the grandfather among neural networks, the original single-layer perceptron, developed in 1958— came before recurrent neural networks. In FFNNs, the information flows in only one direction: from the input layer, through the hidden layers, to the output layer, but never backwards in feedback loops. FFNN are often used in pattern recognition. The FFNN multiplies a matrix of weight factors with the inputs and generates the outputs from these weighted inputs. Feed-forward neural networks don’t retain a memory of the inputs they have processed. They suffer from anterograde amnesia, the inability to form new memories (similar to the protagonist in Christopher Nolan’s movie Memento – Wikipedia [this seemed a rare opportunity to mention anterograde amnesia and Memento in a data science article]).

A recurrent neural network, by contrast, retains a memory of what it has processed in its recent previous steps (we’ll come back to the "recent" qualifier in a minute). It makes recurrent connections by going through temporal feedback loops: the output of a preceding step is used as an input for the current process step. Unlike amnesiac FFNNs, this memory enables RNNs to process sequences of inputs without loosing track. The loops make it a recurrent network.'''

'Recurrent neural networks (RNNs) are deep learning models, typically used to solve problems with sequential input data such as time series. What are they, and how do we use them in time series forecasting?\n\nRNNs are a type of neural network that retains a memory of what it has already processed and thus can learn from previous iterations during its training.\n\nProbably you have done what most of us do when we hear any technical term for the first time. You have tried to understand what recurrent neural networks are by clicking on the top-listed non-ad Google search result. Then you will have found that Wikipedia’s article exhibits a high level of abstraction. It is of limited usefulness when we try to understand what RNNs are and what they are for: "A recurrent neural network (RNN) is a class of artificial neural networks where connections between nodes form a directed graph along a temporal sequence. This allows it to exhibit temporal dynamic behavior. Derived from feedforward neu