In [18]:
import models.sd_gru as sd_gru
import models.min_gru as min_gru


In [19]:
import tensorflow as tf
import numpy as np
import time
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.utils import to_categorical

In [20]:
doc='''Recurrent neural networks (RNNs) are deep learning models, typically used to solve problems with sequential input data such as time series. What are they, and how do we use them in time series forecasting?

RNNs are a type of neural network that retains a memory of what it has already processed and thus can learn from previous iterations during its training.

Probably you have done what most of us do when we hear any technical term for the first time. You have tried to understand what recurrent neural networks are by clicking on the top-listed non-ad Google search result. Then you will have found that Wikipedia’s article exhibits a high level of abstraction. It is of limited usefulness when we try to understand what RNNs are and what they are for: "A recurrent neural network (RNN) is a class of artificial neural networks where connections between nodes form a directed graph along a temporal sequence. This allows it to exhibit temporal dynamic behavior. Derived from feedforward neural networks, RNNs can use their internal state (memory) to process variable length sequences of inputs …. Recurrent neural networks are theoretically Turing complete and can run arbitrary programs to process arbitrary sequences of inputs." Say what?

Michael Phi provided an excellent, non-mathematical guide on RNNs in a previous Towards Data Science article of his: "Illustrated Guide to Recurrent Neural Networks | by Michael Phi | Towards Data Science". So did Will Koehrsen, in "Recurrent Neural Networks by Example in Python | by Will Koehrsen | Towards Data Science."

Let me summarize the basics we should understand about RNNs, in non-mathematical terms (and then I’d refer you to the additional explanations and illustrations in the two articles Michael and Will wrote in 2018).

A neural network – of which recurrent neural networks are one type, among other types such as convolutional networks – is composed of three elementary components: the input layer, the hidden layers, and the output layer. Each layer consists of so-called nodes (aka neurons).

I’ve read the following analogy for the three main types of neural networks, which are said to mimic human brain functions in specific ways. The following comparisons oversimplify, so best take them with a grain of salt.

the temporal lobe of our brain => artificial neural networks => mainly for classification and regression problems => one of the functions of the temporal lobe is long-term memory
the occipital lobe => convolutional neural networks => mainly for computer vision problems (though temporal convolutional networks, TCNs, can be applied to time series)
the frontal lobe => recurrent neural networks RNN => mainly for time series analysis, sequences, and lists – for instance, in language processing, which deals with sequences of characters, words, and sentences ordered by a grammar; or time series, which consist of temporal sequences of observations => one of the frontal lobe’s functions is short-term memory
Feed-forward neural networks (FFNNs) – such as the grandfather among neural networks, the original single-layer perceptron, developed in 1958— came before recurrent neural networks. In FFNNs, the information flows in only one direction: from the input layer, through the hidden layers, to the output layer, but never backwards in feedback loops. FFNN are often used in pattern recognition. The FFNN multiplies a matrix of weight factors with the inputs and generates the outputs from these weighted inputs. Feed-forward neural networks don’t retain a memory of the inputs they have processed. They suffer from anterograde amnesia, the inability to form new memories (similar to the protagonist in Christopher Nolan’s movie Memento – Wikipedia [this seemed a rare opportunity to mention anterograde amnesia and Memento in a data science article]).

A recurrent neural network, by contrast, retains a memory of what it has processed in its recent previous steps (we’ll come back to the "recent" qualifier in a minute). It makes recurrent connections by going through temporal feedback loops: the output of a preceding step is used as an input for the current process step. Unlike amnesiac FFNNs, this memory enables RNNs to process sequences of inputs without loosing track. The loops make it a recurrent network.'''

In [21]:
# Tokenize words
tokenizer = Tokenizer()
tokenizer.fit_on_texts([doc])
vocab_size = len(tokenizer.word_index) + 1  # +1 for padding/indexing
sequences = tokenizer.texts_to_sequences([doc])[0]

# Create input-output pairs for training
X_train, Y_train = [], []
seq_length = 3  # Number of words to predict the next word

for i in range(len(sequences) - seq_length):
    X_train.append(sequences[i:i+seq_length])
    Y_train.append(sequences[i+seq_length])

X_train = np.array(X_train)
Y_train = to_categorical(Y_train, num_classes=vocab_size)  # Convert output to one-hot


In [22]:
# model = sd_gru.train_gru_model(X_train, Y_train, vocab_size,epochs=500)
# word = sd_gru.predict_next_word_gru(model, tokenizer, "recurrent neural networks")
# print("Predicted:", word)

In [23]:
# min_gru_model, emb_layer = min_gru.train_min_gru_parallel(X_train, Y_train, vocab_size)

In [24]:
# input_text = "recurrent neural networks"
# predicted = min_gru.predict_min_gru_parallel(min_gru_model, tokenizer, input_text, emb_layer, hidden_size=10)
# print(f"Input: '{input_text}' → Predicted next word: '{predicted}'")

In [26]:

# === Training Standard GRU ===
start_time = time.time()
std_model = sd_gru.train_gru_model(X_train, Y_train, vocab_size,verbose=False,epochs=500)
std_time = time.time() - start_time
print(f"[Standard GRU] Training time: {std_time:.2f} seconds")

# === Training MinGRU ===
start_time = time.time()
min_model, embed_layer = min_gru.train_min_gru_parallel(X_train, Y_train, vocab_size,verbose=False,epochs=500)
min_time = time.time() - start_time
print(f"[MinGRU] Training time: {min_time:.2f} seconds")


TypeError: train_gru_model() got an unexpected keyword argument 'verbose'