$\textbf{Recurrent Neural Networks}$



$\textit{Elman Network}$

The Elman Network has memory neurons, instead of using lags of the input variables.  It is like the Moving Average process in traditional econometrics.

Here is the setup:

$\begin{align*}
\text{Hidden State (}h_t\text{):} \quad & h_t = \sigma(W_h \cdot h_{t-1} + W_x \cdot x_t + b_h)
\end{align*}$

$\textit{Long Short-Term Memory Network}$

The LSTM (Long Short-Term Memory) network is a type of recurrent neural network (RNN) that is often used for time series forecasting. LSTM networks are designed to capture long-term dependencies and patterns in sequential data. Here is a set of equations that describe the LSTM network for time series forecasting:

$\begin{align*}
\text{Input Gate (}i_t\text{):} \quad & i_t = \sigma(W_i \cdot [h_{t-1}, x_t] + b_i) \\
\text{Forget Gate (}f_t\text{):} \quad & f_t = \sigma(W_f \cdot [h_{t-1}, x_t] + b_f) \\
\text{Output Gate (}o_t\text{):} \quad & o_t = \sigma(W_o \cdot [h_{t-1}, x_t] + b_o) \\
\text{Cell State (}C_t\text{):} \quad & C_t = f_t \cdot C_{t-1} + i_t \cdot \tanh(W_c \cdot [h_{t-1}, x_t] + b_c) \\
\text{Hidden State (}h_t\text{):} \quad & h_t = o_t \cdot \tanh(C_t)
\end{align*}$




Where:

- $i_t$ is the input gate's output, controlling how much new information should be added to the cell state.
- $f_t$ is the forget gate's output, controlling how much of the previous cell state should be forgotten.
- $o_t$ is the output gate's output, determining the output based on the cell state.
- $C_t$ is the updated cell state.
- $h_t$ is the hidden state, which is also the output of the LSTM at time step t.
- $x_t$ is the input at time step t.
- $W_i, W_f, W_o,  W_c$ are weight matrices for the input gate, forget gate, output gate, and cell state, respectively.
- $b_i, b_f, b_o, b_c$  are bias vectors for the input gate, forget gate, output gate, and cell state, respectively.
- $[h_(t-1), x_t]$ denotes the concatenation of the previous hidden state and the current input.

- sigmoid is the sigmoid activation function, and tanh is the hyperbolic tangent activation function.
- To use this LSTM network for time series forecasting, you would typically provide historical time series data as input $x_t$ 
- Ttrain the network to predict future time steps. 
- During training, the network learns to adjust its parameters $(W_i, W_f, W_o, W_c, b_i, b_f, b_o, b_c)$ to minimize the prediction error. 
- Once trained, you can use the network to make forecasts by providing a sequence of historical data and letting the LSTM predict the future values.


$\textit{Gated Recurrent Network}$

The GRU has the following setup:

$\begin{align*}
\text{Update Gate (}z_t\text{):} \quad & z_t = \sigma(W_z \cdot [h_{t-1}, x_t]) \\
\text{Reset Gate (}r_t\text{):} \quad & r_t = \sigma(W_r \cdot [h_{t-1}, x_t]) \\
\text{Candidate Hidden State (}h_t'\text{):} \quad & h_t' = \tanh(W_h \cdot [r_t \odot h_{t-1}, x_t]) \\
\text{Hidden State (}h_t\text{):} \quad & h_t = (1 - z_t) \odot h_{t-1} + z_t \odot h_t'
\end{align*}$

 

- $z_t$  is the update gate 
- $r_t$ is the reset gate
- $h_t'$ is the candidate hidden state
- $\sigma$ is the sigmoid activation function
- $\odot$  represents element-wise multiplication

- The GRU is another type of RNN that combines elements of the LSTM and the Elman RNN. 
- It has two gates, an update gate and a reset gate, and a hidden state:
















$\textit{Elman Network}$

In [10]:
import numpy as np
from keras.models import Sequential
from keras.layers import SimpleRNN, Dense

# Generate random data for training
n_samples = 1000
n_timesteps = 11
n_features = 4

X = np.random.rand(n_samples, n_timesteps, n_features)
y = np.random.rand(n_samples, n_timesteps)

# Create a GRU network
model = Sequential()
#model.add(GRU(4, activation='tanh', input_shape=(n_timesteps, n_features)))
#model.add(Dense(1))  # Output layer
# Create an Elman Recurrent Network
model = Sequential()
model.add(SimpleRNN(units=4, activation='tanh', input_shape=(n_timesteps, n_features), return_sequences=False))
model.add(Dense(units=1))  # Output layer

model.compile(optimizer='adam', loss='mean_squared_error')

# Train the model
model.fit(X, y, epochs=100, batch_size=32)

# Generate new data for prediction (you can replace this with your own test data)
X_test = np.random.rand(10, n_timesteps, n_features)

# Make predictions
predictions = model.predict(X_test)

print("Predictions:", predictions)


Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78

In [7]:
print(predictions.shape)

(10, 1)


$\textit{LSTM Network}$

In [8]:
import numpy as np
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense

# Generate random data
n_samples = 1000
n_timesteps = 1
n_features = 4

X = np.random.rand(n_samples, n_timesteps, n_features)
y = np.random.rand(n_samples, n_timesteps)


model = Sequential()
model.add(LSTM(4, activation='tanh', input_shape=(n_timesteps, n_features)))
model.add(Dense(1))  # Output layer

model.compile(optimizer='adam', loss='mean_squared_error')

# Train the model
model.fit(X, y, epochs=100, batch_size=1)

# Generate new data for prediction (1 sample with 10 time steps and 4 features)
X_test = np.random.rand(10, n_timesteps, n_features)

# Make predictions
predictions = model.predict(X_test)

print("Predicted Values for the test data:")
print(predictions)


Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78

Epoch 98/100
Epoch 99/100
Epoch 100/100
Predicted Values for the test data:
[[0.3741139 ]
 [0.42571205]
 [0.50168145]
 [0.5089371 ]
 [0.5051675 ]
 [0.48736456]
 [0.48349017]
 [0.47100073]
 [0.4787231 ]
 [0.4876212 ]]


$\textit{Gated Recurrent Unit}$

In [9]:
import numpy as np
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import GRU, Dense

# Generate random data
n_samples = 1000
n_timesteps = 1
n_features = 4

X = np.random.rand(n_samples, n_timesteps, n_features)
y = np.random.rand(n_samples, n_timesteps)


model = Sequential()
model.add(GRU(4, activation='tanh', input_shape=(n_timesteps, n_features)))
model.add(Dense(1))  # Output layer

model.compile(optimizer='adam', loss='mean_squared_error')

# Train the model
model.fit(X, y, epochs=100, batch_size=1)

# Generate new data for prediction (1 sample with 10 time steps and 4 features)
X_test = np.random.rand(10, n_timesteps, n_features)

# Make predictions
predictions = model.predict(X_test)

print("Predicted Values for the test data:")
print(predictions)


Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78

Epoch 98/100
Epoch 99/100
Epoch 100/100
Predicted Values for the test data:
[[0.5184412 ]
 [0.4816203 ]
 [0.48103875]
 [0.51708287]
 [0.50191474]
 [0.46081322]
 [0.4661432 ]
 [0.55040187]
 [0.45851198]
 [0.4704014 ]]
