# Caution! Recommend Running the Script on GPU
## A100 reduces the runtime by 90% compared to Intel Xeon

Context:

We will go through a basic Long Short Term Memory (LSTM) model for time series and investigate the maximal efficacy of different sets of parameters

The broad overview of the steps will involve loading the data, preprocessing of the data, creation of the LSTM model, training of the model and finally testing of the model.

Task 1 - 0 points) Run the necessary libraries, which have been provided for you below

In [56]:
import matplotlib.pyplot as plt
import statsmodels.tsa.seasonal as smt
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import random
import datetime as dt
from sklearn import linear_model
from sklearn.metrics import mean_absolute_error
import plotly

# import the relevant Keras modules
from keras.models import Sequential
from keras.layers import Activation, Dense
from keras.layers import LSTM
from keras.layers import Dropout
from keras.layers import Input
from tensorflow.keras.preprocessing.text import Tokenizer
from keras.preprocessing.sequence import pad_sequences
from keras.layers import Embedding

import yfinance as yf
import plotly.graph_objects as go
from keras.callbacks import EarlyStopping
from tqdm import tqdm
from sklearn.metrics import mean_squared_error

Task 2 - 10 points) Load the data

Retrieve Close price of Google from yahoo finance from 2008-01-02 to 2024-12-30.

Refer to the instructions in homework 2, if necessary, for up to date instructions on how to retrieve such data for free under new Yahoo Finance API permissions.

In [2]:
ticker = 'GOOG'
start = '2008-01-02'
end = '2024-12-30'

data = yf.download(ticker, start=start, end=end, auto_adjust=False)

adj_close = data[['Adj Close']]

[*********************100%***********************]  1 of 1 completed


In [3]:
adj_close.isna().sum().sum()

np.int64(0)

Task 3 - 10 points) Write a function to visualise the data to make sure it has been succesfully imported

In [4]:
def time_series_visualizer(data: pd.DataFrame):
    assert "Adj Close" in data.columns

    fig = go.Figure()
    fig.add_trace(go.Scatter(x=data.index, y=data['Adj Close'],
                             mode='lines',
                             name='Adjusted Close'))

    fig.update_layout(title='GOOG Adjusted Close',
                      xaxis_title='Date',
                      yaxis_title='Price (USD)',
                      template='plotly_white',
                      height=600,
                      width=1000)

    fig.show()

In [5]:
data.columns = data.columns.droplevel(1)
data.head()

Price,Adj Close,Close,High,Low,Open,Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2008-01-02,16.985332,17.065783,17.369146,16.87998,17.257067,172921733
2008-01-03,16.988804,17.069269,17.107128,16.849842,17.067528,130587647
2008-01-04,16.286524,16.363665,16.960428,16.313852,16.928797,215195594
2008-01-07,16.094406,16.170637,16.495173,15.874249,16.287451,257096061
2008-01-08,15.658861,15.733029,16.437389,15.716092,16.264038,214364490


In [6]:
time_series_visualizer(data)

Context) Now we will create windows of default length 20 with the data imported and begin the process of constructing the model.

Task 4.1 - 5 points) Define the parameter for the length of your window

In [7]:
window_size = 20

Task 4.2 - 5 points) Create a data point which splits training and testing set

In [8]:
split = 0.8
split_date = data.index[int(split * len(data.index))]
print(split_date)

2021-08-04 00:00:00


Task 4.3 - 5 points) Create a window for training

In [9]:
# The return prediction will be done between 1 and 10 days ahead
train_returns_h = {h: [] for h in range(1, 11)}

train_data = data.loc[:split_date, 'Adj Close']
train_windows = []

for i in range(len(train_data) - window_size - 10):
    window = train_data[i:i+window_size].values
    train_windows.append(window)

    for h in range(1, 11):
        current_day = train_data[i + window_size - 1]
        future_day = train_data[i + window_size - 1 + h]
        daily_return = (future_day - current_day) / current_day
        train_returns_h[h].append(daily_return)


Series.__getitem__ treating keys as positions is deprecated. In a future version, integer keys will always be treated as labels (consistent with DataFrame behavior). To access a value by position, use `ser.iloc[pos]`


Series.__getitem__ treating keys as positions is deprecated. In a future version, integer keys will always be treated as labels (consistent with DataFrame behavior). To access a value by position, use `ser.iloc[pos]`



Task 4.4 - 5 points) Create a window for testing

In [10]:
# Same thing happens for the testing set
test_returns_h = {h: [] for h in range(1, 11)}

test_data = data.loc[split_date:, 'Adj Close']
test_windows = []

for i in range(len(test_data) - window_size - 10):
    window = test_data[i:i+window_size].values
    test_windows.append(window)

    for h in range(1, 11):
        current_day = test_data[i + window_size - 1]
        future_day = test_data[i + window_size - 1 + h]
        daily_return = (future_day - current_day) / current_day
        test_returns_h[h].append(daily_return)


Series.__getitem__ treating keys as positions is deprecated. In a future version, integer keys will always be treated as labels (consistent with DataFrame behavior). To access a value by position, use `ser.iloc[pos]`


Series.__getitem__ treating keys as positions is deprecated. In a future version, integer keys will always be treated as labels (consistent with DataFrame behavior). To access a value by position, use `ser.iloc[pos]`



Task 5 - 10 points) Create a function in which you execute the building of your LSTM model

Consider these inputs:

Activation function +
Loss function +
Dropout rate +
Optimizer +
nn layers/architecture

In [26]:
def build_lstm(input_shape, activation = 'tanh', loss_function = 'mse', dropout_rate = 0.2, optimizer = 'adam', lstm_units = [50, 50]):
    model = Sequential()
    model.add(Input(shape=input_shape))

    for i, units in enumerate(lstm_units):
        return_seq = i != len(lstm_units) - 1
        model.add(LSTM(units=units,
                       activation=activation,
                       return_sequences=return_seq))
        model.add(Dropout(dropout_rate))

    model.add(Dense(1))

    model.compile(loss=loss_function, optimizer=optimizer)
    return model

In [12]:
input_shape = (window_size, 1)
test = build_lstm(input_shape,
                   activation='tanh',
                   loss_function='mse',
                   dropout_rate=0.3,
                   optimizer='adam',
                   lstm_units=[50, 25])

test.summary()


Do not pass an `input_shape`/`input_dim` argument to a layer. When using Sequential models, prefer using an `Input(shape)` object as the first layer in the model instead.



Task 6 - 10 points) Considering epochs as a parameter to adjust, use epochs to set the stopping condition of the model by monitoring the loss at each step of the iteration.

In [13]:
def early_stopping(monitor = 'loss', patience = 5, restore_best_weights = True, min_delta=1e-5):
    return EarlyStopping(monitor=monitor,
                         patience=patience,
                         restore_best_weights=restore_best_weights,
                         min_delta=min_delta)

In [14]:
X_train = np.array(train_windows)
y_train = np.array(train_returns_h[1])
X_train = X_train.reshape((X_train.shape[0], X_train.shape[1], 1))

# Build
input_shape = (window_size, 1)
model = build_lstm(input_shape,
                   activation='tanh',
                   loss_function='mse',
                   dropout_rate=0.3,
                   optimizer='adam',
                   lstm_units=[50, 25])

# Define early stopping
callback = early_stopping(patience=5, min_delta=1e-5)

# Fit
history = model.fit(X_train, y_train,
                    epochs=50,
                    batch_size=32,
                    callbacks=[callback],
                    verbose=1)

Epoch 1/50
[1m106/106[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 6ms/step - loss: 0.0337
Epoch 2/50
[1m106/106[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 6ms/step - loss: 0.0023
Epoch 3/50
[1m106/106[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 6ms/step - loss: 0.0013
Epoch 4/50
[1m106/106[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 6ms/step - loss: 8.6429e-04
Epoch 5/50
[1m106/106[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 6ms/step - loss: 7.1023e-04
Epoch 6/50
[1m106/106[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 5ms/step - loss: 5.8432e-04
Epoch 7/50
[1m106/106[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 5ms/step - loss: 5.2027e-04
Epoch 8/50
[1m106/106[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 5ms/step - loss: 4.1830e-04
Epoch 9/50
[1m106/106[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 5ms/step - loss: 4.1608e-04
Epoch 10/50
[1m106/106[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m 

In [15]:
fig = go.Figure()
fig.add_trace(go.Scatter(y=history.history['loss'], mode='lines', name='Training Loss'))

fig.update_layout(title='Training Loss Over Epochs',
                  xaxis_title='Epoch',
                  yaxis_title='Loss (MSE)',
                  template='plotly_white',
                  height=500, width=900)

fig.show()

Task 7 - 10 points) Plot the performance of the model, in terms of accuracy, when predicting one time point ahead compared to multiple time points ahead.

In [27]:
loss_histories = {}
test_mse = {}

for h in tqdm(range(1, 11)):
    X_train = np.array(train_windows)
    y_train = np.array(train_returns_h[h])
    X_train = X_train.reshape((X_train.shape[0], X_train.shape[1], 1))

    X_test = np.array(test_windows)
    y_test = np.array(test_returns_h[h])
    X_test = X_test.reshape((X_test.shape[0], X_test.shape[1], 1))

    input_shape = (window_size, 1)
    model = build_lstm(input_shape=input_shape,
                       activation='tanh',
                       loss_function='mse',
                       dropout_rate=0.3,
                       optimizer='adam',
                       lstm_units=[50, 25])

    callback = early_stopping(patience=5)
    history = model.fit(X_train, y_train,
                        epochs=50,
                        batch_size=32,
                        callbacks=[callback],
                        verbose=0)

    loss_histories[h] = history.history['loss']

    y_pred = model.predict(X_test, verbose=0)
    test_mse[h] = mean_squared_error(y_test, y_pred)

100%|██████████| 10/10 [02:59<00:00, 17.91s/it]


In [28]:
fig = go.Figure()
for h, loss in loss_histories.items():
    fig.add_trace(go.Scatter(
        y=loss,
        mode='lines',
        name=f'h = {h}'
    ))

fig.update_layout(
    title='Training Loss vs Lookahead Horizon (h)',
    xaxis_title='Epoch',
    yaxis_title='Loss (MSE)',
    template='plotly_white',
    height=600,
    width=1000
)

fig.show()

In [30]:
fig = go.Figure()

fig.add_trace(go.Scatter(
    x=list(test_mse.keys()),
    y=list(test_mse.values()),
    mode='markers',
    name='Test MSE',
    marker=dict(size=8)
))

fig.update_layout(
    title='Test MSE vs Lookahead Horizon (h)',
    xaxis_title='Lookahead Horizon (h)',
    yaxis_title='Mean Squared Error (MSE)',
    template='plotly_white',
    height=500,
    width=800
)

fig.show()

Task 8 - 10 points) Test the work attempted with different set of parameters.
Specifically test the functions by attempting to modify the following parameters and report, for each, your findings on their impact within the model.

1) Window length
2) LSTM parameter: activation function
3) LSTM parameter: loss function
4) LSTM parameter: dropout rate
5) LSTM parameter: optimizer

In [44]:
h = 1

X_train = np.array(train_windows)
y_train = np.array(train_returns_h[h])
X_test = np.array(test_windows)
y_test = np.array(test_returns_h[h])
X_train = X_train.reshape((X_train.shape[0], X_train.shape[1], 1))
X_test = X_test.reshape((X_test.shape[0], X_test.shape[1], 1))

In [45]:
window_lengths = [10, 15, 20, 25, 30]
activations = ['tanh', 'relu', 'sigmoid']
loss_functions = ['mse', 'mae', 'huber']
dropout_rates = [0.1, 0.2, 0.3, 0.5]
optimizers = ['adam', 'sgd', 'rmsprop']

In [46]:
def evaluate_variant(param_name, param_values):
    results = {}

    for val in tqdm(param_values):
        if param_name == 'window_length':
            wl = val

            train_w = [train_data.iloc[i:i+wl].values for i in range(len(train_data) - wl - 10)]
            test_w = [test_data.iloc[i:i+wl].values for i in range(len(test_data) - wl - 10)]

            y_train_var = np.array([
                (train_data.iloc[i + wl] - train_data.iloc[i + wl - 1]) / train_data.iloc[i + wl - 1]
                for i in range(len(train_data) - wl - 10)
            ])
            y_test_var = np.array([
                (test_data.iloc[i + wl] - test_data.iloc[i + wl - 1]) / test_data.iloc[i + wl - 1]
                for i in range(len(test_data) - wl - 10)
            ])

            X_train_var = np.array(train_w).reshape((-1, wl, 1))
            X_test_var = np.array(test_w).reshape((-1, wl, 1))
            input_shape = (wl, 1)

        else:
            X_train_var = X_train
            y_train_var = y_train
            X_test_var = X_test
            y_test_var = y_test
            input_shape = (X_train.shape[1], 1)

        model = build_lstm(input_shape=input_shape,
                           activation=val if param_name == 'activation' else 'tanh',
                           loss_function=val if param_name == 'loss' else 'mse',
                           dropout_rate=val if param_name == 'dropout' else 0.3,
                           optimizer=val if param_name == 'optimizer' else 'adam',
                           lstm_units=[50, 25])

        callback = early_stopping(patience=5)
        model.fit(X_train_var, y_train_var,
                  epochs=50,
                  batch_size=32,
                  callbacks=[callback],
                  verbose=0)

        y_pred = model.predict(X_test_var, verbose=0)
        mse = mean_squared_error(y_test_var, y_pred)
        results[val] = mse

    return results

In [47]:
results_task8 = {
    'window_length': evaluate_variant('window_length', window_lengths),
    'activation': evaluate_variant('activation', activations),
    'loss': evaluate_variant('loss', loss_functions),
    'dropout': evaluate_variant('dropout', dropout_rates),
    'optimizer': evaluate_variant('optimizer', optimizers),
}

100%|██████████| 5/5 [01:34<00:00, 18.85s/it]
100%|██████████| 3/3 [00:57<00:00, 19.20s/it]
100%|██████████| 3/3 [00:50<00:00, 16.69s/it]
100%|██████████| 4/4 [01:11<00:00, 17.97s/it]
100%|██████████| 3/3 [00:50<00:00, 16.77s/it]


In [48]:
for param, result in results_task8.items():
    fig = go.Figure()
    fig.add_trace(go.Bar(
        x=list(result.keys()),
        y=list(result.values()),
        text=[f"{v:.4f}" for v in result.values()],
        textposition='auto'
    ))

    fig.update_layout(
        title=f'Test MSE vs {param.replace("_", " ").title()}',
        xaxis_title=param.replace("_", " ").title(),
        yaxis_title='Test MSE',
        template='plotly_white',
        height=500,
        width=800
    )

    fig.show()

Task 9 - 20 points):

In this short NLP task, you will use an LSTM model to perform next-word prediction on a small text sample.

Instructions:
	1.	Select a short paragraph of English text (around 100–200 words).
	2.	Tokenize the text into words, and create input-output pairs where each input consists of 3 consecutive words and the output is the 4th word.
	3.	Convert words to integer indices (you may use Keras’ Tokenizer).
	4.	Train a simple LSTM model to predict the next word given the previous 3.
	5.	Test your model by providing a custom 3-word input and printing the predicted next word.

You do not need to optimize the model. Just focus on building a viable version of the workflow.

In [51]:
# Abstract of the paper "Attention is All You Need"
text = """
The dominant sequence transduction models are based on complex recurrent or convolutional neural networks that include an encoder and a decoder.
The best performing models also connect the encoder and decoder through an attention mechanism.
We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely.
Experiments on two machine translation tasks show these models to be superior in quality while being more parallelizable and requiring significantly less time to train.
Our model achieves 28.4 BLEU on the WMT 2014 English-to-German translation task, improving over the existing best results, including ensembles.
On the WMT 2014 English-to-French translation task, our model establishes a new single-model state-of-the-art BLEU score of 41.0 after training for 3.5 days on eight GPUs.
"""

In [53]:
tokenizer = Tokenizer()
tokenizer.fit_on_texts([text])
word_index = tokenizer.word_index
total_words = len(word_index) + 1

token_list = tokenizer.texts_to_sequences([text])[0]

In [54]:
# 3 input words to 1 output word
input_sequences = []
for i in range(len(token_list) - 3):
    input_seq = token_list[i:i+3]
    output_word = token_list[i+3]
    input_sequences.append((input_seq, output_word))

X = np.array([x[0] for x in input_sequences])
y = np.array([x[1] for x in input_sequences])

In [57]:
model = Sequential()
model.add(Embedding(input_dim=total_words, output_dim=10, input_length=3))
model.add(LSTM(64))
model.add(Dense(total_words, activation='softmax'))

model.compile(loss='sparse_categorical_crossentropy', optimizer='adam', metrics=['accuracy'])


Argument `input_length` is deprecated. Just remove it.



In [58]:
model.fit(X, y, epochs=200, verbose=0)

<keras.src.callbacks.history.History at 0x791033e08b50>

In [59]:
def predict_next_word(input_text):
    tokens = tokenizer.texts_to_sequences([input_text])[0]
    if len(tokens) != 3:
        raise ValueError("Input must contain exactly 3 known words from the training text.")
    tokens = np.array(tokens).reshape(1, 3)
    predicted_index = np.argmax(model.predict(tokens, verbose=0), axis=-1)[0]
    for word, index in tokenizer.word_index.items():
        if index == predicted_index:
            return word
    return "<unknown>"

In [62]:
examples = [
    "we propose a",
    "the transformer based",
    "attention mechanisms dispensing",
    "on eight gpus",
    "a new simple",
    "models also connect",
    "encoder and decoder",
    "superior in quality",
    "include an encoder",
    "more parallelizable and",
    "requires significantly less", # Somehow this is not converted to 3 tokens
    "translation task improving",
    "state of the",
    "convolutional neural networks",
    "the best performing",
    "on the wmt",
    "english to french",
    "translation tasks show",
    "neural networks that",
    "networks that include"
]

for phrase in examples:
    tokens = tokenizer.texts_to_sequences([phrase])[0]
    if len(tokens) != 3:
        print(f"{phrase} — invalid input length")
        continue
    tokens = np.array(tokens).reshape(1, 3)
    predicted_index = np.argmax(model.predict(tokens, verbose=0), axis=-1)[0]
    predicted_word = next((word for word, index in tokenizer.word_index.items() if index == predicted_index), "<unknown>")
    print(f"{phrase} \"{predicted_word}\"")

we propose a "new"
the transformer based "solely"
attention mechanisms dispensing "with"
on eight gpus "translation"
a new simple "network"
models also connect "the"
encoder and decoder "through"
superior in quality "while"
include an encoder "and"
more parallelizable and "requiring"
requires significantly less — invalid input length
translation task improving "over"
state of the "art"
convolutional neural networks "that"
the best performing "models"
on the wmt "2014"
english to french "translation"
translation tasks show "these"
neural networks that "include"
networks that include "an"
