##### Prototipy de Modelos con LSTM:
Univariate Time Series
 - Baseline
 - Simple LSTM Model ( One Input )
Forecast Multivariate Time Series
 - Multivariate Single Step Model
 - MuLtivariate Multiple Step Model


# Time series forecasting

Packages calls, mount google drive, Tensorflow Backend, Maplotlib, numpy, os, pandas


In [0]:
from __future__ import absolute_import, division, print_function, unicode_literals
try:
  # %tensorflow_version only exists in Colab.
  %tensorflow_version 2.x
except Exception:
  pass
import tensorflow as tf

import matplotlib as mpl
import matplotlib.pyplot as plt
import numpy as np
import os
import pandas as pd

mpl.rcParams['figure.figsize'] = (8, 6)
mpl.rcParams['axes.grid'] = False

from google.colab import drive

drive.mount('/content/gdrive', force_remount=True)
root_path = 'gdrive/My Drive/Colab Notebooks/CS230_PROJECT' 

## The Price Stock Dataset
We use Sharadar Core US bundle Bundle, specifically implement the tables: Sharadar Equity Prices, Daily Metrics and Core US Fundamentals. These provide daily open/close prices, enterprise value over earnings (evebit), price to book value(pb), price to earnings(pe), volume and trailing 12 months for financial and fundamental data for each stock.

In [0]:
#df = pd.read_csv(csv_path)
df = pd.read_csv('gdrive/My Drive/Colab Notebooks/CS230_PROJECT/symbol_Basic.csv')

Let's take a glance at the data and size.

In [0]:
df.head

In [0]:
df.shape

As you can see above, we have 1467 day of data for our. This is our Basic dataset.

The function below returns the above described windows of time for the model to train on. The parameter `history_size` is the size of the past window of information. The `target_size` is how far in the future does the model need to learn to predict. The `target_size` is the label that needs to be predicted.

In [0]:
def univariate_data(dataset, start_index, end_index, history_size, target_size):
  data = []
  labels = []

  start_index = start_index + history_size
  if end_index is None:
    end_index = len(dataset) - target_size

  for i in range(start_index, end_index):
    indices = range(i-history_size, i)
    # Reshape data from (history_size,) to (history_size, 1)
    data.append(np.reshape(dataset[indices], (history_size, 1)))
    labels.append(dataset[i+target_size])
  return np.array(data), np.array(labels)

for this test we use 90% of our dataset as training set.

In [0]:
TRAIN_SPLIT = np.int(df.shape[0]*0.9)
print(TRAIN_SPLIT)

Setting seed to ensure reproducibility.

In [0]:
tf.random.set_seed(1)

## Part 1: Forecast a univariate time series
First, you will train a model using only a single feature ( Close Price Stock Value), and use it to make predictions for that value in the future.

Let's first extract only the Close Price from the dataset.

In [0]:

uni_data = df['close']
uni_data.index = df['date']
uni_data.tail()


Let's observe how this data looks across time.

In [0]:
uni_data.plot(subplots=True)

In [0]:
uni_data = uni_data.values

It is important to normalize features before training a neural network. A common way to do so is by subtracting the mean and dividing by the standard deviation of each feature.

Note: The mean and standard deviation should only be computed using the training data.

In [0]:
uni_train_mean = uni_data[:TRAIN_SPLIT].mean()
uni_train_std = uni_data[:TRAIN_SPLIT].std()

Let's normalize the data.

In [0]:
uni_data = (uni_data-uni_train_mean)/uni_train_std

We create the data for the univariate model. For part 1, the model will be given the last 60 stock close prices, and needs to learn to predict the temperature at the next time step. 

In [0]:
univariate_past_history = 60
univariate_future_target = 1

x_train_uni, y_train_uni = univariate_data(uni_data, 0, TRAIN_SPLIT,
                                           univariate_past_history,
                                           univariate_future_target)
x_val_uni, y_val_uni = univariate_data(uni_data, TRAIN_SPLIT, None,
                                       univariate_past_history,
                                       univariate_future_target)

This is what the `univariate_data` function returns.

In [0]:
print ('Single window of past history')
print (x_train_uni.shape)
print ('\n Target Price to predict')
print (y_train_uni.shape)
print ('x to eval')
print (x_val_uni.shape)
print ('\n price to eval')
print (y_val_uni.shape)

Now that the data has been created, let's take a look at a single example. The information given to the network is given in blue, and it must predict the value at the red cross.

In [0]:
def create_time_steps(length):
  time_steps = []
  for i in range(-length, 0, 1):
    time_steps.append(i)
  return time_steps

In [0]:
def show_plot(plot_data, delta, title):
  labels = ['History', 'True Future', 'Model Prediction']
  marker = ['.-', 'rx', 'go']
  time_steps = create_time_steps(plot_data[0].shape[0])
  if delta:
    future = delta
  else:
    future = 0

  plt.title(title)
  for i, x in enumerate(plot_data):
    if i:
      plt.plot(future, plot_data[i], marker[i], markersize=10,
               label=labels[i])
    else:
      plt.plot(time_steps, plot_data[i].flatten(), marker[i], label=labels[i])
  plt.legend()
  plt.xlim([time_steps[0], (future+5)*2])
  plt.xlabel('Time-Step')
  return plt

In [0]:
print(y_train_uni[1000])

In [0]:
show_plot([x_train_uni[1000], y_train_uni[1000]], 0, 'Sample Example')

### Vanilla Baseline
Before proceeding to train a model, let's first set a simple baseline. Given an input point, the baseline method looks at all the history and predicts the next point to be the average of the last 60 observations.

In [0]:
def baseline(history):
  return np.mean(history)

In [0]:
print(x_train_uni[100])

In [0]:
show_plot([x_train_uni[100], y_train_uni[100], baseline(x_train_uni[100])], 0,
           'Baseline Prediction Example')

Let's see if you can beat this baseline using a recurrent neural network.


Let's now use `tf.data` to shuffle, batch, and cache the dataset.

In [0]:
BATCH_SIZE = 128
#BUFFER_SIZE = 10000
BUFFER_SIZE = 256

train_univariate = tf.data.Dataset.from_tensor_slices((x_train_uni, y_train_uni))
train_univariate = train_univariate.cache().shuffle(BUFFER_SIZE).batch(BATCH_SIZE).repeat()

val_univariate = tf.data.Dataset.from_tensor_slices((x_val_uni, y_val_uni))
val_univariate = val_univariate.batch(BATCH_SIZE).repeat()

You will see the LSTM requires the input shape of the data it is being given.

In [0]:
simple_lstm_model = tf.keras.models.Sequential([
    tf.keras.layers.LSTM(16, input_shape=x_train_uni.shape[-2:]),
    tf.keras.layers.Dense(1)
])

simple_lstm_model.compile(optimizer='adam', loss='mae')

In [0]:
x_train_uni.shape[-2:]

Let's make a sample prediction, to check the output of the model. 

In [0]:
for x, y in val_univariate.take(1):
    print(simple_lstm_model.predict(x).shape)

Let's train the model now. Each epoch will run for 256 steps, instead of the complete training data as normally done.

In [0]:
EVALUATION_INTERVAL = 256
EPOCHS = 30

simple_lstm_model.fit(train_univariate, epochs=EPOCHS,
                      steps_per_epoch=EVALUATION_INTERVAL,
                      validation_data=val_univariate, validation_steps=50)

In [0]:
print(simple_lstm_model.predict(x)[0])
print(y[0])

#### Predict using the simple LSTM model
Now that you have trained your simple LSTM, let's try and make a few predictions.

In [0]:
for x, y in val_univariate.take(3):
  plot = show_plot([x[0].numpy(), y[0].numpy(),
                    simple_lstm_model.predict(x)[0]], 0, 'Simple LSTM model')
  plot.show()

This looks better than the baseline. Now that you have seen the basics, let's move on to part two, where you will work with a multivariate time series.

## Part 2: Forecast a multivariate time series

Our final dataset considers 117 features. For simplicity, this section considers only 12. The features used are open daily price, high daily price, low daily price, close price, volumen, dividends, ev, evebit, vebitda, marketcap, pb,pe ps.

To use more features, add their names to this list.

In [0]:
features_considered = ['open', 'high', 'low', 'close', 'volume','dividends', 'ev', 'evebit', 'evebitda', 'marketcap', 'pb', 'pe', 'ps']

In [0]:
features = df[features_considered]
#features.index = df['Date Time']
features.index = df['date']
features.head()

Let's have a look at how each of these features vary across time.

In [0]:
features.plot(subplots=True)

As mentioned, the first step will be to normalize the dataset using the mean and standard deviation of the training data.

In [0]:
dataset = features.values
data_mean = dataset.mean(axis=0)
print(data_mean)
data_std = dataset.std(axis=0)
print(dataset[0:5])
print(dataset.std(axis=0))

In [0]:
print(dataset[1,:])
print(dataset[:,3])
dataset = (dataset-data_mean)/data_std
print(dataset[1,:])
print(dataset[:,3])

### Single step model
In a single step setup, the model learns to predict a single point in the future based on some history provided.

The below function performs the same windowing task as below, however, here it samples the past observation based on the step size given.

In [0]:
def multivariate_data(dataset, target, start_index, end_index, history_size,
                      target_size, step, single_step=False):
  data = []
  labels = []

  start_index = start_index + history_size
  if end_index is None:
    end_index = len(dataset) - target_size

  for i in range(start_index, end_index):
    indices = range(i-history_size, i, step)
    data.append(dataset[indices])
    print(i,i+target_size,indices)
    if single_step:
      labels.append(target[i+target_size])
    else:
      labels.append(target[i:i+target_size])

  return np.array(data), np.array(labels)

In [0]:

past_history = 90
future_target = 1
STEP = 1

x_train_single, y_train_single = multivariate_data(dataset, dataset[:, 3], 0,
                                                   TRAIN_SPLIT, past_history,
                                                   future_target, STEP,
                                                   single_step=True)
x_val_single, y_val_single = multivariate_data(dataset, dataset[:, 3],
                                               TRAIN_SPLIT, None, past_history,
                                               future_target, STEP,
                                               single_step=True)

Let's look at a single data-point.


In [0]:
print ('Single window of past history : {}'.format(x_train_single[1].shape))
print ('Single window of past history y : {}'.format(y_train_single[1].shape))
print(y_train_single[0])
print ('Single Valuation window: {}'.format(x_val_single[0].shape))
print ('Single Valuation window y : {}'.format(y_val_single[0].shape))
print(y_train_single[0])


In [0]:
train_data_single = tf.data.Dataset.from_tensor_slices((x_train_single, y_train_single))
train_data_single = train_data_single.cache().shuffle(BUFFER_SIZE).batch(BATCH_SIZE).repeat()

val_data_single = tf.data.Dataset.from_tensor_slices((x_val_single, y_val_single))
val_data_single = val_data_single.batch(BATCH_SIZE).repeat()

In [0]:
single_step_model = tf.keras.models.Sequential()
single_step_model.add(tf.keras.layers.LSTM(50,return_sequences=True,input_shape=x_train_single.shape[-2:]))
single_step_model.add(tf.keras.layers.LSTM(50,dropout=0.2))
single_step_model.add(tf.keras.layers.Dense(1))

single_step_model.compile(optimizer=tf.keras.optimizers.RMSprop(), loss='mae')

Let's check out a sample prediction.

In [0]:
for x, y in val_data_single.take(1):
  print(single_step_model.predict(x).shape)

In [0]:
single_step_history = single_step_model.fit(train_data_single, epochs=EPOCHS,
                                            steps_per_epoch=EVALUATION_INTERVAL,
                                            validation_data=val_data_single,
                                            validation_steps=50)

In [0]:
def plot_train_history(history, title):
  loss = history.history['loss']
  val_loss = history.history['val_loss']
  
  epochs = range(len(loss))

  plt.figure()

  plt.plot(epochs, loss, 'b', label='Training loss')
  plt.plot(epochs, val_loss, 'r', label='Validation loss')
  plt.title(title)
  plt.legend()

  plt.show()

In [0]:
plot_train_history(single_step_history,
                   'Single Step Training and validation loss')

In [0]:
print(x[0][:, 3])
print(y[0])
print(single_step_model.predict(x)[0])

#### Predict a single step future
Now that the model is trained, let's make a few sample predictions.

In [0]:
for x, y in val_data_single.take(3):
  plot = show_plot([x[0][:, 3].numpy(), y[0].numpy(),
                    single_step_model.predict(x)[0]], 12,
                   'Single Step Prediction')
  plot.show()
  print (val_data_single.take(3))

In [0]:
plot = show_plot([x[0][:, 1].numpy(), y[0].numpy(),
                    single_step_model.predict(x)[4]], 12,
                   'Single Step Prediction')

### Multi-Step model
In a multi-step prediction model, given a past history, the model needs to learn to predict a range of future values. Thus, unlike a single step model, where only a single future point is predicted, a multi-step model predict a sequence of the future.

For the multi-step model, the training data again consists of recordings over the past 60 days. The output is 10 predictions. For this task, the dataset needs to be prepared accordingly, thus the first step is just to create it again, but with a different target window.

In [0]:
future_target = 10
x_train_multi, y_train_multi = multivariate_data(dataset, dataset[:, 1], 0,
                                                 TRAIN_SPLIT, past_history,
                                                 future_target, STEP)
x_val_multi, y_val_multi = multivariate_data(dataset, dataset[:, 1],
                                             TRAIN_SPLIT, None, past_history,
                                             future_target, STEP)

Let's check out a sample data-point.

In [0]:
train_data_multi = tf.data.Dataset.from_tensor_slices((x_train_multi, y_train_multi))
train_data_multi = train_data_multi.cache().shuffle(BUFFER_SIZE).batch(BATCH_SIZE).repeat()

val_data_multi = tf.data.Dataset.from_tensor_slices((x_val_multi, y_val_multi))
val_data_multi = val_data_multi.batch(BATCH_SIZE).repeat()

Plotting a sample data-point.

In [0]:
def multi_step_plot(history, true_future, prediction):
  plt.figure(figsize=(12, 6))
  num_in = create_time_steps(len(history))
  num_out = len(true_future)

  plt.plot(num_in, np.array(history[:, 1]), label='History')
  plt.plot(np.arange(num_out)/STEP, np.array(true_future), 'bo',
           label='True Future')
  if prediction.any():
    plt.plot(np.arange(num_out)/STEP, np.array(prediction), 'ro',
             label='Predicted Future')
  plt.legend(loc='upper left')
  plt.show()

In this plot and subsequent similar plots, the history and the future data are sampled every hour.

In [0]:
for x, y in train_data_multi.take(1):
  multi_step_plot(x[0], y[0], np.array([0]))

Two LSTM layers. 

In [0]:
multi_step_model = tf.keras.models.Sequential()
multi_step_model.add(tf.keras.layers.LSTM(16,
                                          return_sequences=True,
                                          input_shape=x_train_multi.shape[-2:]))
multi_step_model.add(tf.keras.layers.LSTM(8, activation='relu'))
#multi_step_model.add(tf.keras.layers.Dense(72))
multi_step_model.add(tf.keras.layers.Dense(10))
multi_step_model.compile(optimizer=tf.keras.optimizers.RMSprop(clipvalue=1.0), loss='mae')

Let's see how the model predicts before it trains.

In [0]:
for x, y in val_data_multi.take(1):
  print (multi_step_model.predict(x).shape)

In [0]:
multi_step_history = multi_step_model.fit(train_data_multi, epochs=EPOCHS,
                                          steps_per_epoch=EVALUATION_INTERVAL,
                                          validation_data=val_data_multi,
                                          validation_steps=50)

In [0]:
plot_train_history(multi_step_history, 'Multi-Step Training and validation loss')

#### Predict a multi-step future
Let's now have a look at how well your network has learnt to predict the future.

In [0]:
for x, y in val_data_multi.take(3):
  multi_step_plot(x[0], y[0], multi_step_model.predict(x)[4])