# Time Series Forecasting

A time series is an array of data which consists of a an array of points, each of which contains one or more features. The crucial characteristic of time series is that we use them to model temporal dependencies, meaning that we want to perform operations based on the series.
Example tasks for time series would be 
- predicting future stock prices based on current stock prices (univariate time series prediction)
- predicting future weather based on current temperature, humidity and other weather (multivariate time series prediction)
- Classifying the emotion of a piece of music (classifying a whole time series)
- Classifying changes in emotion over multiple scenes (classifying each step of a time series).

For all of these tasks Recurrent Neural Networks (RNN) can be very useful, since the RNN tries to capture the temporal component of the data by feeding each timestep into the network one by one. Especially LSTMs, which include both a "short-term" and a "long-term" memory state or even the simpler GRUs (Gatet Recurrent Units) are widley used. 

In this notebook, you will be working on a modified version of the Air Pollution Dataset, which contains weather data and air pollution measurements done in Peking, China. Your task will be forecasting the pollution of the next measurement.

In [None]:
import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)
from datetime import datetime
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from tensorflow.python.keras.layers import Input, GRU, Dense,Flatten,Dropout,Conv1D, GlobalAveragePooling1D, LSTM
from tensorflow.python.keras.models import Sequential
from tensorflow.python.keras.callbacks import EarlyStopping
from sklearn.metrics import r2_score,mean_squared_error
from sklearn.model_selection import train_test_split
from pmdarima import auto_arima
import seaborn as sns
from IPython.display import set_matplotlib_formats
set_matplotlib_formats('svg')
sns.set_style("darkgrid")

In [None]:
def plot_model_history(history, ax=None, metric='loss', ep_start=1, ep_stop=None, monitor='val_loss', mode='min', plttitle=None):
    if ax is None:
        fig,ax = plt.subplots()
    if ep_stop is None:
        ep_stop = len(history.epoch)
    if plttitle is None:
        plttitle = metric[0].swapcase() + metric[1:] + ' During Training'
    ax.plot(np.arange(ep_start,ep_stop+1, dtype='int'),history.history[metric][ep_start-1:ep_stop])
    ax.plot(np.arange(ep_start,ep_stop+1, dtype='int'),history.history['val_' + metric][ep_start-1:ep_stop])
    ax.set(title=plttitle)
    ax.set(ylabel=metric[0].swapcase() + metric[1:])
    ax.set(xlabel='Epoch')
    ax.legend(['train', 'val'], loc='upper right')

In [None]:
# custom R2-score metrics for keras backend
from tensorflow.python.keras import backend as K

def r2_keras(y_true, y_pred):
    SS_res =  K.sum(K.square(y_true - y_pred)) 
    SS_tot = K.sum(K.square(y_true - K.mean(y_true))) 
    return ( 1 - SS_res/(SS_tot + K.epsilon()) )

In [None]:
data = pd.read_csv('pollution.csv', header=0, index_col=0)
data = data.dropna()
values = data.values
print('datapoints:', len(data))
data

A plot showing the 5 years data for each variable

In [None]:
groups = [0, 1, 2, 3, 4, 5, 6]
i = 1
# plot each column
plt.figure(figsize=(9,10))
for group in groups:
    plt.subplot(len(groups), 1, i)
    plt.plot(values[:, group])
    plt.title(data.columns[group], y=0.5, loc='right')
    i += 1
plt.tight_layout()

In [None]:
# Converting the index as date
data.index = pd.to_datetime(data.index)
data = data.resample('D').mean()
data.head()

For time series prediction, there is always the issue of how training and test data are to be split. For this purposes, we will define `TimeSeriesTrainTestSplit` where the last `test_size` data will be used as a test data.

In [None]:
def TimeSeriesTrainTestSplit(X, test_size):
    
    test_index = int(len(X)*(1-test_size))

    X_train = X.iloc[:test_index]
    X_test = X.iloc[test_index:]
    return X_train, X_test

Our task is a sequence prediction problem. Firstly, is needed to transform time series to a supervised learning problem. Given a sequence of values for time series data set, the data set can be structured to look like a supervised learning. For this purpose we are using a sliding window algorithm. We are using 90 previous days (time step) as input variables and we predict the next entry of polution data as output.

![Moving Window Algorithm](sliding_window.png)

In [None]:
def get_x_y(data, timestamp):
    """
    Split data into x (features) and y (target)
    """
    x, y = [], []
    for i in range(timestamp, data.shape[0]):
        x.append(data[i-timestamp:i,:])
        y.append(data[i,-1:])
    x = np.array(x)
    y = np.array(y)
    
    return x, y

In [None]:
# specify the number of lag days
timestamp =  90 # 1Q
n_features = len(data.columns)

In [None]:
# split into train and test sets
df, test = TimeSeriesTrainTestSplit(data, 0.3)
train, val = TimeSeriesTrainTestSplit(df, 0.2)

In [None]:
X_train, y_train = get_x_y(train.values, timestamp)
X_val, y_val = get_x_y(val.values, timestamp)
X_test, y_test = get_x_y(test.values, timestamp)

Normalize the training data

In [None]:
from sklearn.preprocessing import StandardScaler

scx = StandardScaler()
scy = StandardScaler()

X_train_sc = scx.fit_transform(X_train.reshape(-1, X_train.shape[-1])).reshape(X_train.shape)
y_train_sc = scy.fit_transform(y_train)

Now, for the training and validation data we have a tensor with _samples_ _x_ _timesteps_ _x_ _features_.

In [None]:
n_steps = X_train.shape[1]
n_feats = X_train.shape[2]

In [None]:
def build_model_gru(n_steps,n_feats,n_fore=1):
    model = Sequential()
    model.add(GRU(256, return_sequences=True, input_shape=(n_steps,n_feats),name="gru1"))
    model.add(GRU(128, name="gru2"))
    model.add(Dense(128,activation="relu",name="hidden1"))
    model.add(Dense(64,activation="relu",name="hidden2"))
    model.add(Dense(16,activation="relu",name="hidden3"))
    model.add(Dense(n_fore,activation="linear",name="output"))
    model.compile(loss='mse', optimizer='adam',metrics=[r2_keras])
    return model

def build_model_mlp(n_steps,n_feats,n_fore=1):
    model = Sequential()
    model.add(Flatten(input_shape=(n_steps,n_feats)))
    model.add(Dense(64,activation="relu",name="hidden1"))
    model.add(Dense(32,activation="relu",name="hidden2"))
    model.add(Dense(16,activation="relu",name="hidden3"))
    model.add(Dense(n_fore,activation="linear",name="output"))
    model.compile(loss='mean_squared_error', optimizer='adam',metrics=[r2_keras])
    return model

def build_model_cnn(n_steps,n_feats,n_fore=1):
    model = Sequential()
    model.add(Conv1D(filters=128, kernel_size=7, activation='relu',input_shape=(n_steps,n_feats)))
    model.add(Conv1D(filters=256, kernel_size=3, activation='relu'))
    model.add(Flatten())
    model.add(Dropout(0.20))
    model.add(Dense(128, activation='relu'))
    model.add(Dense(32, activation='relu'))
    model.add(Dense(n_fore, activation='linear'))
    model.compile(optimizer='adam', loss='mean_squared_error', metrics=[r2_keras])
    return model

In [None]:
from tensorflow.python.keras.callbacks import Callback
import time

class TimeHistory(Callback):
    def on_train_begin(self, logs={}):
        self.times = []
    def on_epoch_begin(self, batch, logs={}):
        self.epoch_time_start = time.time()
    def on_epoch_end(self, batch, logs={}):
        self.times.append(time.time() - self.epoch_time_start)

In [None]:
cnn = build_model_cnn(n_steps,n_feats)
gru = build_model_gru(n_steps,n_feats)
mlp = build_model_mlp(n_steps,n_feats)
cb = TimeHistory()
es = EarlyStopping(monitor='val_loss', mode='min', verbose=1, patience=10, restore_best_weights=True)

In [None]:
%%time
history_cnn = cnn.fit(X_train, y_train, validation_data=(X_val, y_val), epochs=10, callbacks=[es, cb], verbose=0)

In [None]:
%%time
history_mlp = mlp.fit(X_train, y_train, validation_data=(X_val, y_val), epochs=10, callbacks=[es, cb], verbose=0)

In [None]:
%%time
history_gru = gru.fit(X_train, y_train, validation_data=(X_val, y_val), epochs=10, callbacks=[es, cb], verbose=0)

In [None]:
print(cb.times)

In [None]:
fig, ax = plt.subplots(2, 2, figsize=(10,6))
plot_model_history(history_cnn, ax = ax[0,0], plttitle='CNN')
plot_model_history(history_mlp, ax = ax[0,1], plttitle='MLP')
plot_model_history(history_gru, ax = ax[1,0], plttitle='GRU')
#ax[0,0].set_ylim([300,2000])
#ax[0,1].set_ylim([300,2000])
#ax[1,0].set_ylim([300,2000])
ax.flat[-1].set_visible(False)
plt.tight_layout()

In [None]:
models = [cnn,mlp,gru]
modelnames = ['CNN', 'MLP', 'GRU']
for i,nme in enumerate(modelnames):
    print(nme)
    print(models[i].evaluate(X_train,y_train))
    print(models[i].evaluate(X_test,y_test))
    print('------')

In [None]:
def dummy_prediction(X):
    y_pred = X[:,-1,-1]
    return y_pred

Results check:

In [None]:
y_pred_train = np.zeros((y_train.shape[0],4))
y_pred_test = np.zeros((y_test.shape[0],4))
modelnames = ['CNN', 'MLP', 'GRU', 'Dummy']
for i,nme in enumerate(modelnames):
    if i==3:
        y_pred_train[:,i] = scy.inverse_transform(dummy_prediction(X_train)).ravel()
        y_pred_test[:,i] = dummy_prediction(X_test).ravel()
    else:
        y_pred_train[:,i] = scy.inverse_transform(models[i].predict(X_train)).ravel()
        y_pred_test[:,i] = models[i].predict(X_test).ravel()
        
fig, axs = plt.subplots(2, 2, figsize=(10,10))
for i,ax in enumerate(axs.flat):
    textstr = 'RMSE training fit: %.03f\n R2 training fit: %.03f\n RMSE prediction: %.03f\n R2 prediction: %.03f' % (np.sqrt(mean_squared_error(y_train,y_pred_train[:,i])),
                                                                                                                    r2_score(y_train,y_pred_train[:,i]),
                                                                                                                    np.sqrt(mean_squared_error(y_test,y_pred_test[:,i])),
                                                                                                                    r2_score(y_test,y_pred_test[:,i]))
    minlim = y_test.min()
    maxlim = y_test.max()
    sns.scatterplot(x=y_test.ravel(),y=y_pred_test[:,i],ax=ax)
    ax.set_xlabel('observed pollution')
    ax.set_ylabel('predicted pollution')
    ax.set_xlim(minlim-10, maxlim+10)
    ax.set_ylim(minlim-10, maxlim+10)
    ax.text(0.05, 0.95, textstr, transform=ax.transAxes, fontsize=10,
        verticalalignment='top', bbox=dict(boxstyle='round', facecolor='wheat', alpha=0.3))
    ax.set_title(modelnames[i])
fig.suptitle('Test set predictions')
fig.tight_layout()
fig.subplots_adjust(top=0.95)

___
___
___

## ARIMAX

Autoregressive Integrated Moving Average with Explanatory Variable (ARIMAX) is an extended version of ARIMA that includes independent predictor variables.

In [None]:
data['Date']=data.index

### Feature Engineering

Almost every time series problem will have some external features or some internal feature engineering to help the model.

Let's add some basic features like lag values of available numeric features that are widely used for time series problems. Since we need to predict the price of the stock for a day, we cannot use the feature values of the same day since they will be unavailable at actual inference time. We need to use statistics like mean, standard deviation of their lagged values.

We will use three sets of lagged values, one looking back 7 days, one looking back a month (30 days) and another looking back 90 days as a proxy for last year quarter metrics.

In [None]:
data.reset_index(drop=True, inplace=True)

lag_features = ['dew', 'temp', 'press', 'wnd_spd', 'snow', 'rain']

window1 = 7 #weekly
window2 = 30 #monthly
window3 = 90 #Q

df_rolled_7d = data[lag_features].rolling(window=window1, min_periods=0)
df_rolled_30d = data[lag_features].rolling(window=window2, min_periods=0)
df_rolled_90d = data[lag_features].rolling(window=window3, min_periods=0)

df_mean_7d = df_rolled_7d.mean().shift(1).reset_index().astype(np.float32)
df_mean_30d = df_rolled_30d.mean().shift(1).reset_index().astype(np.float32)
df_mean_90d = df_rolled_90d.mean().shift(1).reset_index().astype(np.float32)

df_std_7d = df_rolled_7d.std().shift(1).reset_index().astype(np.float32)
df_std_30d = df_rolled_30d.std().shift(1).reset_index().astype(np.float32)
df_std_90d = df_rolled_90d.std().shift(1).reset_index().astype(np.float32)

for feature in lag_features:
    data[f"{feature}_mean_lag{window1}"] = df_mean_7d[feature]
    data[f"{feature}_mean_lag{window2}"] = df_mean_30d[feature]
    data[f"{feature}_mean_lag{window3}"] = df_mean_90d[feature]
    
    data[f"{feature}_std_lag{window1}"] = df_std_7d[feature]
    data[f"{feature}_std_lag{window2}"] = df_std_30d[feature]
    data[f"{feature}_std_lag{window3}"] = df_std_90d[feature]

data.fillna(data.mean(), inplace=True)

data.set_index("Date", drop=False, inplace=True)
data.head()

Sometimes it is very useful to add datetime features like hour, day, month, as applicable to provide the model information about the time component in the data. For time series models it is not explicitly required to pass this information but we could do so and we will try here.

In [None]:
data["month"] = data.index.month
data["week"] = data.index.isocalendar().week
data["day"] = data.index.day
data["day_of_week"] = data.index.dayofweek

Splitting the data into train and validation along with features. We will take the last year as a validation data.

In [None]:
df_train = data[data.Date < "2014"]
df_valid = data[data.Date >= "2014"]

In [None]:
# Drop the 'Date'
data.drop(['Date'], axis='columns', inplace=True)

The additional features supplied to time series problems are called exogenous regressors.

In [None]:
exogenous_features = data.columns[data.columns != 'pollution']

ARIMA (Auto Regressive Integrated Moving Average) models explain a given time series based on its own past values, that is, its own lags and the lagged forecast errors, so that equation can be used to forecast future values.

ARIMA models require certain input parameters: p for the AR(p) part, q for the MA(q) part and d for the I(d) part. Thankfully, there is an automatic process by which these parameters can be chosen which is called Auto ARIMA.

When exogenous regressors are used with ARIMA it is commonly called ARIMAX.

Read more about [ARIMA](https://en.wikipedia.org/wiki/Autoregressive_integrated_moving_average)

In [None]:
model = auto_arima(df_train['pollution'].values, exogenous=df_train[exogenous_features].values,
                   trace=True, error_action='trace', suppress_warnings=True)


In [None]:
model.fit(df_train['pollution'].values, exogenous=df_train[exogenous_features].values)

In [None]:
model.plot_diagnostics(figsize=(10, 8))

In [None]:
forecast = model.predict(n_periods=len(df_valid), exogenous=df_valid[exogenous_features].values)

In [None]:
plt.figure(figsize=(10,4))

ax = sns.lineplot(x=df_train['Date'], y=df_train['pollution'], label='y_Train', color='b')
ax = sns.lineplot(x=df_valid['Date'], y=df_valid['pollution'], label='y_Valid', color='g')

ax = sns.lineplot(x=df_valid['Date'], y=forecast, label='y_pred', color='r')

ax.set_xlabel("Date")
ax.set_ylabel("Pollution")

plt.legend(loc="best");

In [None]:
plt.figure(figsize=(10,4))
sns.lineplot(x=df_valid['Date'], y=df_valid['pollution'], label='y_Valid', color='g')
sns.lineplot(x=df_valid['Date'], y=forecast, label="y_pred", color='r')

ax.set_xlabel("Date")
ax.set_ylabel("Pollution")

plt.legend(loc="best");