# Time Series Forecasting with Convolutional Neural Networks 
The core part of this notebook was digested from https://github.com/JEddy92/TimeSeries_Seq2Seq/blob/master/notebooks/TS_Seq2Seq_Conv_Full_Exog.ipynb. 

I mainly added 

**0. Utilities**

**1. Loading Data**

**4. Saving Model with weights**

## 0. Utilities

In [0]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

import seaborn as sns
sns.set()

In [0]:
from google.colab import files
import os
import os.path
from os import path

Please check the **real file name** after uploading.
There is no bug, but this upload solution is not perfect.

In [0]:
def upload_1_file():
  uploaded = files.upload()

  for fn in uploaded.keys():
    print('User uploaded file "{name}" with length {length} bytes'.format(
        name=fn, length=len(uploaded[fn])))
    return fn

  return ""


In [0]:
def download_1_file(http_path):
  import requests
  import shutil
  response = requests.get(http_path, stream=True)

  import tempfile
  fname = tempfile.mkstemp()[1]
  
  #print(fname)
  with open(fname, 'wb') as fin:
      shutil.copyfileobj(response.raw, fin)

  return fname # Works!

In [0]:
def download_or_upload_1_file(http_or_file_path):
  if (path.exists(http_or_file_path) and (path.isfile(http_or_file_path))):
    return http_or_file_path

  if (http_or_file_path==""):
    encoder_input_file=upload_1_file()
    return encoder_input_file
  else:
    print(http_or_file_path)
    return download_1_file(http_or_file_path)

## 1. Loading Data



For convinence, I uploaded the cleaned data to the following web path.

In [0]:
encoder_input_web_or_local="https://MrYingLee.Github.io/Seq2Seq/encoder_input.npy"
decoder_target_web_or_local="https://MrYingLee.Github.io/Seq2Seq/decoder_target.npy"

### Encoder File

In [7]:
file1=download_or_upload_1_file(encoder_input_web_or_local)
encoder_input_data=np.load(file1)

https://MrYingLee.Github.io/Seq2Seq/encoder_input.npy


In [8]:
encoder_input_data.shape

(13, 489, 11)

### Decoder File

Please check the **real file name** after uploading.

In [9]:
file2=download_or_upload_1_file(decoder_target_web_or_local)
decoder_target_data=np.load(file2)

https://MrYingLee.Github.io/Seq2Seq/decoder_target.npy


In [10]:
decoder_target_data.shape

(13, 60, 1)

## 2. Building the Model - Architecture

This convolutional architecture is a full-fledged version of the [WaveNet model](https://deepmind.com/blog/wavenet-generative-model-raw-audio/), designed as a generative model for audio (in particular, for text-to-speech applications). The wavenet model can be abstracted beyond audio to apply to any time series forecasting problem, providing a nice structure for capturing long-term dependencies without an excessive number of learned weights. Exogenous features can be integrated into WaveNet simply by extending the 3rd dimension (feature dimension) of the tensors that we feed to the model.

The core of the wavenet model can be described as a **stack of residual blocks** that utilize **dilated causal convolutions**, visualized by the two diagrams from the wavenet paper below. I've gone into detailed discussion of these model components in the two previous notebooks of this series ([part 1](https://github.com/JEddy92/TimeSeries_Seq2Seq/blob/master/notebooks/TS_Seq2Seq_Conv_Intro.ipynb), [part 2](https://github.com/JEddy92/TimeSeries_Seq2Seq/blob/master/notebooks/TS_Seq2Seq_Conv_Full.ipynb)), so I'd recommend checking those out if you want to build familiarity.

![dilatedconv](https://github.com/JEddy92/TimeSeries_Seq2Seq/blob/master/notebooks/images/WaveNet_dilatedconv.png?raw=1)  

![blocks](https://github.com/JEddy92/TimeSeries_Seq2Seq/blob/master/notebooks/images/WaveNet_residblock.png?raw=1)        


### **Our Architecture**

With all of our components now laid out, here's what we'll use:

* 16 dilated causal convolutional blocks
    * Preprocessing and postprocessing (time distributed) fully connected layers (convolutions with filter width 1): 32 output units
    * 32 filters of width 2 per block
    * Exponentially increasing dilation rate with a reset (1, 2, 4, 8, ..., 128, 1, 2, ..., 128) 
    * Gated activations
    * Residual and skip connections
* 2 (time distributed) fully connected layers to map sum of skip outputs to final output 

Note that the only change in architecture from the previous notebook (without exogenous features) is an increase in units from 16 to 32 for the pre and postprocessing layers. This increase lets us better handle the larger number of input features (before we only used 1 feature!). 

As in the previous notebook, we'll extract the last 60 steps from the output sequence as our predicted output for training. We'll also use teacher forcing again during training, and write a separate function for iterative inference (section 5). 

In [0]:
def create_conv_model(input_last_length):
  import tensorflow as tf
  tf.logging.set_verbosity(tf.logging.FATAL) # suppress unhelpful tf warnings

  from keras.models import Model
  from keras.layers import Input, Conv1D, Dense, Activation, Dropout, Lambda, Multiply, Add, Concatenate

  # convolutional operation parameters
  n_filters = 32 # 32 
  filter_width = 2
  dilation_rates = [2**i for i in range(8)] * 2 

  # define an input history series and pass it through a stack of dilated causal convolution blocks. 
  # Note the feature input dimension corresponds to the raw series and all exogenous features  
  history_seq = Input(shape=(None, input_last_length))
  x = history_seq

  skips = []
  for dilation_rate in dilation_rates:
      
      # preprocessing - equivalent to time-distributed dense
      x = Conv1D(32, 1, padding='same', activation='relu')(x) 
      
      # filter convolution
      x_f = Conv1D(filters=n_filters,
                  kernel_size=filter_width, 
                  padding='causal',
                  dilation_rate=dilation_rate)(x)
      
      # gating convolution
      x_g = Conv1D(filters=n_filters,
                  kernel_size=filter_width, 
                  padding='causal',
                  dilation_rate=dilation_rate)(x)
      
      # multiply filter and gating branches
      z = Multiply()([Activation('tanh')(x_f),
                      Activation('sigmoid')(x_g)])
      
      # postprocessing - equivalent to time-distributed dense
      z = Conv1D(32, 1, padding='same', activation='relu')(z)
      
      # residual connection
      x = Add()([x, z])    
      
      # collect skip connections
      skips.append(z)

  # add all skip connection outputs 
  out = Activation('relu')(Add()(skips))

  # final time-distributed dense layers 
  out = Conv1D(128, 1, padding='same')(out)
  out = Activation('relu')(out)
  out = Dropout(.2)(out)
  out = Conv1D(1, 1, padding='same')(out)

  # extract the last 60 time steps as the training target
  def slice(x, seq_length):
      return x[:,-seq_length:,:]

  pred_seq_train = Lambda(slice, arguments={'seq_length':60})(out)

  model = Model(history_seq, pred_seq_train)
  return model

In [12]:
from keras.optimizers import Adam
conv_model=create_conv_model(encoder_input_data.shape[-1])

Using TensorFlow backend.


In [13]:
conv_model.summary()

Model: "model_1"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
input_1 (InputLayer)            (None, None, 11)     0                                            
__________________________________________________________________________________________________
conv1d_1 (Conv1D)               (None, None, 32)     384         input_1[0][0]                    
__________________________________________________________________________________________________
conv1d_2 (Conv1D)               (None, None, 32)     2080        conv1d_1[0][0]                   
__________________________________________________________________________________________________
conv1d_3 (Conv1D)               (None, None, 32)     2080        conv1d_1[0][0]                   
____________________________________________________________________________________________

With our training architecture defined, we're ready to train the model! We'll leverage the transformer utility functions we defined earlier, and train using mean absolute error loss.

For this expansion of the full-fledged model, once again we end up more than doubling the total number of trainable parameters and incur the cost of slower training time. These additional parameters are due to the increase in filters for the pre/postprocessing layers. Training a model at this scale will take quite a while if you're not running fancy hardware - I'd recommend using a GPU. When constructing this notebook, I used an AWS EC2 instance with a GPU (p2.xlarge) and the Amazon Deep Learning AMI, and training took about an hour. 

This time around, we'll go ahead and use all of the series in the dataset for training, and train for 15 epochs to give this more complex model more time to try to reach its full potential. 

This is only a starting point, and I would encourage you to play around with this pipeline to see if you can get even better results! You could try selecting/engineering different exogenous features, adjusting the model architecture/hyperparameters, tuning the learning rate and number of epochs, etc.

## 3. Trainging Model

In [14]:
batch_size = 2**10 
epochs = 15

conv_model.compile(Adam(), loss='mean_absolute_error')
conv_history = conv_model.fit(encoder_input_data, decoder_target_data,
                    batch_size=batch_size,
                    epochs=epochs)  

Epoch 1/15
Epoch 2/15
Epoch 3/15
Epoch 4/15
Epoch 5/15
Epoch 6/15
Epoch 7/15
Epoch 8/15
Epoch 9/15
Epoch 10/15
Epoch 11/15
Epoch 12/15
Epoch 13/15
Epoch 14/15
Epoch 15/15


## 4. Saving Model with weights

In [0]:
from keras.models import load_model

model_file='conv_model.h5'
conv_model.save(model_file)  

from google.colab import files

files.download(model_file)