## DeepAR Time Series Prediction

## Overview

DeepAR

* Supervised algorithm based on recurrent neural networks
* Time series
    * a series of values of a quantity obtained at successive times, often with equal intervals between them.
    * often values in a time series are often correlated with values measured at a previous time (temperature in a day, seasonal temperature values, etc.)
    
Typical Use - Forecasting

* Product demand
* Passenger traffic
* Weather trends



Time series forcasting with non-time series algoritms

* Do not natively handle date time features
* Do not learn from time dependant nature of data - completely feature driven, ignore temporal order


Time series can be made up of:

* Noise - random fluctuation that cannot be explained or predicted by an algorithm
* Seasonal data - a pattern that repeats itself at a specific frequency
* Trends - show increase or decrease of a target value over time

DeepAR Stationary vs Non-Stationary

* Classical time series forecasting algorithms expect time series to be stationary (mean, standard deviation are constant over time)
    * Requires data transformation to make it stationary
* DeepAR does not expect data to be stationary
    * Can experiment to see if making the data stationary improves model
    
DeepAR supports features to indicate external factors that influence a target value (for example black friday)

DeepAR vs Classical Models

* Classical forecasting methods (ARIMA or ETS) fit a single model to each time series
* DeepAR can fit many similar time series in a single model
* DeepAR outperforms ARIMA/ETC on datasets that contain hundreds of related time series



## Details

Traing/Test - Important Differences

Normal:

* Data is randomly divided into training and test sets
* Training file contains only training data
* Test file contains only test data
* Inference: Model predicts target for new data

[DeepAR](https://docs.aws.amazon.com/sagemaker/latest/dg/deepar.html):

* Data is ordered; time order must be honored, series needs to be split based on time
* Decide how far in the future the model can predict (prediction length hyper parameter)
* Training set: use entire time series minus the end of the series that makes us the prediction length
* Test set has the last number of points covering the prediction length

In DeepAR you train on the aggregated time series data, for example in the energy example all customer time series is used for the training. On the prediction side, for one model you would send the customer's time series through to get the prediction for the next set of points.

Supports JSON lines and parquet formats for input, JSON format for inference

Input fields:

* start - start timestamp for the series YYYY-MM-DD HH:MM:SS
* target - array of floating point or integer values; missing values can be NaN in JSON Lines, nan in Parquet
* dynamic_feat - optional input features, floating point or integer values. Arrays of arrays. Each inner array represents values of a feature adn must be of same length as target. Missing values not supported.
* cat - optional category. Category is used for identifying/encoding a time series. Each categorical feature is presented as 0 based integer.

Demo: Kaggle Bike Rentals as a timeseries forecasting problem




### Working with Missing Values

See [this notebook](https://github.com/ChandraLingam/AmazonSageMakerCourse/blob/master/DeepAR/MissingValue/Timeseries.ipynb)

* Want to load fileswith timestamp as the index for the data frames
* Use df.resample to fill in missing timestamps
* Use forward fill, backward fill, or interpolation to fill in missing data, or any of the numerous pandas values.


### Bike Rental Demo

Notebooks [here](https://github.com/ChandraLingam/AmazonSageMakerCourse/tree/master/DeepAR/BikeRental)

Challenges:

* Gaps in time series - training data consists of first 19 days of each month, two years of data.
* Has missing features as well, e.g. missing timestamps

For demo, first start with rental counts only. Then add categories to the mix, then add dynamic features. 3 models total.

First example - [data preparatiion](https://github.com/ChandraLingam/AmazonSageMakerCourse/blob/master/DeepAR/BikeRental/deepar_biketrain_data_preparation.ipynb)

Notes:

* Grab the data from the kaggle site.
* Create training (1st 19 days of month) and test json files (the rest).
* Analyze data for missing values, decide prediction length.
* First model - target values are count, registered, casual
* Use sagemaker example encoding helpers to converting to json lines format

Notes - Training and Fitting DeepAR Model

* Based on [this](https://github.com/ChandraLingam/AmazonSageMakerCourse/blob/master/DeepAR/BikeRental/deepar_biketrain_cloud_training.ipynb) notebook
* No categories
* DeepAR [hyperparamters](https://docs.aws.amazon.com/sagemaker/latest/dg/deepar_hyperparameters.html)
    * context length - number of points model gets to see before making prediction - set to roughly same length as prediction length
    * lagged inputs - used for seasonality, for example 1 year pattern, model can include data from the previous year to account for seasonality. 
    
 
Invoking the Endpoint

* [This notebook](https://github.com/ChandraLingam/AmazonSageMakerCourse/blob/master/DeepAR/BikeRental/deepar_biketrain_cloud_prediction.ipynb)

DeepAR with Categories

* Send additional context using categories, can forecast categories based on data seen during training.
* Use numeric values for categories
* Use same prep notebook set with_categories to True, same in the training notebook, and prediction.
* Now can forecast for each category.

DeepAR with Dynamic Features

* [This notebook](https://github.com/ChandraLingam/AmazonSageMakerCourse/blob/master/DeepAR/BikeRental/deepar_biketrain_data_preparation_dynamic_feat.ipynb)
* No missing features or NaN, only finite numbers are allowed.
* Replace missing features with NaN values with values available in the test file
* Can use forward fill for features like temperature, humidity, weather, etc.
* Forward fill might not work for some features that remain the same throughout the day, such as working day, holiday, and season. For these look for a valid value in the same day and use that.

Train a Model with Dynamic Features

* [This notebook](https://github.com/ChandraLingam/AmazonSageMakerCourse/blob/master/DeepAR/BikeRental/deepar_biketrain_dynamic_feat_cloud_training.ipynb)
* Need a more powerful instance.
* Prediction script [here](https://github.com/ChandraLingam/AmazonSageMakerCourse/blob/master/DeepAR/BikeRental/deepar_biketrain_dynamic_feat_cloud_prediction.ipynb)
* Good for what if scenarios and modeling external events like black friday.



### Other Samples

Two samples provided by AWS -synthetic data and electricity forecasting

## Summary

* DeepAR is a time series forcasting algoritm
* Supervised algorithm based on Recurrent Nueral Networks
* Typically used for forecasting - product demand, passender traffic, weather trend, etc.
* Classical forecasting methods (ARIMA or ETS) fit a single model to each time series
    * DeepAR can fit many similar time series in a single model
    * DeepAR outperforms ARIMA/ETS on datasets that contain hundreds of related time series
* Train model with one or more time series
* Use model to extrapolate time series into the future
* Use model to generate forecasts for a new time series that is similar to the ones it is trained on
* prediction_length hyper parameter determines how many points in future need to be forecast
* Supports external events via dynamic features
* Biggest challenge is data preparation

## Training a Model for Different Products

Consider the bike rental time seires data for different bike models. Each bike type hasits own time series, and you want to traing one model which includes all the time series.

(from the course notes)

> You can take couple of approaches.
> 1. Train a global model that learns from all the timeseries data that you provide - you can then use this to forecast future demand. Here, DeepAR learns to differentiate timeseries on its own
>
> 2. Train a DeepAR model with Categories - Here, you identify each timeseries with a category (bike type, in your example).
> DeepAR then learns to differentiate timeseries based on category that you provided.
>
> At the time of forecast, you need to include the category for which to predict. This approach requires you to identify "all" categories that you will encounter and train the model.
>
> If new categories show up, you need to retrain the model
>
> 3. Finally, you can include dynamic features - to highlight things like holidays, sale promotion and other factors that have external influence.
>
> As you can see, you would need to experiment these options for your dataset and see which one generalizes well