# Project: Web Traffic Forecasting
Ruoxin Jiang and Bingyan Hu
## Overview
___
### Task
The goal of our project is to forecast web traffic time series for online pages. Forecasting time series is challenging since we need to combine its seasonality, trend and other factors intelligently in modeling; the historical data itself is insufficient to capture uncertainty in future events. 

We present a hierchical time series forecasting model using Edward and demostrate three rounds of Box's loop below.

### Data Source
We obtain real time series data from [a recent Kaggle competition](https://www.kaggle.com/c/web-traffic-time-series-forecasting). Each time series represents daily page views of a particular Wikipedia article from **07/01/2015** to **09/10/2017**. 

The model is trained on data before **07/10/2017** and we forecast number of page visits in last 60 days from **07/10/2017** to **09/10/2017**.

## Round 1: 
___
### 1.Data
We randomly pick a wikipedia article data with `ds` and `views` in long format.

In [None]:
#Data

### 1.Model: FB prophet regression model
We build a regression model similar to [Facebook Prophet](https://peerj.com/preprints/3190/); it combines trend, seasonality and holiday components with non-linear smoothers applied to regressors $t \in \mathbb{Z}^{T}$. 

$$y(t) = g(t) + s(t) + h(t) + \epsilon_{t}  $$

- **Trend** <br/> $$g(t) = (k + \mathbf{a}(t)^{T} \boldsymbol{\delta})t + (m + \mathbf{a}(t)^{T} \boldsymbol{\gamma})$$
    - $k$ is the growth rate (slope)
    - $m$ is the offset (intercept)
    - $S$ changepoints are explicitly defined to allow trend changes at times $s_{j \in {1,2,...,S}}$
        - $\mathbf{a}(t) \in \{0,1\}^{S}$ are changepoint indicators
        - $\delta_{j} \sim Laplace(0,\tau)$ is the change of rate at time $s_{j}$
        - $\gamma_{j}$ is set to $-s_{j}\delta_{j}$ to make the function continuous</br>


- **Seasonality** <br/>
We construct Fourier series to approximate periodic seaonality.
$$s(t) = \sum_{n=1}^{N}(a_{n}cos(\frac{2\pi nt}{P}) + b_{n}sin(\frac{2\pi nt}{P} )) = X(t)  \boldsymbol{\beta}$$
    - $\boldsymbol{\beta} = [a_{1}, b_{1} , ... , a_{N}, b_{N}]^{T}$ and $\boldsymbol{\beta} \sim Normal(0,\sigma^{2})$
    - yearly -> (P = 365.25, N = 10)
    - weekly -> (P = 7, N = 3)
    

- **Holiday/Events** <br/>
Assuming holidays are independnet, we assign each holiday with a parameter $\kappa_{i}$
$$h(t) = Z(t) \boldsymbol{\kappa}$$
    - $Z(t) = [\boldsymbol{1}(t\in D_{1}) , ... , \boldsymbol{1}(t\in D_{L})]$
    - $\boldsymbol{\kappa} \sim Normal(0, \nu^2)$

In [1]:
#Model placeholder

Before modeling, we transform the raw data and extract features into proper format before feeding into the model. The input data includes

- X
    - **t: ** date index
    - **X: ** seasonality vector after fourier transformation given data dates
    - **A: ** changepoint vector given data dates and changpoints
    - **sigmas: ** fixed scale on seasonality priors
- y
    - **y_scaled: ** `maxdiff(log(views))`

In [7]:
# # Model
# t = tf.placeholder(tf.float32, shape=None, name="t")        # time index
# A = tf.placeholder(tf.float32, shape=(None, S), name="A")      # changepoint indicators
# t_change = tf.placeholder(tf.float32, shape=(S), name="t_change") # changepoints_t
# X = tf.placeholder(tf.float32, shape=(None, K), name="X")      # season vectors
# sigmas = tf.placeholder(tf.float32, shape=(K,), name="sigmas")  # scale on seasonality prior
# #tau = tf.placeholder(tf.float32, shape=(), name="tau")      # scale on changepoints prior
# tau = Normal(loc=tf.ones(1) * 0.05, scale=1.*tf.ones(1))

# k = Normal(loc=tf.zeros(1), scale=5.0*tf.ones(1))           # initial slope
# m = Normal(loc=tf.zeros(1), scale=5.0*tf.ones(1))           # initial intercept
# sigma_obs = Normal(loc=tf.zeros(1), scale=0.5*tf.ones(1))   # noise

# delta = Laplace(loc=tf.zeros(S), scale=tau*tf.ones(S))      # changepoint rate adjustment
# gamma = tf.multiply(-t_change, delta, name="gamma")

# beta = Normal(loc=tf.zeros(K), scale=sigmas*tf.ones(K))     # seasonal

# trend_loc = (k + ed.dot(A, delta)) * t + (m + ed.dot(A, gamma))
# seas_loc = ed.dot(X, beta)
# y = Normal(loc = trend_loc + seas_loc, scale = sigma_obs)

# Extract features
# holiday_en_us = ['2015-01-01', '2015-01-19', '2015-05-25', '2015-07-03', '2015-09-07', '2015-11-26', '2015-11-27', '2015-12-25', 
#                  '2016-01-01', '2016-01-18', '2016-05-30', '2016-07-04', '2016-09-05', '2016-11-11', '2016-11-24', '2016-12-26', 
#                  '2017-01-01', '2017-01-02', '2017-01-16', '2017-05-29', '2017-07-04', '2017-09-04', '2017-11-10', '2017-11-23', 
#                  '2017-12-25',
#                  '2015-02-14', '2016-02-14', '2017-02-14']
# holidays = pd.DataFrame({
#   'holiday': 'US public holiday',
#   'ds': pd.to_datetime(holiday_en_us),
#   'lower_window': -1,
#   'upper_window': 0,
#   'prior_scale': 10.0
# })
# holidays = None

### 1.Inference: HMC
Given train data, we use `ed.HMC` to infer the latent variables

In [None]:
### Inference

### 1.Criticism
- Visualization
- Pointwise Evaluation
    - MAPE
    - SMAPE
    - MSE
- PPC

In [None]:
### Criticism

In [None]:
### Next

## Round 2: 
___
### 2.Data
We scratched data from

### 2.Model


## Round 3: 
___
### Data
Using the data from previous round.

### Model: 
- Local Features
- Global Features
### Inference
### Criticism
### Next

## Conclusion and Lessions Learned