<img src="http://imgur.com/1ZcRyrc.png" style="float: left; margin: 20px; height: 55px">

# Time Series: Forecasting Models

# Installing StatsModels

The default `statsmodels` installation you get with Anaconda doesn't include some of the time series modeling components. Run the following cell to see if you need to install these components.

In [None]:
from statsmodels.tsa.api import ExponentialSmoothing

If you get an error, try running this command:

In [None]:
!pip install -U statsmodels

Now reset your kernel and try again.

If it still doesn't work, just sit tight for this lesson. Check out [this page](http://www.statsmodels.org/stable/install.html) later if you want to use the statsmodels extensions for time series forecasting.

# Time Series Models

**Time series analysis** comprises methods for analyzing time series data in order to extract meaningful statistics and other characteristics of the data. (The analyses we've been doing in previous section qualify as time series analysis).

**Time series forecasting** is the use of a model to predict future values based on previously observed values.

- Time series forecasting models predict a future value in a time series. Like other predictive models, we will use prior history to predict the future. Unlike previous models, however, we will use the _outcome_ variables from earlier in time as the _inputs_ for prediction.
- As with the modeling you're used to, we will have to evaluate different models on _test data_ to ensure that we have chosen the best one.

# Properties of Time Series Forecasting Models

## Training and Testing Sets

Because these data are ordered, we **cannot choose training and testing examples at random.** As we are attempting to predict _a sequence of future values_, we must train on values from earlier (in time) in our data and then test our values at the end of the period.

## Moving Averages and Autocorrelation

In previous sections, we learned about a few statistics for analyzing time series. A **moving average** is the average of *k* surrounding data points in time.

We also looked at autocorrelation the compute the relationship of the data with prior values.

**Autocorrelation** is how correlated a variable is with itself. Specifically, how related variables from earlier in time are with variables from later in time. Note the need for a mean value:

${\Huge R(k) = \frac{\operatorname{E}[(X_{t} - \mu)(X_{t-k} - \mu)]}{\sigma^2}}^*$

## Stationarity

The criteria for classifying a series as stationary indicate that:

* The mean of the series should not be a function of time, but rather should be a constant. The image below has the left-hand graph satisfying this condition, whereas the graph in red has a time-dependent mean.

![](../assets/images/Mean_nonstationary.png)

* The variance of the series should not be a function of time. This property is known as homoscedasticity. The following graph depicts what is and what is not a stationary series. (Notice the varying spread of distribution in the right-hand graph.)

![](../assets/images/Var_nonstationary.png)

* The covariance of the `i`th term and the `(i + m)`th term should not be a function of time. In the following graph, you'll notice that the spread becomes closer as time increases. Hence, the covariance is not constant with time for the "red series."

![](../assets/images/Cov_nonstationary.png)

**Many time series models work on the assumption that the time series is stationary,** but real-world data often violate this assumption. For example, typical stock market performance is not stationary. In this plot of Dow Jones performance since 1986, the mean is clearly increasing over time:

![](../assets/images/dow-jones.png)

## Checking for Stationarity

In [None]:
import warnings

import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import statsmodels as sm

In [None]:
drones = pd.read_csv('../assets/data/gt_drone_racing.csv', header=1)
drones.columns = ['week', 'drone_racing_interest']
drones.head()

In [None]:
# Change the `week` column to a `datetime` object and make it the index of the DataFrame.


In [None]:
# Plot the data with the rolling mean and std


The standard deviation experiences one jump, and the mean is increasing with time, so this time series appears not to be stationary.

## Making a Time Series Stationary

There are methods to transform a non-stationary time series into a stationary time series. One can then model the derived stationary time series, and perhaps then invert the transformation to generate a model of the original time series.

We will look specifically at removing a trend from a time series through **detrending** and **differencing**.

### Detrending

**Detrending** removes major trends in our data. The simplest way is to fit a line to the trend, then make a new series of the difference between the line and the true series. 

Below are U.S. housing prices over time that demonstrate an upward trend. This makes the time series non-stationary, as the mean home price is increasing. The line fit through it represents the trend.

The bottom figure is the "detrended" data, where each data point is transformed by subtracting the value of the trend line at that point. This data now have a fixed mean and may be easier to model. This is similar to mean-scaling our features in earlier models with `StandardScaler`.

![](../assets/images/detrend.gif)

#### Example: Detrending

In [None]:
# Fit a trendline to the data.


In [None]:
# Use the model to create the trendline as a series


In [None]:
# Plot the trendline with the data


Detrend the time series. In its simplest form, we literally subtract the trendline from the time series.

In [None]:
# Create the detrended series


In [None]:
# Plot the detrended series


In [None]:
# Alternative approach: use the `detrend` function from `scipy`


### Differencing

A related method is **differencing**. Instead of predicting the (non-stationary) series, we can predict the difference between two consecutive values. **ARIMA** models incorporate this approach.

Recall that we used Pandas' `.diff()` method to find the difference in an earlier section.

#### Example: Differencing

In [None]:
# Create the differenced series


In [None]:
# Plot the differenced series


# Time Series Forecasting Methods

## Method 1: Carry forward last observation

Let's look at Walmart's weekly sales data over a two-year period from 2010 to 2012. The data set is separated by store and department, but we'll focus on analyzing one store for simplicity.

In [None]:
# Load the data with "Date" as a DatetimeIndex


#### Filter the DataFrame to Store 1 sales and aggregate over departments to compute the total sales per store.

In [None]:
# Pull out Store 1


In [None]:
# Sum sales across departments


In [None]:
# Plot store 1 sales


One very simple forecasting method is to "carry forward" the last observation:

$$\hat y_{t+1} = y_t$$

We'll use the first two years (2010–2011) as the "training" data and the last year (2012) as a "testing" set.

In [None]:
# Split Store 1 Sales into train and test


In [None]:
# Plot train and test


#### Let's see how the well the carry-forward method does when forecasting sales.

In [None]:
# Create series of carry-forward predictions, using a function for reusability


In [None]:
# Plot carry-forward predictions with train and test, again using a function


In [None]:
# Use RMSE to check the accuracy of our model on the test data set, again using a function


The carry-forward method is best suited for stable data sets without trend or seasonality, and its accuracy can depend greatly on where you happened to cut off the time series for training.

In addition to making predictons over the entire test set in one shot, we can also consider shorter-term predictions make on a rolling basis.

In [None]:
# Create series of rolling carry-forward predictions, again using a function


## Method 2: Simple Average

Simple average forecasts the weekly sales of the next time period to be the average of the sales over all previous time periods. 

$$\hat y_{t+1} = \dfrac{1}{x} \sum_{i=1}^{x} y_i$$

This method is less sensitive than carry-forward approach to where the training data is cut off. Like that approach, it does not capture seasonality, and it loses accuracy over time when there is a trend.

In [None]:
# Make predictions, plot them, get RMSE


In [None]:
# Make rolling predictions, plot them, get RMSE


This model improved the score a bit. The simple average method works best when the average at each time period remains constant.

## Method 3: Moving Average

When we have data sets in which the sales/value has increased or decreased sharply some time periods ago, simply using the previous average of all of the data (like the simple average method) isn't appropriate. An improvement over the simple average in this case will only take the average of the sales for the last few time periods, as we are think that only recent values may matter. This is called the **moving average** technique and it uses a sliding time period window to calculate the average. 

Using a simple moving average model, we forecast the next value(s) in a time series based on the average of a fixed finite number (`p`) of the previous values. Thus, for all `i > p`:

$$\hat y_{t+1} = \dfrac{1}{p} (y_{i-1} + y_{i-2} + y_{i-3} + ... + y_{i-p})$$

In [None]:
# Make predictions, plot them, get RMSE


In [None]:
# Make rolling predictions, plot them, get RMSE


This approach didn't perform very well in this case, because the last eight observations were from the busy holiday period.

## Method 4: Simple Exponential Smoothing 

Simple average and weighted moving average lie on opposite sides of the spectrum. **Simple exponential smoothing** lies between these two extremes and takes into account all of the data while weighing the data points differently. Simple exponential smoothing will calculate the forecast using weighted averages where the weights decrease exponentially as observations come from further in the past (the closet data points in time are weighted more heavily). 

$$\hat y_{t+1} = \alpha y_t + \alpha (1-\alpha)y_{t-1} + (1-\alpha)^2 y_{t-2} + ...$$

The one-step-ahead forecast for time `t+1` is a weighted average of all of the observations in the series (`y1,…,yt`). The rate at which the weights decrease is controlled by the parameter, `α` (which is between 0 and 1).

In [None]:
# Make predictions, plot them, get RMSE


In [None]:
# Make rolling predictions, plot them, get RMSE


## Method 5: Holt's Linear Trend

The methods we've looked at so far don't do well when our data have high variations. If our data contain a trend, none of the previous methods would be able to take that into account. A method that can is **Holt's linear trend** method. 

Recall that a time series data set can be decomposed into its trend, seasonality, and residual components. If the data set contains a trend, then Holt's linear trend method can be applied. 

In [None]:
# Get decomposition


From these graphs, we can see that the data set follows an increasing trend, and we can use Holt's method to forecast the future sales.

Holt's method uses exponential smoothing to estimate both the average value of the series (called "level") and the trend. 

When the trend is linear, we add the estimated level and the estimated trend for forecasting. When the trend is exponential, we multiply them instead.

In [None]:
# Make predictions, plot them, get RMSE


In [None]:
# Make rolling predictions, plot them, get RMSE


## Method 6: Holt-Winters Method

The **Holt-Winter** method applies exponential smoothing to seasonal components as well as level and trend components. 

Just like in Holt's linear trend method, we can use either additive or multiplicative forecasting equations. When the seasonal variations are roughly constant throughout the series, we will use the additive method. When the seasonal variations change depending on the level of the series, we will use the multiplicative method. 

In [None]:
# Make predictions, plot them, get RMSE


In [None]:
# Make rolling predictions, plot them, get RMSE


## Method 7: ARIMA

A very popular forecasting method is **ARIMA**, which stands for **autoregressive integrated moving average.** It applies **differencing** of degree $d$ and then combines two components:

- An **autoregressive** component with order $p$ that uses the previous $p$ observations to predict the next observation.
- A **moving average** component with order $q$ that accounts for possible "random shocks" that shift the mean of the series.

The **seasonal ARIMA** takes into account seasonality (like Holt's Winter method did).

#### ARIMA Model

In [None]:
# Make predictions, plot them, get RMSE


In [None]:
# Make rolling predictions, plot them, get RMSE


# **Bonus:** Facebook Prophet

Prophet is a library from Facebook that promises to make fitting time series models easy. Let's give it a try.

In [None]:
!pip install fbprophet

In [None]:
from fbprophet import Prophet

In [None]:
m_train = pd.DataFrame(train).reset_index()
m_train.columns = ['ds', 'y']

In [None]:
m = Prophet(yearly_seasonality=True)
m.fit(m_train)

In [None]:
future = m.make_future_dataframe(len(test)*7)

In [None]:
forecast = m.predict(future)
m.plot(forecast)
m.plot_components(forecast);

In [None]:
yhat_proph = forecast.set_index('ds').loc['2012', 'yhat']
plot_predictions(yhat_proph)
calculate_rmse(yhat_proph[::7])

Maybe not the best that a skilled time series forecaster could do, but not bad at all out of the box.

# **Bonus:** Adapting standard machine learning models for time series data.

Time series Kaggle competitions are usually won using `xgboost`, whch is a tree-based method not designed for time series at all.

Here is a quick example using a different tree-based method (a random forest) to illustrate this approach.

In [None]:
store1_data = store1.groupby(store1.index).agg({'Weekly_Sales': sum, 'IsHoliday': max})

In [None]:
store1_data.loc[:, 'month'] = store1_data.index.month
store1_data.loc[:, 'day'] = store1_data.index.day
store1_data.loc[:, 'year'] = store1_data.index.year

In [None]:
store1_data = pd.get_dummies(store1_data, columns=['month', 'day']).drop(['month_2', 'day_1'], axis=1)

In [None]:
train = store1_data.loc['2010':'2011', :]
test = store1_data.loc['2012', :]
X_train = train.iloc[:, 1:]
X_test = test.iloc[:, 1:]
y_train = train.loc[:, 'Weekly_Sales']
y_test = test.loc[:, 'Weekly_Sales']

In [None]:
from sklearn.ensemble import RandomForestRegressor

rf = RandomForestRegressor()
rf.fit(X_train, y_train)
y_pred = rf.predict(X_test)

In [None]:
np.sqrt(metrics.mean_squared_error(y_test, y_pred))

In [None]:
ax = y_train.plot()
y_test.plot(ax=ax)
pd.Series(y_pred, index=test.index).plot(ax=ax);

You could also use as features various descriptive statistics for sales from the corresponding month in the previous year, for instance. For rolling predictions, you could use various descriptive statistics over different sizes of preceding windows.

See Kaggle competition forums for past time series forecasts (such as [this one](https://www.kaggle.com/c/rossmann-store-sales)) for examples and discussion.