# Time series forecasting with Prophet

## Motivation

Prophet is a Python library for time series forecasting. It is a predictive model that can be used to predict future values of time series data. It is a general purpose forecasting library that can be used for many different types of time series data. It can be used to predict the future values of time series data, such as stock prices, sales, or any other time series data.

Not all for

## Basics

In prophet, there are three main components:
$$y_t = g(t) + s(t) + h(t) + \epsilon _t$$

where: 
1. $g(t)$ is the trend component, which is a linear function of the past $t$ time points.
2. $s(t)$ is the seasonal component, which is a sinusoidal function of the past $t$ time points.
3. $h(t)$ is the holiday component, which is a linear function of the past $t$ time points.

$\epsilon _t$ is the noise component, which is a random variable with a normal distribution.

### Trend $g(t)$

Two types of trend components are available in prophet: 
1. Linear trend, which is a linear function of the past $t$ time points.
2. A simpler version of the linear trend, which is a linear function of the past $t$ time points, with a constant offset.

### Seasonal $s(t)$

The seasonal component is modeled using a Fourier series. The Fourier series is a polynomial function of the past $t$ time points.

$$s(t) = \sum_{n=1}^N \left(a_n cos \frac{(2 \pi n t)}{P} + b_n sin \frac{(2 \pi n t)}{P} \right)$$

where, 
1. $N$ is the number of terms in the Fourier series. (N can be thought of as a way of increasing the sensitivity of our seasonality model. As we increase N, we allow for the model to capture more seasonal changes)
2. $P$ is the period of the seasonal component. (for yearly P = 365)
3. $a_n$ and $b_n$ are the coefficients of the Fourier series.

### Holiday $h(t)$

The last component is the holiday component. If we pass a list of holidays to the model, for each holiday i we let Di be the set of past and future dates for those holidays. Those holidays are incorporated as vectors of indicator functions (ie. for each time t in our data set, it has a 1 for each holiday occurring on that day, and a bunch of zeroes). These vectors should be very sparse.

$$h(t) = [1(t \in D_1), \dots , 1 (t \in D_L)$$



In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import psycopg2 as pg
import datetime
from fbprophet import Prophet
from fbprophet.diagnostics import cross_validation, performance_metrics
from fbprophet.plot import plot_cross_validation_metric

import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)

from statsmodels.tsa.seasonal import seasonal_decompose
from pandas.plotting import register_matplotlib_converters
register_matplotlib_converters()
plt.style.use("ggplot")

import utils.settings_utils as settings
import utils.DatasetAccess as db_access
import utils.preprocess as preprocess
import utils.prophet_experiment as exp

Importing plotly failed. Interactive plots will not work.


In [2]:
conn = pg.connect(database=settings.get_database(), user=settings.get_user(), password=settings.get_pasword())

def get_preprocessed_dataset(company, interval):
    df_d = db_access.get_data_for_datasetid(company, conn, interval)
    df_d = preprocess.rename_dataset_columns(df_d)
    return df_d

test_dataset = get_preprocessed_dataset(15521, 'H')

test_dataset.head()



Unnamed: 0,ds,open,high,low,y,volume
0,2020-03-04 10:00:00,108.2,108.45,107.85,108.1,158609.0
1,2020-03-04 11:00:00,108.2,108.6,108.0,108.25,144596.0
2,2020-03-04 12:00:00,108.25,108.95,108.0,108.55,214087.0
3,2020-03-04 13:00:00,108.55,108.55,107.85,107.95,298778.0
4,2020-03-04 14:00:00,107.95,108.1,106.8,106.95,370810.0


In [3]:
# plot close price as a function of time
# df_d.plot(x="date", y="close", figsize=(12,6))

The plot shows the trend, seasonal, and holiday components.

### Training the model

In [4]:
def fit_model_with_dataset(df):
    model = Prophet()
    # short_df = df.head(n)
    # train_df = short_df.rename(columns={"date":"ds", "close":"y"})

    model.fit(df)
    return model

In [5]:
# cross_validation_model = Prophet()
# train_df = df_d.rename(columns={"date":"ds", "close":"y"})
# cross_validation_model.fit(train_df)

def perform_cross_validation_on_horizon(model, horizon, period=None):
    if period == None:
        return cross_validation(model, horizon=horizon)
    else:
        return cross_validation(model, horizon=horizon, period=period)


In [6]:
def get_peformance_metrics(df):
    return performance_metrics(df)

In [7]:
def populate_df(metric_name, df, metrics, df_columns, name):
    std = preprocess.std_on_column(metrics, metric_name)
    avg = preprocess.avg_on_column(metrics, metric_name)
    min = preprocess.min_on_column(metrics, metric_name)
    max = preprocess.max_on_column(metrics, metric_name)
    
    new_row = [[name, avg, std, min, max]]
    
    new_df = pd.DataFrame(new_row, columns=df_columns)
    df = pd.concat([df, new_df], axis=0)
    # df = df.append(new_df)
    
    return df
    

In [8]:
setups = exp.get_experiments()
df_columns = ['name', 'mse_avg', 'mse_std', 'mse_min', 'mse_max']
results = pd.DataFrame(columns=df_columns)#, 'mae_avg', 'mae_std', 'mae_min', 'mae_max'])

for s in setups:
    s.print_setup()
    ds = get_preprocessed_dataset(15521, s.time_unit)
    model = fit_model_with_dataset(ds)
    cross_val = perform_cross_validation_on_horizon(model, s.horizon, s.horizon)
    metrics = get_peformance_metrics(cross_val)
    results = populate_df('mse', results, metrics, df_columns, s.name)

results



About to execute for '15 minutes'




In [None]:


# ds.columns

In [None]:
# df_cv.columns

In [None]:
# plot_cross_validation_metric(df_cv, metric='mse', rolling_window=0.2)


In [None]:
pd.plotting.register_matplotlib_converters()

# # plot the forecast for the next 5 months

# future = model.make_future_dataframe(1000, freq="H", include_history=True)

# # print(future)




# forecast = model.predict(future)
# # print(forecast.shape)
# # print(forecast)
# # # forcasted_values = forecast.tail(1000)
# # forcasted_values = forecast.iloc[4009:]
# # print(forcasted_values.shape)
# # print(forcasted_values.columns)
# # print(forcasted_values)
# forecast.tail(1000).plot(x='ds', y=['trend', 'trend_lower', 'trend_upper', 'yhat'])
# # forcasted_values.plot(x='ds', y=['trend', 'trend_lower', 'trend_upper', 'yhat'])
# # print(forcasted_values)
# # model.plot(forcasted_values)