In [1]:
%matplotlib inline


Simple Forecast
===============

You can create and evaluate a forecast with just a few lines of code.

Provide your timeseries as a pandas dataframe with timestamp and value.

For example, to forecast daily sessions data, your dataframe could look like this:

```{python}

    import pandas as pd
    df = pd.DataFrame({
        "date": ["2020-01-08-00", "2020-01-09-00", "2020-01-10-00"],
        "sessions": [10231.0, 12309.0, 12104.0]
    })
```
The time column can be any format recognized by `pandas.to_datetime`.

In this example, we'll load a dataset representing ``log(daily page views)``
on the Wikipedia page for Peyton Manning.
It contains values from 2007-12-10 to 2016-01-20. More dataset info
`here <https://facebook.github.io/prophet/docs/quick_start.html>`_.


In [2]:
from collections import defaultdict
import warnings

warnings.filterwarnings("ignore")

import pandas as pd
import plotly

from plotly.offline import init_notebook_mode, iplot
init_notebook_mode(connected=True) 

from greykite.common.data_loader import DataLoader
from greykite.framework.templates.autogen.forecast_config import ForecastConfig
from greykite.framework.templates.autogen.forecast_config import MetadataParam
from greykite.framework.templates.forecaster import Forecaster
from greykite.framework.templates.model_templates import ModelTemplateEnum
from greykite.framework.utils.result_summary import summarize_grid_search_results

# Loads dataset into pandas DataFrame
dl = DataLoader()
df = dl.load_peyton_manning()

# specify dataset information
metadata = MetadataParam(
    time_col="ts",  # name of the time column ("date" in example above)
    value_col="y",  # name of the value column ("sessions" in example above)
    freq="D"  # "H" for hourly, "D" for daily, "W" for weekly, etc.
              # Any format accepted by `pandas.date_range`
)

Create a forecast
-----------------
You can pick the ``PROPHET`` or ``SILVERKITE``
forecasting model template. (see :doc:`/pages/stepbystep/0100_choose_model`).

In this example, we use ``SILVERKITE``.
You may also use ``PROPHET`` to see how a third-party library
is leveraged in the same framework.



In [3]:
forecaster = Forecaster()  # Creates forecasts and stores the result
result = forecaster.run_forecast_config(  # result is also stored as `forecaster.forecast_result`.
    df=df,
    config=ForecastConfig(
        model_template=ModelTemplateEnum.SILVERKITE.name,
        forecast_horizon=365,  # forecasts 365 steps ahead
        coverage=0.95,         # 95% prediction intervals
        metadata_param=metadata
    )
)

Fitting 3 folds for each of 1 candidates, totalling 3 fits


Check results
-------------
The output of ``run_forecast_config`` is a dictionary that contains
the future forecast, historical forecast performance, and
the original timeseries.



## Timeseries

Let's plot the original timeseries.
``run_forecast_config`` returns this as ``ts``.

(The interactive plot is generated by ``plotly``: **click to zoom!**)



In [4]:
ts = result.timeseries
fig = ts.plot()
iplot(fig)

## Cross-validation

By default, ``run_forecast_config`` provides historical evaluation,
so you can see how the forecast performs on past data.
This is stored in ``grid_search`` (cross-validation splits)
and ``backtest`` (holdout test set).

Let's check the cross-validation results.
By default, all metrics in `~greykite.common.evaluation.ElementwiseEvaluationMetricEnum`
are computed on each CV train/test split.
The configuration of CV evaluation metrics can be found at
`Evaluation Metric <../../pages/stepbystep/0400_configuration.html#evaluation-metric>`_.
Below, we show the Mean Absolute Percentage Error (MAPE)
across splits (see `~greykite.framework.utils.result_summary.summarize_grid_search_results`
to control what to show and for details on the output columns).



In [5]:
grid_search = result.grid_search
cv_results = summarize_grid_search_results(
    grid_search=grid_search,
    decimals=2,
    # The below saves space in the printed output. Remove to show all available metrics and columns.
    cv_report_metrics=None,
    column_order=["rank", "mean_test", "split_test", "mean_train", "split_train", "mean_fit_time", "mean_score_time", "params"])
# Transposes to save space in the printed output
cv_results["params"] = cv_results["params"].astype(str)
cv_results.set_index("params", drop=True, inplace=True)
cv_results.transpose()

params,[]
rank_test_MAPE,1
mean_test_MAPE,7.31
split_test_MAPE,"(5.02, 8.53, 8.39)"
mean_train_MAPE,4.2
split_train_MAPE,"(3.82, 4.25, 4.54)"
mean_fit_time,13.92
mean_score_time,1.02


## Backtest

Let's plot the historical forecast on the holdout test set.
You can zoom in to see how it performed in any given period.



In [6]:
backtest = result.backtest
fig = backtest.plot()
iplot(fig)

You can also check historical evaluation metrics (on the historical training/test set).



In [7]:
backtest_eval = defaultdict(list)
for metric, value in backtest.train_evaluation.items():
    backtest_eval[metric].append(value)
    backtest_eval[metric].append(backtest.test_evaluation[metric])
metrics = pd.DataFrame(backtest_eval, index=["train", "test"]).T
metrics

Unnamed: 0,train,test
CORR,0.754233,0.756897
R2,0.556228,-0.695154
MSE,0.317248,0.865076
RMSE,0.563247,0.930095
MAE,0.401251,0.856716
MedAE,0.300722,0.840022
MAPE,4.75745,11.3071
MedAPE,3.775,11.2497
sMAPE,2.38715,5.318
Q80,0.200944,0.187063


## Forecast

The ``forecast`` attribute contains the forecasted result.
Just as for ``backtest``, you can plot the result or
see the evaluation metrics.

Let's plot the forecast (trained on all data):



In [8]:
forecast = result.forecast
fig = forecast.plot()
iplot(fig)

The forecasted values are available in ``df``.



In [9]:
forecast.df.head().round(2)

Unnamed: 0,ts,actual,forecast,forecast_lower,forecast_upper
0,2007-12-10,9.59,8.71,7.17,10.24
1,2007-12-11,8.52,8.57,7.46,9.68
2,2007-12-12,8.18,8.45,7.49,9.41
3,2007-12-13,8.07,8.38,7.44,9.33
4,2007-12-14,7.89,8.36,7.32,9.4


## Model Diagnostics

The component plot shows how your dataset's trend,
seasonality, and event / holiday patterns are handled in the model:



In [10]:
fig = forecast.plot_components()
iplot(fig)   # fig.show() if you are using "PROPHET" template

Model summary allows inspection of individual model terms.
Check parameter estimates and their significance for insights
on how the model works and what can be further improved.



In [11]:
summary = result.model[-1].summary()  # -1 retrieves the estimator from the pipeline
print(summary)


Number of observations: 2964,   Number of features: 122
Method: Ridge regression
Number of nonzero features: 122
Regularization parameter: 148.5

Residuals:
         Min           1Q       Median           3Q          Max
      -2.342      -0.3604     -0.06554         0.26        3.759

             Pred_col    Estimate  Std. Err Pr(>)_boot sig. code                   95%CI
            Intercept       8.049   0.01942     <2e-16       ***          (8.012, 8.085)
  events_C...New Year    0.007008   0.01981      0.700               (-0.02951, 0.05043)
  events_C...w Year-1    0.002321   0.01548      0.862               (-0.03077, 0.02919)
  events_C...w Year-2    0.002074   0.01636      0.916               (-0.02995, 0.03259)
  events_C...w Year+1   -0.002421   0.01544      0.882               (-0.03323, 0.02741)
  events_C...w Year+2     0.01539   0.01934      0.428                (-0.0196, 0.05197)
 events_Christmas Day    -0.02248   0.01074      0.026         *   (-0.04417, -0.004721)

## Apply the model

The trained model is available as a fitted `sklearn.pipeline.Pipeline`.



In [12]:
model = result.model
model

Pipeline(steps=[('input',
                 PandasFeatureUnion(transformer_list=[('date',
                                                       Pipeline(steps=[('select_date',
                                                                        ColumnSelector(column_names=['ts']))])),
                                                      ('response',
                                                       Pipeline(steps=[('select_val',
                                                                        ColumnSelector(column_names=['y'])),
                                                                       ('outlier',
                                                                        ZscoreOutlierTransformer()),
                                                                       ('null',
                                                                        NullTransformer(impute_algorithm='interpolate',
                                                                 

You can take this model and forecast on any date range
by passing a new dataframe to predict on. The
`~greykite.framework.input.univariate_time_series.UnivariateTimeSeries.make_future_dataframe`
convenience function can be used to create this dataframe.
Here, we predict the next 4 periods after the model's train end date.

<div class="alert alert-info"><h4>Note</h4><p>The dataframe passed to .predict() must have the same columns
  as the ``df`` passed to ``run_forecast_config`` above, including
  any regressors needed for prediction. The ``value_col`` column
  should be included with values set to `np.nan`.</p></div>



In [13]:
future_df = result.timeseries.make_future_dataframe(
    periods=4,
    include_history=False)
future_df

Unnamed: 0,ts,y
2016-01-21,2016-01-21,
2016-01-22,2016-01-22,
2016-01-23,2016-01-23,
2016-01-24,2016-01-24,


Call `.predict()` to compute predictions



In [14]:
model.predict(future_df)

Unnamed: 0,ts,forecast,forecast_lower,forecast_upper,y_quantile_summary
0,2016-01-21,8.971131,8.023176,9.919087,"(8.023175764425014, 9.919086953794318)"
1,2016-01-22,8.971261,7.930734,10.011789,"(7.930733838005737, 10.011788866680934)"
2,2016-01-23,8.610398,7.565959,9.654837,"(7.565959127781374, 9.654837376620923)"
3,2016-01-24,9.087944,7.794432,10.381456,"(7.794431622702435, 10.3814559235619)"


What's next?
------------
If you're satisfied with the forecast performance, you're done!

For a complete example of how to tune this forecast, see
[Tune your first forecast model](https://linkedin.github.io/greykite/docs/0.1.0/html/gallery/tutorials/0100_forecast_tutorial.html).

Besides the component plot, we offer additional tools to
help you improve your forecast and understand the result.

See the following guides:

* [Changepoint Detection](https://linkedin.github.io/greykite/docs/0.1.0/html/gallery/quickstart/0200_changepoint_detection.html)

* [Seasonality](https://linkedin.github.io/greykite/docs/0.1.0/html/gallery/quickstart/0300_seasonality.html)

* [Model Summary](https://linkedin.github.io/greykite/docs/0.1.0/html/gallery/quickstart/0400_model_summary.html)

* [Grid Search](https://linkedin.github.io/greykite/docs/0.1.0/html/gallery/quickstart/0500_grid_search.html)

For example, for this dataset, you could add changepoints to
handle the change in trend around 2014 and avoid the overprediction
issue seen in the backtest plot.

Or you might want to try a different model template.
Model templates bundle an algorithm with recommended
hyperparameters. The template that works best for you depends on
the data characteristics and forecast requirements
(e.g. short / long forecast horizon). We recommend trying
a few and tuning the ones that look promising.
All model templates are available through the same forecasting
and tuning interface shown here.

For details about the model templates and how to set model
components, see the following guides:

* [Templates](https://linkedin.github.io/greykite/docs/0.1.0/html/gallery/tutorials/0200_templates.html)

* [Forecasting Process](https://linkedin.github.io/greykite/docs/0.1.0/html/pages/stepbystep/0000_stepbystep.html)


[Download 0100_simple_forecast.py](https://linkedin.github.io/greykite/docs/0.1.0/html/_downloads/cb2adf5e84d76ba865e519366e8ef0a2/0100_simple_forecast.py)