**Model Selection Considerations**

| Data Assumptions | Assumptions Test |
|------|------|
|   Univariate Timeseries | N/A|

- works poorly on data with high variance 
- consider using box cox 

**Use Cases**
- Good for really any type of time series data. However, it is easer to implement with daily data. Advance tuning is required for sub-daily data. 

**Dependencies**

In [5]:
import pandas as pd
from fbprophet import Prophet

**Load Data**

In [212]:
df = pd.read_csv('data/example_wp_log_peyton_manning.csv')


**Data Preperation: Required Inputs and Format**
- Dataframe with timestamps in a column ```df[ds]```
    - Prophet uses the pandas datetime format for the ds column or ```pd.to_datetime(df['ds'])```
    - Timestamp in ```YYYY-MM-DD HH:MM:SS``` format.
- Period specification (ex. ```periods=365```)

In [213]:
df.tail()

Unnamed: 0,ds,y
2900,2016-01-16,7.817223
2901,2016-01-17,9.273878
2902,2016-01-18,10.333775
2903,2016-01-19,9.125871
2904,2016-01-20,8.891374


In [215]:
#Instantiate the object 
m = Prophet()
#Fit the model
m.fit(df)
#Generate a dataframe with dates extending into the future for the number of days you wish to forecast onto
future = m.make_future_dataframe(periods=365)
future.tail()
#Generate the forecast/Make the prediction
forecast = m.predict(future)

INFO:fbprophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this.


**Optional Inputs**
- Carrying Capacity/Saturation Maximum 
    - Creating a new cap column `df['cap']=8.5`. Must be specified for every row in the dataframe, and that it does not have to be constant. For example, if the market size is growing, then cap can be an increasing sequence.
- Saturation Minimum
    - Inverse of carrying capacity `df['floor']=1.5`. 
- Growth Type
    - `m = Prophet(growth='logistic')`
- Changepoints
    - By default changepoints are only inferred for the first 80% of the time series in order to have plenty of runway for projecting the trend forward and to avoid overfitting fluctuations at the end of the time series.
    - `m = Prophet(changepoint_range=0.9)`
    - If the trend changes are being overfit (too much flexibility) or underfit (not enough flexibility), you can adjust the strength of the sparse prior with `m = Prophet(changepoint_prior_scale=0.5.` Increasing it will make the trend more flexible (by increasing the confidence intervals). Decreasing it will make the trend less flexible.
    - Rather than using automatic changepoint detection you can manually specify the locations of potential changepoints with the `changepoints` argument. `m = Prophet(changepoints=['2014-01-01'])`
- Seasonality, Holidays Effects and Special Events
    - Holidays
        -  If you have holidays or other recurring events that you’d like to model, you must create a dataframe for them. It has two columns (`holiday` and `ds`) and a row for each occurrence of the holiday.
        - You can also include columns lower_window and upper_window which extend the holiday out to `[lower_window, upper_window]` days around the date.
        - `m = Prophet(holidays=holidays) forecast = m.fit(df).predict(future)`
    - Built-in Country Holidays
    
        `m = Prophet(holidays=holidays) 
        m.add_country_holidays(country_name='US')
        m.fit(df)` 
        
    - Fourier Order for Seasonality
        - Seasonalities are estimated using a partial Fourier sum. The number of terms in the partial sum (the order) is a parameter that determines how quickly the seasonality can change. The default Fourier order for yearly seasonality is 10, for weekly is 7, for monthly 5 is recommended. Increased when the seasonality needs to fit higher-frequency changes, and generally be less smooth. Decrease for the opposite effect. 
     - Specifying Custom Seasonalities
           - Prophet will by default fit weekly and yearly seasonalities if the time series is more than two cycles long. It will also fit daily seasonality for a sub-daily time series.
            - The inputs to the `add_seasonality` function are a name, the period of the seasonality in days, and the Fourier order for the seasonality.
            
            `m = Prophet(weekly_seasonality=False)
             m.add_seasonality(name='monthly', period=30.5, fourier_order=5)
             forecast = m.fit(df).predict(future)
             fig = m.plot_components(forecast)`
             
           - On-season and off-season seasonalities can be specified as well. But this is more complicated. 
           - Prior scale for holidays and seasonality
           `m = Prophet(holidays=holidays, holidays_prior_scale=0.05).fit(df)`
       -Seasonality Type: Multiplicative vs Additive
       `m = Prophet(seasonality_mode='multiplicative')`
- Removing outliers
`df.loc[(df['ds'] > '2010-01-01') & (df['ds'] < '2011-01-01'), 'y'] = None
model = Prophet().fit(df)
fig = model.plot(model.predict(future))`


**Input Defaults**
Please take a look at the [code](https://github.com/facebook/prophet/blob/8c48f5b0429d127f6243110ff13f61a322cbf227/python/fbprophet/forecaster.py) for a complete list of parameters and defaults. 
- Growth Type: 
    - `m = Prophet(growth='linear')`
- Changepoints
    - `m = Prophet(changepoint_range=0.8)`
    - `m = Prophet(changepoint_prior_scale=0.5)`
- Fourier Order for Seasonalities
    - Fourier order of 3 for weekly seasonality and 10 for yearly seasonality. 
    `from fbprophet.plot import plot_yearly
    m = Prophet(yearly_seasonality=10).fit(df)
    a = plot_yearly(m)`
- Prior scale for holidays
- Seasonality type
     `m = Prophet(seasonality_mode='additive')`


**Minimal Output**
- Plain forecast without including default parameters 

In [22]:
#Viewing part of the forecast dataframe generated by Prophet
forecast[['ds','yhat']].tail()

Unnamed: 0,ds,yhat
3265,2017-01-15,8.206497
3266,2017-01-16,8.531523
3267,2017-01-17,8.31893
3268,2017-01-18,8.151543
3269,2017-01-19,8.163477


**BASIC TUNING: Example including Input Parameter Defaults**
- Forecast with default parameters explicitly specified. Added holidays as well so values will be slightly different from above forecast. 

In [68]:
m = Prophet(growth='linear',  
            yearly_seasonality= 'auto', 
            weekly_seasonality = 'auto', 
            daily_seasonality = 'auto',
            holidays = holidays,
            seasonality_mode= 'additive')
m.add_country_holidays(country_name='US')
m.fit(df)
future = m.make_future_dataframe(periods=365)
future.tail()
forecast = m.predict(future)
forecast[['ds','yhat']].tail()

INFO:fbprophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this.


Unnamed: 0,ds,yhat
3265,2017-01-15,8.206497
3266,2017-01-16,8.531523
3267,2017-01-17,8.31893
3268,2017-01-18,8.151543
3269,2017-01-19,8.163477


**ADVANCED TUNING: Example including Input Parameter Defaults**
- I explicitly included some paramers that Prophet 'auto' eliminated. My suggestion is to have all of these advanced tuning parameters be set to Null by default (to exclude them) and allow the user to input what they might need. For the user-defined holidays, I imagine that the user would just create one list of timestamps. We could add the code in the background. 
- Please note that some of these paramenters shouldn't be used in conjunction. It's up to the user to understand how Prophet works if they want to implement advanced tuning

In [72]:
m = Prophet(growth='linear' # 'logistic' is the other option here, 
            changepoint_prior_scale=0.5, 
            changepoint_range=0.8, 
            yearly_seasonality = 10,
            weekly_seasonality = 3,
            daily_seasonality = 5,
            ##change this to add your own holidays with:
#             playoffs = pd.DataFrame({
#                   'holiday': 'playoff',
#                   'ds': pd.to_datetime(['2008-01-13', '2009-01-03', '2010-01-16',
#                                         '2010-01-24', '2010-02-07', '2011-01-08',
#                                         '2013-01-12', '2014-01-12', '2014-01-19',
#                                         '2014-02-02', '2015-01-11', '2016-01-17',
#                                         '2016-01-24', '2016-02-07']),
#                   'lower_window': 0,
#                   'upper_window': 1,
#                 })
#                 superbowls = pd.DataFrame({
#                   'holiday': 'superbowl',
#                   'ds': pd.to_datetime(['2010-02-07', '2014-02-02', '2016-02-07']),
#                   'lower_window': 0,
#                   'upper_window': 1,
#                 })
#            holidays = pd.concat((playoffs, superbowls))
            holidays = holidays,
            holidays_prior_scale = 10,
            seasonality_mode = 'additive' #'multiplicative' is the other option here,
            seasonality_prior_scale = 10,
            interval_width = 0.8)
m.add_country_holidays(country_name='US')
# for specifying user-defined seasonality you would disable the other seasonality above and: 
# add_seasonality(name='monthly', period=30.5, fourier.order=5)
# Sub daily period is in units of days and can be fractional, so 4/24 would fit a 4-hour cycle for example 
m.fit(df)
future = m.make_future_dataframe(periods=365, freq = 'D')
future.tail()
forecast = m.predict(future)
forecast[['ds','yhat']].tail()

Unnamed: 0,ds,yhat
3265,2017-01-15,7.937575
3266,2017-01-16,8.774294
3267,2017-01-17,8.087653
3268,2017-01-18,7.917217
3269,2017-01-19,7.93568


**Additional Outputs**
- Confidence intervals

In [23]:
forecast[['ds','yhat','yhat_lower','yhat_upper']].tail()

Unnamed: 0,ds,yhat,yhat_lower,yhat_upper
3265,2017-01-15,8.206497,7.554402,8.962996
3266,2017-01-16,8.531523,7.775467,9.265652
3267,2017-01-17,8.31893,7.600338,9.03551
3268,2017-01-18,8.151543,7.463419,8.898323
3269,2017-01-19,8.163477,7.473671,8.901803


**Complete Outputs**
- My opinion is that just y_hat, y_lower, and y_upper should be returned for now. 

In [25]:
forecast.tail()

Unnamed: 0,ds,trend,yhat_lower,yhat_upper,trend_lower,trend_upper,additive_terms,additive_terms_lower,additive_terms_upper,weekly,weekly_lower,weekly_upper,yearly,yearly_lower,yearly_upper,multiplicative_terms,multiplicative_terms_lower,multiplicative_terms_upper,yhat
3265,2017-01-15,7.18856,7.554402,8.962996,6.860857,7.568169,1.017937,1.017937,1.017937,0.048276,0.048276,0.048276,0.969662,0.969662,0.969662,0.0,0.0,0.0,8.206497
3266,2017-01-16,7.187532,7.775467,9.265652,6.85889,7.568882,1.343991,1.343991,1.343991,0.352295,0.352295,0.352295,0.991696,0.991696,0.991696,0.0,0.0,0.0,8.531523
3267,2017-01-17,7.186504,7.600338,9.03551,6.856923,7.570956,1.132426,1.132426,1.132426,0.119639,0.119639,0.119639,1.012787,1.012787,1.012787,0.0,0.0,0.0,8.31893
3268,2017-01-18,7.185477,7.463419,8.898323,6.854966,7.572431,0.966066,0.966066,0.966066,-0.066664,-0.066664,-0.066664,1.03273,1.03273,1.03273,0.0,0.0,0.0,8.151543
3269,2017-01-19,7.184449,7.473671,8.901803,6.853072,7.572589,0.979028,0.979028,0.979028,-0.072254,-0.072254,-0.072254,1.051282,1.051282,1.051282,0.0,0.0,0.0,8.163477


**Additional Tuning Options and Considerations**

I haven't added anything about [gaps in data](https://facebook.github.io/prophet/docs/non-daily_data.html#data-with-regular-gaps), because I'm not sure what the UX should/could look like. 

[Additional Reggressors](https://towardsdatascience.com/forecast-model-tuning-with-additional-regressors-in-prophet-ffcbf1777dda): For [example](https://github.com/abaranovskis-redsamurai/automation-repo/blob/master/forecast/bikesharing_forecast_prophet_regressor2.ipynb), adding temperature and weather condition as a regressor to evaluate bike rentals. Future values of regressors have to be known in order to incorporate them with Prophet. You could perhaps use Prophet to make forecasts on regressors and then add those predictions/regressors to your main forecast. *Question* Could that approach cause overfitting? 
- If we were to include the option to include multiple regressors I imagine that the UX would be something like:
    - Specify your forecast period. 
    - Specify your `y` or main forecast with a Flux Query.
    - Specify input parameters for main forecast. 
    - Specify each regressor with a unique flux query. The simplest Prophet model is used 

    `m = Prophet()
    m.fit(df)
    future = m.make_future_dataframe(periods=365)
    future.tail()
    forecast = m.predict(future)forecast[['ds','yhat']].tail()` 

    to forecast each regressor.
    - The regressor is added with:
    `future['temp'] = future['ds'].apply(weather_temp)`
    - Any added seasonalities or extra regressors will by default use whatever seasonality_mode is set to, but can be overriden by specifying `mode='additive'` or `mode='multiplicative'` as an argument when adding the seasonality or regressor.
    
    `m = Prophet(seasonality_mode='multiplicative')
    m.add_seasonality('quarterly', period=91.25, fourier_order=8, mode='additive')
    m.add_regressor('regressor', mode='additive')`
 - In other words, the functions that add the regressor forecasts are created automatically. This code would be autogenerated (please see [example](https://github.com/abaranovskis-redsamurai/automation-repo/blob/master/forecast/bikesharing_forecast_prophet_regressor2.ipynb) for context): 
 

In [30]:
 def weather_temp(ds):
    date = (pd.to_datetime(ds)).date()
    if d_df[date:].empty:
        return future_temp_df[date:]['future_temp'].values[0]
    else:
        return (d_df[date:]['temp']).values[0]
    return 0

def weather_condition(ds):
    date = (pd.to_datetime(ds)).date()
    if d_df[date:].empty:
        return future_temp_df[date:]['future_weathersit'].values[0]
    else:
        return (d_df[date:]['weathersit']).values[0]
    return 0

m = Prophet()
m.add_regressor('temp')
m.add_regressor('weathersit')
m.fit(d_df)

future = m.make_future_dataframe(periods=10)
future['temp'] = future['ds'].apply(weather_temp)
future['weathersit'] = future['ds'].apply(weather_condition)

**Example of Autogenerating Regression Code**

Predicting the Temperature using Humidity as a regressor 

In [156]:
df= pd.read_csv('data/weather_day.csv')
df.head()

Unnamed: 0,instant,dteday,season,yr,mnth,holiday,weekday,workingday,weathersit,temp,atemp,hum,windspeed,casual,registered,cnt
0,1,2011-01-01,1,0,1,0,6,0,2,0.344167,0.363625,0.805833,0.160446,331,654,985
1,2,2011-01-02,1,0,1,0,0,0,2,0.363478,0.353739,0.696087,0.248539,131,670,801
2,3,2011-01-03,1,0,1,0,1,1,1,0.196364,0.189405,0.437273,0.248309,120,1229,1349
3,4,2011-01-04,1,0,1,0,2,1,1,0.2,0.212122,0.590435,0.160296,108,1454,1562
4,5,2011-01-05,1,0,1,0,3,1,1,0.226957,0.22927,0.436957,0.1869,82,1518,1600


In [157]:
#data cleanup to get data in the correct form
weather = df[['dteday', 'temp', 'hum']].copy()
weather['dteday'] = pd.to_datetime(weather['dteday'])
weather.head()

Unnamed: 0,dteday,temp,hum
0,2011-01-01,0.344167,0.805833
1,2011-01-02,0.363478,0.696087
2,2011-01-03,0.196364,0.437273
3,2011-01-04,0.2,0.590435
4,2011-01-05,0.226957,0.436957


Assuming user can write two queries--one to specify the data they wish to forecast and two the regressor. Splitting up raw data into these two time time series

In [158]:
query_one = weather[['dteday', 'temp']].copy().rename(columns={"dteday": "ds", "temp": "y"})
query_one.tail()

Unnamed: 0,ds,y
726,2012-12-27,0.254167
727,2012-12-28,0.253333
728,2012-12-29,0.253333
729,2012-12-30,0.255833
730,2012-12-31,0.215833


In [159]:
query_two_regressor = weather[['dteday', 'hum']].copy().rename(columns={"dteday": "ds", "hum": "y"})
query_two_regressor.tail()

Unnamed: 0,ds,y
726,2012-12-27,0.652917
727,2012-12-28,0.59
728,2012-12-29,0.752917
729,2012-12-30,0.483333
730,2012-12-31,0.5775


In [243]:
#Use Prophet to forecast the regressor values
m = Prophet()
m.fit(query_two_regressor)
future_regressor = m.make_future_dataframe(periods=10)
future.tail()
forecast_regressor = m.predict(future_regressor)
forecast_regressor[['ds','yhat']].tail()


INFO:fbprophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this.


Unnamed: 0,ds,yhat
736,2013-01-06,0.51472
737,2013-01-07,0.523615
738,2013-01-08,0.527004
739,2013-01-09,0.531515
740,2013-01-10,0.497589


In [246]:
#merge together the data you wish to forecast (query_one) and the forecasted regressor values
together = pd.merge(query_one, forecast_regressor[['ds','yhat']], on=['ds'], how="right")
together = together.rename(columns={"yhat": "hum"})
together.tail(20)


Unnamed: 0,ds,y,hum
721,2012-12-22,0.265833,0.616434
722,2012-12-23,0.245833,0.615394
723,2012-12-24,0.231304,0.615913
724,2012-12-25,0.291304,0.609751
725,2012-12-26,0.243333,0.603693
726,2012-12-27,0.254167,0.558362
727,2012-12-28,0.253333,0.553422
728,2012-12-29,0.253333,0.550687
729,2012-12-30,0.255833,0.550564
730,2012-12-31,0.215833,0.552885


In [249]:
#Make forecast and include the regressor
m = Prophet()
m.add_regressor('hum')
m.fit(together)
forecast = m.predict(together)
forecast[['ds','yhat']].tail()


INFO:fbprophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this.


Unnamed: 0,ds,yhat
736,2013-01-06,0.234212
737,2013-01-07,0.238787
738,2013-01-08,0.242442
739,2013-01-09,0.238008
740,2013-01-10,0.233313


**Stopped working on this idea because I'm not sure if it's really kosher to use prophet to forecast the regressor values to then use to help create the forecast** I'm not sure where/who I should ask about this. 

**Method Evaluation**