# Import necessary libraries

In [None]:
import os
import pandas as pd
import numpy as np
from fbprophet import Prophet
from fbprophet.diagnostics import cross_validation
import matplotlib.pyplot as plt
import ml_metrics as metrics
%matplotlib inline
 
plt.rcParams['figure.figsize']=(20,10)
plt.style.use('ggplot')

# Read in the data

Read the data in from the retail sales CSV file in the examples folder then set the index to the 'date' column. We are also parsing dates in the data file.

In [None]:
# Data should be in the data folder, and we test if the notebook's working directory is the notebook directory or the project directory
if os.path.isdir(os.getcwd() + '/data'):
    path_suffix = ''        # We're in the project directory
else:
    path_suffix = '../'     # We're in the notebook directory

pv_df = pd.read_csv(path_suffix + "data/pv.csv")
pv_df['datetime'] = pd.date_range(start="2015-01-01", end="2015-12-31 23:45:00", freq="15T")
# TODO: Maybe we actually will want to compute the datetime from the Time column ...
#df["Seconds_In_2015"] = df.Time * 4 * 15 * 60
#df['datetime'] = pd.to_datetime(df.Seconds_In_2015, origin=datetime.datetime(year=2015, month=1, day=1), unit="s")
pv_df = pv_df.set_index('datetime').drop(['Month', 'Day', 'Time'], axis=1)
pv_df.to_csv(path_suffix + 'data/example.csv')

# Prepare for Prophet

For prophet to work, we need to change the names of these columns to 'ds' and 'y', so lets just create a new dataframe and keep our old one handy (you'll see why later). The new dataframe will initially be created with an integer index so we can rename the columns as required by fbprophet. Additionally, fbprophet doesn't like the index to be a datetime...it wants to see 'ds' as a non-index column, so we won't set an index differently than the integer index.

In [None]:
df = pv_df.reset_index()
df = df .rename(columns={'datetime':'ds', 'EJJ PV (MW)':'y'})

# Running Prophet

Now, let's set prophet up to begin modeling our data.

Note: Since we are using monthly data, you'll see a message from Prophet saying `Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this.` This is OK since we are working with monthly data but you can disable it by using `weekly_seasonality=True` in the instantiation of Prophet.

In [None]:
model = Prophet(yearly_seasonality=True, daily_seasonality=True)
model.fit(df);

Forecasting is fairly useless unless you can look into the future, so we need to add some future dates to our dataframe. For this example, I want to forecast 2 years into the future, so I'll built a future dataframe with 24 periods since we are working with monthly data. Note the ```freq='m'``` inclusion to ensure we are adding 24 months of data.

This can be done with the following code:


In [None]:
future = model.make_future_dataframe(periods=3000, freq = '15Min', include_history=False)

To forecast this future data, we need to run it through Prophet's model.

In [None]:
forecast = model.predict(future)

# Plotting Prophet results

Prophet has a plotting mechanism called `plot`.  This plot functionality draws the original data (black dots), the model (blue line) and the error of the forecast (shaded blue area).

In [None]:
model.plot(forecast)

In [None]:
model.plot_components(forecast)

Personally, I'm not a fan of this visualization so I like to break the data up and build a chart myself.  The next section describes how I build my own visualization for Prophet modeling 

# Visualizing Prophet models

In order to build a useful dataframe to visualize our model versus our original data, we need to combine the output of the Prophet model with our original data set, then we'll build a new chart manually using pandas and matplotlib.

First, let's set our dataframes to have the same index of ```ds```

In [None]:
df.set_index('ds', inplace=True)
forecast.set_index('ds', inplace=True)

Now, we'll combine the original data and our forecast model data

In [None]:
viz_df = sales_df.join(forecast[['yhat', 'yhat_lower','yhat_upper']], how = 'outer')

If we look at the ```head()```, we see the data has been joined correctly but the scales of our original data (sales) and our model (yhat) are different. We need to rescale the yhat colums(s) to get the same scale, so we'll use numpy's ```exp``` function to do that.

In [None]:
sales_df.index = pd.to_datetime(sales_df.index) # Make sure our index is a datetime object
connect_date = sales_df.index[-2] # Select the 2nd to last date

Using the `connect_date` we can now grab only the model data that occurs after that date (you'll see why in a minute). To do this, we'll mask the forecast data.

In [None]:
mask = (forecast.index > connect_date)
predict_df = forecast.loc[mask]

Now, let's build a dataframe to use in our new visualization. We'll follow the same steps we did before.

In [None]:
viz_df = sales_df.join(predict_df[['yhat', 'yhat_lower','yhat_upper']], how = 'outer')

## Time to plot

Now, let's plot everything to get the 'final' visualization of our sales data and forecast with errors.

In [None]:
fig, ax1 = plt.subplots()
ax1.plot(viz_df.sales)
ax1.plot(viz_df.yhat, color='black', linestyle=':')
ax1.fill_between(viz_df.index, viz_df['yhat_upper'], viz_df['yhat_lower'], alpha=0.5, color='darkgray')
ax1.set_title('Sales (Orange) vs Sales Forecast (Black)')
ax1.set_ylabel('Dollar Sales')
ax1.set_xlabel('Date')

L=ax1.legend() #get the legend
L.get_texts()[0].set_text('Actual Sales') #change the legend text for 1st plot
L.get_texts()[1].set_text('Forecasted Sales') #change the legend text for 2nd plot

This visualization is much better (in my opinion) than the default fbprophet plot. It is much easier to quickly understand and describe what's happening. The orange line is actual sales data and the black dotted line is the forecast. The gray shaded area is the uncertainty estimation of the forecast.

In [None]:
df_cv = cross_validation(model, horizon = '60 days', initial = '60 days', period = '60 days')

In [None]:
df_my_cv = cross_validation(model, horizon = '1 days', initial = '365 days', period = '30 days')

In [None]:
#model.plot(df_my_cv);
model.plot(df_cv);