<a href="https://colab.research.google.com/github/adamdenault/colab-notebooks/blob/master/Time_Series_Traffic_Forecasting.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

This Notebook was created by [Britney Muller](http://twitter.com/BritneyMuller) using Facebook's open source [Prophet time-series prediction model](https://facebook.github.io/prophet/):

#Hold Shift + Return to run the below cell and upload your timeseries.csv data. 
You can use this to predict any numerical values that occur over time (sales, traffic, number of cookies you eat a day, twitter activity by large babies, etc.)

##Headers must look like the following.
## Columns:

###Ds = month/day/year (acsending)

### Y = timeseries data (make sure to remove decimals & commas)
![alt text](https://i.imgur.com/QebUg9F.png)



#Run the following cell & upload your time series data 

Use 3+ years worth of time series data for optimal predictions.

In [None]:
# Import nessecary libraries and data. [Shift + Return to run cell]
import os
import yaml
import datetime
from datetime import date
import numpy as np
import pandas as pd
from fbprophet import Prophet
from fbprophet.plot import plot_plotly
import matplotlib.pyplot as plt
from matplotlib.pyplot import figure
figure(num=None, figsize=(8, 6), dpi=80)

#import plotly.offline as py
#py.init_notebook_mode()

# Did some pre processing of the CSV. 
# Specficially I:
#   - truncated a bunch of notes at the top of the file
 #   - removed some whitespace at the end of file

from google.colab import files
uploaded = files.upload()

#Upload your Time Series data:

#Import data into a Pandas dataframe:

Change the below .csv name to match your upload!

In [None]:
import io
df = pd.read_csv(io.BytesIO(uploaded['PP-traffic.csv'])) #<--Change .csv name to your uploaded .csv name.
# Dataset is now stored in a Pandas Dataframe


#Validate the dataframe's head (top 5 rows)

In [None]:
df.head()

# Explore known dataframe types (not necessary, but good to know)

In [None]:
print(df.dtypes)

#Drop extra columns to clean up your dataframe

In [None]:
#data cleanup
df['ds'] = df['Ds']
df['y'] = df['Y']
#drop extra columns
df = df[['ds', 'y']]

#Reevaluate your dataframe's head:

In [None]:
df.head()

##Convert to a date time

In [None]:
#df['ds'] = df['ds'].astype('datetime64[ns]')
df['ds'] = pd.to_datetime(df['ds'])

#Make model & fit it to your data

In [None]:
m = Prophet()
m.fit(df)

#Make a future data frame

In [None]:
future = m.make_future_dataframe(periods=30)
future.tail()

#Predict future data over a period of time 

After running the below cell, scroll all the way over to the right. The 'yhat' values are the predictions for each of the following days.

Modify the number of days predicted by changing forecast.tail(*X*)

In [None]:
forecast = m.predict(future)
#forecast[['ds', 'yhat', 'yhat_lower', 'yhat_upper']].tail()
forecast.tail(30)

#Plot the model's prediction

In [None]:
fig1 = m.plot(forecast)

#Explore Yearly, Weekly & Monthly Trends

In [None]:
fig2 = m.plot_components(forecast)

#[Optional] Pick a forecast date in the past to evaluate how well your actual data did vs. your model's prediction:

In [None]:
forecast_date = '07-01-2019'


#Remove data that occured after start date

In [None]:
mask = (df['ds'] < forecast_date)
df2 = df.loc[mask]
df2.head()

#Make & Fit Model + Forecast data points

In [None]:

# Make model and fit it
m2 = Prophet()
m2.fit(df2)

# Make a future data frame
future = m2.make_future_dataframe(periods=90)
future.tail()

# Predict the GA data over the future period
forecast = m2.predict(future)
forecast.tail()
#forecast[['ds', 'yhat', 'yhat_lower', 'yhat_upper']].tail()

#Plot the model's prediction

In [None]:
# plot the forecast
fig2 = m2.plot(forecast)

In [None]:
# Merge actuals with forecast
forecast_plot = forecast[['ds', 'yhat']]
df_inner = pd.merge(forecast_plot, df, on='ds', how='inner')
df_inner.tail()

In [None]:

mask = (df_inner['ds'] > forecast_date)
df2_plot = df_inner.loc[mask]
df2_plot.tail()

#Evaluate actual results vs forecast to see how you did against the model's prediction

Prediction values are the light blue dashed line.

Actual values are the solid black line.

In [None]:
# Plot actuals vs forecast
plt.figure(figsize=(16, 9))
plt.title(label='Forecast vs. Actual Performance \n' + 'forecast date = ' + forecast_date)
plt.plot('ds', 'y', data=df2_plot, color='black')
plt.plot('ds', 'yhat', data=df2_plot, color ='skyblue', linestyle='dashed')
plt.show()