<a href="https://colab.research.google.com/github/bernardofn/Time-series-Colab-Notebooks/blob/master/Time_series_Prophet.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# <font color=Darkred>Time series</font>

---


## <font color=Darkblue>Forecasting </font>

The data comes from **[WMFLabs](http://tools.wmflabs.org/pageviews/)**.
<br>
Using Facebook **[Prophet](https://facebook.github.io/prophet/)**

<br><br>
### <font color=Darkred>** 1. Install Prophet and other required libraries/packages:**</font>


In [0]:
from IPython.display import clear_output
try:
  !pip install pystan
  !pip install fbprophet
except:
  pass
finally:
  clear_output()
  print('All Loaded')

In [0]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import datetime
from fbprophet import Prophet

<br><br>
### <font color=Darkred>**2. Import the dataset **</font>

In [0]:
from google.colab import files
uploaded = files.upload()

In [0]:
#Save uploaded file on the Virtual Machine's  

with open("pageviews-gh.csv", 'w') as f:
    f.write(uploaded[uploaded.keys()[0]])

In [0]:
# Once your file is on the Virtual Machine, you can check if the file is there.
!ls

<br><br>
### <font color=Darkred>**3. Declare the dataset as a data frame and visualise it**</font>

In [0]:
ts_data = pd.read_csv("pageviews-gh.csv")
ts_data["Date"]= pd.to_datetime(ts_data["Date"])  # changing format of Date

ts_data.head(5)

In [0]:
ts_data.shape # It contains 1257 rows and 2 columns.

In [0]:
ts_data.dtypes

<br><br>
### <font color=Darkred>**4. Take a look at the descriptive stats**</font>

* What is the min/average/max numbe of visits per day?


In [0]:
ts_data.describe().round(3)

In [0]:
# Create a histogram to observe the distribution of visits (per day)
ts_data['Growth hacking'].hist(color='red', alpha=0.5, bins=20)

# Add labels
plt.title('Histogram')
plt.xlabel('Visits per day')
plt.ylabel('Frequency')

In [0]:
# Create a line chart to observe the evolution of visits (per day)

ts_data.set_index('Date').plot(color='red', alpha=0.5);

<br><br>
### <font color=Darkred>**5. Declare the variable to predict ($y$) and the date ($ds$) **</font>

In [0]:
ts_data.columns = ["ds", "y"]
ts_data['cap'] = 1200
ts_data['floor'] = 0
ts_data.head(5)

<br><br>
### <font color=Darkred>**6. Making a prediction **</font>

* Create the first model ($m_1$) and fit the data to our dataframe:

In [0]:
m1 = Prophet(growth='logistic')
m1.fit(ts_data);

<br><br>
* To tell **Prophet** how far to predict in the future, use  ```make_future_dataframe```. 

In [0]:
# In this example, we will predict out 1 year (365 days).
future365dd = m1.make_future_dataframe(periods=365)
future365dd['cap'] = 1200
future365dd['floor'] = 0

# Then make the forecast
forecast12mm = m1.predict(future365dd)

<br><br>
* The ```forecast12mm``` is a pandas dataframe. The predicted value is called ```yhat``` and the range is defined by ```yhat_lower``` and ```yhat_upper```.

In [0]:
# To see the last 5 predicted values:
forecast12mm[['ds', 'yhat', 'yhat_lower', 'yhat_upper','cap','floor']].tail()

In [0]:
# Plot the data
m1.plot(forecast12mm);

<br><br>
* The other useful feature is the ability to plot the various components:

In [0]:
m1.plot_components(forecast12mm);

<br><br>
### <font color=Darkred>**7. Diagnostics **</font>

* Prophet includes functionality for time series cross validation to measure forecast error using historical data. 
* This is done by selecting cutoff points in the history, and for each of them fitting the model using data only up to that cutoff point. 
* We can then compare the forecasted values to the actual values. 
<br>
For more info about Diagnostics, check [here](https://facebook.github.io/prophet/docs/diagnostics.html).

In [0]:
# Cross-validation
from fbprophet.diagnostics import cross_validation
df_cv = cross_validation(m1, horizon = '90 days')
df_cv.head()

In [0]:
# Performance metrics
from fbprophet.diagnostics import performance_metrics
df_p = performance_metrics(df_cv)
df_p.head()

In [0]:
# Visualizing the performance metrics
from fbprophet.plot import plot_cross_validation_metric
fig = plot_cross_validation_metric(df_cv, metric='rmse')

<br><br>
Generally, this notebook gives you a framework for getting data from Wikipedia visits and processing it with Propet to make forecasts. 