<a href="https://colab.research.google.com/github/Dong2Yo/Data-Science-Capstone-Project/blob/master/wk4_prophet.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Time Series Forecasting with Prophet

The Library Prophet makes forecasting with time series a lot easier. Prophet kind of does for time series what Scikit-learn did for machine learning. We say almost because time seris analysis is usually more complicated than Machine learning, and Prophet makes your job a lot easier. 


## **Table of Contents**

<ol>
    <li><a href="https://#Time-series-Analysis">Time series Analysis</a></li>
    <li><a href="https://#Intoduction-to-Prophet">Introduction to Prophet</a></li>
     <li><a href="https://#benifits">Benifits of Prophet</a></li>
    <li><a href="https://#Setup">Setup</a></li>
    <ol><li><a href="https://#Installing-Required-Libraries">Installing Required Libraries</a></li>
    <li><a href="https://#Importing-Required-Libraries">Importing Required Libraries</a></li></ol>  
    <li><a href="https://#case-study">Case Study</a></li>
    <ol>
        <li><a href="https://#Case-Study-1">Time series analysis of expenses (2007- 2016) </a>
    </ol>
    <li><a href="https://#Conclusion">Conclusion</a>
        


### Time Series Analysis:

 - Time series data is a collection of observations of measurements gathered over regular or irregular intervals of time.
 - Time series data have a natural temporal ordering
 - E.g. Sales data for a specific product at different times of the year.
 


<center><img src="https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBM-GPXX032NEN/images/time_series_img.jpg" width="700" height="400"></center>


### Introduction to Prophet
 - Prophet is a Python library that is open source and was created by Facebook primarily for time series forecasting.
 - It has the capability of automatically determining the right hyperparameters for the model.
 - It promotes insightful seasonal patterns.
 - It can fit time-series data having non-linearity in trends as well as holiday effects.
 - It has R and Python APIs for time-series forecasting
 


<center><img src="https://cdn.analyticsvidhya.com/wp-content/uploads/2018/05/tumblr_inline_omh3tnv5zk1r1x9ql_500.png" width="700" height="400"></center> 


### Benefits of using prophet
1. It's automatic as well as quick. For manual time series analysis and decomposition, it saves time.
2. It generates reliable and precise models.
3. It can deal with outliers and missing values.
4. It can manage the effects of seasonality and holidays.
5. It produces a tunable model.


## Structure of Prophet


<center><img src="https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBM-GPXX032NEN/images/prophet_structure-ImResizer.jpg" width="700" height="250"></center>


Prophet is particularly good at modeling time series that have multiple seasonalities and doesn’t face the drawbacks of other algorithms. At its core is the sum of three functions of time plus an error term: 
1) growth g(t)
2) seasonality s(t)
3) holidays h(t) , and error e_t :


<center><img src="https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBM-GPXX032NEN/images/formula_prophet.png" width="500" height="350"></center>


An Additive Model above can absorb the absence of seasonal effects by having s(t) = 0, as the other terms of the equation have no impact to predict future values in y(t). Unlike, fixed and linear regression models like Fama.
Prophet is a modular and non — linear regression model that separates and recombines a single dataset of history. 
Feature Engineering when features explain a future value or when factors drive a forecast are removed.


## Setup



For this notebook, we will be using the following libraries:

*   [`pandas`](https://pandas.pydata.org/?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkCoursesIBMML0187ENSkillsNetwork31430127-2021-01-01) for managing the data.
*   [`numpy`](https://numpy.org/?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkCoursesIBMML0187ENSkillsNetwork31430127-2021-01-01) for mathematical operations.
*   [`sklearn`](https://scikit-learn.org/stable/?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkCoursesIBMML0187ENSkillsNetwork31430127-2021-01-01) for machine learning and machine-learning-pipeline related functions.
*   [`matplotlib`](https://matplotlib.org/?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkCoursesIBMML0187ENSkillsNetwork31430127-2021-01-01) for additional plotting tools.
*   [`Plotly`](https://matplotlib.org/?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkCoursesIBMML0187ENSkillsNetwork31430127-2021-01-01) for additional plotting tools.


### Installing Required Libraries

In [None]:
!pip install pandas --upgrade
!pip install numpy
!pip install seaborn --upgrade
!pip install matplotlib --upgrade
!pip install scikit-learn --upgrade

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


### After running below command Restart the kernel


In [None]:
!pip install prophet

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


In [None]:
!pip install cmdstanpy

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


## Importing Required Libraries

*We recommend you import all required libraries in one place (here):*


In [None]:
import pandas as pd
import cmdstanpy
cmdstanpy.install_cmdstan(compiler=True)
from prophet import Prophet


Installing CmdStan version: 2.31.0
Install directory: /root/.cmdstan
Downloading CmdStan version 2.31.0
Download successful, file: /tmp/tmpe59ugyhu
Extracting distribution


DEBUG:cmdstanpy:cmd: make build -j1
cwd: None


Unpacked download as cmdstan-2.31.0
Building version cmdstan-2.31.0, may take several minutes, depending on your system.


DEBUG:cmdstanpy:cmd: make examples/bernoulli/bernoulli
cwd: None


Test model compilation
Installed cmdstan-2.31.0


# CASE STUDY - Time series analysis of expenses (2007-2016)


### Read CSV file


Read a comma-separated values (csv) file into DataFrame.

<code>Parameters</code>
filepath_or_bufferstr, path object or file-like object Any valid string path is acceptable. 



In [None]:
df = pd.read_csv('https://raw.githubusercontent.com/Dong2Yo/Dataset/main/prophet_example_data.csv')


### Print the dataset information


```DataFrame_Name.head``` function returns the first n rows for the object based on position. 

 It is useful for quickly testing if your object has the right type of data in it.

 For negative values of n, this function returns all rows except the last |n| rows, equivalent to df[:n]


In [None]:
df.head()

Unnamed: 0,ds,y
0,2007-12-10,9.590761
1,2007-12-11,8.51959
2,2007-12-12,8.183677
3,2007-12-13,8.072467
4,2007-12-14,7.893572



```DataFrame_Name.shape``` gives us the ```dimension``` of the dataset (columns, rows) therefor, we have six timer series (rows) of length 16599 columns


In [None]:
df.shape

(2905, 2)

### Check All column Datatypes


```dtypes``` function returns a Series with the data type of each column. 

The result’s index is the original DataFrame’s columns. Columns with mixed types are stored with the object dtype.


```dtypes``` function returns a Series with the data type of each column. 

The result’s index is the original DataFrame’s columns. Columns with mixed types are stored with the object dtype.


In [None]:
df.dtypes

ds     object
y     float64
dtype: object

## Format Dates column into datetime type


### Prepare for Prophet  


For prophet to work, we need to change the names of these columns to 'ds' and 'y'.

We will convert datetime column to ds and target column to y.

ds - datestamp column

y - target column

DataFrame.columns attribute return the column labels of the given Dataframe.


### Print 5 rows of the Dataframe 


In [None]:
df.head()

Unnamed: 0,ds,y
0,2007-12-10,9.590761
1,2007-12-11,8.51959
2,2007-12-12,8.183677
3,2007-12-13,8.072467
4,2007-12-14,7.893572


### Initialize the Model


We fit the model by ```instantiating a new Prophet object```. Any settings to the forecasting procedure are passed into the constructor. 

Then you call the fit method and pass in the historical dataframe. 

Fitting should take 1-7 seconds.


In [None]:
m = Prophet()

### Fit the model to dataframe (df)


The ```fit()``` function takes a DataFrame of time series data. The DataFrame must have a specific format. 

The first column must have the name ```ds``` and contain the date-times. 

The second column must have the name ```y``` and contain the observations.


In [None]:
m.fit(df)

INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this.
DEBUG:cmdstanpy:input tempfile: /tmp/tmpa1pc8sc2/7abte4om.json
DEBUG:cmdstanpy:input tempfile: /tmp/tmpa1pc8sc2/lhcde7l0.json
DEBUG:cmdstanpy:idx 0
DEBUG:cmdstanpy:running CmdStan, num_threads: None
DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.8/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=67941', 'data', 'file=/tmp/tmpa1pc8sc2/7abte4om.json', 'init=/tmp/tmpa1pc8sc2/lhcde7l0.json', 'output', 'file=/tmp/tmpa1pc8sc2/prophet_modelcimoj1co/prophet_model-20230130183007.csv', 'method=optimize', 'algorithm=lbfgs', 'iter=10000']
18:30:07 - cmdstanpy - INFO - Chain [1] start processing
INFO:cmdstanpy:Chain [1] start processing
18:30:08 - cmdstanpy - INFO - Chain [1] done processing
INFO:cmdstanpy:Chain [1] done processing


<prophet.forecaster.Prophet at 0x7f7635e01850>

### Make Future Dataframe


A ```make_future_dataframe``` function Make dataframe with future dates for forecasting.


#### Arguments:
<table height="40%",width="60%" style="font-size:16px;">
    
<tr><td>periods</td><td> - </td> <td>Int number of periods to forecast forward.</td></tr>

<tr><td>freq</td><td> - </td> <td>'day', 'week', 'month', 'quarter', 'year', 1(1 sec), 60(1 minute) or 3600(1 hour).</td></tr>

<tr><td>include_history</td><td> - </td> <td>Boolean to include the historical dates in the data frame for predictions.</td></tr>
   
</table>


In [None]:
future = m.make_future_dataframe(periods=365)
future.tail()

Unnamed: 0,ds
3265,2017-01-15
3266,2017-01-16
3267,2017-01-17
3268,2017-01-18
3269,2017-01-19


### Prediction of the model


To predict, we use the predict() method and pass in the future dataframe as shown below.


In [None]:
forecast = m.predict(future)
forecast[['ds', 'yhat', 'yhat_lower', 'yhat_upper']].tail()


Unnamed: 0,ds,yhat,yhat_lower,yhat_upper
3265,2017-01-15,8.205874,7.478255,8.968232
3266,2017-01-16,8.5309,7.786427,9.279148
3267,2017-01-17,8.318295,7.640217,9.081765
3268,2017-01-18,8.150892,7.43832,8.914494
3269,2017-01-19,8.162855,7.490148,8.891958


#### From the results generated, the model has generated a lot of information in addition to the predicted ds and yhat column. 
#### The most important column is the ```yhat column```, as it is what represents your ```Usage forecast```.


yhat upper values and yhate lower values - Uncertainity Interval

There are three sources of uncertainty in the forecast:
 1) uncertainty in the trend. 
 2) uncertainty in the seasonality estimates.
 3) additional observation noise.


### Plot The Prediction


In [None]:
from prophet.plot import plot_plotly, plot_components_plotly

plot_plotly(m, forecast)

In [None]:
plot_components_plotly(m, forecast)
