<a href="https://colab.research.google.com/github/Dasika-Vaishnavi/Wave2Web_forecast/blob/main/Wave2web_diagnosis.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
import warnings; 
warnings.simplefilter('ignore')

In [2]:
!pip install pandas
!pip install prophet

Collecting prophet
  Downloading prophet-1.0.1.tar.gz (65 kB)
[K     |████████████████████████████████| 65 kB 2.5 MB/s 
Collecting cmdstanpy==0.9.68
  Downloading cmdstanpy-0.9.68-py3-none-any.whl (49 kB)
[K     |████████████████████████████████| 49 kB 5.5 MB/s 
Collecting ujson
  Downloading ujson-4.0.2-cp37-cp37m-manylinux1_x86_64.whl (179 kB)
[K     |████████████████████████████████| 179 kB 18.6 MB/s 
Building wheels for collected packages: prophet
  Building wheel for prophet (setup.py) ... [?25l[?25hdone
  Created wheel for prophet: filename=prophet-1.0.1-py3-none-any.whl size=6638876 sha256=38fadcbe362cc7aec45ab3b0c4bf08ae2a9b905903f8a6cf88d382f854aecd41
  Stored in directory: /root/.cache/pip/wheels/4e/a0/1a/02c9ec9e3e9de6bdbb3d769d11992a6926889d71567d6b9b67
Successfully built prophet
Installing collected packages: ujson, cmdstanpy, prophet
  Attempting uninstall: cmdstanpy
    Found existing installation: cmdstanpy 0.9.5
    Uninstalling cmdstanpy-0.9.5:
      Successfully

# 0. Install and import dependencies

In [3]:
import pandas as pd
from prophet import Prophet

# 1. Read and process data

In [4]:
df = pd.read_csv('/content/picchi_peaks.csv')
df['Year'] = df['FLOW_DATE'].apply(lambda x: str(x)[-4:])
df['Month'] = df['FLOW_DATE'].apply(lambda x: str(x)[-6:-4])
df['Day'] = df['FLOW_DATE'].apply(lambda x: str(x)[:-6])
df['ds'] = pd.DatetimeIndex(df['Day']+'-'+df['Year'])
df.drop(['FLOW_DATE', 'Year', 'Month', 'Day'], axis=1, inplace=True)
df.head()

Unnamed: 0,y,ds
0,46.42,2011-01-01
1,46.54,2011-01-01
2,46.64,2011-01-01
3,46.69,2011-01-01
4,46.7,2011-01-01


# 2. Train the model

In [5]:
m = Prophet(interval_width=0.95, daily_seasonality=True)
model = m.fit(df)

# 3. Forecasting the data

In [None]:
future = m.make_future_dataframe(periods=100,freq='D')
forecast = m.predict(future)
forecast.head()

Unnamed: 0,ds,trend,yhat_lower,yhat_upper,trend_lower,trend_upper,additive_terms,additive_terms_lower,additive_terms_upper,daily,daily_lower,daily_upper,weekly,weekly_lower,weekly_upper,yearly,yearly_lower,yearly_upper,multiplicative_terms,multiplicative_terms_lower,multiplicative_terms_upper,yhat
0,2010-09-03,46.058854,45.633372,70.003411,46.058854,46.058854,11.620017,11.620017,11.620017,1.036155,1.036155,1.036155,0.193177,0.193177,0.193177,10.390685,10.390685,10.390685,0.0,0.0,0.0,57.678871
1,2010-12-03,42.335766,37.847051,62.413921,42.335766,42.335766,7.113038,7.113038,7.113038,1.036155,1.036155,1.036155,0.193177,0.193177,0.193177,5.883706,5.883706,5.883706,0.0,0.0,0.0,49.448804
2,2011-01-01,41.149287,32.468721,58.232895,41.149287,41.149287,4.769734,4.769734,4.769734,1.036155,1.036155,1.036155,0.076082,0.076082,0.076082,3.657498,3.657498,3.657498,0.0,0.0,0.0,45.919021
3,2011-01-02,41.108374,33.979227,57.165635,41.108374,41.108374,4.555793,4.555793,4.555793,1.036155,1.036155,1.036155,0.305051,0.305051,0.305051,3.214588,3.214588,3.214588,0.0,0.0,0.0,45.664167
4,2011-01-03,41.067461,32.602509,57.454095,41.067461,41.067461,4.011792,4.011792,4.011792,1.036155,1.036155,1.036155,0.351497,0.351497,0.351497,2.624141,2.624141,2.624141,0.0,0.0,0.0,45.079253


# 4. Diagnostics

## 4.1. Cross validation
### Prophet includes functionality for time series cross validation to measure forecast error using historical data. This is done by selecting cutoff points in the history, and for each of them fitting the model using data only up to that cutoff point. We can then compare the forecasted values to the actual values.

In [6]:
from prophet.diagnostics import cross_validation
df_cv = cross_validation(m, initial='730 days', period='180 days', horizon = '365 days')

INFO:prophet:Making 15 forecasts with cutoffs between 2013-01-07 00:00:00 and 2019-12-02 00:00:00


HBox(children=(FloatProgress(value=0.0, max=15.0), HTML(value='')))




### Here we do cross-validation to assess prediction performance on a horizon of 365 days, starting with 730 days of training data in the first cutoff and then making predictions every 180 days. On this 8 year time series, this corresponds to 11 total forecasts

In [7]:
df_cv.head()

Unnamed: 0,ds,yhat,yhat_lower,yhat_upper,y,cutoff
0,2013-02-01,9.149717,3.570928,14.711747,11.42,2013-01-07
1,2013-02-01,9.149717,3.428742,14.838433,11.43,2013-01-07
2,2013-02-01,9.149717,3.305308,14.489686,12.14,2013-01-07
3,2013-02-01,9.149717,3.906284,14.788559,11.41,2013-01-07
4,2013-06-01,-13.97245,-20.118753,-8.423479,5.11,2013-01-07


### Custom cutoffs can also be supplied as a list of dates to the cutoffs keyword in the cross_validation function in Python and R. For example, three cutoffs six months apart, would need to be passed to the cutoffs argument in a date format like

In [8]:
cutoffs = pd.to_datetime(['2013-02-15', '2013-08-15', '2014-02-15'])
df_cv2 = cross_validation(m, cutoffs=cutoffs, horizon='365 days')

HBox(children=(FloatProgress(value=0.0, max=3.0), HTML(value='')))




### The performance_metrics utility can be used to compute some useful statistics of the prediction performance (yhat, yhat_lower, and yhat_upper compared to y), as a function of the distance from the cutoff (how far into the future the prediction was)

In [9]:
from prophet.diagnostics import performance_metrics
df_p = performance_metrics(df_cv)
df_p.head()

INFO:prophet:Skipping MAPE because y close to 0


Unnamed: 0,horizon,mse,rmse,mae,mdape,smape,coverage
0,41 days,109.53101,10.465706,8.069232,0.236358,0.317942,0.690342
1,42 days,110.612125,10.51723,8.218065,0.262047,0.324341,0.694915
2,43 days,111.977796,10.581956,8.345423,0.280285,0.330152,0.690954
3,44 days,112.155012,10.590326,8.357869,0.280285,0.330893,0.690759
4,46 days,112.092784,10.587388,8.374499,0.288554,0.360431,0.706799
