<a href="https://colab.research.google.com/github/SGhuman123/Data-Science-Portfolio/blob/main/Udemy_Master_Time_Series_Analysis/Using_Facebook_Prophet_to_forecast_demand_for_shelter_in_New_York/Prophet_Capstone_Project.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Challenge: Demand for Shelter

Predicting the future is always difficult. In this interesting case study, we use Prophet to predict the Demand for Shelter in New York City. As well, we will learn about Cross-validation and Parameter Tuning in Time Series

1. **Prepare Dataframe**
  * Facebook Prophet has a lot of quirks.
The Date variable must be called ds and
the time-series has to be y. Additionally,
the date must be in the format yyyymm-dd. Finally, don't forget to prepare
the events like was shown in the
practice tutorial.

2. **Training and test set**
  * In Time-Series, the training and test
set follows a different structure, given
that information without context does
not have value. Additionally, the test
set should have the same number of
days as a real-life forecast.

3. **Prophet Model and Accuracy assessment**
  * Build the Facebook Prophet model,
while adding the regressors. Next,
build the future data frame to
perform the forecast. In the end,
assess the accuracy of the model.

4. **Visualization**
  * Facebook Prophet has very cool built-in
visualization functions. Use them! As a
visual learner myself, I like to see
pretty graphs to know what the model
tells me.

5. **Parameter Tuning**
  * Do the Parameter Tuning while
performing cross-validation. Tune
the parameters we tuned in the
practice tutorial. Good luck!

# Libraries and data

In [None]:
!pip freeze

absl-py==1.4.0
accelerate==1.1.1
aiohappyeyeballs==2.4.3
aiohttp==3.11.2
aiosignal==1.3.1
alabaster==1.0.0
albucore==0.0.19
albumentations==1.4.20
altair==4.2.2
annotated-types==0.7.0
anyio==3.7.1
argon2-cffi==23.1.0
argon2-cffi-bindings==21.2.0
array_record==0.5.1
arviz==0.20.0
astropy==6.1.6
astropy-iers-data==0.2024.11.18.0.35.2
astunparse==1.6.3
async-timeout==4.0.3
atpublic==4.1.0
attrs==24.2.0
audioread==3.0.1
autograd==1.7.0
babel==2.16.0
backcall==0.2.0
beautifulsoup4==4.12.3
bigframes==1.27.0
bigquery-magics==0.4.0
bleach==6.2.0
blinker==1.9.0
blis==0.7.11
blosc2==2.7.1
bokeh==3.6.1
Bottleneck==1.4.2
bqplot==0.12.43
branca==0.8.0
CacheControl==0.14.1
cachetools==5.5.0
catalogue==2.0.10
certifi==2024.8.30
cffi==1.17.1
chardet==5.2.0
charset-normalizer==3.4.0
chex==0.1.87
clarabel==0.9.0
click==8.1.7
cloudpathlib==0.20.0
cloudpickle==3.1.0
cmake==3.30.5
cmdstanpy==1.2.4
colorcet==3.1.0
colorlover==0.3.0
colour==0.1.5
community==1.0.0b1
confection==0.1.5
cons==0.4.6
contourpy==1.

In [None]:
# Mount Drive to access files
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
# Path to the folder
%cd /content/drive/MyDrive/Udemy Time Series Forecasting/Modern Time Series Forecasting Techniques /CAPSTONE PROJECT_ Prophet

/content/drive/MyDrive/Udemy Time Series Forecasting/Modern Time Series Forecasting Techniques /CAPSTONE PROJECT_ Prophet


In [None]:
# Import libraries
import pandas as pd
from prophet import Prophet
from prophet.plot import plot_plotly, plot_components_plotly
from prophet.diagnostics import cross_validation, performance_metrics

1. **Prepare Dataframe**
  * Facebook Prophet has a lot of quirks.
The Date variable must be called ds and
the time-series has to be y. Additionally,
the date must be in the format yyyymm-dd. Finally, don't forget to prepare
the events like was shown in the
practice tutorial.

In [None]:
# Load the CSV file
df = pd.read_csv('DHS_weekly.csv')
# Rename the columns, date to 'ds' and target variable to 'y'
df.rename(columns={'Date': 'ds', 'Total Individuals in Shelter': 'y'}, inplace=True)
# convert the column 'date' to datetime
df['ds'] = pd.to_datetime(df['ds'])
df.tail()

Unnamed: 0,ds,y,Easter,Thanksgiving,Christmas,Temperature
361,2020-12-06,375444,0,0,0,10.072857
362,2020-12-13,375820,0,0,0,8.208571
363,2020-12-20,375615,0,0,0,3.535714
364,2020-12-27,374203,0,0,1,7.51
365,2021-01-03,212514,0,0,0,6.625


In [None]:
# Prepare holiday dataframe for Easter
holidays = pd.DataFrame({
    'holiday': 'Easter',  # Name of the holiday
    'ds': df['ds'][df['Easter'] == 1],  # Dates of the holiday
    'lower_window': 0,  # Number of days before the holiday to include in the effect
    'upper_window': 1,  # Number of days after the holiday to include in the effect
})

# Loop through other holidays (Thanksgiving and Christmas) to append them to the holidays dataframe
for holiday in ['Thanksgiving', 'Christmas']:
    temp = pd.DataFrame({
        'holiday': holiday,  # Name of the holiday
        'ds': df['ds'][df[holiday] == 1],  # Dates of the holiday
        'lower_window': 0,  # Number of days before the holiday to include in the effect
        'upper_window': 1,  # Number of days after the holiday to include in the effect
    })
    holidays = pd.concat([holidays, temp])  # Concatenate the new holiday dataframe to the existing holidays dataframe

holidays

Unnamed: 0,holiday,ds,lower_window,upper_window
15,Easter,2014-04-20,0,1
65,Easter,2015-04-05,0,1
116,Easter,2016-03-27,0,1
171,Easter,2017-04-16,0,1
221,Easter,2018-04-01,0,1
276,Easter,2019-04-21,0,1
327,Easter,2020-04-12,0,1
47,Thanksgiving,2014-11-30,0,1
99,Thanksgiving,2015-11-29,0,1
151,Thanksgiving,2016-11-27,0,1


# Prophet Model

2. **Training and test set**
  * In Time-Series, the training and test
set follows a different structure, given
that information without context does
not have value. Additionally, the test
set should have the same number of
days as a real-life forecast.

In [None]:
# Assuming the test set is intended to be for the last 60 days in the dataset
max_date = df['ds'].max()  # Get the maximum date in the dataset
split_date = max_date - pd.Timedelta(weeks=13)  # Set the split date 13 weeks (91 days) before the last date

# Split the data into training and testing sets based on the split date
train_df = df[df['ds'] <= split_date]  # Training set includes data up to and including the split date
test_df = df[df['ds'] > split_date]  # Testing set includes data after the split date


3. **Prophet Model and Accuracy assessment**
  * Build the Facebook Prophet model,
while adding the regressors. Next,
build the future data frame to
perform the forecast. In the end,
assess the accuracy of the model.


In [None]:
# Initialize the Prophet model
model = Prophet(holidays=holidays) # Add the holidays
model.add_regressor('Temperature')  # Adding temperature as a regressor
# Fit the model on the training data
model.fit(train_df)

INFO:prophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this.
INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this.
DEBUG:cmdstanpy:input tempfile: /tmp/tmp89kiof8r/sprxqecz.json
DEBUG:cmdstanpy:input tempfile: /tmp/tmp89kiof8r/x4xzk1tt.json
DEBUG:cmdstanpy:idx 0
DEBUG:cmdstanpy:running CmdStan, num_threads: None
DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.10/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=42380', 'data', 'file=/tmp/tmp89kiof8r/sprxqecz.json', 'init=/tmp/tmp89kiof8r/x4xzk1tt.json', 'output', 'file=/tmp/tmp89kiof8r/prophet_modelq3ax_ike/prophet_model-20241204051532.csv', 'method=optimize', 'algorithm=lbfgs', 'iter=10000']
05:15:32 - cmdstanpy - INFO - Chain [1] start processing
INFO:cmdstanpy:Chain [1] start processing
05:15:32 - cmdstanpy - INFO - Chain [1] done processing
INFO:cmdstanpy:Chain [1] done processing


<prophet.forecaster.Prophet at 0x7c97736a3190>

In [None]:
# Create a dataframe for predictions
future_df = model.make_future_dataframe(periods=13, freq='W')  # Generate future dates for 13 weeks

# Include the regressors in the future dataframe
future_df = future_df.merge(df[['ds', 'Temperature']], on='ds', how='left')

# Predict over the future dataframe
forecast = model.predict(future_df)

In [None]:
# Evaluate predictions

# Aligning predicted 'yhat' with the actual 'y' in the test set
test_df = test_df.set_index('ds')  # Set the index of test_df to 'ds' (date)
forecast.set_index('ds', inplace=True)  # Set the index of forecast to 'ds' (date)
forecast = forecast.join(test_df['y'])  # Join the actual 'y' values from the test set to the forecast dataframe

# Calculate mean absolute error
forecast['error'] = (forecast['y'] - forecast['yhat']).abs()  # Calculate the absolute error between actual and predicted values
mae = forecast['error'].mean()  # Calculate the mean of the absolute errors
print(f"Mean Absolute Error: {mae}")  # Print the Mean Absolute Error


Mean Absolute Error: 29485.304917293375


4. **Visualization**
  * Facebook Prophet has very cool built-in
visualization functions. Use them! As a
visual learner myself, I like to see
pretty graphs to know what the model
tells me.

In [None]:
# Visualizing the forecast
from prophet.plot import plot_plotly
plot_plotly(model, forecast.reset_index())

In [None]:
from prophet.plot import plot_plotly, plot_components_plotly
import matplotlib.pyplot as plt

# Plot the components of the forecast
fig_components = plot_components_plotly(model, forecast.reset_index())
fig_components.show()


Discarding nonzero nanoseconds in conversion.



# Parameter Tuning

5. **Parameter Tuning**
  * Do the Parameter Tuning while
performing cross-validation. Tune
the parameters we tuned in the
practice tutorial. Good luck!

In [None]:
from prophet.diagnostics import cross_validation, performance_metrics
import itertools
import numpy as np

In [None]:
# Define all combinations of parameters for grid search
param_grid = {
    'changepoint_prior_scale': [0.01, 0.1, 0.5],
    'seasonality_prior_scale': [0.1, 1.0, 10.0],
    'holidays_prior_scale': [0.1, 1.0, 10.0],
    'seasonality_mode': ['additive', 'multiplicative']
}

# Generate all combinations of parameters
all_params = [dict(zip(param_grid.keys(), v)) for v in itertools.product(*param_grid.values())]
rmses = []  # Store the RMSEs for each params here

In [None]:
# Iterate over all parameter combinations
for params in all_params:
    # Initialize and fit the Prophet model with the given parameters and holidays
    m = Prophet(holidays=holidays, **params).fit(train_df)

    # Perform cross-validation
    df_cv = cross_validation(
        m,
        initial='1500 days',  # Initial training period
        period='42 days',     # Period between cutoff dates
        horizon='91 days',    # Forecast horizon
        parallel="processes"  # Use parallel processing
    )

    # Calculate performance metrics
    df_p = performance_metrics(df_cv, rolling_window=1)

    # Append the first RMSE value to the rmses list
    rmses.append(df_p['rmse'].values[0])


[1;30;43mStreaming output truncated to the last 5000 lines.[0m
DEBUG:cmdstanpy:input tempfile: /tmp/tmp89kiof8r/2neh8e43.json
DEBUG:cmdstanpy:input tempfile: /tmp/tmp89kiof8r/8sxdpad3.json
DEBUG:cmdstanpy:idx 0
DEBUG:cmdstanpy:running CmdStan, num_threads: None
DEBUG:cmdstanpy:input tempfile: /tmp/tmp89kiof8r/lpc8vhml.json
DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.10/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=8456', 'data', 'file=/tmp/tmp89kiof8r/qv18u8hk.json', 'init=/tmp/tmp89kiof8r/8sxdpad3.json', 'output', 'file=/tmp/tmp89kiof8r/prophet_model6uwjl3ch/prophet_model-20241204051950.csv', 'method=optimize', 'algorithm=lbfgs', 'iter=10000']
05:19:50 - cmdstanpy - INFO - Chain [1] start processing
DEBUG:cmdstanpy:idx 0
INFO:cmdstanpy:Chain [1] start processing
DEBUG:cmdstanpy:running CmdStan, num_threads: None
DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.10/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=35767', 'dat

In [None]:
# Find the best parameters

# Get the row of the minimum RMSE value in the rmses list
best_params = all_params[np.argmin(rmses)]
print('Best Parameters:', best_params)

Best Parameters: {'changepoint_prior_scale': 0.5, 'seasonality_prior_scale': 10.0, 'holidays_prior_scale': 10.0, 'seasonality_mode': 'multiplicative'}
