# Application of fbProphet for estimation of Sales of the TPS Jan 2022 Data 📈

![](https://miro.medium.com/max/1400/1*BVIwEoE5oEmHJU8XbV_mKA.png)

This notebook is in continuation of my [EDA of TPS 2022 + fbProphet Baseline](https://www.kaggle.com/dextermojo/eda-of-tps-2022-fbprophet-baseline) notebook. 

In this notebook I'll be showcasing the application of Facebook Prophet for Multi-variate Time Series Modelling.



<a id='toc'/><br/>
# Table of Contents

1. [Installation of Prophet](#install)
2. [Basic Setup](#bs)
3. [Basic Feature Engineering](#basic-feature)
4. [Recording Specific holidays](#holidays)
5. [Application of Prophet (without Holidays) on the data](#prophet1)
6. [Application of Prophet (with Holidays) on the data](#prophet2)
7. [References](#ref)

<a id='install'/><br/>
## Installation of Prophet

In [None]:
!pip install fbprophet

<a id='bs'/><br/>
## Basic Setup

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.graph_objects as go
import plotly.offline as pyo
import holidays
import warnings
import os
import sys

from fbprophet import Prophet
from collections import defaultdict

warnings.filterwarnings('ignore')
pyo.init_notebook_mode()

In [None]:
df_train = pd.read_csv('../input/tabular-playground-series-jan-2022/train.csv', index_col='date', parse_dates=True, infer_datetime_format=True)
df_test = pd.read_csv('../input/tabular-playground-series-jan-2022/test.csv', index_col='date', parse_dates=True, infer_datetime_format=True)

print(f'Shape of the train dataset : {df_train.shape}')
print(f'Shape of the test dataset : {df_test.shape}')

<a id='basic-feature'/><br/>
## Basic Feature Engineering

In [None]:
# Training Data Feature engineering
df_train['weekday'] = df_train.index.weekday
df_train['days'] = df_train.index.day
df_train['months'] = df_train.index.month


# Test Data Feature engineering
df_test['weekday'] = df_test.index.weekday
df_test['days'] = df_test.index.day
df_test['months'] = df_test.index.month

In [None]:
def get_holidays(obj, yrs, **kwargs):
    """
    Function to get the holidays of a specific country/region.
    
    Parameters:
    -----------
    obj : holidays.object
        Represents a holidays object of a specific country
        
        
    yrs : array-like
        Represents the years.
        
    Returns:
    -------
    df_holidays : pandas.DataFrame
        Represents the holiday list of the country/region.
        
        
    Usage:
    -----
    >>> import holidays
    >>> finland_holidays = get_holidays(holidays.Finland, range(2015, 2020))
    >>> 
    """
    
    temp_data = defaultdict(lambda : list())
    for yr in yrs:
        for data in obj(years=yr).items():
            temp_data[data[1]].append(data[0].strftime('%d-%m-%Y'))
            
            
    df_holidays = pd.DataFrame()
    for days, dts in temp_data.items():
        _df = pd.DataFrame({
            'holiday' : days,
            'ds' : pd.to_datetime(dts),
            'lower_window' : kwargs.get('lower_window', 0),
            'upper_window' : kwargs.get('upper_window', 1)
        })
        
        df_holidays = df_holidays.append(_df)
        del _df
        
    del temp_data
        
    return df_holidays

<a id="holidays"/><br/>
## Recording Specific holidays

<a id='fin-holiday'/><br/>
### Recording Finland Holidays

In [None]:
finland_holidays = get_holidays(holidays.Finland, range(2015, 2019))

<a id='nor-holiday'/><br/>
### Recording Norway Holidays

In [None]:
norway_holidays = get_holidays(holidays.Norway, range(2015, 2019))

<a id="swe-holiday"/><br/>
### Recording Sweden Holidays

In [None]:
sweden_holidays = get_holidays(holidays.Sweden, range(2015, 2019))

<a id="prophet1"/><br/>
## Application of Prophet (without Holidays) on the data

In [None]:
def create_models(data, holidays):
    
    """
    Function to create Prophet models 
    
    Parameters:
    -----------
    data : pandas.DataFrame
        Represents the dataframe with which we are working.
    
    holidays : dict
        Represents a dictionary of dataframes containing specific holidays (if required)
        
    Returns:
    --------
    models : array-like
        Contains a list of Prophet Models.
        
    combinations : array-like
        Contains a specific list of combinations of (country, store, product)
        
        
    Usage:
    ------
    >>> from prophet import Prophet
    >>> models, combinations = create_models(data, {}) # in case of no specific holidays
    >>> ...
    >>> models, combinations = create_models(data, {'USA':holidays_USA}) # in case of specific holidays
    >>> ...
    """
    combinations = list()
    
    for country in data['country'].unique():
        for store in data['store'].unique():
            for prod in data['product'].unique():
                combinations.append((country, store, prod, holidays.get(country, None)))
                
    
    total = data['country'].nunique() * data['store'].nunique() * data['product'].nunique()
    
    models = [Prophet(holidays=combinations[i][-1]) for i in range(total)]
    
    return models, combinations

In [None]:
def model_fitting_and_forecasting(data, combinations, models, regressors, **kwargs):
    
    """
    Function to fit models and then forecast on the data
    
    Parameters:
    -----------
    data : pandas.DataFrame
        Represents the training data or a specific dataset.
        
    combinations : array-like
        Reperesents the specific combinations for our model
        
    models : array-like
        Represents a list of Prophet model (for our use case)
        
    regressors : array-like 
        Represents the extra variables that the model will take into consideration while training/forecasting
        
    Returns:
    -------
    preds : array-like
        Represents the forecast values of the data
        
    Usage:
    -----
    >>> 
    >>> preds = model_fitting_and_forecasting(df_train, combinationsm models, regressors)
    >>> .....
    
    """
    
    
    # add the regressors to the model
    for m in models:
        for reg in regressors:
            m.add_regressor(reg)
            
    # Train and forecast the model
    preds = list()
    
    for i, com in enumerate(combinations):
        _df = data.loc[(data['country'] == com[0]) & (data['store'] == com[1]) & (data['product'] == com[2])][['num_sold', *regressors]].reset_index()
        _df = _df.rename(columns={'date':'ds', 'num_sold':'y'})
        models[i].fit(_df)
        
        future = models[i].make_future_dataframe(periods=kwargs.get('periods', 365), freq=kwargs.get('freq', 'D'))
        future['weekday'] = future['ds'].dt.weekday
        future['days'] = future['ds'].dt.day
        future['months'] = future['ds'].dt.month
        
        
        forecast = models[i].predict(future)[['ds', 'yhat']].tail(kwargs.get('periods', 365))
        preds.append(forecast)
        del forecast
        del future
        del _df
        
    return preds
        
        
    

In [None]:
def create_submission_file(test, preds, combinations, filename):
    """
    Function to create the submission files for the competetion
    
    Parameters:
    -----------        
    test : panadas.DataFrame
        Represents the test data.
        
    preds : array-like
        Represents the forecast made for the test data.
        
    combinations : array-like
        Represents the combination data.
        
    filename : str
        Represents the submission file name.
        
        
    Returns:
    -------
    None
    
    Usage:
    -----
    >>> ....
    >>> create_submission_file(df_test, preds, combinations, 'sample_submission.csv')
    >>> ....
    """
    
    for i in range(len(combinations)):
        preds[i]['country'] = combinations[i][0]
        preds[i]['store'] = combinations[i][1]
        preds[i]['product'] = combinations[i][2]
        
        
    submission = test.copy().reset_index()
    for i, com in enumerate(combinations):
        submission.loc[(submission['country'] == com[0]) & (submission['store'] == com[1]) & (submission['product'] == com[2]), 'num_sold'] = preds[i]['yhat'].values
        
    submission = submission[['row_id', 'num_sold']]
    submission.to_csv(filename, index=False)
    

### Model Creation

In [None]:
models1, combinations1 = create_models(df_train, {})

### Model Fitting and Forecasting

In [None]:
preds = model_fitting_and_forecasting(df_train, combinations1, models1, ['weekday', 'days', 'months'])

### Create the Submission File

In [None]:
create_submission_file(df_test, preds, combinations1, 'submission_without_holidays.csv')

<a id="prophet2"/><br/>
## Application of Prophet (with Holidays) on the data

### Model Creation

In [None]:
models2, combinations2 = create_models(df_train, {'Finland':finland_holidays, 
                                                 'Norway' : norway_holidays,
                                                 'Sweden' : sweden_holidays})

### Model Fitting and Forecasting

In [None]:
preds = model_fitting_and_forecasting(df_train, combinations2, models2, ['weekday', 'days', 'months'])

### Create the Submission File

In [None]:
create_submission_file(df_test, preds, combinations2, 'submission_with_holidays.csv')

<a id="ref"/><br/>
## References

1. https://www.kaggle.com/dextermojo/eda-of-tps-2022-fbprophet-baseline
2. https://www.youtube.com/watch?v=XZhPO043lqU&list=PL3N9eeOlCrP5cK0QRQxeJd6GrQvhAtpBK&index=11
3. https://facebook.github.io/prophet/docs/seasonality,_holiday_effects,_and_regressors.html
4. https://www.geeksforgeeks.org/python-holidays-library/