# Hotel Cancel Culture - Time Series Modeling and Forecasting

---

**Forecasting Cancellations**

> * **Goal:** Forecast cancellations for the given hotel data
> * **Why:** Predictions only work on preexisting reservations
    * *How can we forecast occupancy without depending on preexisting reservations?*
> * **How:** Using probabilities generated from prior classification modeling to forecast future cancellations

---

[Return to workflow](#return)

# Imports

In [None]:
## JNB tool to reload functions when called
%load_ext autoreload
%autoreload 2

In [None]:
## Data Handling
import pandas as pd
import numpy as np

## Visualizations
import matplotlib as mpl
import matplotlib.pyplot as plt
import seaborn as sns

## Time Series Modeling
import statsmodels
import statsmodels.tsa.api as tsa
from statsmodels.tsa.seasonal import seasonal_decompose

import pmdarima as pmd
from pmdarima.arima import ndiffs
from pmdarima.arima import nsdiffs

## Custom-made Functions
from bmc_functions import eda
from bmc_functions import time_series_modeling as tsm

In [None]:
## Settings
%matplotlib inline
plt.style.use('seaborn-talk')
pd.set_option('display.max_columns', None)
pd.set_option('display.float_format', lambda x: f'{x:,.2f}')
pd.set_option('max_rows', 100)

## Reminder on how to use ihelp function

In [None]:
# import fsds as fs

In [None]:
# fs.ihelp(tsm.ts_split, file_location=True)

# Reading Data

In [None]:
## Reading data
source = './data/data_probs.pickle'
data = pd.read_pickle(source, compression = "gzip")
data

In [None]:
## Setting Datetime index
data = data.set_index(data['arrival_date'])
data

In [None]:
# ts_sum = data['is_canceled'].resample('D').sum()
# display(ts_sum)
# ts_sum.plot();

In [None]:
## Resampling for average daily cancellations
ts_avg = data['is_canceled'].resample('D').mean()
display(ts_avg)
ts_avg.plot();

In [None]:
ts_avg = ts_avg.loc["2016":'2017']
ts_avg

In [None]:
## Creating train/test split
split_dict = tsm.ts_split(ts_avg, show_vis=True)

In [None]:
split_dict['train']

In [None]:
decomp = tsa.seasonal_decompose(split_dict['train'].loc['03-2016':'06-2016'])
# decomp.seasonal.plot(figsize = (15,4));
decomp.plot();

In [None]:
## Pre-determining differencing values

n_d = ndiffs(split_dict['train'])
n_D = nsdiffs(split_dict['train'], m=7)

display(n_d, n_D)

In [None]:
results = tsm.ts_modeling_workflow(ts_avg, m=7, show_vis=True);

In [None]:
results.keys()

In [None]:
results['model_visuals']['train']['vis']

| --- **Return to workflow** --- | <a name = return></a>

---

- Determined it's too complicated to use the probabilities for TSM for now
    - instead resampled data and generated the average daily cancellations

- Determined "m" via seasonal decomp (also, think about the nature of the data - patterns emerge over the course of a week and repeat).


- Took note of several issues with workflow function
    - show_vis inconsistent
    - results are in a deeply nested dictionary
        - simplify?
    - no progress-tracking print statements

---