# Time Series Modeling - EDA Notebook

---

> **Goal:** To prepare data for time series modeling and forecasting in next notebook.
>
>
> **Purpose:** to explore, clean, and organize.
>
>
> **Process:**
>
>    * Inspecting data integrity and statistics
>    * Splitting data by hotel type ("City" vs. "Resort")
>    * Filling any missing values
>    * Evaluating and confirming Granger causality
>    * Performing cointegration test
>    * Save processed data for modeling notebook
>
>
> **Modeling Notebook:**
>
>    * Performing train/test split
>    * Evaluating and confirming stationarity
>    * Training the model
>    * Checking serial correlation of errors via Durbin-Watson statistic
>    * Generate forecast data for training set
>    * Compare forecast against test data
>    * Evaluate performance metrics
>    * Provide final recommendations

---

# To-Do List

---

**Copy:**
- [ ] Imports
- [ ] Personal module
- [ ] Data
- [ ] Starter code from P4P

**Links:**
- [ ] Statsmodels documentation
    - [ ] [Statsmodels VAR documentation](https://www.statsmodels.org/dev/vector_ar.html)
- [ ] H2: vector autoregression
    - [ ] [Machine Learning Plus](https://www.machinelearningplus.com/time-series/vector-autoregression-examples-python/)
    - [ ] [Towards Data Science - MV TSF](https://towardsdatascience.com/multivariate-time-series-forecasting-653372b3db36)
    - [ ] [Machine Learning Mastery - MV TSF](https://machinelearningmastery.com/how-to-develop-machine-learning-models-for-multivariate-multi-step-air-pollution-time-series-forecasting/)
    - [ ] [Analytics Vidya - MV TSF](https://www.analyticsvidhya.com/blog/2018/09/multivariate-time-series-guide-forecasting-modeling-python-codes/)
    - [ ] [Analytics Vidya - Non-Stationarity](https://www.analyticsvidhya.com/blog/2018/09/non-stationary-time-series-python/)
- [ ] [Phase 4 Project - Time Series](https://github.com/BenJMcCarty/BMC_Phase_4_Project)

---

# Import Packages

In [None]:
## Data Handling
import pandas as pd
import numpy as np

## Visualizations
import matplotlib.pyplot as plt
import seaborn as sns

## Time Series Modeling
import statsmodels
import statsmodels.tsa.api as tsa
from statsmodels.tsa.seasonal import seasonal_decompose

import pmdarima as pmd
from pmdarima.arima import ndiffs
from pmdarima.arima import nsdiffs

## Custom-made Functions
from bmc_functions import eda
from bmc_functions import time_series_modeling as tsm

## Settings
plt.style.use('seaborn-talk')
pd.set_option('display.max_columns', None)
pd.set_option('display.float_format', lambda x: f'{x:,.2f}')
pd.set_option('max_rows', 100)
# %matplotlib inline
# %load_ext autoreload
# %autoreload 2

# Read Data

In [None]:
## Reading data
source = './data/hotel_bookings.csv'
data = pd.read_csv(source)
data

# EDA

## Reviewing Statistics

## Splitting "City" and "Resort" 

## Imputing Missing Values

# Setting Datetime Index


In [None]:
# city_ts = subgroup_city.set_index('arrival_date')
# city_ts

In [None]:
# resort_ts = subgroup_resort.set_index('arrival_date')
# resort_ts