# ❌ **Cancel Culture** ❌ - **EDA Notebook**

---

**Author:** Ben McCarty

**Capstone Project** - Classification, Time Series Modeling

**Contact:** bmccarty505@gmail.com

---

---

**Who?**
>* 🏢 **Revenue Management (RM) teams** for hotel groups (corporate, franchise)
>
>
>* 🏨 On-site GMs, Sales, and Ops teams

---

**Why?**
>* 💰 **Revenue Management:** 
>  * Revenue optimization: Right price, right time, right customer
>    * Dynamic pricing
>    * Distribution channels
>    * Pricing per room type
>
>
>* 🤝 **Sales:**
>  * Group sales (pickup/wash)
>  * BT (performance/company for both GPP and LNR rates)
>
>
>* 🛌 **Rooms Ops:**
>  * Forecasting occupancy, arrivals, departures, stay-overs, same-day booking demand, and probability of guest relocation in the case of oversell.
>  * Determining staff schedules and periods of high demand
>
>
>* 🍰 ☕ **Food and Beverage:**
>  * Ordering food/supplies overall
>  * Scheduling staff
>  * Determining busy times (breakfast, lunch, dinner)
>    * Staffing, specific food/supplies

---

**What?**
>* 🧾 Dataset comprised of... 
>  * 32 different features
>    * Detailed explanation of features (and sub-categories, when appropriate) available in Readme
>  * Nearly 120,000 reservation records
>  * Source cited in Readme

---

 **How?**
>* Which models/methods?
>  * 🔢 Classifiers 🌳
    * XGBoost, RFC, ABC, etc.
>  * ⏳ Time Series Analysis 📈
    * PMD auto-arima
    * Statsmodels vector autoregression
>
>
>* Data prep and feature engineering

---

---

> **Goal:** To prepare data for classification modeling in next notebook.
>
>
> **Purpose:** to explore, clean, and organize.
>
>
> **Process:**
>
>    * Inspecting data integrity and statistics
>    * Splitting data by hotel type ("City" vs. "Resort")
>    * Filling any missing values
>    * Save processed data for modeling notebook
>
>
> **Modeling Notebook:**
>
>    * Performing train/test split
>    * Training the model
>    * Evaluate performance metrics
>    * Provide final recommendations

---

# ✅ **To-Do List**

---

**Copy:**
- [ ] Imports
- [ ] Personal module
- [ ] Data
- [ ] Starter code from P4P

**Links:**
- [ ] 

---

# 📦 **Import Packages**

In [None]:
## Data Handling
import pandas as pd
import numpy as np
from scipy import stats

## Visualizations
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px

# Modeling - SKLearn
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import StandardScaler, OneHotEncoder, FunctionTransformer
from sklearn.model_selection import train_test_split, cross_validate, cross_val_score
from sklearn.impute import SimpleImputer
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.dummy import DummyClassifier
from sklearn import set_config
set_config(display='diagram')


## Custom-made Functions
from bmc_functions import eda
from bmc_functions import classification as clf

## Settings
# plt.style.use('seaborn-talk')
pd.set_option('display.max_columns', None)
pd.set_option('display.float_format', lambda x: f'{x:,.2f}')
pd.set_option('max_rows', 100)
%matplotlib inline

In [None]:
%load_ext autoreload
%autoreload 2

# 📥 **Read Data**

In [None]:
## Reading data
source = './data/hotel_bookings.csv'
data = pd.read_csv(source)
data

In [None]:
## Inspecting percentage of city vs. resort hotels
data['hotel'].value_counts(1)

# 🪓 **Splitting "City" and "Resort"**

In [None]:
## Creating subgroup for city hotels
subgroup_city = data[data['hotel'] == 'City Hotel']
subgroup_city.drop(columns='hotel', inplace=True)
subgroup_city

In [None]:
## Creating subgroup for resort hotels
subgroup_resort = data[data['hotel'] == 'Resort Hotel']
subgroup_resort.drop(columns='hotel', inplace=True)
subgroup_resort

### Testing Hierarchical Indexing

---

> Instead of splitting the data into two different dataframes, I may be able to create a new index for the same dataframe by splitting the "`hotel`" feature and using the two values as the first level of the row index, then the normal index values as the second level.
>
>
> This would add a layer of complexity to the data processing steps, but would reduce memory consumption and the number of dataframes.

---

In [None]:
# data_mi = data
# data_mi

In [None]:
# ## Creating new multi-index from hotel types and original index values
# data_mi.reset_index(inplace=True)
# multi = data_mi.set_index(['hotel'])
# multi

In [None]:
# ## Testing indexing  - City Hotel
# multi.loc['City Hotel']

In [None]:
# ## Testing indexing  - Resort Hotel
# multi.loc['Resort Hotel']

In [None]:
# eda.report_df(multi.loc['City Hotel']).sort_values('null_sum', ascending=False)

---

**Hierarchical Indexing Results**

> While the multi-indexed results can represent the dimensionality of the data, it is not best for this dataset. I will continue to use the sub-grouped dataframes for my analysis and modeling.

---

# 📊 **Reviewing Statistics**

---

**`Report_df()`: City**

---

In [None]:
## Sorting report by number of missing values
eda.report_df(subgroup_city).sort_values('null_sum', ascending=False)

---

**`Report_df()`: Resort**

---

In [None]:
## Selecting report values for columns with missing values 
eda.report_df(subgroup_resort).sort_values('null_sum', ascending=False)

---

**Reviewing Reports - Missing Values**

> Based on the post-split results, I see that both dataframes are missing values for `company,` `agent`, and `country`. Additionally, the `subroup_city` dataframe is missing four values for `children`.
>
> **Special note:** As noted in the data's documentation ( located in "details.md"), any missing values are intentional representations of features that were not applicable to a reservation.
---

**`Company` and `Agent` Features**

> *Missing in `subgroup_city`:*
* `company:` 95%
* `agent:` 10%
>
> *Missing in `subgroup_resort`:*
* `company:`" 92%
* `agent:` 20%

> **Due to the large number of missing values for `company`, I will drop that column from both dataframes.** Since the missing values for `agent` are valid, I will keep the column and fill the missing values with the value "N/A"  to represent the lack of a value. I will fill the values in the next section.

**`Country` and `Children` Features**

> The remaining two features with missing values are `country` and `children`.**As there are a small number of missing values in both dataframes' features, I will keep both features. I will use** `SimpleImputer` **transformer during my preprocessing pipeline step to impute values and use a** `GridSearchCV` **to determine the best method.**

---

##### Dropping "Company" Column

In [None]:
# Dropping "company" column (95% missing values)
subgroup_city.drop(columns = ['company'], inplace=True)
subgroup_city

In [None]:
# Dropping "company" column (95% missing values)
subgroup_resort.drop(columns = ['company'], inplace=True)
subgroup_resort

In [None]:
## Confirming 'company' removal from both
'company' not in subgroup_city and 'company' not in subgroup_resort

# 🔬 **Inspecting Feature Data Types**

---

**City**

---

In [None]:
## Inspecting dataypes for "subgroup_city"
subgroup_city.dtypes.sort_values()

---

**Resort**

---

In [None]:
## Inspecting dataypes for "subgroup_resort"
subgroup_resort.dtypes.sort_values()

In [None]:
## Confirming all datatypes match between dataframes
subgroup_city.dtypes.sort_values() == subgroup_resort.dtypes.sort_values()

---

**Review - Datatypes**

> After reviewing the datatypes, I noticed **one feature need to be changed to the string datatype: `agent`**. This feature represents unique identifiers for booking agents and need to be treated as categorical data.
>
> As both dataframes' datatypes are the same, I do not need to make any other adjustments specific to either dataframe.

---

## Converting to Strings

In [None]:
## Converting subgroup_city "country" to string
subgroup_city.loc[:,'country'] = subgroup_city.loc[:,'country'].astype(str)
subgroup_city.loc[:,'country']

In [None]:
## Converting subgroup_resort "country" to string
subgroup_resort.loc[:,'country'] = subgroup_resort.loc[:,'country']\
                                                                .astype(str)
subgroup_resort.loc[:,'country']

# 🔎 **EDA - Features**

---

> Now that I reviewed my missing values and confirmed my datatypes, I will inspect the details of each of my features.

---

> DataFrame styling code used in `explore_feature()` function adapted from this [source](https://stackoverflow.com/questions/59769161/python-color-pandas-dataframe-based-on-multiindex#:~:text=2-,You,-can%20use%20Styler).

---

## **Reservation_Status**

---

**City**

---

In [None]:
eda.explore_feature_test(subgroup_city,'reservation_status', 
                    plot_type='histogram',
                    target_feature='reservation_status',
                    plot_label ='Status',
                    plot_title= 'Reservation Status - Resort');

In [None]:
## Reviewing details for city - reservation_status
eda.explore_feature(subgroup_city,'reservation_status', 
                    plot_type='histogram',
                    target_feature='reservation_status',
                    plot_label ='Status',
                    plot_title= 'Reservation Status - Resort');

---

**Resort**

---

In [None]:
## Reviewing details for resoty - reservation_status
eda.explore_feature(subgroup_resort,'reservation_status', 
                    plot_type='histogram',
                    target_feature='reservation_status',
                    plot_label ='Status',
                    plot_title= 'Reservation Status - Resort');

### Review - `Reservation_Status`

---

**Feature Review**

> `Reservation_status` will be my target feature for my classification modeling. **To prepare it for modeling, I will need to replace the `No-Show` status with `Canceled` values.**

**Actions**

>For the purposes of my analysis, **I will treat `No-Show` reservations as `Canceled` reservations** due to their limited number preventing me from effectively using it as a third class.

**City vs. Resort**

> The most notable difference between the city and resort hotels would be the number of cancellations: *The city hotel shows a much larger proportion of canceled reservations vs. the resort hotel.* 
* This may be due to a variety of factors, including resort guests booking when they are more certain of their plans or the resort hotel may charge a cancellation fee.
>
> No-Show reservations are low for both hotels, supporting my decision to merge no-shows with cancellations. 

---

### Converting `No-Show` to `Canceled`

In [None]:
## Changing no-show values to "canceled"
subgroup_city.loc[:,'reservation_status'].replace('No-Show', 'Canceled',
                                            inplace=True)
subgroup_resort.loc[:,'reservation_status'].replace('No-Show', 'Canceled',
                                            inplace=True)

In [None]:
## Confirming the change
'No-Show' not in subgroup_city['reservation_status'] and \
                        'No-Show' not in subgroup_city['reservation_status']

In [None]:
## Inspecting the updated target classes
subgroup_city['reservation_status'].value_counts(1, dropna=False)

In [None]:
subgroup_resort['reservation_status'].value_counts(1, dropna=False)

### Review - `Reservation_Status`

---

> I successfully converted all `No-Show` values to `Canceled`, **resulting in a binary classification of whether a reservation will actualize (`Check-Out`) or not (`No-Show`).**

---

## **Is_Canceled**

---

**City**

---

In [None]:
## Reviewing details for city - 'is_canceled'
eda.explore_feature(subgroup_city,'is_canceled', 
                    plot_type='histogram',
                    target_feature='reservation_status',
                    plot_label ='Cancellation Status',
                    plot_title= 'Reservation Status - City');

---

**Resort**

---

In [None]:
## Reviewing details for resort - 'is_canceled'
eda.explore_feature(subgroup_resort,'is_canceled', 
                    plot_type='histogram',
                    target_feature='reservation_status',
                    plot_label ='Cancellation Status',
                    plot_title= 'Reservation Status - Resort');

### Review - `Is_Canceled`

---

**Feature Review**

> After reviewing the results post-"no-show" conversion, `Is_canceled` is a binarization of the `reservation_status`. Reservations are indicated as cancellations if they either cancel or are marked as a "no-show" reservation.

**Actions**

> This feature is a better target feature as the values are already binarized and match the `reservation_status` feature for all of the reservations.
>
> I will use this feature in place of the `reservation_status` feature as my target feature.

**City vs. Resort**

> The breakdown between hotels is the same as `reservation_status` and confirms that the resort hotel experiences fewer cancellations vs. the city hotel.

---

## **Lead_Time**

---

**City**

---

In [None]:
subgroup_city.head().style.set_properties(**{
    'font-size': "107.5%",
})

In [None]:
## Reviewing details for city - 'lead_time'
eda.explore_feature(subgroup_city,'lead_time',
                    bins = 5,
                    plot_type='histogram',
                    marginal_x = 'box',
                    width= 800, height=600,
                    target_feature='reservation_status',
                    plot_label ='Number of Days',
                    plot_title= 'Lead Time (Days) - City');

---

**Resort**

---

In [None]:
## Reviewing details for resort - 'lead_time'
eda.explore_feature(subgroup_resort,'lead_time',
                    bins = 5,
                    plot_type='histogram',
                    marginal_x = 'box',
                    target_feature='reservation_status',
                    plot_label ='Number of Days',
                    plot_title= 'Lead Time (Days) - Resort');

### Review - `Lead_Time`

---

**Feature Review**

> `Lead_Time` indicates how far in advance reservations are booked in days. *This information is particularly useful in hospitality for Revenue Management (RM) and Operations (Ops).*
>
>  * RM needs to know **when to expect bookings** and **when to monitor rates and availability** closely to make any necessary changes to optimize revenue.
>
>
>  * Ops uses this information to **forecast how many reservations will book in a short-term booking window** (I usually focused on 0-3 days prior to arrival).
>
> * **This forecast is critical to determine staffing and supplies in particular** - when building our schedules, we consider the current number of booked reservations and the forecasted bookings to determine how many staff members to schedule and if we have enough supplies, etc..
>  * *Being the only staff member at the Front Desk during a rush of arrivals due to a snow storm is NOT fun!*

**Actions**

> I noticed there are a significant number of outliers for both properties. **I will remove the outliers based on the z-score percentiles prior to modeling.**

**City vs. Resort**

> The histograms and box plots for both hotels match up closely, but it is clear that **the city hotel has a larger range of lead times for cancellations vs. the resort hotel.**

---

## Arrival_Date_Year

In [None]:
## Reviewing details for city - 'lead_time'
eda.explore_feature(subgroup_city,'arrival_date_year',
#                     bins = 5,
                    plot_type='histogram',
                    marginal_x = 'box',
                    width= 800, height=600,
                    target_feature='reservation_status',
                    plot_label ='Year',
                    plot_title= 'Arrival Date (Year) - City');

## **`Arrival_Date` as Datetime**

---

**City**

---

In [None]:
## Converting from month, day of month, and year to a single datetime column
subgroup_city['arrival_date'] = subgroup_city['arrival_date_month'] +' '+ \
                                subgroup_city['arrival_date_day_of_month']\
                                .astype(str) +', '+ \
                                subgroup_city['arrival_date_year'].astype(str)
subgroup_city['arrival_date'] = pd.to_datetime(subgroup_city['arrival_date'])
subgroup_city['arrival_date']

---

**Resort**

---

In [None]:
## Converting from month, day of month, and year to a single datetime column
subgroup_resort['arrival_date'] = subgroup_resort['arrival_date_month'] +' '+ \
                                subgroup_resort['arrival_date_day_of_month']\
                                .astype(str) +', '+ \
                                subgroup_resort['arrival_date_year'].astype(str)
subgroup_resort['arrival_date'] = pd.to_datetime(subgroup_resort['arrival_date'])
subgroup_resort['arrival_date']

### Review - `Arrival_Date`

---

**Feature Review**

> I created this new feature to merge the arrival year/month/day-of-month features into one usable feature. 

**Actions**

> PLACEHOLDER

**City vs. Resort**

> PLACEHOLDER

---

## stays_in_weekend_nights

---

**City**

---

In [None]:
## Reviewing details for city - 'lead_time'
eda.explore_feature(subgroup_city,'lead_time',
                    bins = 5,
                    plot_type='histogram',
                    marginal_x = 'box',
                    width= 800, height=600,
                    target_feature='reservation_status',
                    plot_label ='Number of Days',
                    plot_title= 'Lead Time (Days) - City');

---

**Resort**

---

### Review - `PLACEHOLDER`

---

**Feature Review**

> PLACEHOLDER

**Actions**

> PLACEHOLDER

**City vs. Resort**

> PLACEHOLDER

---

## stays_in_week_nights

---

**City**

---

In [None]:
## Reviewing details for city - 'lead_time'
eda.explore_feature(subgroup_city,'lead_time',
                    bins = 5,
                    plot_type='histogram',
                    marginal_x = 'box',
                    width= 800, height=600,
                    target_feature='reservation_status',
                    plot_label ='Number of Days',
                    plot_title= 'Lead Time (Days) - City');

---

**Resort**

---

### Review - `PLACEHOLDER`

---

**Feature Review**

> PLACEHOLDER

**Actions**

> PLACEHOLDER

**City vs. Resort**

> PLACEHOLDER

---

## Adults

---

**City**

---

In [None]:
## Reviewing details for city - 'lead_time'
eda.explore_feature(subgroup_city,'lead_time',
                    bins = 5,
                    plot_type='histogram',
                    marginal_x = 'box',
                    width= 800, height=600,
                    target_feature='reservation_status',
                    plot_label ='Number of Days',
                    plot_title= 'Lead Time (Days) - City');

---

**Resort**

---

### Review - `PLACEHOLDER`

---

**Feature Review**

> PLACEHOLDER

**Actions**

> PLACEHOLDER

**City vs. Resort**

> PLACEHOLDER

---

## Children

---

**City**

---

In [None]:
## Reviewing details for city - 'lead_time'
eda.explore_feature(subgroup_city,'lead_time',
                    bins = 5,
                    plot_type='histogram',
                    marginal_x = 'box',
                    width= 800, height=600,
                    target_feature='reservation_status',
                    plot_label ='Number of Days',
                    plot_title= 'Lead Time (Days) - City');

---

**Resort**

---

### Review - `PLACEHOLDER`

---

**Feature Review**

> PLACEHOLDER

**Actions**

> PLACEHOLDER

**City vs. Resort**

> PLACEHOLDER

---

## babies

---

**City**

---

In [None]:
## Reviewing details for city - 'lead_time'
eda.explore_feature(subgroup_city,'lead_time',
                    bins = 5,
                    plot_type='histogram',
                    marginal_x = 'box',
                    width= 800, height=600,
                    target_feature='reservation_status',
                    plot_label ='Number of Days',
                    plot_title= 'Lead Time (Days) - City');

---

**Resort**

---

### Review - `PLACEHOLDER`

---

**Feature Review**

> PLACEHOLDER

**Actions**

> PLACEHOLDER

**City vs. Resort**

> PLACEHOLDER

---

## meal

---

**City**

---

In [None]:
## Reviewing details for city - 'lead_time'
eda.explore_feature(subgroup_city,'lead_time',
                    bins = 5,
                    plot_type='histogram',
                    marginal_x = 'box',
                    width= 800, height=600,
                    target_feature='reservation_status',
                    plot_label ='Number of Days',
                    plot_title= 'Lead Time (Days) - City');

---

**Resort**

---

### Review - `PLACEHOLDER`

---

**Feature Review**

> PLACEHOLDER

**Actions**

> PLACEHOLDER

**City vs. Resort**

> PLACEHOLDER

---

## country

---

**City**

---

In [None]:
## Reviewing details for city - 'lead_time'
eda.explore_feature(subgroup_city,'lead_time',
                    bins = 5,
                    plot_type='histogram',
                    marginal_x = 'box',
                    width= 800, height=600,
                    target_feature='reservation_status',
                    plot_label ='Number of Days',
                    plot_title= 'Lead Time (Days) - City');

---

**Resort**

---

### Review - `PLACEHOLDER`

---

**Feature Review**

> PLACEHOLDER

**Actions**

> PLACEHOLDER

**City vs. Resort**

> PLACEHOLDER

---

## market_segment

---

**City**

---

In [None]:
## Reviewing details for city - 'lead_time'
eda.explore_feature(subgroup_city,'lead_time',
                    bins = 5,
                    plot_type='histogram',
                    marginal_x = 'box',
                    width= 800, height=600,
                    target_feature='reservation_status',
                    plot_label ='Number of Days',
                    plot_title= 'Lead Time (Days) - City');

---

**Resort**

---

### Review - `PLACEHOLDER`

---

**Feature Review**

> PLACEHOLDER

**Actions**

> PLACEHOLDER

**City vs. Resort**

> PLACEHOLDER

---

## distribution_channel

---

**City**

---

In [None]:
## Reviewing details for city - 'lead_time'
eda.explore_feature(subgroup_city,'lead_time',
                    bins = 5,
                    plot_type='histogram',
                    marginal_x = 'box',
                    width= 800, height=600,
                    target_feature='reservation_status',
                    plot_label ='Number of Days',
                    plot_title= 'Lead Time (Days) - City');

---

**Resort**

---

### Review - `PLACEHOLDER`

---

**Feature Review**

> PLACEHOLDER

**Actions**

> PLACEHOLDER

**City vs. Resort**

> PLACEHOLDER

---

## is_repeated_guest

---

**City**

---

In [None]:
## Reviewing details for city - 'lead_time'
eda.explore_feature(subgroup_city,'lead_time',
                    bins = 5,
                    plot_type='histogram',
                    marginal_x = 'box',
                    width= 800, height=600,
                    target_feature='reservation_status',
                    plot_label ='Number of Days',
                    plot_title= 'Lead Time (Days) - City');

---

**Resort**

---

In [None]:
## Reviewing details for city - 'lead_time'
eda.explore_feature(subgroup_resort,'lead_time',
                    bins = 5,
                    plot_type='histogram',
                    marginal_x = 'box',
                    width= 800, height=600,
                    target_feature='reservation_status',
                    plot_label ='Number of Days',
                    plot_title= 'Lead Time (Days) - City');

### Review - `PLACEHOLDER`

---

**Feature Review**

> PLACEHOLDER

**Actions**

> PLACEHOLDER

**City vs. Resort**

> PLACEHOLDER

---

## previous_cancellations

---

**City**

---

In [None]:
## Reviewing details for city - 'lead_time'
eda.explore_feature(subgroup_city,'lead_time',
                    bins = 5,
                    plot_type='histogram',
                    marginal_x = 'box',
                    width= 800, height=600,
                    target_feature='reservation_status',
                    plot_label ='Number of Days',
                    plot_title= 'Lead Time (Days) - City');

---

**Resort**

---

In [None]:
## Reviewing details for city - 'lead_time'
eda.explore_feature(subgroup_resort,'lead_time',
                    bins = 5,
                    plot_type='histogram',
                    marginal_x = 'box',
                    width= 800, height=600,
                    target_feature='reservation_status',
                    plot_label ='Number of Days',
                    plot_title= 'Lead Time (Days) - City');

### Review - `PLACEHOLDER`

---

**Feature Review**

> PLACEHOLDER

**Actions**

> PLACEHOLDER

**City vs. Resort**

> PLACEHOLDER

---

## previous_bookings_not_canceled

---

**City**

---

In [None]:
## Reviewing details for city - 'lead_time'
eda.explore_feature(subgroup_city,'lead_time',
                    bins = 5,
                    plot_type='histogram',
                    marginal_x = 'box',
                    width= 800, height=600,
                    target_feature='reservation_status',
                    plot_label ='Number of Days',
                    plot_title= 'Lead Time (Days) - City');

---

**Resort**

---

In [None]:
## Reviewing details for city - 'lead_time'
eda.explore_feature(subgroup_resort,'lead_time',
                    bins = 5,
                    plot_type='histogram',
                    marginal_x = 'box',
                    width= 800, height=600,
                    target_feature='reservation_status',
                    plot_label ='Number of Days',
                    plot_title= 'Lead Time (Days) - City');

### Review - `PLACEHOLDER`

---

**Feature Review**

> PLACEHOLDER

**Actions**

> PLACEHOLDER

**City vs. Resort**

> PLACEHOLDER

---

##  reserved_room_type

---

**City**

---

In [None]:
## Reviewing details for city - 'lead_time'
eda.explore_feature(subgroup_city,'lead_time',
                    bins = 5,
                    plot_type='histogram',
                    marginal_x = 'box',
                    width= 800, height=600,
                    target_feature='reservation_status',
                    plot_label ='Number of Days',
                    plot_title= 'Lead Time (Days) - City');

---

**Resort**

---

In [None]:
## Reviewing details for city - 'lead_time'
eda.explore_feature(subgroup_resort,'lead_time',
                    bins = 5,
                    plot_type='histogram',
                    marginal_x = 'box',
                    width= 800, height=600,
                    target_feature='reservation_status',
                    plot_label ='Number of Days',
                    plot_title= 'Lead Time (Days) - City');

### Review - `PLACEHOLDER`

---

> TEXT 
>
> TEXT

---

## assigned_room_type

---

**City**

---

In [None]:
## Reviewing details for city - 'lead_time'
eda.explore_feature(subgroup_city,'lead_time',
                    bins = 5,
                    plot_type='histogram',
                    marginal_x = 'box',
                    width= 800, height=600,
                    target_feature='reservation_status',
                    plot_label ='Number of Days',
                    plot_title= 'Lead Time (Days) - City');

---

**Resort**

---

In [None]:
## Reviewing details for resort - 'lead_time'
eda.explore_feature(subgroup_resort,'lead_time',
                    bins = 5,
                    plot_type='histogram',
                    marginal_x = 'box',
                    width= 800, height=600,
                    target_feature='reservation_status',
                    plot_label ='Number of Days',
                    plot_title= 'Lead Time (Days) - Resort');

### Review - `PLACEHOLDER`

---

> TEXT 
>
> TEXT

---

## booking_changes

---

**City**

---

In [None]:
## Reviewing details for city - 'lead_time'
eda.explore_feature(subgroup_city,'lead_time',
                    bins = 5,
                    plot_type='histogram',
                    marginal_x = 'box',
                    width= 800, height=600,
                    target_feature='reservation_status',
                    plot_label ='Number of Days',
                    plot_title= 'Lead Time (Days) - city');

---

**Resort**

---

In [None]:
## Reviewing details for resort - 'lead_time'
eda.explore_feature(subgroup_resort,'lead_time',
                    bins = 5,
                    plot_type='histogram',
                    marginal_x = 'box',
                    width= 800, height=600,
                    target_feature='reservation_status',
                    plot_label ='Number of Days',
                    plot_title= 'Lead Time (Days) - Resort');

### Review - `PLACEHOLDER`

---

> TEXT 
>
> TEXT

---

## deposit_type

---

**City**

---

In [None]:
## Reviewing details for resort - 'lead_time'
eda.explore_feature(subgroup_resort,'lead_time',
                    bins = 5,
                    plot_type='histogram',
                    marginal_x = 'box',
                    width= 800, height=600,
                    target_feature='reservation_status',
                    plot_label ='Number of Days',
                    plot_title= 'Lead Time (Days) - Resort');

---

**Resort**

---

In [None]:
## Reviewing details for resort - 'lead_time'
eda.explore_feature(subgroup_resort,'lead_time',
                    bins = 5,
                    plot_type='histogram',
                    marginal_x = 'box',
                    width= 800, height=600,
                    target_feature='reservation_status',
                    plot_label ='Number of Days',
                    plot_title= 'Lead Time (Days) - Resort');

### Review - `PLACEHOLDER`

---

> TEXT 
>
> TEXT

---

## agent

---

**City**

---

In [None]:
## Reviewing details for resort - 'lead_time'
eda.explore_feature(subgroup_resort,'lead_time',
                    bins = 5,
                    plot_type='histogram',
                    marginal_x = 'box',
                    width= 800, height=600,
                    target_feature='reservation_status',
                    plot_label ='Number of Days',
                    plot_title= 'Lead Time (Days) - Resort');

---

**Resort**

---

In [None]:
## Reviewing details for resort - 'lead_time'
eda.explore_feature(subgroup_resort,'lead_time',
                    bins = 5,
                    plot_type='histogram',
                    marginal_x = 'box',
                    width= 800, height=600,
                    target_feature='reservation_status',
                    plot_label ='Number of Days',
                    plot_title= 'Lead Time (Days) - Resort');

### Review - `PLACEHOLDER`

---

> TEXT 
>
> TEXT

---

## days_in_waiting_list

---

**City**

---

In [None]:
## Reviewing details for resort - 'lead_time'
eda.explore_feature(subgroup_resort,'lead_time',
                    bins = 5,
                    plot_type='histogram',
                    marginal_x = 'box',
                    width= 800, height=600,
                    target_feature='reservation_status',
                    plot_label ='Number of Days',
                    plot_title= 'Lead Time (Days) - Resort');

---

**Resort**

---

In [None]:
## Reviewing details for resort - 'lead_time'
eda.explore_feature(subgroup_resort,'lead_time',
                    bins = 5,
                    plot_type='histogram',
                    marginal_x = 'box',
                    width= 800, height=600,
                    target_feature='reservation_status',
                    plot_label ='Number of Days',
                    plot_title= 'Lead Time (Days) - Resort');

### Review - `PLACEHOLDER`

---

> TEXT 
>
> TEXT

---

## customer_type

---

**City**

---

In [None]:
## Reviewing details for resort - 'lead_time'
eda.explore_feature(subgroup_resort,'lead_time',
                    bins = 5,
                    plot_type='histogram',
                    marginal_x = 'box',
                    width= 800, height=600,
                    target_feature='reservation_status',
                    plot_label ='Number of Days',
                    plot_title= 'Lead Time (Days) - Resort');

---

**Resort**

---

In [None]:
## Reviewing details for resort - 'lead_time'
eda.explore_feature(subgroup_resort,'lead_time',
                    bins = 5,
                    plot_type='histogram',
                    marginal_x = 'box',
                    width= 800, height=600,
                    target_feature='reservation_status',
                    plot_label ='Number of Days',
                    plot_title= 'Lead Time (Days) - Resort');

### Review - `PLACEHOLDER`

---

> TEXT 
>
> TEXT

---

## adr

---

**City**

---

In [None]:
## Reviewing details for resort - 'lead_time'
eda.explore_feature(subgroup_resort,'lead_time',
                    bins = 5,
                    plot_type='histogram',
                    marginal_x = 'box',
                    width= 800, height=600,
                    target_feature='reservation_status',
                    plot_label ='Number of Days',
                    plot_title= 'Lead Time (Days) - Resort');

---

**Resort**

---

In [None]:
## Reviewing details for resort - 'lead_time'
eda.explore_feature(subgroup_resort,'lead_time',
                    bins = 5,
                    plot_type='histogram',
                    marginal_x = 'box',
                    width= 800, height=600,
                    target_feature='reservation_status',
                    plot_label ='Number of Days',
                    plot_title= 'Lead Time (Days) - Resort');

### Review - `PLACEHOLDER`

---

> TEXT 
>
> TEXT

---

## required_car_parking_spaces

---

**City**

---

In [None]:
## Reviewing details for resort - 'lead_time'
eda.explore_feature(subgroup_resort,'lead_time',
                    bins = 5,
                    plot_type='histogram',
                    marginal_x = 'box',
                    width= 800, height=600,
                    target_feature='reservation_status',
                    plot_label ='Number of Days',
                    plot_title= 'Lead Time (Days) - Resort');

---

**Resort**

---

In [None]:
## Reviewing details for resort - 'lead_time'
eda.explore_feature(subgroup_resort,'lead_time',
                    bins = 5,
                    plot_type='histogram',
                    marginal_x = 'box',
                    width= 800, height=600,
                    target_feature='reservation_status',
                    plot_label ='Number of Days',
                    plot_title= 'Lead Time (Days) - Resort');

### Review - `PLACEHOLDER`

---

> TEXT 
>
> TEXT

---

## total_of_special_requests

---

**City**

---

In [None]:
## Reviewing details for resort - 'lead_time'
eda.explore_feature(subgroup_resort,'lead_time',
                    bins = 5,
                    plot_type='histogram',
                    marginal_x = 'box',
                    width= 800, height=600,
                    target_feature='reservation_status',
                    plot_label ='Number of Days',
                    plot_title= 'Lead Time (Days) - Resort');

---

**Resort**

---

In [None]:
## Reviewing details for resort - 'lead_time'
eda.explore_feature(subgroup_resort,'lead_time',
                    bins = 5,
                    plot_type='histogram',
                    marginal_x = 'box',
                    width= 800, height=600,
                    target_feature='reservation_status',
                    plot_label ='Number of Days',
                    plot_title= 'Lead Time (Days) - Resort');

### Review - `PLACEHOLDER`

---

> TEXT 
>
> TEXT

---

## reservation_status_date

---

**City**

---

In [None]:
## Reviewing details for resort - 'lead_time'
eda.explore_feature(subgroup_resort,'lead_time',
                    bins = 5,
                    plot_type='histogram',
                    marginal_x = 'box',
                    width= 800, height=600,
                    target_feature='reservation_status',
                    plot_label ='Number of Days',
                    plot_title= 'Lead Time (Days) - Resort');

---

**Resort**

---

In [None]:
## Reviewing details for resort - 'lead_time'
eda.explore_feature(subgroup_resort,'lead_time',
                    bins = 5,
                    plot_type='histogram',
                    marginal_x = 'box',
                    width= 800, height=600,
                    target_feature='reservation_status',
                    plot_label ='Number of Days',
                    plot_title= 'Lead Time (Days) - Resort');

### Review - `PLACEHOLDER`

---

> TEXT 
>
> TEXT

---

# 📅 **Setting Datetime Index**

In [None]:
city_ts = subgroup_city.set_index('arrival_date')
city_ts

In [None]:
resort_ts = subgroup_resort.set_index('arrival_date')
resort_ts