# ❌ **Cancel Culture** ❌ - **EDA Notebook**

---

**Author:** Ben McCarty

**Capstone Project** - Classification, Time Series Modeling

**Contact:** bmccarty505@gmail.com

---

---

**Who?**
>* 🏢 **Revenue Management (RM) teams** for hotel groups (corporate, franchise)
>
>
>* 🏨 On-site GMs, Sales, and Ops teams

---

**Why?**
>* 💰 **Revenue Management:** 
>  * Revenue optimization: Right price, right time, right customer
>    * Dynamic pricing
>    * Distribution channels
>    * Pricing per room type
>
>
>* 🤝 **Sales:**
>  * Group sales (pickup/wash)
>  * BT (performance/company for both GPP and LNR rates)
>
>
>* 🛌 **Rooms Ops:**
>  * Forecasting occupancy, arrivals, departures, stay-overs, same-day booking demand, and probability of guest relocation in the case of oversell.
>  * Determining staff schedules and periods of high demand
>
>
>* 🍰 ☕ **Food and Beverage:**
>  * Ordering food/supplies overall
>  * Scheduling staff
>  * Determining busy times (breakfast, lunch, dinner)
>    * Staffing, specific food/supplies

---

**What?**
>* 🧾 Dataset comprised of... 
>  * 32 different features
>    * Detailed explanation of features (and sub-categories, when appropriate) available in Readme
>  * Nearly 120,000 reservation records
>  * Source cited in Readme

---

 **How?**
>* Which models/methods?
>  * 🔢 Classifiers 🌳
    * XGBoost, RFC, ABC, etc.
>  * ⏳ Time Series Analysis 📈
    * PMD auto-arima
    * Statsmodels vector autoregression
>
>
>* Data prep and feature engineering

---

---

> **Goal:** To prepare data for classification modeling in next notebook.
>
>
> **Purpose:** to explore, clean, and organize.
>
>
> **Process:**
>
>    * Inspecting data integrity and statistics
>    * Splitting data by hotel type ("City" vs. "Resort")
>    * Filling any missing values
>    * Save processed data for modeling notebook
>
>
> **Modeling Notebook:**
>
>    * Performing train/test split
>    * Training the model
>    * Evaluate performance metrics
>    * Provide final recommendations

---

# ✅ **To-Do List**

---

**Copy:**
- [ ] Imports
- [ ] Personal module
- [ ] Data
- [ ] Starter code from P4P

**Links:**
- [ ] 

---

# 📦 **Import Packages**

In [1]:
## Data Handling
import pandas as pd
import numpy as np
from scipy import stats

## Visualizations
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px

## Custom-made Functions
from bmc_functions import eda
from bmc_functions import classification as clf

In [2]:
## Settings
# plt.style.use('seaborn-talk')
pd.set_option('display.max_columns', None)
pd.set_option('display.float_format', lambda x: f'{x:,.2f}')
pd.set_option('max_rows', 100)
%matplotlib inline

In [3]:
%load_ext autoreload
%autoreload 2

# 📥 **Read Data**

In [4]:
## Reading data
source = './data/hotel_bookings.csv'
data = pd.read_csv(source)
data

Unnamed: 0,hotel,is_canceled,lead_time,arrival_date_year,arrival_date_month,arrival_date_week_number,arrival_date_day_of_month,stays_in_weekend_nights,stays_in_week_nights,adults,children,babies,meal,country,market_segment,distribution_channel,is_repeated_guest,previous_cancellations,previous_bookings_not_canceled,reserved_room_type,assigned_room_type,booking_changes,deposit_type,agent,company,days_in_waiting_list,customer_type,adr,required_car_parking_spaces,total_of_special_requests,reservation_status,reservation_status_date
0,Resort Hotel,0,342,2015,July,27,1,0,0,2,0.00,0,BB,PRT,Direct,Direct,0,0,0,C,C,3,No Deposit,,,0,Transient,0.00,0,0,Check-Out,7/1/2015
1,Resort Hotel,0,737,2015,July,27,1,0,0,2,0.00,0,BB,PRT,Direct,Direct,0,0,0,C,C,4,No Deposit,,,0,Transient,0.00,0,0,Check-Out,7/1/2015
2,Resort Hotel,0,7,2015,July,27,1,0,1,1,0.00,0,BB,GBR,Direct,Direct,0,0,0,A,C,0,No Deposit,,,0,Transient,75.00,0,0,Check-Out,7/2/2015
3,Resort Hotel,0,13,2015,July,27,1,0,1,1,0.00,0,BB,GBR,Corporate,Corporate,0,0,0,A,A,0,No Deposit,304.00,,0,Transient,75.00,0,0,Check-Out,7/2/2015
4,Resort Hotel,0,14,2015,July,27,1,0,2,2,0.00,0,BB,GBR,Online TA,TA/TO,0,0,0,A,A,0,No Deposit,240.00,,0,Transient,98.00,0,1,Check-Out,7/3/2015
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
119385,City Hotel,0,23,2017,August,35,30,2,5,2,0.00,0,BB,BEL,Offline TA/TO,TA/TO,0,0,0,A,A,0,No Deposit,394.00,,0,Transient,96.14,0,0,Check-Out,9/6/2017
119386,City Hotel,0,102,2017,August,35,31,2,5,3,0.00,0,BB,FRA,Online TA,TA/TO,0,0,0,E,E,0,No Deposit,9.00,,0,Transient,225.43,0,2,Check-Out,9/7/2017
119387,City Hotel,0,34,2017,August,35,31,2,5,2,0.00,0,BB,DEU,Online TA,TA/TO,0,0,0,D,D,0,No Deposit,9.00,,0,Transient,157.71,0,4,Check-Out,9/7/2017
119388,City Hotel,0,109,2017,August,35,31,2,5,2,0.00,0,BB,GBR,Online TA,TA/TO,0,0,0,A,A,0,No Deposit,89.00,,0,Transient,104.40,0,0,Check-Out,9/7/2017


In [5]:
## Inspecting percentage of city vs. resort hotels
data['hotel'].value_counts(1)

City Hotel     0.66
Resort Hotel   0.34
Name: hotel, dtype: float64

# 🎯 Identifying Target Feature 🎯

---

> For my classification analysis, **I will use the `is_canceled` feature as my target feature.** This feature indicates whether a reservation was canceled (0 = check-out, 1= canceled).
>
> There is another feature, `reservation_status`, that also looks valuable. I will compare that feature against `is_canceled` to investigate any differences between the two.

---

# 🪓 **Splitting "City" and "Resort"**

In [6]:
## Creating subgroup for city hotels
subgroup_city = data[data['hotel'] == 'City Hotel']
subgroup_city.drop(columns='hotel', inplace=True)
subgroup_city



A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



Unnamed: 0,is_canceled,lead_time,arrival_date_year,arrival_date_month,arrival_date_week_number,arrival_date_day_of_month,stays_in_weekend_nights,stays_in_week_nights,adults,children,babies,meal,country,market_segment,distribution_channel,is_repeated_guest,previous_cancellations,previous_bookings_not_canceled,reserved_room_type,assigned_room_type,booking_changes,deposit_type,agent,company,days_in_waiting_list,customer_type,adr,required_car_parking_spaces,total_of_special_requests,reservation_status,reservation_status_date
40060,0,6,2015,July,27,1,0,2,1,0.00,0,HB,PRT,Offline TA/TO,TA/TO,0,0,0,A,A,0,No Deposit,6.00,,0,Transient,0.00,0,0,Check-Out,7/3/2015
40061,1,88,2015,July,27,1,0,4,2,0.00,0,BB,PRT,Online TA,TA/TO,0,0,0,A,A,0,No Deposit,9.00,,0,Transient,76.50,0,1,Canceled,7/1/2015
40062,1,65,2015,July,27,1,0,4,1,0.00,0,BB,PRT,Online TA,TA/TO,0,0,0,A,A,0,No Deposit,9.00,,0,Transient,68.00,0,1,Canceled,4/30/2015
40063,1,92,2015,July,27,1,2,4,2,0.00,0,BB,PRT,Online TA,TA/TO,0,0,0,A,A,0,No Deposit,9.00,,0,Transient,76.50,0,2,Canceled,6/23/2015
40064,1,100,2015,July,27,2,0,2,2,0.00,0,BB,PRT,Online TA,TA/TO,0,0,0,A,A,0,No Deposit,9.00,,0,Transient,76.50,0,1,Canceled,4/2/2015
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
119385,0,23,2017,August,35,30,2,5,2,0.00,0,BB,BEL,Offline TA/TO,TA/TO,0,0,0,A,A,0,No Deposit,394.00,,0,Transient,96.14,0,0,Check-Out,9/6/2017
119386,0,102,2017,August,35,31,2,5,3,0.00,0,BB,FRA,Online TA,TA/TO,0,0,0,E,E,0,No Deposit,9.00,,0,Transient,225.43,0,2,Check-Out,9/7/2017
119387,0,34,2017,August,35,31,2,5,2,0.00,0,BB,DEU,Online TA,TA/TO,0,0,0,D,D,0,No Deposit,9.00,,0,Transient,157.71,0,4,Check-Out,9/7/2017
119388,0,109,2017,August,35,31,2,5,2,0.00,0,BB,GBR,Online TA,TA/TO,0,0,0,A,A,0,No Deposit,89.00,,0,Transient,104.40,0,0,Check-Out,9/7/2017


In [7]:
## Creating subgroup for resort hotels
subgroup_resort = data[data['hotel'] == 'Resort Hotel']
subgroup_resort.drop(columns='hotel', inplace=True)
subgroup_resort



A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



Unnamed: 0,is_canceled,lead_time,arrival_date_year,arrival_date_month,arrival_date_week_number,arrival_date_day_of_month,stays_in_weekend_nights,stays_in_week_nights,adults,children,babies,meal,country,market_segment,distribution_channel,is_repeated_guest,previous_cancellations,previous_bookings_not_canceled,reserved_room_type,assigned_room_type,booking_changes,deposit_type,agent,company,days_in_waiting_list,customer_type,adr,required_car_parking_spaces,total_of_special_requests,reservation_status,reservation_status_date
0,0,342,2015,July,27,1,0,0,2,0.00,0,BB,PRT,Direct,Direct,0,0,0,C,C,3,No Deposit,,,0,Transient,0.00,0,0,Check-Out,7/1/2015
1,0,737,2015,July,27,1,0,0,2,0.00,0,BB,PRT,Direct,Direct,0,0,0,C,C,4,No Deposit,,,0,Transient,0.00,0,0,Check-Out,7/1/2015
2,0,7,2015,July,27,1,0,1,1,0.00,0,BB,GBR,Direct,Direct,0,0,0,A,C,0,No Deposit,,,0,Transient,75.00,0,0,Check-Out,7/2/2015
3,0,13,2015,July,27,1,0,1,1,0.00,0,BB,GBR,Corporate,Corporate,0,0,0,A,A,0,No Deposit,304.00,,0,Transient,75.00,0,0,Check-Out,7/2/2015
4,0,14,2015,July,27,1,0,2,2,0.00,0,BB,GBR,Online TA,TA/TO,0,0,0,A,A,0,No Deposit,240.00,,0,Transient,98.00,0,1,Check-Out,7/3/2015
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
40055,0,212,2017,August,35,31,2,8,2,1.00,0,BB,GBR,Offline TA/TO,TA/TO,0,0,0,A,A,1,No Deposit,143.00,,0,Transient,89.75,0,0,Check-Out,9/10/2017
40056,0,169,2017,August,35,30,2,9,2,0.00,0,BB,IRL,Direct,Direct,0,0,0,E,E,0,No Deposit,250.00,,0,Transient-Party,202.27,0,1,Check-Out,9/10/2017
40057,0,204,2017,August,35,29,4,10,2,0.00,0,BB,IRL,Direct,Direct,0,0,0,E,E,0,No Deposit,250.00,,0,Transient,153.57,0,3,Check-Out,9/12/2017
40058,0,211,2017,August,35,31,4,10,2,0.00,0,HB,GBR,Offline TA/TO,TA/TO,0,0,0,D,D,0,No Deposit,40.00,,0,Contract,112.80,0,1,Check-Out,9/14/2017


# 📊 **Reviewing Statistics**

---

**`Report_df()`: City**

---

In [9]:
## Sorting report by number of missing values
eda.report_df(subgroup_city).sort_values('null_sum', ascending=False)

Unnamed: 0,null_sum,null_pct,datatypes,num_unique,count,mean,std,min,25%,50%,75%,max
company,75641,0.95,float64,207,3689.0,145.27,119.77,8.0,40.0,91.0,219.0,497.0
agent,8131,0.1,float64,223,71199.0,28.14,56.43,1.0,9.0,9.0,17.0,509.0
country,24,0.0,object,166,,,,,,,,
children,4,0.0,float64,4,79326.0,0.09,0.37,0.0,0.0,0.0,0.0,3.0
adr,0,0.0,float64,5405,79330.0,105.3,43.6,0.0,79.2,99.9,126.0,5400.0
previous_cancellations,0,0.0,int64,10,79330.0,0.08,0.42,0.0,0.0,0.0,0.0,21.0
market_segment,0,0.0,object,8,,,,,,,,
meal,0,0.0,object,4,,,,,,,,
previous_bookings_not_canceled,0,0.0,int64,73,79330.0,0.13,1.69,0.0,0.0,0.0,0.0,72.0
required_car_parking_spaces,0,0.0,int64,4,79330.0,0.02,0.15,0.0,0.0,0.0,0.0,3.0


---

**`Report_df()`: Resort**

---

In [10]:
## Selecting report values for columns with missing values 
eda.report_df(subgroup_resort).sort_values('null_sum', ascending=False)

Unnamed: 0,null_sum,null_pct,datatypes,num_unique,count,mean,std,min,25%,50%,75%,max
company,36952,0.92,float64,235,3108.0,241.49,125.93,6.0,154.0,223.0,330.0,543.0
agent,8209,0.2,float64,185,31851.0,217.57,88.26,1.0,240.0,240.0,242.0,535.0
country,464,0.01,object,125,,,,,,,,
adr,0,0.0,float64,5880,40060.0,94.95,61.44,-6.38,50.0,75.0,125.0,508.0
previous_cancellations,0,0.0,int64,11,40060.0,0.1,1.34,0.0,0.0,0.0,0.0,26.0
lead_time,0,0.0,int64,412,40060.0,92.68,97.29,0.0,10.0,57.0,155.0,737.0
market_segment,0,0.0,object,6,,,,,,,,
meal,0,0.0,object,5,,,,,,,,
previous_bookings_not_canceled,0,0.0,int64,31,40060.0,0.15,1.0,0.0,0.0,0.0,0.0,30.0
required_car_parking_spaces,0,0.0,int64,5,40060.0,0.14,0.35,0.0,0.0,0.0,0.0,8.0


---

**Reviewing Reports - Missing Values**

> Based on the post-split results, I see that both dataframes are missing values for `company,` `agent`, and `country`. Additionally, the `subgroup_city` dataframe is missing four values for `children`.
>
> **Special note:** As noted in the data's documentation ( located in *"details.md"*), any missing values are intentional representations of features that were not applicable to a reservation.
---

**`Company` and `Agent` Features**

> *Missing in `subgroup_city`:*
* `company:` 95%
* `agent:` 10%
>
> *Missing in `subgroup_resort`:*
* `company:`" 92%
* `agent:` 20%
>
> Due to the large number of missing values for `company`, **I will drop `company` from both dataframes.**
>
> Since the missing values for `agent` are valid, **I will keep `agent` and fill the missing values with a value to represent the lack of a value.** I will fill the missing values in the next section.

**`Country` and `Children` Features**

> The remaining two features with missing values are `country` and `children`.
>
> **As there are a small number of missing values in both dataframes' features, I will keep both features and fill the missing values with the most frequent values.** As there are so few missing values, my method for filling these missing values has a negligible impact on the final results.
>
> 

---

## Dropping `Company` Column

In [11]:
# Dropping "company" column (95% missing values)
subgroup_city.drop(columns = ['company'], inplace=True)
subgroup_city



A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



Unnamed: 0,is_canceled,lead_time,arrival_date_year,arrival_date_month,arrival_date_week_number,arrival_date_day_of_month,stays_in_weekend_nights,stays_in_week_nights,adults,children,babies,meal,country,market_segment,distribution_channel,is_repeated_guest,previous_cancellations,previous_bookings_not_canceled,reserved_room_type,assigned_room_type,booking_changes,deposit_type,agent,days_in_waiting_list,customer_type,adr,required_car_parking_spaces,total_of_special_requests,reservation_status,reservation_status_date
40060,0,6,2015,July,27,1,0,2,1,0.00,0,HB,PRT,Offline TA/TO,TA/TO,0,0,0,A,A,0,No Deposit,6.00,0,Transient,0.00,0,0,Check-Out,7/3/2015
40061,1,88,2015,July,27,1,0,4,2,0.00,0,BB,PRT,Online TA,TA/TO,0,0,0,A,A,0,No Deposit,9.00,0,Transient,76.50,0,1,Canceled,7/1/2015
40062,1,65,2015,July,27,1,0,4,1,0.00,0,BB,PRT,Online TA,TA/TO,0,0,0,A,A,0,No Deposit,9.00,0,Transient,68.00,0,1,Canceled,4/30/2015
40063,1,92,2015,July,27,1,2,4,2,0.00,0,BB,PRT,Online TA,TA/TO,0,0,0,A,A,0,No Deposit,9.00,0,Transient,76.50,0,2,Canceled,6/23/2015
40064,1,100,2015,July,27,2,0,2,2,0.00,0,BB,PRT,Online TA,TA/TO,0,0,0,A,A,0,No Deposit,9.00,0,Transient,76.50,0,1,Canceled,4/2/2015
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
119385,0,23,2017,August,35,30,2,5,2,0.00,0,BB,BEL,Offline TA/TO,TA/TO,0,0,0,A,A,0,No Deposit,394.00,0,Transient,96.14,0,0,Check-Out,9/6/2017
119386,0,102,2017,August,35,31,2,5,3,0.00,0,BB,FRA,Online TA,TA/TO,0,0,0,E,E,0,No Deposit,9.00,0,Transient,225.43,0,2,Check-Out,9/7/2017
119387,0,34,2017,August,35,31,2,5,2,0.00,0,BB,DEU,Online TA,TA/TO,0,0,0,D,D,0,No Deposit,9.00,0,Transient,157.71,0,4,Check-Out,9/7/2017
119388,0,109,2017,August,35,31,2,5,2,0.00,0,BB,GBR,Online TA,TA/TO,0,0,0,A,A,0,No Deposit,89.00,0,Transient,104.40,0,0,Check-Out,9/7/2017


In [12]:
# Dropping "company" column (95% missing values)
subgroup_resort.drop(columns = ['company'], inplace=True)
subgroup_resort



A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



Unnamed: 0,is_canceled,lead_time,arrival_date_year,arrival_date_month,arrival_date_week_number,arrival_date_day_of_month,stays_in_weekend_nights,stays_in_week_nights,adults,children,babies,meal,country,market_segment,distribution_channel,is_repeated_guest,previous_cancellations,previous_bookings_not_canceled,reserved_room_type,assigned_room_type,booking_changes,deposit_type,agent,days_in_waiting_list,customer_type,adr,required_car_parking_spaces,total_of_special_requests,reservation_status,reservation_status_date
0,0,342,2015,July,27,1,0,0,2,0.00,0,BB,PRT,Direct,Direct,0,0,0,C,C,3,No Deposit,,0,Transient,0.00,0,0,Check-Out,7/1/2015
1,0,737,2015,July,27,1,0,0,2,0.00,0,BB,PRT,Direct,Direct,0,0,0,C,C,4,No Deposit,,0,Transient,0.00,0,0,Check-Out,7/1/2015
2,0,7,2015,July,27,1,0,1,1,0.00,0,BB,GBR,Direct,Direct,0,0,0,A,C,0,No Deposit,,0,Transient,75.00,0,0,Check-Out,7/2/2015
3,0,13,2015,July,27,1,0,1,1,0.00,0,BB,GBR,Corporate,Corporate,0,0,0,A,A,0,No Deposit,304.00,0,Transient,75.00,0,0,Check-Out,7/2/2015
4,0,14,2015,July,27,1,0,2,2,0.00,0,BB,GBR,Online TA,TA/TO,0,0,0,A,A,0,No Deposit,240.00,0,Transient,98.00,0,1,Check-Out,7/3/2015
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
40055,0,212,2017,August,35,31,2,8,2,1.00,0,BB,GBR,Offline TA/TO,TA/TO,0,0,0,A,A,1,No Deposit,143.00,0,Transient,89.75,0,0,Check-Out,9/10/2017
40056,0,169,2017,August,35,30,2,9,2,0.00,0,BB,IRL,Direct,Direct,0,0,0,E,E,0,No Deposit,250.00,0,Transient-Party,202.27,0,1,Check-Out,9/10/2017
40057,0,204,2017,August,35,29,4,10,2,0.00,0,BB,IRL,Direct,Direct,0,0,0,E,E,0,No Deposit,250.00,0,Transient,153.57,0,3,Check-Out,9/12/2017
40058,0,211,2017,August,35,31,4,10,2,0.00,0,HB,GBR,Offline TA/TO,TA/TO,0,0,0,D,D,0,No Deposit,40.00,0,Contract,112.80,0,1,Check-Out,9/14/2017


In [13]:
## Confirming 'company' removal from both
'company' not in subgroup_city and 'company' not in subgroup_resort

True

## Filling missing values in `agent`

In [14]:
## Identifying unique vales for both sub-groups

unique_values = set()
for value in subgroup_city['agent'].unique():
    unique_values.add(value)
    
for value in subgroup_resort['agent'].unique():
    unique_values.add(value)

In [15]:
## Confirming uniform datatype
unique_dtype = set()
for item in unique_values:
    unique_dtype.add(type(item))
    
unique_dtype

{numpy.float64}

In [16]:
## Testing placeholder value to fill missing values
999.0 in unique_values

False

In [17]:
## Filling missing values and confirming no remaining values

for df in [subgroup_city,subgroup_resort]:
    df.loc[:,'agent'].fillna(999.0, inplace=True)
    print(df['agent'].isna().sum())

0
0




A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



## Filling Remaining Missing Values

In [18]:
## Inspecting remaining missing values
display(subgroup_city.isna().sum()[subgroup_city.isna().sum() >0])
display(subgroup_resort.isna().sum()[subgroup_resort.isna().sum() >0])

children     4
country     24
dtype: int64

country    464
dtype: int64

In [19]:
## Determining most frequent value for subgroup_city
city_child = subgroup_city['children'].mode()[0]
city_country = subgroup_city['country'].mode()[0]

print(f'Most frequent value (children): {city_child}.')
print(f'Most frequent value (country): {city_country}.')

Most frequent value (children): 0.0.
Most frequent value (country): PRT.


In [20]:
## Replacing missing values for 'children'
subgroup_city.loc[:,'children'].fillna(city_child,inplace=True)



A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



In [21]:
## Replacing missing values for 'country
subgroup_city.loc[:,'country'].fillna(city_country,inplace=True)

In [22]:
## Confirming filled missing values
subgroup_city.isna().sum()

is_canceled                       0
lead_time                         0
arrival_date_year                 0
arrival_date_month                0
arrival_date_week_number          0
arrival_date_day_of_month         0
stays_in_weekend_nights           0
stays_in_week_nights              0
adults                            0
children                          0
babies                            0
meal                              0
country                           0
market_segment                    0
distribution_channel              0
is_repeated_guest                 0
previous_cancellations            0
previous_bookings_not_canceled    0
reserved_room_type                0
assigned_room_type                0
booking_changes                   0
deposit_type                      0
agent                             0
days_in_waiting_list              0
customer_type                     0
adr                               0
required_car_parking_spaces       0
total_of_special_requests   

In [23]:
resort_country = subgroup_resort['country'].mode()[0]

In [24]:
## Filling missing value for resort - 'country'
subgroup_resort.loc[:,'country'].fillna(resort_country,inplace=True)

In [25]:
## Confirming no missing values
subgroup_resort.isna().sum()

is_canceled                       0
lead_time                         0
arrival_date_year                 0
arrival_date_month                0
arrival_date_week_number          0
arrival_date_day_of_month         0
stays_in_weekend_nights           0
stays_in_week_nights              0
adults                            0
children                          0
babies                            0
meal                              0
country                           0
market_segment                    0
distribution_channel              0
is_repeated_guest                 0
previous_cancellations            0
previous_bookings_not_canceled    0
reserved_room_type                0
assigned_room_type                0
booking_changes                   0
deposit_type                      0
agent                             0
days_in_waiting_list              0
customer_type                     0
adr                               0
required_car_parking_spaces       0
total_of_special_requests   

# 🔬 **Inspecting Feature Data Types**

---

**City**

---

In [26]:
## Inspecting dataypes for "subgroup_city"
subgroup_city.dtypes.sort_values()

is_canceled                         int64
previous_bookings_not_canceled      int64
previous_cancellations              int64
is_repeated_guest                   int64
days_in_waiting_list                int64
required_car_parking_spaces         int64
adults                              int64
babies                              int64
stays_in_weekend_nights             int64
arrival_date_day_of_month           int64
arrival_date_week_number            int64
total_of_special_requests           int64
arrival_date_year                   int64
lead_time                           int64
stays_in_week_nights                int64
booking_changes                     int64
children                          float64
adr                               float64
agent                             float64
deposit_type                       object
customer_type                      object
distribution_channel               object
reserved_room_type                 object
reservation_status                

---

**Resort**

---

In [27]:
subgroup_resort.dtypes.sort_values()

is_canceled                         int64
previous_bookings_not_canceled      int64
previous_cancellations              int64
is_repeated_guest                   int64
days_in_waiting_list                int64
required_car_parking_spaces         int64
adults                              int64
babies                              int64
stays_in_weekend_nights             int64
arrival_date_day_of_month           int64
arrival_date_week_number            int64
total_of_special_requests           int64
arrival_date_year                   int64
lead_time                           int64
stays_in_week_nights                int64
booking_changes                     int64
children                          float64
adr                               float64
agent                             float64
deposit_type                       object
customer_type                      object
distribution_channel               object
reserved_room_type                 object
reservation_status                

In [28]:
## Confirming all datatypes match between dataframes
subgroup_city.dtypes.sort_values() == subgroup_resort.dtypes.sort_values()

is_canceled                       True
previous_bookings_not_canceled    True
previous_cancellations            True
is_repeated_guest                 True
days_in_waiting_list              True
required_car_parking_spaces       True
adults                            True
babies                            True
stays_in_weekend_nights           True
arrival_date_day_of_month         True
arrival_date_week_number          True
total_of_special_requests         True
arrival_date_year                 True
lead_time                         True
stays_in_week_nights              True
booking_changes                   True
children                          True
adr                               True
agent                             True
deposit_type                      True
customer_type                     True
distribution_channel              True
reserved_room_type                True
reservation_status                True
market_segment                    True
country                  

---

**Review - Datatypes**

> After reviewing the datatypes, I noticed **one feature need to be changed to the string datatype: `agent`**. This feature represents unique identifiers for booking agents and need to be treated as categorical data.
>
> As both dataframes' datatypes are the same, I do not need to make any other adjustments specific to either dataframe.

---

## Converting to Strings

In [29]:
## Converting "agent" to string for both sub-groups

for df in [subgroup_city, subgroup_resort]:
    df.loc[:,'agent'] = df['agent'].astype(int)
    df.loc[:,'agent'] = df['agent'].astype(str)
    print(f'Datatype: {df["agent"].dtype}')

Datatype: object
Datatype: object




A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



# 🔎 **EDA - Features**

---

**In-Depth EDA per Feature**

> Now that I reviewed my missing values and confirmed my datatypes, I will inspect the details of each of my features.

---
**Note:**

> DataFrame styling code used in `explore_feature()` function adapted from this [source](https://stackoverflow.com/questions/59769161/python-color-pandas-dataframe-based-on-multiindex#:~:text=2-,You,-can%20use%20Styler).

---

## **Reservation_Status**

---

**City**

---

In [30]:
subgroup_city['reservation_status'].dtype == 'O'

True

In [215]:
eda.explore_feature(subgroup_city,'reservation_status', 
                    plot_type='histogram',
                    target_feature='is_canceled',
                    plot_label ='Status',
                    plot_title= 'Reservation Status - Resort');


| --------------------------- Feature Details ------------------------------- |



Unnamed: 0,Unnamed: 1,Unnamed: 2,reservation_status
Statistics,Check-Out,count,33102
Statistics,Check-Out,unique,2
Statistics,Check-Out,top,Canceled
Statistics,Check-Out,freq,32186
Statistics,Canceled,count,46228
Statistics,Canceled,unique,1
Statistics,Canceled,top,Check-Out
Statistics,Canceled,freq,46228
Value Counts,Check-Out,Check-Out,1.00
Value Counts,Canceled,Canceled,0.97




| --------------------------- Visualizing Results --------------------------- |


In [None]:
## Reviewing details for city - reservation_status
eda.explore_feature(subgroup_city,'reservation_status', 
                    plot_type='histogram',
                    target_feature='is_canceled',
                    plot_label ='Status',
                    plot_title= 'Reservation Status - Resort');

---

**Resort**

---

In [None]:
## Reviewing details for resoty - reservation_status
eda.explore_feature(subgroup_resort,'reservation_status', 
                    plot_type='histogram',
                    target_feature='is_canceled',
                    plot_label ='Status',
                    plot_title= 'Reservation Status - Resort');

### Review - `Reservation_Status`

---

**Feature Review**

> `Reservation_status` closely mirrors the values for my target feature, with some slight differences due to "no-show" values. **To prepare it for modeling, I will combine the `No-Show` status and `Canceled` values.**

**Actions**

>For the purposes of my analysis, **I will treat `No-Show` reservations as `Canceled` reservations** due to their limited number preventing me from effectively using it as a third class.

**City vs. Resort**

> The most notable difference between the city and resort hotels would be the number of cancellations: *The city hotel shows a much larger proportion of canceled reservations vs. the resort hotel.* 
* This may be due to a variety of factors, including resort guests booking when they are more certain of their plans or the resort hotel may charge a cancellation fee.
>
> No-Show reservations are low for both hotels, supporting my decision to merge no-shows with cancellations. 

---

### Converting `No-Show` to `Canceled`

In [None]:
## Changing no-show values to "canceled"
subgroup_city.loc[:,'reservation_status'].replace('No-Show', 'Canceled',
                                            inplace=True)
subgroup_resort.loc[:,'reservation_status'].replace('No-Show', 'Canceled',
                                            inplace=True)

In [None]:
## Confirming the change
'No-Show' not in subgroup_city['reservation_status'] and \
                        'No-Show' not in subgroup_city['reservation_status']

In [None]:
## Inspecting the updated target classes
subgroup_city['reservation_status'].value_counts(1, dropna=False)

In [None]:
subgroup_resort['reservation_status'].value_counts(1, dropna=False)

### Review - `Reservation_Status`

---

> I successfully converted all `No-Show` values to `Canceled`, **resulting in a binary classification of whether a reservation will actualize (`Check-Out`) or not (`No-Show`).**

---

## **Is_Canceled**

---

**City**

---

In [None]:
## Reviewing details for city - 'is_canceled'
eda.explore_feature(subgroup_city,'is_canceled', 
                    target_feature='is_canceled',
                    normalize=False,
                    plot_type='histogram',
                    plot_label ='Cancellation Status',
                    plot_title= 'Reservation Status - City');

---

**Resort**

---

In [None]:
## Reviewing details for resort - 'is_canceled'
eda.explore_feature(subgroup_resort,'is_canceled', 
                    target_feature='is_canceled',
                    plot_type='histogram',
                    normalize=False,
                    plot_label ='Cancellation Status',
                    plot_title= 'Reservation Status - Resort');

In [None]:
test.index.get_level_values(1)

### Review - `Is_Canceled`

---

**Feature Review**

> After reviewing the results post-"no-show" conversion, `Is_canceled` is a binarization of the `reservation_status`. Reservations are indicated as cancellations if they either cancel or are marked as a "no-show" reservation.

**Actions**

> This feature is a better target feature as the values are already binarized and match the `reservation_status` feature for all of the reservations.
>
> **I will use `is_canceled` in place of the `reservation_status` feature as my target feature.**

**City vs. Resort**

> The breakdown between hotels is the same as `reservation_status` and confirms that the resort hotel experiences fewer cancellations vs. the city hotel.

---

## **Lead_Time**

---

**City**

---

In [None]:
## Reviewing details for city - 'lead_time'
eda.explore_feature(subgroup_city,'lead_time',
                    bins = 5, plot_type='histogram',
                    marginal_x = 'box', width= 800, height=600,
                    target_feature='is_canceled',
                    plot_label ='Number of Days',
                    plot_title= 'Lead Time (Days) - City');

---

**Resort**

---

In [None]:
## Reviewing details for resort - 'lead_time'
eda.explore_feature(subgroup_resort,'lead_time',
                    bins = 5,
                    plot_type='histogram',
                    marginal_x = 'box',
                    target_feature='is_canceled',
                    plot_label ='Number of Days',
                    plot_title= 'Lead Time (Days) - Resort');

### Review - `Lead_Time`

---

**Feature Review**

> `Lead_Time` indicates how far in advance reservations are booked in days. *This information is particularly useful in hospitality for Revenue Management (RM) and Operations (Ops).*
>
>  * RM needs to know **when to expect bookings** and **when to monitor rates and availability** closely to make any necessary changes to optimize revenue.
>
>
>  * Ops uses this information to **forecast how many reservations will book in a short-term booking window** (I usually focused on 0-3 days prior to arrival).
>
> * **This forecast is critical to determine staffing and supplies in particular** - when building our schedules, we consider the current number of booked reservations and the forecasted bookings to determine how many staff members to schedule and if we have enough supplies, etc..
>  * *Being the only staff member at the Front Desk during a rush of arrivals due to a snow storm is NOT fun!*

**Actions**

> I noticed there are a significant number of outliers for both properties. **I will remove the outliers based on the z-score percentiles prior to modeling.**

**City vs. Resort**

> The histograms and box plots for both hotels match up closely, but it is clear that **the city hotel has a larger range of lead times for cancellations vs. the resort hotel.**

---

## Arrival_Date_Year

---

**City**

---

In [None]:
## Reviewing details for city - 'arrival_date_year'
eda.explore_feature(subgroup_city,'arrival_date_year',
                    plot_type='histogram',
                    marginal_x = 'box',
                    width= 800, height=600,
                    target_feature='is_canceled',
                    plot_label ='Year',
                    plot_title= 'Arrival Date (Year) - City');

---

**Resort**

---

In [None]:
## Reviewing details for resort - 'arrival_date_year'
eda.explore_feature(subgroup_resort,'arrival_date_year',
                    plot_type='histogram',
                    marginal_x = 'box',
                    width= 800, height=600,
                    target_feature='is_canceled',
                    plot_label ='Year',
                    plot_title= 'Arrival Date (Year) - Resort');

## **`Arrival_Date` as Datetime**

---

**City**

---

In [None]:
## Converting from month, day of month, and year to a single datetime column
subgroup_city['arrival_date'] = subgroup_city['arrival_date_month'] +' '+ \
                                subgroup_city['arrival_date_day_of_month']\
                                .astype(str) +', '+ \
                                subgroup_city['arrival_date_year'].astype(str)
subgroup_city['arrival_date'] = pd.to_datetime(subgroup_city['arrival_date'])
subgroup_city['arrival_date']

---

**Resort**

---

In [None]:
## Converting from month, day of month, and year to a single datetime column
subgroup_resort['arrival_date'] = subgroup_resort['arrival_date_month'] +' '+ \
                                subgroup_resort['arrival_date_day_of_month']\
                                .astype(str) +', '+ \
                                subgroup_resort['arrival_date_year'].astype(str)
subgroup_resort['arrival_date'] = pd.to_datetime(subgroup_resort['arrival_date'])
subgroup_resort['arrival_date']

### Review - `Arrival_Date`

---

**Feature Review**

> I created this new feature to merge the arrival year/month/day-of-month features into one usable feature. 

**Actions**

> PLACEHOLDER

**City vs. Resort**

> PLACEHOLDER

---

## stays_in_weekend_nights

---

**City**

---

In [None]:
## Reviewing details for city - 'stays_in_weekend_nights'
eda.explore_feature(subgroup_city,'stays_in_weekend_nights',
                    bins = 5,
                    plot_type='histogram',
                    marginal_x = 'box',
                    width= 800, height=600,
                    target_feature='is_canceled',
                    plot_label ='Number of Days',
                    plot_title= 'Lead Time (Days) - City');

---

**Resort**

---

In [None]:
## Reviewing details for resort - 'stays_in_weekend_nights'
eda.explore_feature(subgroup_resort,'stays_in_weekend_nights',
                    bins = 5,
                    plot_type='histogram',
                    marginal_x = 'box',
                    width= 800, height=600,
                    target_feature='is_canceled',
                    plot_label ='Number of Prior Stays',
                    plot_title= 'Stays in Weekend Nights - Resort');

### Review - `PLACEHOLDER`

---

**Feature Review**

> PLACEHOLDER

**Actions**

> PLACEHOLDER

**City vs. Resort**

> PLACEHOLDER

---

## stays_in_week_nights

---

**City**

---

In [None]:
## Reviewing details for city - 'stays_in_week_nights'
eda.explore_feature(subgroup_city,'stays_in_week_nights',
                    bins = 5,
                    plot_type='histogram',
                    marginal_x = 'box',
                    width= 800, height=600,
                    target_feature='is_canceled',
                    plot_label ='Number of Prior Stays',
                    plot_title= 'Stays in Week Nights - City');

---

**Resort**

---

In [None]:
## Reviewing details for resort - 'stays_in_week_nights'
eda.explore_feature(subgroup_resort,'stays_in_week_nights',
                    bins = 5,
                    plot_type='histogram',
                    marginal_x = 'box',
                    width= 800, height=600,
                    target_feature='is_canceled',
                    plot_label ='Number of Prior Stays',
                    plot_title= 'Stays in Week Nights - Resort');

### Review - `PLACEHOLDER`

---

**Feature Review**

> PLACEHOLDER

**Actions**

> PLACEHOLDER

**City vs. Resort**

> PLACEHOLDER

---

## Adults

---

**City**

---

In [None]:
## Reviewing details for city - 'adults'
eda.explore_feature(subgroup_city,'adults',
                    bins = 3,
                    plot_type='histogram',
                    marginal_x = 'box',
                    width= 800, height=600,
                    target_feature='is_canceled',
                    plot_label ='Number of Adults',
                    plot_title= 'Adults - City');

---

**Resort**

---

### Review - `PLACEHOLDER`

---

**Feature Review**

> PLACEHOLDER

**Actions**

> PLACEHOLDER

**City vs. Resort**

> PLACEHOLDER

---

## Children

---

**City**

---

In [None]:
## Reviewing details for city - 'children'
eda.explore_feature(subgroup_city,'children',
                    bins = 5,
                    plot_type='histogram',
                    marginal_x = 'box',
                    width= 800, height=600,
                    target_feature='is_canceled',
                    plot_label ='Number of Children',
                    plot_title= 'Children - City');

---

**Resort**

---

In [None]:
## Reviewing details for resort - 'children'
eda.explore_feature(subgroup_resort,'children',
                    bins = 3,
                    plot_type='histogram',
                    marginal_x = 'box',
                    width= 800, height=600,
                    target_feature='is_canceled',
                    plot_label ='Number of Children',
                    plot_title= 'Children - Resort');

### Review - `PLACEHOLDER`

---

**Feature Review**

> PLACEHOLDER

**Actions**

> PLACEHOLDER

**City vs. Resort**

> PLACEHOLDER

---

## babies

---

**City**

---

In [None]:
## Reviewing details for city - 'babies'
eda.explore_feature(subgroup_city,'babies',
                    bins = 5,
                    plot_type='histogram',
                    marginal_x = 'box',
                    width= 800, height=600,
                    target_feature='is_canceled',
                    plot_label ='Number of Babies',
                    plot_title= 'Babies - City');

---

**Resort**

---

In [None]:
## Reviewing details for resort - 'babies'
eda.explore_feature(subgroup_resort,'babies',
                    bins = 3,
                    plot_type='histogram',
                    marginal_x = 'box',
                    width= 800, height=600,
                    target_feature='is_canceled',
                    plot_label ='Number of Babies',
                    plot_title= 'Babies - Resort');

### Review - `PLACEHOLDER`

---

**Feature Review**

> PLACEHOLDER

**Actions**

> PLACEHOLDER

**City vs. Resort**

> PLACEHOLDER

---

## meal

---

**City**

---

In [None]:
subgroup_city[subgroup_city['is_canceled'] == 0]['meal'].value_counts(dropna=0, normalize=1, bins=None, sort=False).sort_index()

In [None]:
subgroup_city[subgroup_city['is_canceled'] == 1]['meal'].value_counts(dropna=0, normalize=1, bins=None, sort=False).sort_index()

In [None]:
## Reviewing details for city - 'meal'
eda.explore_feature(subgroup_city,'meal',
                    plot_type='histogram',
                    width= 800, height=600,
                    target_feature='is_canceled',
                    plot_label ='Types of Meal',
                    plot_title= 'Meal - City');

---

**Resort**

---

In [None]:
## Reviewing details for resort - 'meal'
test = eda.explore_feature(subgroup_resort,'meal',
                    plot_type='histogram',
                    width= 800, height=600,
                    target_feature='is_canceled',
                    plot_label ='Types of Meals',
                    plot_title= 'Meal - Resort')

In [None]:
# test.loc[('Statistics', 'Check-Out')]
test

### Review - `PLACEHOLDER`

---

**Feature Review**

> PLACEHOLDER

**Actions**

> PLACEHOLDER

**City vs. Resort**

> PLACEHOLDER

---

## country

---

**City**

---

In [None]:
## Reviewing details for city - 'country'
eda.explore_feature(subgroup_city,'country',
                    plot_type='histogram',
                    marginal_x = 'box',
                    normalize=False,
                    width= 800, height=600,
                    target_feature='is_canceled',
                    plot_label ='Country',
                    plot_title= 'Country - City');

---

**Resort**

---

In [None]:
## Reviewing details for resort - 'country'
eda.explore_feature(subgroup_resort,'country',
                    plot_type='histogram',
                    normalize=False,
                    marginal_x='box',
                    width= 800, height=600,
                    target_feature='is_canceled',
                    plot_label ='Country',
                    plot_title= 'Country - Resort');

### Review - `PLACEHOLDER`

---

**Feature Review**

> PLACEHOLDER

**Actions**

> PLACEHOLDER

**City vs. Resort**

> PLACEHOLDER

---

## market_segment

---

**City**

---

In [None]:
## Reviewing details for city - 'market_segment'
eda.explore_feature(subgroup_city,'market_segment',
                    plot_type='histogram',
                    marginal_x = 'box',
                    width= 800, height=600,
                    target_feature='is_canceled',
                    plot_label ='Market Segment',
                    plot_title= 'Market Segment - City');

---

**Resort**

---

In [None]:
## Reviewing details for resort - 'market_segment'
eda.explore_feature(subgroup_resort,'market_segment',
                    normalize=False,
                    plot_type='histogram',
                    width= 800, height=600,
                    target_feature='is_canceled',
                    plot_label ='Market Segment',
                    plot_title= 'Market Segment - Resort');

### Review - `PLACEHOLDER`

---

**Feature Review**

> PLACEHOLDER

**Actions**

> PLACEHOLDER

**City vs. Resort**

> PLACEHOLDER

---

## distribution_channel

---

**City**

---

In [None]:
## Reviewing details for city - 'distribution_channel'
eda.explore_feature(subgroup_city,'distribution_channel',
                    plot_type='histogram',
                    normalize=False,
                    width= 800, height=600,
                    target_feature='is_canceled',
                    plot_label ='Distribution Channel',
                    plot_title= 'Distribution Channel - City');

---

**Resort**

---

In [None]:
## Reviewing details for resort - 'distribution_channel'
eda.explore_feature(subgroup_resort,'distribution_channel',
                    plot_type='histogram',
                    normalize=False,
                    marginal_x = 'box',
                    width= 800, height=600,
                    target_feature='is_canceled',
                    plot_label ='Distribution Channel',
                    plot_title= 'Distribution Channel - Resort');

### Review - `PLACEHOLDER`

---

**Feature Review**

> PLACEHOLDER

**Actions**

> PLACEHOLDER

**City vs. Resort**

> PLACEHOLDER

---

## is_repeated_guest

---

**City**

---

In [None]:
## Reviewing details for city - 'is_repeated_guest'
eda.explore_feature(subgroup_city,'is_repeated_guest',
                    plot_type='histogram',
                    width= 800, height=600,
                    target_feature='is_canceled',
                    plot_label =' Repeat Guest',
                    plot_title= ' Repeat Guest - City');

---

**Resort**

---

In [None]:
## Reviewing details for city - 'is_repeated_guest'
eda.explore_feature(subgroup_resort,'is_repeated_guest',
                    plot_type='histogram',
                    width= 800, height=600,
                    target_feature='is_canceled',
                    plot_label ='Repeat Guest',
                    plot_title= 'Repeat Guest - City');

### Review - `PLACEHOLDER`

---

**Feature Review**

> PLACEHOLDER

**Actions**

> PLACEHOLDER

**City vs. Resort**

> PLACEHOLDER

---

## previous_cancellations

---

**City**

---

In [None]:
## Reviewing details for city - 'previous_cancellations'
eda.explore_feature(subgroup_city,'previous_cancellations',
                    bins = 5,
                    normalize=False,
                    plot_type='histogram',
                    width= 800, height=600,
                    target_feature='is_canceled',
                    plot_label ='Number of Cancellations',
                    plot_title= 'Previous Cancellations - City');

---

**Resort**

---

In [None]:
## Reviewing details for resort - 'previous_cancellations'
eda.explore_feature(subgroup_resort,'previous_cancellations',
                    bins = 4,
                    plot_type='histogram',
                    width= 800, height=600,
                    target_feature='is_canceled',
                    plot_label ='Previous Cancellations',
                    plot_title= 'Previous Cancellations - Resort');

### Review - `PLACEHOLDER`

---

**Feature Review**

> PLACEHOLDER

**Actions**

> PLACEHOLDER

**City vs. Resort**

> PLACEHOLDER

---

## previous_bookings_not_canceled

---

**City**

---

In [None]:
## Reviewing details for city - 'previous_bookings_not_canceled'
eda.explore_feature(subgroup_city,'previous_bookings_not_canceled',
                    bins = 5,
                    plot_type='histogram',
                    marginal_x = 'box',
                    width= 800, height=600,
                    target_feature='is_canceled',
                    plot_label ='Number of Bookings Not Canceled',
                    plot_title= 'Previous Bookings Not Canceled - City');

---

**Resort**

---

In [None]:
## Reviewing details for resort - 'previous_bookings_not_canceled'
eda.explore_feature(subgroup_resort,'previous_bookings_not_canceled',
                    bins = 5,
                    plot_type='histogram',
                    width= 800, height=600,
                    target_feature='is_canceled',
                    plot_label ='Previous Bookings Not Canceled',
                    plot_title= 'Previous Bookings Not Canceled - Resort');

### Review - `PLACEHOLDER`

---

**Feature Review**

> PLACEHOLDER

**Actions**

> PLACEHOLDER

**City vs. Resort**

> PLACEHOLDER

---

##  reserved_room_type

---

**City**

---

In [None]:
## Reviewing details for city - 'reserved_room_type'
eda.explore_feature(subgroup_city,'reserved_room_type',
                    plot_type='histogram',
                    width= 800, height=600,
                    target_feature='is_canceled',
                    plot_label ='Reserved Toom Type',
                    plot_title= 'Reserved Toom Type - City');

---

**Resort**

---

In [None]:
## Reviewing details for resort - 'reserved_room_type'
eda.explore_feature(subgroup_resort,'reserved_room_type',
                    plot_type='histogram',
                    width= 800, height=600,
                    target_feature='is_canceled',
                    plot_label ='Preserved Room Type',
                    plot_title= 'Preserved Room Type - Resort');

### Review - `PLACEHOLDER`

---

> TEXT 
>
> TEXT

---

## assigned_room_type

---

**City**

---

In [None]:
## Reviewing details for city - 'assigned_room_type'
eda.explore_feature(subgroup_city,'assigned_room_type',
                    plot_type='histogram',
                    width= 800, height=600,
                    target_feature='is_canceled',
                    plot_label ='Assigned Room Type',
                    plot_title= 'Assigned Room Type - City');

---

**Resort**

---

In [None]:
## Reviewing details for resort - 'assigned_room_type'
eda.explore_feature(subgroup_resort,'assigned_room_type',
                    plot_type='histogram',
                    width= 800, height=600,
                    target_feature='is_canceled',
                    plot_label ='Assigned Room Type',
                    plot_title= 'Assigned Room Type - Resort');

### Review - `PLACEHOLDER`

---

> TEXT 
>
> TEXT

---

## booking_changes

---

**City**

---

In [None]:
## Reviewing details for city - 'booking_changes'
eda.explore_feature(subgroup_city,'booking_changes',
                    bins = 5,
                    plot_type='histogram',
                    width= 800, height=600,
                    target_feature='is_canceled',
                    plot_label ='booking_changes',
                    plot_title= 'booking_changes - city');

---

**Resort**

---

In [None]:
## Reviewing details for resort - 'booking_changes'
eda.explore_feature(subgroup_resort,'booking_changes',
                    bins = 5,
                    plot_type='histogram',
                    width= 800, height=600,
                    target_feature='is_canceled',
                    plot_label ='Booking Changes ',
                    plot_title= 'Booking Changes - Resort');

### Review - `PLACEHOLDER`

---

> TEXT 
>
> TEXT

---

## deposit_type

---

**City**

---

In [None]:
## Reviewing details for resort - 'deposit_type'
eda.explore_feature(subgroup_resort,'deposit_type',
                    plot_type='histogram',
                    normalize=False,
                    width= 800, height=600,
                    target_feature='is_canceled',
                    plot_label ='Deposit Type',
                    plot_title= 'Deposit Type - Resort');

---

**Resort**

---

In [None]:
## Reviewing details for resort - 'deposit_type'
eda.explore_feature(subgroup_resort,'deposit_type',
                    plot_type='histogram',
                    width= 800, height=600,
                    target_feature='is_canceled',
                    plot_label ='Deposit Type',
                    plot_title= 'Deposit Type - Resort');

### Review - `PLACEHOLDER`

---

> TEXT 
>
> TEXT

---

## agent

---

**City**

---

In [None]:
## Reviewing details for resort - 'agent'
eda.explore_feature(subgroup_resort,'agent',
                    plot_type='histogram',
                    width= 800, height=600,
                    target_feature='is_canceled',
                    plot_label ='Booking Agent',
                    plot_title= 'Agent - Resort');

---

**Resort**

---

In [None]:
## Reviewing details for resort - 'agent'
eda.explore_feature(subgroup_resort,'agent',
                    plot_type='histogram',
                    width= 800, height=600,
                    target_feature='is_canceled',
                    plot_label ='Booking Agent',
                    plot_title= 'Agent - Resort');

### Review - `PLACEHOLDER`

---

> TEXT 
>
> TEXT

---

## days_in_waiting_list

---

**City**

---

In [None]:
## Reviewing details for resort - 'days_in_waiting_list'
eda.explore_feature(subgroup_resort,'days_in_waiting_list',
                    bins = 5,
                    normalize=False,
                    plot_type='histogram',
                    width= 800, height=600,
                    target_feature='is_canceled',
                    plot_label ='Days in Waiting List',
                    plot_title= 'Days in Waiting List - Resort');

---

**Resort**

---

In [None]:
## Reviewing details for resort - 'days_in_waiting_list'
eda.explore_feature(subgroup_resort,'days_in_waiting_list',
                    bins = 5,
                    normalize=False,
                    plot_type='histogram',
                    width= 800, height=600,
                    target_feature='is_canceled',
                    plot_label ='Days in Waiting List',
                    plot_title= 'Days in Waiting List - Resort');

### Review - `PLACEHOLDER`

---

> TEXT 
>
> TEXT

---

## customer_type

---

**City**

---

In [None]:
## Reviewing details for city - 'customer_type'
eda.explore_feature(subgroup_city,'customer_type',
                    plot_type='histogram',
                    marginal_x = 'box',
                    width= 800, height=600,
                    target_feature='is_canceled',
                    plot_label ='Customer Type',
                    plot_title= 'Customer Type - City');

---

**Resort**

---

In [None]:
## Reviewing details for resort - 'customer_type'
eda.explore_feature(subgroup_resort,'customer_type',
                    plot_type='histogram',
                    marginal_x = 'box',
                    width= 800, height=600,
                    target_feature='is_canceled',
                    plot_label ='Customer Type',
                    plot_title= 'Customer Type - Resort');

### Review - `PLACEHOLDER`

---

> TEXT 
>
> TEXT

---

## adr

---

**City**

---

In [None]:
## Reviewing details for city - 'adr'
eda.explore_feature(subgroup_city ,'adr',
                    bins = 5,
                    plot_type='histogram',
                    width= 800, height=600,
                    target_feature='is_canceled',
                    plot_label ='ADR (€)',
                    plot_title= 'ADR (€) - City');

---

**Resort**

---

In [None]:
## Reviewing details for resort - 'adr'
eda.explore_feature(subgroup_resort,'adr',
                    bins = 5,
                    plot_type='histogram',
                    width= 800, height=600,
                    target_feature='is_canceled',
                    plot_label ='ADR (€)',
                    plot_title= 'ADR (€) - Resort');

### Review - `PLACEHOLDER`

---

> TEXT 
>
> TEXT

---

## required_car_parking_spaces

---

**City**

---

In [None]:
## Reviewing details for city - 'required_car_parking_spaces'
eda.explore_feature(subgroup_city,'required_car_parking_spaces',
                    bins = 5,
                    normalize=False,
                    plot_type='histogram',
                    width= 800, height=600,
                    target_feature='is_canceled',
                    plot_label ='Required Car Parking Spaces',
                    plot_title= 'Required Car Parking Spaces - City');

---

**Resort**

---

In [None]:
## Reviewing details for resort - 'required_car_parking_spaces'
eda.explore_feature(subgroup_resort,'required_car_parking_spaces',
                    bins = 5,
                    plot_type='histogram',
                    width= 800, height=600,
                    target_feature='is_canceled',
                    plot_label ='Required Car Parking Spaces',
                    plot_title= 'Required Car Parking Spaces - Resort');

### Review - `PLACEHOLDER`

---

> TEXT 
>
> TEXT

---

## total_of_special_requests

---

**City**

---

In [None]:
## Reviewing details for city - 'total_of_special_requests'
eda.explore_feature(subgroup_resort,'total_of_special_requests',
                    bins = 5,
                    plot_type='histogram',
                    marginal_x = 'box',
                    width= 800, height=600,
                    target_feature='is_canceled',
                    plot_label ='Total of Special Requests',
                    plot_title= 'Total of Special Requests - City');

---

**Resort**

---

In [None]:
## Reviewing details for resort - 'total_of_special_requests'
eda.explore_feature(subgroup_resort,'total_of_special_requests',
                    bins = 5,
                    plot_type='histogram',
                    marginal_x = 'box',
                    width= 800, height=600,
                    target_feature='is_canceled',
                    plot_label ='Booking',
                    plot_title= 'Booking - Resort');

### Review - `PLACEHOLDER`

---

> TEXT 
>
> TEXT

---

## reservation_status_date

---

**City**

---

In [None]:
## Reviewing details for city - 'reservation_status_date'
eda.explore_feature(subgroup_city,'reservation_status_date',
                    plot_type='histogram',
                    marginal_x = 'box',
                    width= 800, height=600,
                    target_feature='is_canceled',
                    plot_label ='Reservation Status Date',
                    plot_title= 'Reservation Status Date - City');

---

**Resort**

---

In [None]:
## Reviewing details for resort - 'reservation_status_date'
eda.explore_feature(subgroup_resort,'reservation_status_date',
                    plot_type='histogram',
                    marginal_x = 'box',
                    width= 800, height=600,
                    target_feature='is_canceled',
                    plot_label ='reservation_status_date',
                    plot_title= 'reservation_status_date - Resort');

### Review - `PLACEHOLDER`

---

> TEXT 
>
> TEXT

---

# Post-EDA 

---

> Now that I reviewed all of my features; confirmed there are no missing values; and confirmed all of the datatypes are correct.
>
> 
---