# ❌ **Cancel Culture** ❌ - **EDA Notebook**

**Who?**
>* 🏢 **Revenue Management (RM) teams** for hotel groups (corporate, franchise)
>
>
>* 🏨 On-site GMs, Sales, and Ops teams

**Why?**
>* 💰 **Revenue Management:** 
>  * Revenue optimization: Right price, right time, right customer
>    * Dynamic pricing
>    * Distribution channels
>    * Pricing per room type
>
>
>* 🤝 **Sales:**
>  * Group sales (pickup/wash)
>  * BT (performance/company for both GPP and LNR rates)
>
>
>* 🛌 **Rooms Ops:**
>  * Forecasting occupancy, arrivals, departures, stay-overs, same-day booking demand, and probability of guest relocation in the case of oversell.
>  * Determining staff schedules and periods of high demand
>
>
>* 🍰 ☕ **Food and Beverage:**
>  * Ordering food/supplies overall
>  * Scheduling staff
>  * Determining busy times (breakfast, lunch, dinner)
>    * Staffing, specific food/supplies

**What?**
>* 🧾 Dataset comprised of... 
>  * 32 different features
>    * Detailed explanation of features (and sub-categories, when appropriate) available in Readme
>  * Nearly 120,000 reservation records
>  * Source cited in Readme

❌ **How?**
>* Which models/methods? 
>* Data prep and feature engineering

---

> **Goal:** To prepare data for time series modeling and forecasting in next notebook.
>
>
> **Purpose:** to explore, clean, and organize.
>
>
> **Process:**
>
>    * Inspecting data integrity and statistics
>    * Splitting data by hotel type ("City" vs. "Resort")
>    * Filling any missing values
>    * 
>    * Save processed data for modeling notebook
>
>
> **Modeling Notebook:**
>
>    * Performing train/test split
>    * 
>    * Training the model
>    * 
>    * Evaluate performance metrics
>    * Provide final recommendations

---

# ✅ **To-Do List**

---

**Copy:**
- [ ] Imports
- [ ] Personal module
- [ ] Data
- [ ] Starter code from P4P

**Links:**
- [ ] 

---

# 📦 **Import Packages**

In [None]:
## Data Handling
import pandas as pd
import numpy as np
from scipy import stats

## Visualizations
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px

# Modeling - SKLearn
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import StandardScaler, OneHotEncoder, FunctionTransformer
from sklearn.model_selection import train_test_split, cross_validate, cross_val_score
from sklearn.impute import SimpleImputer
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.dummy import DummyClassifier
from sklearn import set_config
set_config(display='diagram')


## Custom-made Functions
from bmc_functions import eda
from bmc_functions import classification as clf

## Settings
plt.style.use('seaborn-talk')
pd.set_option('display.max_columns', None)
pd.set_option('display.float_format', lambda x: f'{x:,.2f}')
pd.set_option('max_rows', 100)
%matplotlib inline

In [None]:
%load_ext autoreload
%autoreload 2

# 📥 **Read Data**

In [None]:
## Reading data
source = './data/hotel_bookings.csv'
data = pd.read_csv(source)
data

# 🔎 **EDA** 🔍

In [None]:
## Inspecting percentage of city vs. resort hotels
data['hotel'].value_counts(1)

## Splitting "City" and "Resort" 

In [None]:
## Creating subgroup for city hotels
subgroup_city = data[data['hotel'] == 'City Hotel']
subgroup_city

In [None]:
## Creating subgroup for resort hotels
subgroup_resort = data[data['hotel'] == 'Resort Hotel']
subgroup_resort

## Reviewing Statistics

### City

In [None]:
eda.report_df(subgroup_city).sort_values('null_sum', ascending=False)

In [None]:
# Dropping "company" column (95% missing values) , "hotel" column (only 1 value)
subgroup_city.drop(columns = ['company', 'hotel'], inplace=True)
subgroup_city

In [None]:
subgroup_city.columns

### Resort

In [None]:
eda.report_df(subgroup_resort).sort_values('null_sum', ascending=False)

In [None]:
# Dropping "company" column (95% missing values) , "hotel" column (only 1 value)
subgroup_resort.drop(columns = ['company', 'hotel'], inplace=True)
subgroup_resort

In [None]:
subgroup_resort.columns

# **EDA - Features**

## reservation_status

### City

In [None]:
subgroup_city['reservation_status'].value_counts(1, dropna=False)

### Resort

In [None]:
subgroup_resort['reservation_status'].value_counts(1, dropna=False)

### ❌ Binarizing - New Feature

### City

In [None]:
## Changing no-show values to "canceled"
subgroup_city['reservation_status'].replace('No-Show', 'Canceled',
                                            inplace=True)

In [None]:
## Confirming the change
'No-Show' in subgroup_city['reservation_status']

In [None]:
## Inspecting the updated target classes
subgroup_city['reservation_status'].value_counts(1, dropna=False)

In [None]:
cond = [subgroup_city['reservation_status'] == 'Check-Out',
       subgroup_city['reservation_status'] == 'Canceled']

choice = [0, 1]

subgroup_city['res_status_binary'] = np.select(cond, choice, 0)
subgroup_city['res_status_binary']

In [None]:
subgroup_city['res_status_binary'].value_counts(1)

#### Resort

In [None]:
## Changing no-show values to "canceled"
subgroup_resort['reservation_status'].replace('No-Show', 'Canceled',
                                            inplace=True)

In [None]:
## Confirming the change
'No-Show' in subgroup_resort['reservation_status']

In [None]:
## Inspecting the updated target classes
subgroup_resort['reservation_status'].value_counts(1, dropna=False)

In [None]:
cond = [subgroup_resort['reservation_status'] == 'Check-Out',
       subgroup_resort['reservation_status'] == 'Canceled']

choice = [0, 1]

subgroup_resort['res_status_binary'] = np.select(cond, choice, 0)
subgroup_resort['res_status_binary']

In [None]:
subgroup_resort['res_status_binary'].value_counts(1)

## is_canceled

### City

In [1]:
subgroup_city['is_canceled'].value_counts(1, dropna=False)

NameError: name 'subgroup_city' is not defined

In [2]:
## Visualizing results
fig, ax = plt.subplots()
subgroup_city['is_canceled'].value_counts(1, dropna=False).plot(kind='barh',
                                                                ax=ax)
ax.set_yticklabels(['Not Cancelled', 'Cancelled'])
ax.set_xlabel('Percentage')
ax.set_ylabel('Status')
ax.set_title('Reservation Statuses');

NameError: name 'plt' is not defined

### Resort

In [None]:
subgroup_resort['is_canceled'].value_counts(1, dropna=False)

In [None]:
## Visualizing results
fig, ax = plt.subplots()
subgroup_resort['is_canceled'].value_counts(1, dropna=False).plot(kind='barh',
                                                                ax=ax)
ax.set_yticklabels(['Not Cancelled', 'Cancelled'])
ax.set_xlabel('Percentage')
ax.set_ylabel('Status')
ax.set_title('Reservation Statuses');

## lead_time - Fix legend labels!

### City

In [None]:
subgroup_city['lead_time'].describe()

In [None]:
# fig, ax = plt.subplots(figsize=(5,10))
# subgroup_city['lead_time'].plot(kind='box', ax=ax)

# ax.set_title('Lead Time');

In [None]:
fig = px.box(subgroup_city, y="lead_time", title='Overview of Lead Time',
             width=600, color='reservation_status',
             labels = {'lead_time': 'Lead Time (Days)'},
            category_orders = {'lead_time': 'Lead Time (Days)',
                               'reservation_status': ['Check-Out', 'Canceled']})

fig.show()

### Resort

In [None]:
subgroup_resort['lead_time'].describe()

In [None]:
# fig, ax = plt.subplots(figsize=(5,10))
# subgroup_resort['lead_time'].plot(kind='box', ax=ax)

# ax.set_title('Lead Time');

In [None]:
fig = px.box(subgroup_resort, y="lead_time", title='Overview of Lead Time',
             width=600, color='reservation_status',
             labels = {'lead_time': 'Lead Time (Days)'},
            category_orders = {'lead_time': 'Lead Time (Days)',
                               'reservation_status': ['Check-Out', 'Canceled']})

fig.show()

## Arrival Date as Full Datetime

### City

In [None]:
## Converting from month, day of month, and year to a single datetime column
subgroup_city['arrival_date'] = subgroup_city['arrival_date_month'] +' '+ \
                                subgroup_city['arrival_date_day_of_month']\
                                .astype(str) +', '+ \
                                subgroup_city['arrival_date_year'].astype(str)
subgroup_city['arrival_date'] = pd.to_datetime(subgroup_city['arrival_date'])
subgroup_city['arrival_date']

In [None]:
fig = px.box(subgroup_city, y="lead_time", title='Lead Times',
             width=600, color='reservation_status',
             labels = {'lead_time': 'Lead Time (Days)'},
            category_orders = {'lead_time': 'Lead Time (Days)',
                               'reservation_status': ['Check-Out', 'Canceled']})

fig.show()

In [None]:
fig = px.histogram(subgroup_city,'lead_time', marginal = 'box',
                   color='reservation_status',
                   labels={'lead_time': 'Lead Time (Days)'}, 
                   title="Lead Times", nbins=40)
fig.update_layout(bargap=0.2)
fig.show()

In [None]:
fig = px.scatter(subgroup_city, x="reservation_status", y="lead_time",
                 marginal_y="box",marginal_x="histogram",
                 color='reservation_status')
fig.show()

In [None]:
# ## Try again with select features as dimensions
# fig = px.scatter_matrix(subgroup_city,
# #     dimensions=["sepal_width", "sepal_length", "petal_width", "petal_length"],
#     color="reservation_status")
# fig.show()

### Resort

In [None]:
## Converting from month, day of month, and year to a single datetime column
subgroup_resort['arrival_date'] = subgroup_resort['arrival_date_month'] +' '+ \
                                subgroup_resort['arrival_date_day_of_month']\
                                .astype(str) +', '+ \
                                subgroup_resort['arrival_date_year'].astype(str)
subgroup_resort['arrival_date'] = pd.to_datetime(subgroup_resort['arrival_date'])
subgroup_resort['arrival_date']

In [None]:
fig = px.box(subgroup_resort, y="lead_time", title='Lead Times',
             width=600, color='reservation_status',
             labels = {'lead_time': 'Lead Time (Days)'},
            category_orders = {'lead_time': 'Lead Time (Days)',
                               'reservation_status': ['Check-Out', 'Canceled']})

fig.show()

In [None]:
fig = px.histogram(subgroup_resort,'lead_time', marginal = 'box',
                   color='reservation_status',
                   labels={'lead_time': 'Lead Time (Days)'}, 
                   title="Lead Times", nbins=40)
fig.update_layout(bargap=0.2)
fig.show()

In [None]:
fig = px.scatter(subgroup_resort, x="reservation_status", y="lead_time",
                 marginal_y="box",marginal_x="histogram",
                 color='reservation_status')
fig.show()

In [None]:
# ## Try again with select features as dimensions
# fig = px.scatter_matrix(subgroup_resort,
# #     dimensions=["sepal_width", "sepal_length", "petal_width", "petal_length"],
#     color="reservation_status")
# fig.show()

## stays_in_weekend_nights

In [None]:
subgroup_city['stays_in_weekend_nights'].value_counts(1)

In [None]:
fig = px.histogram(subgroup_city,'stays_in_weekend_nights', marginal = 'box',
                   labels={'stays_in_weekend_nights': 'Number of Weekend Nights'}, 
                   title="Weekend Stays", color='reservation_status', nbins=15)
fig.update_layout(bargap=0.2)
fig.show()

## stays_in_week_nights

In [None]:
subgroup_city['stays_in_week_nights'].value_counts(1)

In [None]:
subgroup_city['stays_in_week_nights'].value_counts(1)[:6]

In [None]:
fig = px.histogram(subgroup_city,'stays_in_week_nights', marginal = 'box',
                   labels={'stays_in_week_nights': 'Number of Week Nights'}, 
                   title="Weekday Stays", color='reservation_status', nbins=40)
fig.update_layout(bargap=0.2)
fig.show()

## Adults

In [None]:
subgroup_city['adults'].value_counts(1, dropna=False)

In [None]:
fig = px.histogram(subgroup_city,'adults',
                   labels={'adults': 'Number of Adults'},
                   title="Number of Adults", color='reservation_status')
fig.update_layout(bargap=0.2)
fig.show()

## Children

In [None]:
subgroup_city['children'].value_counts(1, dropna=False)

In [None]:
fig = px.histogram(subgroup_city,'children', marginal = 'box',
                   labels={'children': 'Number of Children'}, 
                   title="Number of Children", color='reservation_status')
fig.update_layout(bargap=0.2)
fig.show()

## babies

In [None]:
subgroup_city['babies'].value_counts(dropna=False)

In [None]:
fig = px.histogram(subgroup_city,'babies', marginal = 'box',
                   labels={'babies': 'Number of Babies'}, 
                   title="Number of Babies", color='reservation_status')
fig.update_layout(bargap=0.2)
fig.show()

## meal

In [None]:
subgroup_city['meal'].value_counts(1, dropna=False)

In [None]:
fig = px.histogram(subgroup_city,'meal',labels={'meal': 'Types of Meals'}, 
                   title="Dining with Us?", color='reservation_status')
fig.update_layout(bargap=0.2)
fig.show()

## country

In [None]:
subgroup_city['country'].value_counts(1, dropna=False)

In [None]:
fig = px.histogram(subgroup_city,'country',
                   labels={'country': 'Country of Origin'}, 
                   title="'Where's Home?'", color='reservation_status', nbins=15)
fig.update_layout(bargap=0.2)
fig.show()

## market_segment

In [None]:
subgroup_city['market_segment'].value_counts(1, dropna=False)

In [None]:
fig = px.histogram(subgroup_city,'market_segment',
                   labels={'market_segment': 'Market Segment'}, 
                   title="Segmentation", color='reservation_status')
fig.update_layout(bargap=0.2)
fig.show()

## distribution_channel

In [None]:
subgroup_city['distribution_channel'].value_counts(1, dropna=False)

In [None]:
fig = px.histogram(subgroup_city,'distribution_channel',
                   labels={'distribution_channel': 'Channel'}, 
                   title="Distribution Channels", color='reservation_status')
fig.update_layout(bargap=0.2)
fig.show()

## is_repeated_guest

In [None]:
subgroup_city['is_repeated_guest'].value_counts(1, dropna=False)

In [None]:
fig = px.histogram(subgroup_city,'is_repeated_guest',
                   labels={'is_repeated_guest': 'Repeat Status'}, 
                   title="Welcome Back!", color='reservation_status')
fig.update_layout(bargap=0.2)
fig.show()

## previous_cancellations

In [None]:
subgroup_city['previous_cancellations'].describe()

In [None]:
fig = px.histogram(subgroup_city,'previous_cancellations',
                   labels={'previous_cancellations': 'Number of Cancellations'}, 
                   title="Previous Cancellations", color='reservation_status')
fig.update_layout(bargap=0.2)
fig.show()

## previous_bookings_not_canceled

In [None]:
subgroup_city['previous_bookings_not_canceled'].describe()

In [None]:
fig = px.histogram(subgroup_city,'previous_bookings_not_canceled', marginal = 'box',
                   labels={'previous_bookings_not_canceled': 'Number of Prior Stays'}, 
                   title="Prior Stays", color='reservation_status')
fig.update_layout(bargap=0.2)
fig.show()

##  reserved_room_type

In [None]:
subgroup_city['reserved_room_type'].value_counts(1, dropna=False)

In [None]:
fig = px.histogram(subgroup_city,'reserved_room_type',
                   labels={'reserved_room_type': 'Room Type'}, 
                   title="Reserved Room Types", color='reservation_status')
fig.update_layout(bargap=0.2)
fig.show()

## assigned_room_type

In [None]:
subgroup_city['assigned_room_type'].value_counts(1, dropna=False)

In [None]:
fig = px.histogram(subgroup_city,'assigned_room_type',
                   labels={'assigned_room_type': 'Assigned Room Type'}, 
                   title="Assigned Room Types", color='reservation_status')
fig.update_layout(bargap=0.2)
fig.show()

## booking_changes

In [None]:
subgroup_city['booking_changes'].value_counts(1)

In [None]:
fig = px.histogram(subgroup_city,'booking_changes', marginal = 'box',
                   labels={'booking_changes': 'Number of Changes'}, 
                   title="Booking Changes", color='reservation_status')
fig.update_layout(bargap=0.2)
fig.show()

## deposit_type

In [None]:
subgroup_city['deposit_type'].value_counts(1, dropna=False)

In [None]:
fig = px.histogram(subgroup_city,'deposit_type',
                   labels={'deposit_type': 'Type'}, 
                   title="Deposit Types", color='reservation_status')
fig.update_layout(bargap=0.2)
fig.show()

## agent

In [None]:
subgroup_city['agent'].value_counts(1, dropna=False)

In [None]:
fig = px.histogram(subgroup_city,'agent', marginal = 'box',
                   labels={'agent': 'Booking Agent ID Number'}, 
                   title="Bookings per Agent", color='reservation_status')
fig.update_layout(bargap=0.2)
fig.show()

## days_in_waiting_list

In [None]:
subgroup_city['days_in_waiting_list'].describe()

In [None]:
fig = px.histogram(subgroup_city,'days_in_waiting_list', marginal = 'box',
                   labels={'days_in_waiting_list': 'Number of Days'}, 
                   title="Days on Waiting List", color='reservation_status', nbins=15)
fig.update_layout(bargap=0.2)
fig.show()

## customer_type

In [None]:
subgroup_city['customer_type'].value_counts(1, dropna=False)

In [None]:
fig = px.histogram(subgroup_city,'customer_type',
                   labels={'customer_type': 'Reservation Type'}, 
                   title="Reservation Types", color='reservation_status')
fig.update_layout(bargap=0.2)
fig.show()

## adr

In [None]:
subgroup_city['adr'].describe()

In [None]:
fig = px.histogram(subgroup_city,'adr', marginal = 'box',
                   labels={'adr': 'Rate'}, title="Average Daily Rate (ADR)",
                   color='reservation_status', nbins=15)
fig.update_layout(bargap=0.2)
fig.show()

## required_car_parking_spaces

In [None]:
subgroup_city['required_car_parking_spaces'].value_counts(1, dropna=False)

In [None]:
fig = px.histogram(subgroup_city,'required_car_parking_spaces',
                   labels={'required_car_parking_spaces': 'Number of Cars'}, 
                   title="Number of Cars", color='reservation_status')
fig.update_layout(bargap=0.2)
fig.show()

## total_of_special_requests

In [None]:
subgroup_city['total_of_special_requests'].value_counts(1, dropna=False)

In [None]:
fig = px.histogram(subgroup_city,'total_of_special_requests',
                   labels={'total_of_special_requests': 'Number of Requests'}, 
                   title="Number of Special Requests", color='reservation_status')
fig.update_layout(bargap=0.2)
fig.show()

## reservation_status_date

In [None]:
subgroup_city['reservation_status_date']

# 📅 **Setting Datetime Index**

In [None]:
city_ts = subgroup_city.set_index('arrival_date')
city_ts

In [None]:
resort_ts = subgroup_resort.set_index('arrival_date')
resort_ts