# Effect of asking for a deposit on booking cancellation

### Objective
The primary goal of this project is to estimate the causal impact of deposit requirements on hotel reservation cancellations. By understanding how the imposition of a deposit affects the likelihood of cancellations, we aim to provide actionable insights that can guide hoteliers in optimizing their booking and cancellation policies. This analysis seeks to uncover not just the overall effect but also how this impact varies across different customer segments and booking conditions.

### Background
Deposit policies are a crucial aspect of hotel reservation systems, serving as a tool to reduce the risk of cancellations and ensure revenue stability. These policies, however, can have a double-edged effect. While potentially lowering the rate of last-minute cancellations, they might also deter potential customers from making a reservation due to the upfront cost. The balance between these outcomes depends on various factors, including customer behavior, market segment, and the competitive landscape of the hotel industry. Given this complexity, a nuanced understanding of deposit requirements' effects is essential for crafting policies that enhance both customer satisfaction and hotel revenue.

### Strategy for estimating the effect of asking for a deposit

#### Assumption 
- I will be using meta-learner models. Thus, note I have to make create a very bold and strong assumption that the there is no unobserved confounding facotrs that affect both the treatment and the outcome. 
- This is likely not realistic. However, for the purpose of this project, I need to put this assumption in place.
  

### Choice Between S-Learner and T-Learner
- I choose T-learner but why?

Given the significant imbalance between reservations requiring a deposit (Treatment Group) and those that do not (Control Group), the T-Learner approach is identified as the most appropriate for our analysis. The S-Learner's effectiveness diminishes with such disparities, as it integrates the treatment as a feature in a unified model, potentially weakening the treatment effect estimation. The T-Learner, by constructing separate models for each group, better accommodates the imbalance, enabling a more precise estimation of how deposit requirements influence cancellation rates. This choice allows for a nuanced analysis, crucial for our project's aim to derive targeted policy recommendations based on the causal impact of deposit requirements. 

- Why not X-learn or R-learner?
    - it was too complex for this project.

## Data Exploration and Preprocessing

- **For Some basic exploration, refer to the Classification notebook**

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

In [2]:
df = pd.read_csv('hotel_booking.csv')

In [3]:
df.head()

Unnamed: 0,hotel,is_canceled,lead_time,arrival_date_year,arrival_date_month,arrival_date_week_number,arrival_date_day_of_month,stays_in_weekend_nights,stays_in_week_nights,adults,...,customer_type,adr,required_car_parking_spaces,total_of_special_requests,reservation_status,reservation_status_date,name,email,phone-number,credit_card
0,Resort Hotel,0,342,2015,July,27,1,0,0,2,...,Transient,0.0,0,0,Check-Out,2015-07-01,Ernest Barnes,Ernest.Barnes31@outlook.com,669-792-1661,************4322
1,Resort Hotel,0,737,2015,July,27,1,0,0,2,...,Transient,0.0,0,0,Check-Out,2015-07-01,Andrea Baker,Andrea_Baker94@aol.com,858-637-6955,************9157
2,Resort Hotel,0,7,2015,July,27,1,0,1,1,...,Transient,75.0,0,0,Check-Out,2015-07-02,Rebecca Parker,Rebecca_Parker@comcast.net,652-885-2745,************3734
3,Resort Hotel,0,13,2015,July,27,1,0,1,1,...,Transient,75.0,0,0,Check-Out,2015-07-02,Laura Murray,Laura_M@gmail.com,364-656-8427,************5677
4,Resort Hotel,0,14,2015,July,27,1,0,2,2,...,Transient,98.0,0,1,Check-Out,2015-07-03,Linda Hines,LHines@verizon.com,713-226-5883,************5498


In [11]:
df.deposit_type.value_counts()

deposit_type
No Deposit    104641
Non Refund     14587
Refundable       162
Name: count, dtype: int64

I want to drop Refundable because treatment has to be binary for this analysis. 
Thus, no deposit will be 0 whereas non-refundable will be 1 (in order to estimate the effect of asking for deposit). 

Furthermore, including "refundable" to "Non-refund" is not recommended because user behaviour can be different for refundable and non-refundable.  

In [ ]:
# drop refundable
df = df[df.deposit_type != 'Refundable']