<a href="https://colab.research.google.com/github/Rajesh2015/Credit-Card-Transaction-Sql-Analysis/blob/main/inn_hotel_bookings_solution.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Inn Hotels Bookings
#### Context
A significant number of hotel bookings are called off due to cancellations or no-shows. The typical reasons for cancellations include change of plans, scheduling conflicts, etc. This is often made easier by the option to do so free of charge or preferably at a low cost, which benefits hotel guests. However, it is a less desirable and potentially revenue-diminishing factor for hotels. Such losses are particularly high for last-minute cancellations.

New technologies involving online booking channels have dramatically changed customers' booking possibilities and behavior, adding further complexity to how hotels manage cancellations. These are no longer limited to traditional booking and guest characteristics.

Cancellations impact hotels in several ways:
1. **Loss of resources** (revenue) when the hotel cannot resell the room.
2. **Additional distribution costs**, including increased commissions or paying for publicity to help sell these rooms.
3. **Last-minute price reductions** to resell a room, lowering the profit margin.
4. **Increased human resource needs** to manage guest arrangements.

#### Objective
The rising number of cancellations calls for a Machine Learning-based solution to predict which bookings are likely to be canceled. INN Hotels Group, a hotel chain in Portugal, is facing problems with high booking cancellations. They have sought out your firm for a data-driven solution. As a data scientist, your task is to analyze the provided data to identify the factors most influential on booking cancellations. You will then build a predictive model to anticipate cancellations in advance, helping the hotel formulate profitable cancellation and refund policies.

---
### Data Dictionary

The data contains the different attributes of customers' booking details. Below is a detailed description of the fields in the dataset.

- **Booking_ID**: The unique identifier of each booking.
- **no_of_adults**: Number of adults.
- **no_of_children**: Number of children.
- **no_of_weekend_nights**: Number of weekend nights (Saturday or Sunday) the guest stayed or booked to stay at the hotel.
- **no_of_week_nights**: Number of weeknights (Monday to Friday) the guest stayed or booked to stay at the hotel.
- **type_of_meal_plan**: Type of meal plan booked by the customer:
  - Not Selected – No meal plan selected.
  - Meal Plan 1 – Breakfast.
  - Meal Plan 2 – Half board (breakfast and one other meal).
  - Meal Plan 3 – Full board (breakfast, lunch, and dinner).
- **required_car_parking_space**: Does the customer require a car parking space? (0 - No, 1 - Yes).
- **room_type_reserved**: Type of room reserved by the customer. The values are encoded by INN Hotels Group.
- **lead_time**: Number of days between the date of booking and the arrival date.
- **arrival_year**: Year of arrival date.
- **arrival_month**: Month of arrival date.
- **arrival_date**: Day of the month of arrival.
- **market_segment_type**: Market segment designation.
- **repeated_guest**: Is the customer a repeated guest? (0 - No, 1 - Yes).
- **no_of_previous_cancellations**: Number of previous bookings canceled by the customer before the current booking.
- **no_of_previous_bookings_not_canceled**: Number of previous bookings not canceled by the customer before the current booking.
- **avg_price_per_room**: Average price per day of the reservation; prices of the rooms are dynamic (in euros).
- **no_of_special_requests**: Total number of special requests made by the customer (e.g., high floor, room view, etc.).
- **booking_status**: Flag indicating if the booking was canceled or not.


In [1]:
# Libraries to help with reading and manipulating data
import numpy as np
import pandas as pd

# Libraries to help with data visualization
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
# Library to help with statistical analysis
import scipy.stats as stats

## **Loading the dataset**

In [2]:
# mount the drive
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [3]:
# root path for the sample data
path='/content/drive/MyDrive/Python Course'

In [4]:
# load the data in to panda dataframe
inn_hotel_df=pd.read_csv(f'{path}/INNHotelsGroup.csv')

In [5]:
# copying data to another variable to avoid any changes to original data
df = inn_hotel_df.copy()

In [6]:
colors = sns.color_palette('Set2')  # Get Set2 color palette for future use
sns.set(style="darkgrid") # Set grid style

## **Data Overview**

- Observe the first few rows of the dataset, to check whether the dataset has been loaded properly or not
- Get information about the number of rows and columns in the dataset
- Find out the data types of the columns to ensure that data is stored in the preferred format and the value of each property is as expected.
- Check the statistical summary of the dataset to get an overview of the numerical columns of the data
- Check for missing values

In [7]:
# looking at head (5 observations)
df.head()

Unnamed: 0,Booking_ID,no_of_adults,no_of_children,no_of_weekend_nights,no_of_week_nights,type_of_meal_plan,required_car_parking_space,room_type_reserved,lead_time,arrival_year,arrival_month,arrival_date,market_segment_type,repeated_guest,no_of_previous_cancellations,no_of_previous_bookings_not_canceled,avg_price_per_room,no_of_special_requests,booking_status
0,INN00001,2,0,1,2,Meal Plan 1,0,Room_Type 1,224,2017,10,2,Offline,0,0,0,65.0,0,Not_Canceled
1,INN00002,2,0,2,3,Not Selected,0,Room_Type 1,5,2018,11,6,Online,0,0,0,106.68,1,Not_Canceled
2,INN00003,1,0,2,1,Meal Plan 1,0,Room_Type 1,1,2018,2,28,Online,0,0,0,60.0,0,Canceled
3,INN00004,2,0,0,2,Meal Plan 1,0,Room_Type 1,211,2018,5,20,Online,0,0,0,100.0,0,Canceled
4,INN00005,2,0,1,1,Not Selected,0,Room_Type 1,48,2018,4,11,Online,0,0,0,94.5,0,Canceled
