<a href="https://colab.research.google.com/github/KushangShah/EDA_Project-Hotel_Bookings/blob/main/EDAProject_HotelBookings.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Project Name**    - Hotel Booking



##### **Project Type**    - EDA
##### **Contribution**    - Kushang Shah(Individual)

# **Project Summary -**

## **Exploratory Data Analysis (EDA) Summary: Understanding Hotel Bookings**

Exploratory Data Analysis (EDA) is a crucial step in the data analysis process, allowing us to understand the structure and patterns within our data. In this summary, we delve into an EDA conducted on hotel booking data, aiming to extract meaningful insights and trends.

###**Dataset Overview:**
The dataset comprises information about hotel bookings, including various attributes such as booking dates, customer demographics, booking channels, and reservation details. It encompasses both hotel types: resorts and city hotels.

###**Data Exploration:**
#### **Data Cleaning**: Initially, the data underwent cleaning procedures to handle missing values, outliers, and inconsistencies. This step ensured the dataset's integrity and reliability for analysis.
####**Descriptive Statistics**: Basic statistics such as mean, median, standard deviation, and quartiles were calculated for numerical features like booking lead time, stays in nights, and number of adults/children. This provided a snapshot of the central tendencies and spread of the data.
####**Distribution Analysis**: Histograms and density plots were employed to visualize the distribution of key variables, revealing insights into their skewness, multimodality, and outliers. For instance, booking lead time exhibited a right-skewed distribution, indicating a tendency towards shorter booking intervals.
####**Temporal Trends**: Time series analysis was conducted to explore temporal patterns in booking volumes over different months and years. This analysis uncovered seasonality effects, with peak booking periods occurring during certain months, possibly influenced by holidays or tourism seasons.
####**Segmentation Analysis**: Customer segmentation based on demographics (e.g., age, nationality) and booking characteristics (e.g., duration of stay, room type) was performed. This segmentation shed light on distinct booking behaviors among different customer groups, enabling targeted marketing strategies.


###**Insights and Trends:**

####**Seasonal Variations**: The analysis revealed fluctuations in booking volumes across seasons, with summer and holiday seasons experiencing higher demand compared to off-peak periods. This insight can inform revenue management strategies and resource allocation.
####**Booking Channels**: Examination of booking channels (e.g., online travel agencies, direct bookings) unveiled the preferred platforms through which customers make reservations. Understanding channel preferences can guide marketing efforts and partnership decisions.
####**Cancellation Patterns**: Analysis of cancellation rates and reasons for cancellations provided insights into customer behavior and booking volatility. Factors influencing cancellations, such as flexibility in cancellation policies, can be optimized to minimize revenue loss.
####**Booking Lead Time**: Exploration of booking lead time distribution highlighted booking patterns, with implications for inventory management and pricing strategies. Shorter lead times may necessitate dynamic pricing mechanisms to capitalize on last-minute bookings.
##**Conclusion**:
Through comprehensive exploratory data analysis, valuable insights have been gleaned regarding hotel booking trends, customer behavior, and operational dynamics. These insights can inform strategic decision-making processes, ranging from revenue management to customer experience enhancement. Continued analysis and refinement of these findings will facilitate data-driven optimization of hotel operations and service delivery.

# **GitHub Link -**

####**GitHub Link:** - [EDA Project - Hotel Booking](https://github.com/KushangShah/EDA_Project-Hotel_Bookings/tree/main)



# **Problem Statement**


#####--> The primary objective is to gain comprehensive insights into the underlying **patterns**, **trends**, and **dynamics of the booking process.**

#####1. **Booking Patterns**: What typical booking patterns do we observe in terms of timing, duration, and seasonality?
Do discernible trends or fluctuations exist in booking volumes over different time periods?

#####2. **Booking Dynamics**: What are the temporal trends in booking volumes and cancellation rates? Are there seasonal variations, and if so, how do they impact hotel occupancy and revenue?
#####3. **Customer Segmentation**: How can customers be segmented based on demographics, booking behaviors, and preferences? What are the characteristics of different customer segments, and how can tailored marketing strategies be developed to cater to their needs?
#####4. **Operational Efficiency**: What factors contribute to booking lead time, and how can inventory management and pricing strategies be optimized accordingly? Are there patterns in room type preferences, booking channels, and deposit types that influence operational efficiency?
#####5. **Revenue Management**: How do pricing dynamics, such as ADR and booking changes, impact revenue generation? What are the implications of special requests, car parking requirements, and meal preferences on revenue maximization?




#### **Define Your Business Objective?**

**Business Objective:**

The primary business objective of conducting exploratory data analysis (EDA) on hotel bookings is to leverage data-driven insights to optimize revenue generation, enhance operational efficiency, and improve customer satisfaction within the hospitality industry. By delving into the dataset and extracting meaningful patterns and trends, the ultimate goal is to inform strategic decision-making processes and drive tangible outcomes for the hotel management.

1. **Revenue Optimization:**
   - Identify factors influencing revenue generation, such as pricing dynamics, booking patterns, and customer preferences.
   - Utilize insights to implement dynamic pricing mechanisms, targeted promotions, and revenue management strategies.

2. **Operational Efficiency:**
   - Enhance resource allocation, inventory management, and staff scheduling based on demand patterns and booking trends.
   - Optimize room allocation, booking channels, and distribution strategies to improve operational efficiency.
   - Streamline processes to minimize booking lead time, reduce cancellations, and optimize room utilization.

3. **Customer Satisfaction:**
   - Understand customer preferences, behaviors, and satisfaction drivers to deliver personalized experiences.
   - Segment customers based on demographics, booking behaviors, and preferences to tailor marketing efforts and services.
   - Anticipate and fulfill customer needs, preferences, and special requests to enhance overall satisfaction and loyalty.

4. **Risk Management and Decision Support:**
   - Identify potential risks, such as overbooking, cancellations, and revenue volatility, and develop mitigation strategies.
   - Provide decision support for strategic initiatives, investment opportunities, and expansion plans based on data-driven insights.
   - Monitor key performance indicators (KPIs) and metrics to track progress, evaluate performance, and adapt strategies accordingly.

Overall, the business objective of the EDA on hotel bookings is to leverage data analytics to drive strategic decision-making, optimize operations, and create value for both the hotel management and customers. By harnessing the power of data, the aim is to achieve sustainable growth, competitive advantage, and excellence in service delivery within the hospitality sector.

# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required.
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits.
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 20 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule.

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Import Libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import missingno as msno

### Dataset Loading

In [None]:
# Load Dataset
from google.colab import drive
drive.mount("/content/drive")

In [None]:
hb_df = pd.read_csv("/content/drive/MyDrive/CSV files/Hotel Bookings.csv")

### Dataset First View

In [None]:
# Dataset First Look
hb_df

### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count
hb_df.shape

### Dataset Information

In [None]:
# Dataset Info
hb_df.info()

#### Duplicate Values

In [None]:
# Dataset Duplicate Value Count
hb_df[hb_df.duplicated()].count()

In [None]:
# using drop_duplicates to get unique number of rows
hb_df.drop_duplicates(inplace=True)
unique_rows = hb_df.shape[0]
unique_rows

#### Missing Values/Null Values

In [None]:
# Finding for null value in each column.
hb_df.isna().sum().sort_values(ascending=False)[:6]

In [None]:
# Handling those null values
hb_df["company"].fillna(0, inplace=True) # Assigning 0 where block is null is company column
hb_df["agent"].fillna(0, inplace=True)  # Assigning 0 inplace null in agent column
hb_df["country"].fillna("others", inplace=True) # assigning "other" where country name is not given.
hb_df["children"].fillna(0, inplace=True) # Assigning 0 where children is not mentioned

In [None]:
# Missing values has been handled.
hb_df.isna().sum().sort_values()

### What did you know about your dataset?

Hotel booking dataset contained 119390 rows × 32 columns.
and It has 87396 number of unique rows and 31994 same(duplicated) rows.

Hotel booking Dataset had
```
company               82137
agent                 12193
country                 452
children                  4
```
Numbers of null values paresent in them.



Hotel Booking Dataset contain 32 columns with different data init such as,
1. hotel: Name or identifier of the hotel.
2. is_canceled: Binary indicator if the booking was canceled (1) or not (0).
3. lead_time: Number of days between the booking date and the arrival date.
4. arrival_date_year: Year of arrival date.
5. arrival_date_month: Month of arrival date.
6. arrival_date_week_number: Week number of arrival date.
7. arrival_date_day_of_month: Day of arrival date.
8. stays_in_weekend_nights: Number of weekend nights (Saturday or Sunday) the guest stayed.
9. stays_in_week_nights: Number of week nights (Monday to Friday) the guest stayed.
10. adults: Number of adults.
11. children: Number of children.
12. babies: Number of babies.
13. meal: Type of meal booked (e.g., BB for Bed & Breakfast).
14. country: Country of origin of the guest.
15. market_segment: Market segment designation (e.g., Online Travel Agents, Offline Travel Agents).
16. distribution_channel: Booking distribution channel (e.g., Direct, Corporate).
17. is_repeated_guest: Binary indicator if the guest is a repeated guest (1) or not (0).
18. previous_cancellations: Number of previous cancellations by the guest.
19. previous_bookings_not_canceled: Number of previous bookings not canceled by the guest.
20. reserved_room_type: Type of room reserved.
21. assigned_room_type: Type of room assigned to the guest.
22. booking_changes: Number of changes made to the booking.
23. deposit_type: Type of deposit made (e.g., No Deposit, Non Refund, Refundable).
24. agent: ID of the travel agency that made the booking.
25. company: ID of the company/entity that made the booking or is responsible for payment.
26. days_in_waiting_list: Number of days the booking was in the waiting list before it was confirmed to the guest.
27. customer_type: Type of booking (e.g., Contract, Group, Transient).
28. adr: Average Daily Rate, the average rental income per paid occupied room in a given time period.
29. required_car_parking_spaces: Number of car parking spaces requested by the guest.
30. total_of_special_requests: Number of special requests made by the guest (e.g., twin bed, high floor).
31. reservation_status: Reservation last status (e.g., Check-Out, Canceled).
32. reservation_status_date: Date at which the last status was set.




## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns
hb_df.columns

In [None]:
# Dataset Describe
hb_df.describe()

### Variables Description

- **is_canceled:**
  - 27.49% of bookings were canceled on average.
- **lead_time:**
  - The average lead time is approximately 79.89 days, with a standard deviation of around 86.05 days.
- **arrival_date_year:**
  - Bookings span from 2015 to 2017.
- **arrival_date_week_number and arrival_date_day_of_month:**
  -These columns give the week number and day of the month of the arrival date, respectively.
- **stays_in_weekend_nights and stays_in_week_nights:**
  - On average, guests stay for approximately 1 weekend night and 2.63 week nights.
- **adults, children, and babies:**
  - Average numbers of adults, children, and babies per booking are provided.
- **previous_cancellations and previous_bookings_not_canceled:**
  - These columns indicate the number of previous cancellations and bookings not canceled by the guest.
- **booking_changes:**
  - On average, there are around 0.27 booking changes per booking.
- **agent and company:**
  - These seem to be identifiers for the travel agency and company, respectively, involved in the booking.
- **days_in_waiting_list:**
  - On average, bookings spent approximately 11 days in the waiting list before confirmation.
- **adr (Average Daily Rate):**
  - The average daily rate is around 106.34 units.
- **required_car_parking_spaces and total_of_special_requests:**
  - These columns provide average counts for requested car parking spaces and special requests per booking.

### Check Unique Values for each variable.

In [None]:
# Check Unique Values for each variable.

## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:
# Write your code to make your dataset analysis ready.

### What all manipulations have you done and insights you found?

Answer Here.

## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

#### Chart - 1

In [None]:
# Chart - 1 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 2

In [None]:
# Chart - 2 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 3

In [None]:
# Chart - 3 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 4

In [None]:
# Chart - 4 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 5

In [None]:
# Chart - 5 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 6

In [None]:
# Chart - 6 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 7

In [None]:
# Chart - 7 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 8

In [None]:
# Chart - 8 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 9

In [None]:
# Chart - 9 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 10

In [None]:
# Chart - 10 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 11

In [None]:
# Chart - 11 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 12

In [None]:
# Chart - 12 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 13

In [None]:
# Chart - 13 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 14 - Correlation Heatmap

In [None]:
# Correlation Heatmap visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

#### Chart - 15 - Pair Plot

In [None]:
# Pair Plot visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

## **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ?
Explain Briefly.

Answer Here.

# **Conclusion**

Write the conclusion here.

### ***Hurrah! You have successfully completed your EDA Capstone Project !!!***