# **Project Name**    - **Hotel Booking Analysis of Booking.com**


##### **Project Type**    - EDA
##### **Contribution**    - Individual


# **Project Summary -**

**Project Summary: Booking.com Data Analysis**
This project focuses on analyzing a comprehensive dataset from Booking.com, one of the leading global platforms in the travel and hospitality industry. The dataset includes key information about hotel bookings, such as lead times, arrival dates, room types, meal plans, booking channels, and customer preferences. It aims to uncover valuable insights that can enhance customer experience, optimize booking efficiency, and drive revenue growth.

**Key Objectives:**
**Booking Efficiency:** Analyze the impact of booking windows (lead time), reservation statuses, and booking channels on hotel performance to optimize the booking process and maximize occupancy.

**Customer Demographics:**  Understand customer profiles, such as family sizes, meal preferences, and special requests, to tailor services and offerings that better meet guest needs.

**Revenue Optimization: **Identify trends in Average Daily Rate (ADR) and room types to implement dynamic pricing strategies and maximize profitability.

**Operational Streamlining:** Leverage data to streamline processes, such as managing room assignments, handling cancellations, and adjusting inventory in response to demand fluctuations.

**Strategic Benefits:**
Predicting Guest Behavior: By analyzing booking lead times and patterns, the company can predict customer needs and behavior, allowing for proactive service improvements.
**Market Segmentation:** The dataset allows for a detailed segmentation of the market, helping Booking.com offer personalized recommendations and targeted marketing campaigns.
Enhanced Customer Satisfaction: By better understanding customer preferences, the platform can fine-tune services and enhance the guest experience, driving loyalty.
Overall, this data analysis empowers Booking.com to remain competitive in the hospitality sector by leveraging data-driven strategies to optimize its services and operations.













# **GitHub Link -**

Provide your GitHub Link here.

# **Problem Statement**


In a highly competitive travel and hospitality industry, platforms like Booking.com must continually refine their booking processes to meet evolving customer expectations and maximize revenue. With a large volume of data available, including lead times, room types, meal plans, special requests, and booking channels, there is a need to efficiently analyze this data to uncover actionable insights.

The key challenges addressed in this project are:

**Booking Process Optimization:** How can Booking.com improve the efficiency of the booking process by analyzing factors such as lead time, booking channels, and customer demographics?

**Revenue Growth:** How can trends in Average Daily Rate (ADR) and customer booking behavior be used to implement dynamic pricing strategies to maximize profitability?

**Customer Experience Enhancement:** How can insights from customer preferences (room types, meal plans, special requests) be leveraged to offer personalized services that enhance guest satisfaction and drive loyalty?

By solving these challenges through data analysis, Booking.com can streamline operations, improve occupancy rates, and deliver superior service, ensuring its continued success in the competitive hospitality market.

#### **Define Your Business Objective?**

**Optimizing Booking Efficiency and Enhancing Customer Experience**

The primary business objective of this project is to enhance operational efficiency and customer satisfaction for Booking.com through data-driven decision-making. By analyzing various booking factors—such as lead time, reservation types, customer demographics, and booking channels—the goal is to:

**Increase Revenue:** Implement dynamic pricing strategies based on insights from room type demand and Average Daily Rate (ADR) to maximize profitability.

**Optimize Occupancy Rates:** Utilize booking window and customer behavior data to adjust room inventory and pricing in real-time, ensuring maximum room utilization.

Improve Customer Experience: Tailor services and marketing campaigns to better meet guest preferences (e.g., room types, meal plans, special requests), driving customer loyalty and satisfaction.

# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required.
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits.
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 20 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule.

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Import Libraries
import numpy as np
import pandas as pd
import datetime as dt
from numpy import math
import seaborn as sns
import matplotlib.pyplot as plt
import warnings
warnings.filterwarnings('ignore')


### Dataset Loading

In [None]:
from google.colab import drive
drive.mount('/content/drive')

In [None]:
# Load Dataset
df = pd.read_csv('/content/drive/MyDrive/Hotel Bookings.csv')

### Dataset First View

In [None]:
# Dataset First Look
df

In [None]:
df.head()

In [None]:
df.tail()

### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count
df.shape

In [None]:
# here we have 119390 rows and 32 columns in our data set.

### Dataset Information

In [None]:
# Dataset Info
df.info()

#### Duplicate Values

In [None]:
# Dataset Duplicate Value Count
df.duplicated().sum()

#### Missing Values/Null Values

In [None]:
# Missing Values/Null Values Count
df.isnull().sum()

In [None]:
# here we have 4 null values in children , 488 null values in country column ,
# 16340 null values in agent column , 112593 null values in company column

In [None]:
sns.heatmap(df.isnull(), cbar=False)
plt.show()

### What did you know about your dataset?

The dataset given is a dataset from online travel agency and accommodation booking platform popularly known as Booking.com, and we have to analysis the hotel bookings done by customers from various countries through the platform and find the insights behind it.

In the competitive landscape of travel and accommodation, booking.com  continuously refine their booking processes to enhance customer satisfaction and optimize revenue. By analyzing this data, the aim is to improve booking efficiency, predict guest needs, and tailor offerings to better meet market demands.

The above dataset has 119390 rows and 32 columns. There are 16340 null values in agent column , 4 null values in children , 488 null values in country column as well as 112593 null values in the company column. There are 31994 duplicate values in the dataset


## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns
df.columns

In [None]:
# Dataset Describe
df.describe()

### Variables Description

* **hotel                    :** Name of the hotel booked
* **is_canceled              :**
* **lead_time                :**
* **arrival_date_year        :** Year for which booking is made
* **arrival_date_month       :** Month for which booking is made
* **arrival_date_week_number :**Week number for which booking is made out of 52 weeks
* **arrival_date_day_of_month:** Date of month for which booking is made
* **stays_in_weekend_nights  :** Total nights booked for weekend
* **stays_in_week_nights     :**Total nights booked for week days
* **adults                   :** Total adults under the booking made
* **children                 :**Total children under the booking made
* **babies                   :**Total babies under the booking made
* **meal                     :** What type of meal is booked with the booking either it is BB( Bed and Breakfast) or HB (Half Board) or FB (Full Board) or SC (Self Catering) and undefined
* **country                  :** Country of origin of the lead guest
* **market_segment           :** Booking made by online (TA) travel agency or offline Travel Agent, Corporate booking or Direct Booking , Aviation booking etc
* **distribution_channel     :**
* **is_repeated_guest        :** Has the guest stayed before
* **previous_cancellations   :** Total previous bookings that have been canceled
* **previous_bookings_not_canceled  :** Total previous bookings that have not been canceled
* **reserved_room_type       :** Category of room booked (A,B,C,D,E,F,G,H,L,P)
* **assigned_room_type       :** Category of room Alloted
* **booking_changes          :** Count of changes made in the booking
* **deposit_type             :** Refundale , Non refundable or No deposit booing
* **agent                    :** Booked through which agent
* **company                  :** Booked through which company
* **days_in_waiting_list     :** How many days in waiting list
* **customer_type            :** Transient,Contract,Transient-Party or a Group
* **adr                      :** Average Daily Rate of the hotel
* **required_car_parking_spaces:** If car parking is required
* **total_of_special_requests:** Any special Requests by the guests
* **reservation_status       :** Checked-Out or Canceled or No-Show
* **reservation_status_date  :** Date of specific status

### Check Unique Values for each variable.

In [None]:
# Check Unique Values for each variable.
# firstly checking unoque varibales in all categorical columns
cat_col = df.select_dtypes(include = ['object'])
for x in cat_col :
  print(df[x].unique())

In [None]:
# unique values in numerical columns
num_col = df.select_dtypes(include= ['int64' , 'float64'])
for i in num_col:
  print(df[i].unique())

## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:
# Write your code to make your dataset analysis ready.
# handling the missing values
df.isnull().sum()

In [None]:
#finding the data type of the columns having missing values
null_col = ['agent' , 'children', 'company' , 'country']
for i in null_col:
  print(df[i].dtypes)

In [None]:
# filling the missing values present in the data with data type float with mean or median after checking the skewness
df['agent'].skew()

if the skewness is more than 1 it is positively skewed meaning the distribution of data is right-skewed.
Therefore the best central tendency here will be to find median.

In [None]:
# same skewness can be shown with the histogram also
sns.histplot(df , x = 'agent' , bins = 10 , kde = True)
plt.show()

In [None]:
# finding median for agent
df['agent'].median()

In [None]:
# filling missing values with median
df['agent'].fillna(df['agent'].median() , inplace = True)

In [None]:
# similarly checking for children and company

In [None]:
print(df['children'].skew())
print(df['company'].skew())

In [None]:
fig , ax = plt.subplots(1,2 , figsize = (15,8))
ax = ax.flatten()
sns.histplot(df , x = 'children' , bins = 5, ax = ax[0] ,kde = True)
sns.histplot(df , x = 'company' , bins = 5, ax = ax[1] ,kde = True)
plt.tight_layout()
plt.show()

Children is right skewed, So we will go with median. In the company graph and the skewness measure it is rightly skewed but not extremely. Here we can consider both mean and median but still Median will be a better option.

In [None]:
# finding median for children
df['children'].median()

In [None]:
# filling missing values with median
df['children'].fillna(df['children'].median() , inplace = True)

In [None]:
# finding median for children
df['company'].median()

In [None]:
# filling missing values with mean
df['company'].fillna(df['company'].median() , inplace = True)

Now taking up the **country** column as the data type for it was object


In [None]:
# Missing values in a object data type is always filled with the mode of the data
# finding the mode of the data
df['country'].mode()

In [None]:
# replacing the missing values with the mode
df['country'].fillna('PRT' , inplace  = True)

**Checking and changing Data Types**

In [None]:
# reservation_status_date is given in string format we will convert it in date and time format
df['res_status_date'] = pd.to_datetime(df['reservation_status_date'], format='%Y-%m-%d')


In [None]:
# dropping the reservation_status_date column
df_hidden = df.drop('reservation_status_date', axis=1 , inplace = True)

In [None]:

df['date_of_arr'] = df['arrival_date_day_of_month'].astype(str) + ' ' + df['arrival_date_month'] + ' ' + df['arrival_date_year'].astype(str)

df['date_of_arrival'] = pd.to_datetime(df['date_of_arr'], format='%d %B %Y')


In [None]:
# dropping the object column for dat_of_arr
df_hidden1 = df.drop('date_of_arr', axis=1 , inplace = True)

## Manipulations


In [None]:
#finding the monthly booking made in each month
count = df['arrival_date_month'].value_counts()
count.sort_values()

In [None]:
# finding the average revenue generated per month for three years
df.groupby(['arrival_date_month'])['adr'].sum().sort_values(ascending = False)

In [None]:
# finding the revenue generated by each type of hotel
df.groupby(['hotel'])['adr'].sum().sort_values()

In [None]:
df.describe(include= ['object'])

In [None]:
# avergae days to check in date wrt booking date
df['lead_time'].astype('float64').mean()

In [None]:
# finding sum of adr and average of lead time wrt deposit type and reservation status
df.groupby(['deposit_type' , 'reservation_status']).agg({'adr' : 'sum' , 'lead_time' : 'mean' })

In [None]:
# reserved_room_type assigned_room_type
df.groupby(['assigned_room_type']).agg({'adr' : 'sum' })

In [None]:
#average of week nights
df['stays_in_week_nights'].astype('float64').mean()

In [None]:
#average of weekend nights
df['stays_in_weekend_nights'].astype('float64').mean()

In [None]:
df.groupby(['hotel' , 'customer_type'])['adr'].sum()

In [None]:
# general breif of categorical columns
df.describe(include = 'object')

In [None]:
# customer preference for booking
df.groupby(['distribution_channel'])['adr'].sum().sort_values()

In [None]:
df['assigned_room_type'].value_counts().sort_values(ascending = False)

In [None]:
#finding country with maximum revenue
df.groupby(['country' , 'reserved_room_type'])['adr'].sum().sort_values(ascending = False)


### What all manipulations have you done and insights you found?

As per my understanding travelling has been everyone's favorite and main part of travelling is booking a place to stay. Booking.com is a market leader in hotels and resorts booking , so to evaluate their buisness model and to find out better ways to boost up i took up this project.

**Booking as per month**

August has seen highest reservation and has created highest revenue and january the lowest. Season and weather might be a big reason for this but if we reduce our rates more in the off season and increase them in peak season we can match the equilibrium and can have full occupancy in off season too.

**Category wise Analysis**

City hotels has been in demand mostly due to their proximity to markets and better connectivity , giving about more than 65% of the revenue. Resort hotels should be with in the proximity of good roads and with other options too explore too , which will help in generating more revenue in them. As well as BB has been the most liked option in both categories so if we can add one more hour to the timings of breakfast people would be rejoiced.

**Cancellations**

As per my study a lot of customers like to book with any deposit and make their booking a lot before thier actual date of arrival. As per above data where the lead time is more than 100 days the cancellations have been huge , if the platform does not allow bookings having lead time more than 90 days without deposit revenue will get a surge and occupancy will increase. This will also show the rooms to the customers booking at the last moment giving out best rates.

** Weekdays vs Weekends**

As per the data average weekdays nights a person books in 2.5 and the avergae for weekends take a dip to 0.92. This huge difference is due the difference in fare of weekdays and weekends. we should increase rates in weekdays to a point where the difference between weekdays and weekends is such where people tend to give more importance to the weekend than the fare of room.

**Booking window preference**

People tend to depend more on Travel Agent and Tour Operators as they give out complete package and complete itenary, leaving the customers to enjoy the days. If the platform provides well curated holiday packages and tours at better rates and better services the booking preference will change.

**Category of rooms booked**
A , D category rooms have been in demand through out the year and at some times customers after making reservations for these category have been assigned other category rooms. if more rooms are added to this category and some rooms of category H to L are reduced this could make a significant change in the revenue collection and customer satisfaction.

**Guest Demographics**

PRT has been the country whose residents have made the highest booking and generated the highest revenue. Their preference category of room is A and never booked for category P.


# ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

#### Chart - 1 Count plot by  Hotel category

In [None]:
sns.countplot(df , x = 'hotel' , color = 'Magenta'). # taking x axis as hotel from dataset df and color as magenta
plt.title('Count plot by  Hotel category').  # gives the titile of the chart
plt.show()

##### 1. Why did you pick the specific chart?

A countplot chart count the frequency of each determiner in the data. It's easy to explain the percentage comparison through different bars with different colors.

##### 2. What is/are the insight(s) found from the chart?

Here we can clearly see that city hotel has been booked 80000 times whereas resort hotel has been booked 40000 times that 100% less than city hotel.

##### 3. Will the gained insights help creating a positive business impact?
By this chart buisness can focus more on resort hotels and try to lower the difference between city and resort hotels.


#### Chart - 2 Line chart wrt to date and ADR

In [None]:
#extracting month from the date column for line chart
df['month'] = df['res_status_date'].dt.month

In [None]:
# Chart - 2 visualization code
#Line chart wrt to date and adr
sns.lineplot(df , x = 'month' , y = 'adr'  , color = 'purple' )
plt.title('Line chart wrt to date and ADR')
plt.show()

##### 1. Why did you pick the specific chart?

This chart dictates the relation between revenue and months. The graph easily describes the postion of revenue collected in each month.


##### 2. What is/are the insight(s) found from the chart?

As clearly seen there is a clear spike in the graph during the mid year during july , aug ,sep and large down fall at the end and start of the year.


##### 3. Will the gained insights help creating a positive business impact?
Negative growth during the winters is due to seasonal changes and people tend to travel more in autumn season. the above insights can help if the buisness when the graph is low give some attractive discounts and packages which are not available during the peak season to attract customers.

#### Chart - 3 Box plot for lead time

In [None]:
# Chart - 3 visualization code
#box plot for lead time to find the outliers
plt.figure(figsize = (35,12) )
sns.boxplot(df , x = 'lead_time' , vert = False , color='turquoise', linewidth=2)
plt.title('Box plot for lead time')
plt.show()

##### 1. Why did you pick the specific chart?

Box plot provide a concise summary of the data distribution, showcasing the median, quartiles, and potential outliers. This makes it easy to see the center of the data and its variability.

##### 2. What is/are the insight(s) found from the chart?

In this particular chart we can see the minimum value is 0 while the maximum value is around 380 , leaving aside the outliers.
**IQR** for the following graph is **Q3** - **Q1** i.e around 160 - 10 = **150**
**Median** of the following data is around 70.

The graph is also **positively** **skewed**.


##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

From the gained insights we can clearly see that the average time a person makes a booking is 70 days. Some people make bookings more than 365 days or 1 year before the date of arrival making it hard to depict the rates prevailing at that time and afterall the buisness losses revenue.
Bookings having lead time more than 90 days should be non refundalble bookings as if the room is booked and there is cancellation or no show the hotel looses the revenue as well as many potential customers as the room was occupied on the platform.


#### Chart - 4 Scatter plot showing Total of Special Requests vs. ADR

In [None]:
# Chart - 4 visualization code
# Scatter plot showing Total of Special Requests vs. ADR:
plt.figure(figsize = (8 , 8))
sns.scatterplot(df ,x =  'total_of_special_requests' , y = 'adr')
plt.title('Total of Special Requests vs. ADR')
plt.show()

##### 1. Why did you pick the specific chart?

To check if guests who make more special requests are willing to pay more (higher ADR).

##### 2. What is/are the insight(s) found from the chart?

As seen from the graph guests making 1 or more special requests are paying more for there special requests.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

By the insights we can charge accordingly as per the special request of the guest and this subsequently help in increasing the revenue of the hotel as well as providing better services to the guests.

#### Chart - 5 Average daily revenue by Market Segment

In [None]:
# Chart - 5 visualization code
# Average daily revenue by Market Segment
plt.figure(figsize= (12 , 5))
sns.barplot(df , x = 'market_segment' , y = 'adr' , color = 'green')
plt.title('Average daily revenue by Market Segment')
plt.show()

##### 1. Why did you pick the specific chart?

Bar charts offer clear visual distinctions between categories. The length of each bar corresponds to the value it represents, which allows viewers to quickly understand relative sizes and compare them directly .

##### 2. What is/are the insight(s) found from the chart?

Identify the market segments yield higher average daily rates.
and adjust pricing strategies based on market segment performance.
this also helped in allocating marketing resources effectively by focusing on high-ADR segments.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

The insights derived from analyzing market segments and ADR can drive strategic actions that will enhance revenue, improve customer satisfaction, and promote growth. By leveraging data effectively, hotels can create a more favorable business environment that leads to sustained success.

#### Chart - 6 Pie chart for meal disection

In [None]:
# Chart - 6 visualization code
# pie chart for meal disection
meal_counts = df['meal'].value_counts()
plt.figure(figsize=(8, 6))
plt.pie(meal_counts, labels=meal_counts.index, autopct='%1.1f%%')
plt.title('Distribution of Meal Plans')
plt.axis('equal')
plt.show()

##### 1. Why did you pick the specific chart?

A pie chart expresses a part-to-whole relationship in our data. It's easy to explain the percentage comparison through area covered in a circle with different colors. Where differenet percentage comparison comes into action pie chart is used frequently. So, I used Pie chart and which helped me to get the percentage comparision of the dependant variable.



##### 2. What is/are the insight(s) found from the chart?

From the above pie chart we can clearly see that most of the guests prefer to have breakfast included in their booking.

 Similarly FB opting guests are very less but FB generates higher revenue.

 we can curate promotional offers to promote plans other than BB to get more revenue and provide guests a unique dining experience.


##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Insights will sure help in getting more guests to opt for FB or HB if proper and attractive promotional offers are launched. Guests will have more options to choose from rather than just sticking to BB.


#### Chart - 7 Pie chart for deposit type

In [None]:
# Chart - 7 visualization code
# pie chart for deposit type
deposit_tp = df['deposit_type'].value_counts()
plt.figure(figsize=(8, 6))
plt.pie(deposit_tp, labels=deposit_tp.index, autopct='%0.1f%%')
plt.title('Types of payment at the time of booking')
plt.axis('equal')
plt.show()

##### 1. Why did you pick the specific chart?

A pie chart expresses a part-to-whole relationship in our data. It's easy to explain the percentage comparison through area covered in a circle with different colors. Where differenet percentage comparison comes into action pie chart is used frequently. So, I used Pie chart and which helped me to get the percentage comparision of the dependant variable.

##### 2. What is/are the insight(s) found from the chart?

As we can see from the chart 87.6% guests prefer to book with no deposit which means the guests are not certain about their programme and not willing to take risk.

If any deposit type is underrepresented, hotels might consider promotional strategies to encourage that option. For example, offering discounts or incentives for Non-Refundable bookings could increase their share in the pie chart, benefiting the hotel's revenue.



##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

The insights will create a positive impact as if we promote the other deposit type other than no deposit overall revenue will increase and occupancy will increase too.

**Negative**

The first insight has a negative effect to the buisness as when guests choose the option of no deposit they might cancel thier programme last minute or switch to a last min deal of any other hotel which is loss of the hotel.
Also when the deposit type is no deposit hotel cant properly plan for the meals and services prior to check in.

#### Chart - 8 Violin Plot

In [None]:
# Chart - 8 visualization code
#Violin Plot
sns.violinplot(data=df, x='hotel', y='lead_time')
plt.title('Lead Time Distribution by Hotel Type')
plt.show()

##### 1. Why did you pick the specific chart?

To show the distribution of lead times for different hotel types, highlighting both the distribution shape and density.

##### 2. What is/are the insight(s) found from the chart?

* We might observe differences in how far in advance guests book their stays.
* This insight can help hotel tailor their marketing strategies based on customer booking behaviors.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Insights from the chart helps in understanding booking behaviors but also helps hotels to make data-driven decisions that enhance revenue, improve customer satisfaction, and streamline operations. By implementing strategies based on these insights, hotels can create a competitive advantage in the market.



#### Chart - 9 Cancellations by Market Segment

In [None]:
#  Cancellations by Market Segment visualization chart
sns.countplot(df , x='market_segment', hue='is_canceled')
plt.title('Cancellations by Market Segment')
plt.xticks(rotation=45)
plt.show()

##### 1. Why did you pick the specific chart?

A count chart is useful for visualizing the frequency or count of categorical data. It helps to quickly understand the distribution of a category or the occurrence of events in a dataset.

##### 2. What is/are the insight(s) found from the chart?

It helps to identify which market segments (e.g., Direct, Corporate, Travel Agencies) have the highest cancellation rates can help target retention strategies.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

by monitering the insight and the chart we can strategise to have less cancellations and more occuoancy leading to overall growth of buisness.


#### Chart - 10 Average Daily Rate (ADR) by Room Type

In [None]:
# Average Daily Rate (ADR) by Room Type visualization chart
sns.barplot(x='reserved_room_type', y='adr', data=df)
plt.title('ADR by Reserved Room Type')
plt.show()

##### 1. Why did you pick the specific chart?


Bar charts show the frequency counts of values for the different levels of a categorical or nominal variable. Sometimes, bar charts show other statistics, such as percentages.


##### 2. What is/are the insight(s) found from the chart?

Visualizing the ADR for different room types can help identify which room categories generate the highest revenue.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

hotels can improve revenue management, enhance guest experiences, and implement targeted marketing—all of which contribute to business growth.

#### Chart - 11 - Correlation Heatmap

In [None]:
# Correlation Heatmap visualization code
plt.figure(figsize = (20,6))
sns.heatmap(num_col.corr() , annot = True)
plt.title('Correlation Heatmap')
plt.show()


##### 1. Why did you pick the specific chart?

A correlation matrix is a table showing correlation coefficients between variables. Each cell in the table shows the correlation between two variables. A correlation matrix is used to summarize data, as an input into a more advanced analysis, and as a diagnostic for advanced analyses. The range of correlation is [-1,1].

Thus to know the correlation between all the variables along with the correlation coeficients, i used correlation heatmap.

##### 2. What is/are the insight(s) found from the chart?

*  Aslead_time and is_canceled show a positive correlation, it suggests that bookings made far in advance have a higher likelihood of being canceled. This should encourage hotels to adopt different cancellation policies based on how far in advance a booking is made.

* As total_of_special_requests correlates positively with adr, guests making more special requests may also be spending more on their bookings. Hotels could capitalize on this by offering premium services or additional upselling options to guests making special requests.

#### Chart - 12 - Pair Plot

In [None]:
# Pair Plot visualization code
sns.pairplot(df[['lead_time', 'adr', 'total_of_special_requests']], diag_kind='kde')
plt.title('Pair Plot of Lead Time, ADR, and Special Requests')
plt.show()

##### 1. Why did you pick the specific chart?

To visualize relationships and distributions among multiple numerical variables in the dataset.

##### 2. What is/are the insight(s) found from the chart?

Pair plot is used to understand the best set of features to explain a relationship between two variables or to form the most separated clusters. It also helps to form some simple classification models by drawing some simple lines or make linear separation in our data-set.

Thus, I used pair plot to analyse the patterns of data and realationship between the features. It's exactly same as the correlation map but here you will get the graphical representation

## **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ?
Explain Briefly.

Answer Here.

# **Conclusion**


**Booking Lead Time:** The majority of bookings are made several weeks in advance, especially for peak seasons, suggesting that early booking discounts or promotions could attract more customers.

**Market Segment Analysis:** Different booking channels (e.g., Travel Agents (TA/TO), Corporate bookings, etc.) show varying booking patterns, highlighting the need to tailor marketing strategies to specific segments.

**Cancellations and Rebooking:** A high cancellation rate indicates a potential area for improvement, such as by introducing more flexible booking options or cancellation insurance to mitigate revenue loss.

**Impact of Previous Cancellations:** Repeat guests with past cancellations often show different booking behavior, implying that tailored retention strategies (e.g., loyalty programs) can enhance guest loyalty.

**Strategic Implications:**


**Revenue Optimization:** Dynamic pricing based on room type, lead time, and market segment can drive revenue growth.

**Operational Efficiency:** Insights from EDA can help streamline operations by adjusting inventory, pricing, and promotions according to customer demand and booking patterns.

**Improved Customer Experience:** Tailoring services and marketing to customer preferences (e.g., family bookings, meal plans, room type upgrades) can improve guest satisfaction and loyalty.


### ***Hurrah! You have successfully completed your EDA Capstone Project !!!***