<a href="https://colab.research.google.com/github/Naveen-01A/Hotel-Booking-Analysis/blob/main/Hotel_Booking_Analysis.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Hotel Booking Analysis**    -



##### **Project Type**    - EDA/Regression
##### **Contribution**    - Individual
##### **Name**            - Naveen Akula


# **Project Summary -**

Exploratory Data Analysis (EDA) on Hotel Booking Dataset

The objective of this project was to conduct Exploratory Data Analysis (EDA) on a hotel booking dataset to uncover insights into customer behavior and booking patterns. The dataset encompassed diverse information about hotel bookings, including booking dates, customer demographics, and reservation details. Our analysis comprised two main stages: data cleaning and preprocessing, followed by data visualization using various graphical techniques.

# **GitHub Link -**

Provide your GitHub Link here.

# **Problem Statement**


**This report focuses on the analysis of hotel booking cancellations and other factors that do not directly impact the business and annual revenue generation of both the City Hotel and Resort Hotel. In recent years, both hotels have seen significant increases in their cancellation rates, leading to challenges such as reduced revenue and underutilized hotel rooms. Therefore, the top priority for both hotels is to reduce their cancellation rates, which will enhance their efficiency in revenue generation. Through this analysis, we aim to identify factors contributing to cancellation rates and propose strategies to mitigate them, ultimately improving the overall performance of both hotels.**

#### **Define Your Business Objective?**

In the past few years, both the City Hotel and Resort Hotel have experienced significant increases in their cancellation rates. As a result, both hotels are currently facing a range of challenges, such as reduced revenue and underutilized hotel rooms. Therefore, the top priority for both hotels is to reduce their cancellation rates, which will enhance their efficiency in generating revenue. This report focuses on the analysis of hotel booking cancellations and other factors that do not directly impact their business and annual revenue generation.

# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required.
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits.
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 20 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule.

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Import Libraries

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
warnings.filterwarnings('ignore')



### Dataset Loading

In [None]:
# Load Dataset

In [None]:
df = pd.read_csv('/content/Hotel Bookings (1).csv')

### Dataset First View

In [None]:
# Dataset First Look

df.head()

# Head function show the specified number of rows the start of the dataframe, when arguments are empty it shows first.

In [None]:
df.tail()

# It shows the last 5 rows of the datasets


### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count

df.shape

# shape attributes show the number of rows and columns in the datasat

### Dataset Information

In [None]:
# Dataset Info

df.info()

# info method give the information about the type of the columns and the number of non null values

In [None]:
df.describe(include = "all")

# The describe method describes the values present in the columns of the dataframe.

#### Duplicate Values

In [None]:
# Dataset Duplicate Value Count

df.duplicated().sum()

In [None]:
df.duplicated().value_counts()

#### Missing Values/Null Values

In [None]:
# Missing Values/Null Values Count

missing_values_df = df.isna().sum().sort_values(ascending= False)[:10]
missing_values_df = missing_values_df.reset_index().rename(columns = {"index" : "column", 0 : "null values"})
missing_values_df


In [None]:
# Visualizing the missing values

sns.barplot(data = missing_values_df, x = "column", y = "null values")
plt.xticks(rotation = 90)
plt.show()

In [None]:
df["company"].fillna(0, inplace = True)

# Fills 0 in place of null in the company column

In [None]:
df["adults"].fillna(0, inplace = True)

In [None]:
df["children"].fillna(0, inplace = True)

In [None]:
df["agent"].fillna(0, inplace = True)

In [None]:
df["country"].fillna(0, inplace = True)

In [None]:
df.isna().sum()

In [None]:
duplicate = df.duplicated().value_counts()
duplicate

In [None]:
color = ['green', 'orange']
plt.bar(x = duplicate.index, height = duplicate, color = color)

# bar plot function of matplotlib

plt.xticks([0, 1])
plt.xlabel("Duplicate")
plt.ylabel("Count")
# Barplot shows that more than 30000 records are duplicate

In [None]:
df.drop_duplicates(keep = False, inplace = True)

# Droping duplictaes from the dataset

In [None]:
df.duplicated().value_counts()

# Checking duplicates True / False values count

In [None]:
df.shape

# Showing the Rows / Columns of the present dataframe

### What did you know about your dataset?

At first the dataset had about 119390 rows. But after replacing  null values with 0 and removing duplicateed rows it has 79225 rows and same 32 columns as before.

## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns

df.columns

# Attribute which shows the column names of the dataset

## Data Description:

1. hotel : Hotel(Resort Hotel or City Hotel)

2. is_canceled : Value indicating if the booking was canceled (1) or not (0)

3. lead_time :* Number of days that elapsed between the entering date of the
  booking into the PMS and the arrival date*

4. arrival_date_year : Year of arrival date

5. arrival_date_month : Month of arrival date

6. arrival_date_week_number : Week number of year for arrival date

7. arrival_date_day_of_month : Day of arrival date

8. stays_in_weekend_nights : Number of weekend nights (Saturday or Sunday) the guest stayed or booked to stay at the hotel

9. stays_in_week_nights : Number of week nights (Monday to Friday) the guest stayed or booked to stay at the hotel

10. adults : Number of adults

11. children : Number of children

12. babies : Number of babies

13. meal : Type of meal booked. Categories are presented in standard hospitality meal packages:

14. country : Country of origin.`

15. market_segment : Market segment designation. In categories, the term “TA” means “Travel Agents” and “TO” means “Tour Operators”

16. distribution_channel : Booking distribution channel. The term “TA” means “Travel Agents” and “TO” means “Tour Operators”

17. is_repeated_guest : Value indicating if the booking name was from a repeated guest (1) or not (0)

18. previous_cancellations : Number of previous bookings that were cancelled by the customer prior to the current booking

19. previous_bookings_not_canceled : Number of previous bookings not cancelled by the customer prior to the current booking

20. reserved_room_type : Code of room type reserved. Code is presented instead of designation for anonymity reasons.

21. assigned_room_type : Code for the type of room assigned to the booking.

22. booking_changes : Number of changes/amendments made to the booking from the moment the booking was entered on the PMS until the moment of check-in or cancellation

23. deposit_type : Indication on if the customer made a deposit to guarantee the booking.

24. agent : ID of the travel agency that made the booking

25. company : ID of the company/entity that made the booking or responsible for paying the booking.

26. days_in_waiting_list : Number of days the booking was in the waiting list before it was confirmed to the customer

27. customer_type : Type of booking, assuming one of four categories

28. adr : Average Daily Rate as defined by dividing the sum of all lodging transactions by the total number of staying nights

29. required_car_parking_spaces : Number of car parking spaces required by the customer

30. total_of_special_requests :* Number of special requests made by the customer (e.g. twin bed or high floor)*

31. reservation_status : Reservation last status, assuming one of three categories

* Canceled - booking was canceled by the customer
* Check-Out - customer has checked in but already departed
* No-Show - customer did not check-in and did inform the hotel of the reason why

32. reservation_status_date : Date at which the last status was set. This variable can be used in conjunction with the ReservationStatus to understand when was the booking canceled or when did the customer checked-out of the hotel



In [None]:
# Dataset Describe

df.describe()


### Variables Description

Answer Here

### Check Unique Values for each variable.

In [None]:
# Check Unique Values for each variable.

df['hotel'].unique()

In [None]:
df.columns

In [None]:
df['is_canceled'].unique()

In [None]:
df['lead_time'].unique()

In [None]:
df['arrival_date_year'].unique()

In [None]:
df['arrival_date_month'].unique()

In [None]:
df['arrival_date_week_number'].unique()

In [None]:
df['arrival_date_day_of_month'].unique()

In [None]:
df['stays_in_weekend_nights'].unique()

In [None]:
df['stays_in_week_nights'].unique()

In [None]:
df['adults'].unique()

In [None]:
df['children'].unique()

In [None]:
df['babies'].unique()

In [None]:
df['meal'].unique()

In [None]:
df['country'].unique()

In [None]:
df['market_segment'].unique()

In [None]:
df['distribution_channel'].unique()

In [None]:
df.columns

## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:
# Write your code to make your dataset analysis ready.

df['reservation_status_date'] = pd.to_datetime(df['reservation_status_date'], format = '%Y-%m-%d')

# to_dtaetime function converts the values of a column from string values in datetime object


In [None]:
df.info()

# info function shows the columns in a dataset, their datatype and non null values count in that column


In [None]:
df['total_people'] = df['adults'] + df['children'] + df['babies']

# Adding adults column, children column and babies column values to create new column called total_people


In [None]:
df['total_stay'] = df['stays_in_weekend_nights'] + df['stays_in_week_nights']

# Adding two columns values to create a new column called total_stay


In [None]:

df.head(3)

# It shows the first 3 rows of the dataset


### What all manipulations have you done and insights you found?

1. Created a new column called "total_people" by adding all the adults, children and babies.
2. The "reservation_status_date" is converted to datetime.
3. Created a new column called "total_stay" by adding both "stays_in_week_nights" and "stays_in_weekend_nights".


## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

#### 1. Pie chart of type of hotel - "City hotel or Resort hotel"

In [None]:
df['hotel'].value_counts()

# Using the value_counts function which shows the count of the values present in the column


In [None]:

# Pie function of matplotlib.pyplot to create pie chart

# Count the number of each hotel type
hotel_counts = df['hotel'].value_counts()

# Determine the explode values based on the number of categories
explode = [0.05] * len(hotel_counts)

plt.figure(figsize=(10, 8))
df['hotel'].value_counts().plot.pie(explode=explode, autopct='%1.1f%%', shadow=True, figsize=(10, 8), fontsize=20)

plt.title('Percentage of City and Resort hotels')

# Setting title of the created chart

plt.show()


##### 1. Why did you pick the specific chart?

The Pie chart is used to percentage of the both City and Resort hotels.

##### 2. What is/are the insight(s) found from the chart?

**Observation -** 60 % are the City hotels and around 40 % are Resort hotels.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

It will helps us to understand the distribution of the hotels and what type of hotel is preffed to bulid next.

#### 2. Number of booking that are cancelled

In [None]:
df.columns

In [None]:
cancelled = df['is_canceled'].value_counts().reset_index().rename(columns = {'index' : 'cancelled' })


# checking the value counts of the is_canceled column, resetting the resultant the series in to a dataframe and remaining its columns


In [None]:
cancelled

In [None]:
plt.bar(cancelled.index, cancelled['count'], color = ['green', 'red'])

# Bar function of matplotlib which is used to create bar graph

plt.title('Count of cancelled bookings')
plt.xticks(cancelled.index)
plt.xlabel('Cacelled')
plt.ylabel("count")
# replacing the values of the x axis

plt.show()


##### 1. Why did you pick the specific chart?

Bar plot is used to understand the amount of bookings that got cancelled, the bar plot is used to p-lot for categorical data.

##### 2. What is/are the insight(s) found from the chart?

**Observation -** More than 20000 of the hotel bookings got cancelled. as deen from the dataset.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Analyzing cancellation data can help in developing strategies to minimize cancellations, improve customer satisfaction, and optimize business performance.

#### 3. Percentage of bokkings that get cancelled

In [None]:
is_cancelled = df['is_canceled'].value_counts()

# Using the value_counts function which shows the count of the values present in the column

is_cancelled


In [None]:
plt.figure(figsize=(10, 8))
df['is_canceled'].value_counts().plot.pie(explode=[0.05, 0.05], autopct='%1.1f%%', shadow=True, figsize=(10, 8), fontsize=20)

plt.title('Cancellation and non Cancellation')

# Setting title of the created chart

plt.show()

##### 1. Why did you pick the specific chart?

The Pie chart is to find the percentage of booking that get cancelled.

##### 2. What is/are the insight(s) found from the chart?

**Observation -** About 26.2 % bookings get cancelled.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

It helps to understand the amount of cancellation that a hotel management can expect when operating a hotal business.

#### 4. Count of booking changes that are made

In [None]:
booking_changes = df['booking_changes'].value_counts().reset_index().rename(columns={'booking_changes': 'Number of booking changes'})[:10]

# checking the values counts, resetting index, remaining columns and slicing in one line

booking_changes

In [None]:
# Define a custom color palette
custom_palette = ["#FF5733", "#33FF57", "#3366FF"]

# Set the custom palette
sns.set_palette(custom_palette)

# Create a figure and set its size
plt.figure(figsize=(10, 6))

# Use seaborn's barplot function to create a bar graph
sns.barplot(data=booking_changes, x='Number of booking changes', y='count')

# Set the title of the plot
plt.title('Count of number of booking changes')

# Show the plot
plt.show()

##### 1. Why did you pick the specific chart?

The above chart to see the discrete numerical column called booking changes

##### 2. What is/are the insight(s) found from the chart?

**Observation -** Most of the bookings have no changes, about 10000 of have only 1 change and 30000 of have 2 changes.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

It shows the usual number of booking changes that the guests make. Thus the hotel staff may most times expect 1 or 2 changes to be made.

#### 5. Agent who has done the most number of bokkings

In [None]:
agent_bookings = df.groupby(['agent'])['agent'].agg({'count'}).reset_index().rename(columns = {'count' : 'Most bookings'}).sort_values(by = 'Most bookings', ascending = False)[:10]

#grouping by agent and fetching count of agent column values, resetting index, renaming column names, sorting values in descending order and slicing them for the first 10 values in one line
agent_bookings

In [None]:
sns.barplot(data = agent_bookings, x = 'agent', y = 'Most bookings', palette = 'inferno').set_title('Most bookings made by agent')

# Barplot function of seaborn to create a bar graph

plt.show()

##### 1. Why did you pick the specific chart?

The bar chart for comparing which agent makes what number of bookings.

##### 2. What is/are the insight(s) found from the chart?

**Observation -** Agent 9 made the most of the bookings and followed by 240.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Specific agents may be rewarded with commission for their performance, least performing agent may be unsubscribed, depending on the insights found from this chart.

#### 6. Arrival distribution according to month

In [None]:
arrival_distribution = df['arrival_date_month'].value_counts().reset_index().rename(columns = {'arrival_date_month' : 'month'})

# Doing value counts, resetting index and remaining columns

arrival_distribution


In [None]:
sns.barplot(data = arrival_distribution, x = 'month', y = 'count', palette = 'deep')
#Barplot function of seaborn used to create a bar graph
plt.xticks(rotation = 90)
# Rotating the values of the x axis by 90 degree for better readability

plt.title('Bookings according to month')
#Setting tile of the chart

plt.show()

##### 1. Why did you pick the specific chart?

The barplot is used to plot the distribution of bookings according to month.

##### 2. What is/are the insight(s) found from the chart?

**Observation -** Most of the guests arrive in the month of August, followed by July and May. Thus we can understand the seasonal trends of the hotel booking. So we can say Summer season is preffered.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

The grained insights can help the hotel management to understand the yearly flow trends and seasonal hotel bookings.

#### 7. Room types that are mostly reserved

In [None]:
plt.figure(figsize = (8, 6))
#setting size of the figure

sns.countplot(data = df, x = 'reserved_room_type', hue = 'hotel', palette = 'dark')
# Count plot to show counts of categorical variable reserved_room_type, hue parameter is used to distribute the bar depending on city or resort hotel

plt.title('Reserved room types based on city or resort hotels')
# Setting the title of the chart

plt.show()

##### 1. Why did you pick the specific chart?

This Barplot chart to understands the distribution of bookings based on room type that is reserved

##### 2. What is/are the insight(s) found from the chart?

**Observation -** A and D are the room types most reserved in both Resort and City hotels.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

From the above plot, it will helps to determine which type of rooms are favourable for the customers and it will help to provide best services.

#### 8. Room types that are mostly assigned

In [None]:
plt.figure(figsize = (8, 6))

sns.countplot(data = df, x = 'assigned_room_type', hue = 'hotel', palette = sns.color_palette('Set1'))

plt.title('Assigned room types based on city and resort hotels')

plt.show()

##### 1. Why did you pick the specific chart?

This chart is used to observe the distribution of bookings based on room type that is assigned.

##### 2. What is/are the insight(s) found from the chart?

**Observation -** Room types A, D and E are the most assigned among city and resort hotels.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

This gained insights can help to determine which type of rooms and their numbers when building a city or resort hotel and also to keep available which type of rooms when running a hotel.

#### 9. Distribution of hotel bookings arrival according to years

In [None]:
plt.figure(figsize = (5, 5))
# Setting the figure size

sns.countplot(data = df, x = 'arrival_date_year', hue = 'hotel', palette = 'inferno')
# Countplot function of seaborn to see the count of the arrival year's occurence

plt.title('Distribution of arrival date according to year or arrival')
# Setting the title of the chart

plt.show()

##### 1. Why did you pick the specific chart?

The above chart is used to observe the bookings trends across years.

##### 2. What is/are the insight(s) found from the chart?

**Observation -** The most of the guests arrived in the year 2016.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

The insights gained from this chart can help in understanding the yearly trends of bookings, and what are the factors that affect the trend.

#### 10. Booking distribution according to month

In [None]:
sns.countplot(data = df, x = 'arrival_date_month', hue = 'hotel', palette = 'deep')
# Countplot function of seaborn to create a countplot chart

plt.xticks(rotation = 90)
# Rotating values of x axis for better readability

plt.title('Bookings according to month')
# setting the title of the chart

plt.show()

##### 1. Why did you pick the specific chart?

From the above plot, we can observe the booking distribution across months, basedon city and resort hotels.

##### 2. What is/are the insight(s) found from the chart?

**Observation -** Nearly half-year of October to February is the Off season for the hotel business of this dataset.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

The insights gained from this chart can help to understand the monthly flow of bookings through out the year.

#### 11. Percentage of guests that rebook the same hotel

In [None]:
repeated = df['is_repeated_guest'].value_counts()
# Value counts of is_repeated_guests column

repeated

In [None]:
plt.figure(figsize=(10, 8))

df['is_repeated_guest'].value_counts().plot.pie(explode=[0.05, 0.05], autopct='%1.1f%%', shadow=True, figsize=(10, 8), fontsize=20)

plt.title('% of guests that are reapeated')

# Setting title of the created chart

plt.show()

##### 1. Why did you pick the specific chart?

The above Pie chart is used to understands the percentage of guests that are rebooked in the same hotel.

##### 2. What is/are the insight(s) found from the chart?

**Observation -** Only 4.2 % of guests are repeatedly went to the same hotel and the rest are the new comers.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

The insights willhelp us in sales tatics to resell/booking the same hotel or resort. It can also help the business owner to create referral systems to attract the customers.

#### 12. Distribution of booking made by distribution channels

In [None]:
distro_ch = df['distribution_channel'].value_counts().reset_index()
#  By doing value counts, resetting index and remaining columns

distro_ch

In [None]:
distro_ch = distro_ch.rename(columns = {'count' : 'bookings_made'})

distro_ch

In [None]:
plt.figure(figsize = (5, 5))
# Setting the size of the figure

sns.barplot(data = distro_ch, x = 'distribution_channel', y = 'bookings_made', palette = 'deep')
# Barplot of seaborn to create a bar graph

plt.title('Booking made by distribution channels')
# Setting the title of the chart

plt.show()

##### 1. Why did you pick the specific chart?

It shows the share of bookings made by the distribution channel

##### 2. What is/are the insight(s) found from the chart?

**Observation -** TA/TO make the most of the bookings followed by the Direct and Corporte distribution_channel.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

The insights can guide the business owner in making informed decisions to optimize their distribution strategy, improve overall bookings, and drive business growth.

In [None]:
print('0.05th quartile:', df['lead_time'].quantile(0.05))
# Showing the 0.05th quartile

print('0.25th quartile:', df['lead_time'].quantile(0.25))
# Showing the 0.25th quartile

print('0.75th quartile:', df['lead_time'].quantile(0.75))
# Showing the 0.75th quartile

print('0.95th quartile:', df['lead_time'].quantile(0.95))
# Showing the 0.95th quartile

#### 13. Distribution of lead time, which is the number of days elapsed between entering the arrival_date in the PMS and the arrival date.

In [None]:
sns.boxplot(data = df, y = 'lead_time', x = 'hotel')
# Boxplot function of seaborn to create a boxplot chart

##### 1. Why did you pick the specific chart?

The above plot is used to understand the distribution of lead time on a boxplot.

##### 2. What is/are the insight(s) found from the chart?

**Observation -** We can obsereve that most of the guests arrives within around 100 days of booking the hotel, with a few outliers of 500 and 700 days.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

The gained insights can help the business owner understand the trends in the lead time discrete numeric variable and act accordingly.

### 14. Distribution of bookings based on nationality

In [None]:
country_df = df['country'].value_counts().reset_index().sort_values(by = 'count', ascending = False)[:10]
country_df

In [None]:
sns.barplot(data = country_df, x = 'country', y = 'count', palette = sns.color_palette('Set2'))
# Barplot of seaborn to create a bar graph

##### 1. Why did you pick the specific chart?

The above plot shows the number of bookings by nationality.

#### 2. What is/are the insight(s) found from the chart?

**Observation -** Portugal, Britain, France. Spanish and German are the nationalities which make the most of the bookings.

#### 3. Will the gained insights help creating a positive business impact?

Are there any insights that lead to negative growth? Justify with specific reason.

Incorporating nationality-specific insights into various aspects of the hotel's operations allows for a more tailored and enriching experience for guests, ultimately driving customer loyalty, positive word-of-mouth, and business growth.

### 15. Distribution of the total number of people that stay in the hotels

In [None]:
sns.set_style('whitegrid')
# Setting style of the background grid

sns.countplot(data = df, y = 'total_people', palette = 'colorblind')
# Countplot of seaborn to create count chart

plt.xlabel('Count')
# Setting xlabel

plt.ylabel('Number of total people for stay')
# Setting ylabel
plt.title('Distribution of number of people in each stay')
# setting title
plt.show()

### 1. Why did you the specific chart?

From the above chart we can observe how amny people will be in a group usually stay in the hotels.

#### 2. What is/are the insight(s) found from the chart?

**Observation -** Most bookings are of group with 2 people, but rooms with 3 and 4 people to accomadate should also be present.

#### 3. Will the gained insights help creating a positive business impact?

Are there any insights that lead to negative growth? Justify with specific reason.

It will helps the business owners make the required decisions of making allocation of rooms within their hotel based on the room type , type of beds and furniture neede.

### 16. Distribution of total_stay of days/nights that guests stay in the hotel

In [None]:
total_stay_df = df['total_stay'].value_counts().reset_index().sort_values(by = 'count', ascending = False)[:20]
# Doing value_counts,l resetting index, sorting values and index

total_stay_df

In [None]:
total_stay_df = total_stay_df.rename(columns = {'total_stay' : 'total_stay_nights'})
total_stay_df

In [None]:
sns.set_style('whitegrid')
# Setting background grid style

sns.lineplot(data = total_stay_df, x = 'total_stay_nights', y = 'count', palette = sns.color_palette('Set3'))
# Lineplot of seaborn to create linechart

plt.show()

### 1. Why did you the specific chart?

Line chart is used to observe that number of days/nights people likes to stay in the hotels.

#### 2. What is/are the insight(s) found from the chart?

**Observation -** Most of the bookings were for 1 to 3 nights, but few customers can be expected to stay for a week.

#### 3. Will the gained insights help creating a positive business impact?

Are there any insights that lead to negative growth? Justify with specific reason.

This will help the hotel management to provide their best services and facilities for the customers to stay for lpong periods than they like to stay.

### 17. Distribution of total_stay vs total_stay

In [None]:
plt.figure(figsize = (7, 5))
# Setting size of the figure

sns.scatterplot(data = df, x = 'total_stay', y = 'total_people').set_title('Total stay ve Total people scatter plot')
# Scatterplot of seaborn to see the presence of dots at the intersection of total_stay and total_people

plt.show()

### 1. Why did you the specific chart?

The Scatter plot will shows the distribution of the number of days/nights stays over the group of people(total_people).

#### 2. What is/are the insight(s) found from the chart?

Most of the group with 0-5 of people maiximum like to stay of 30 days. A few people where like to stay more than 50 and more days.

#### 3. Will the gained insights help creating a positive business impact?

Are there any insights that lead to negative growth? Justify with specific reason.

By leveraging insights into guests' length of stay preferences, hotel management can optimize operations, enhance guest satisfaction, and ultimately drive profitability and competitive advantage in the hospitality industry.

### 18. Customers most prefer meal type

In [None]:
plt.figure(figsize = (9, 6))
# Setting the figure size for the chart

sns.countplot(x = 'meal', hue = 'hotel', data = df)
# Creating a count plot using seaborn

plt.xlabel('Meal Type')
# Labeling x axis
plt.ylabel('Customer prefer count', fontsize = 12)
# Labeling y axis
plt.title('Distribution of Meal type in term of customer prefers', fontsize = 15)
# Setting title of our chart
plt.legend(title = 'hotel')
# setting legend title
plt.show()

Types of meal in hotels:

BB - (Bed and Breakfast)

HB- (Half Board)

FB- (Full Board)

SC- (Self Catering)

### 1. Why did you the specific chart?

The bar chart shows which is the most preferred meal type by the guests is BB( Bed and Breakfast)

#### 2. What is/are the insight(s) found from the chart?

1. Both of City and Resort hotels guests like to have Bed abd Breakfast meal type.
2. There is no demand of FB meal type in City hotel and very less demand in Resort hotel.
3. No Self Catering meal type presents in Resort hotel.

#### 3. Will the gained insights help creating a positive business impact?

Are there any insights that lead to negative growth? Justify with specific reason.

By focusing on providing exceptional breakfast experiences that cater to guest preferences, the hotel can differentiate itself from competitors, drive guest satisfaction, and ultimately encourage longer stays and repeat visits.

#### 19. Correlation Heatmap

In [None]:
# Correlation Heatmap visualization code

plt.figure(figsize = (16, 10))
sns.heatmap(df.corr(numeric_only = True), annot = True)
# heatmap of seaborn to create heatmap chart

plt.title('Correlations between the columns', fontsize = 20)
plt.show()

##### 1. Why did you pick the specific chart?

Correlation heatmaps is used to find the relationships between the variables

##### 2. What is/are the insight(s) found from the chart?

1) is_canceled and same_room_alloted_or_not are negatively corelated. That means customer is unlikely to cancel his bookings if he don't get the same room as per reserved room. We have visualized it above.

2) lead_time and total_stay is positively corelated.That means more is the stay of cutsomer more will be the lead time.

3)adults,childrens and babies are corelated to each other. That means more the people more will be adr.

4) is_repeated guest and previous bookings not canceled has strong corelation. may be repeated guests are not more likely to cancel their bookings.

#### 20. Pair Plot Visualizatin

In [None]:
# Pairplot visualization code


# Now use sns.pairplot() with the DataFrame
sns.pairplot(data=df)

plt.show()

##### 1. Why did you pick the specific chart?

A pair plot allows us to both distribution of single variables and relationship between two variables. We can see the relationship between all the columns with each other.

##### 2. What is/are the insight(s) found from the chart?

1. From the above pair chart we can observe that if cancellation increases then total stay will decreases.
2. As the total number of people increase, ADR also increases. Thus the ADR are directly proportional to number of people.

## **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ?
Explain Briefly

1. City hotel are most preferred, so the stack holders must maintain and improve the services in City hotel and come up with strategies like offers to increase the booking in resort hotels.
2. About 26.2 % of bookings got cancelled, so the stack holders must increase the deposite amount so thst the number of cancellation go down and they make profoit.
3. Agent nummber 9, 240 and 0 must be rewarded by stack holders for such outstanding performance, so it increasing the competition among agents.
4. Sales team must be stressed more to perfrom in Summer season as it is preferred season for hotel business. Also in off season sales techniques must be changed as it is more difficult to get customers in off seasons.
5. Room types A, D and E must be incresed in hotels as they are mostly reserved and assigned.
6. Only 4.2 % of gusts return to the same hotel, so the guests staying can be coupons or offers so thst they come back to the same hotel again.
7. TA / TO, Corporate booking and direct channels performance must be flourished to gain more bookings.
8. Portugese, British and French dishes must be added and guests of these nationality must be made to feel comfortable by targeting their likings.
9. Meal type Bed and Breakfast must be kept in stock at all times and techniques to sell other meal types must be rejuvenated.
10. Very less children have been seen to be guests, so parks and gaming zones may be built to attract guests with children.

# **Conclusion**

In conclusion, leveraging guest data and insights to inform strategic decision-making, offer customization, and customer engagement is essential for hotel stakeholders to stay competitive in the dynamic hospitality industry. By continuously updating their dataset, analyzing guest behavior, soliciting feedback, and personalizing their offerings and communication, stakeholders can create a differentiated guest experience that drives loyalty, satisfaction, and business success.

### ***Hurrah! You have successfully completed your EDA Capstone Project !!!***