# **Project Name**    -



##### **Project Type**    - EDA(Exploratory Data Analysis)
##### **Contribution**    - Individual
##### **Name -**            Tanvi Pawar


# **Project Summary -**

The hotel business thrives on understanding customer preferences and booking patterns to optimize operations and revenue. This case study focuses on analyzing data from two types of hotels to extract insights into their historical and current business performance. The goal is to provide actionable recommendations for improving profitability and customer satisfaction.

I conducted exploratory data analysis (EDA) to uncover trends, patterns, and anomalies in the hotel booking dataset. Starting with raw data cleaning and preprocessing, I ensured the dataset was ready for analysis by removing inconsistencies and noise. Key areas of focus included:

Analysis of booking volumes over time.
Identifying the demographic and behavioral traits of customers.
Examining price distribution across different booking scenarios.
Studying the duration of stay and its correlation with booking preferences.
Insights into arrival and checkout trends.
Key findings from the analysis include:

The highest number of bookings occurred in 2016, with August being the busiest month for both 2016 and 2017.
The most popular meal plan among customers was BB (Bed and Breakfast).
An optimal stay duration of 5 days ensures the best daily rate for customers.
A high rate of booking cancellations was observed, highlighting a need for strategies to reduce cancellations.
Full Board (FB) meal options require improvement to better cater to customers staying for extended durations.
The visualizations, interactive plots, and detailed analysis presented here aim to help hotel owners understand their business better and implement strategies to attract more customers while retaining existing ones.



# **GitHub Link -**

https://github.com/Tanvipawar10/Hotel-Booking-Analysis-EDA-

# **Problem Statement**


**Have you ever wondered when the best time of year to book a hotel room is? Or the optimal length of stay in order to get the best daily rate? What if you wanted to predict whether or not a hotel was likely to receive a disproportionately high number of special requests? This hotel booking dataset can help you explore those questions!**


This data set contains booking information for a city hotel and a resort hotel, and includes information such as when the booking was made, length of stay, the number of adults, children, and/or babies, and the number of available parking spaces, among other things. All personally identifying information has been removed from the data. Explore and analyze the data to discover important factors that govern the bookings.

#### **Define Your Business Objective?**

When planning vacations, business trips, or casual getaways to a new city, hotel bookings are an essential part of the process, and everyone seeks to optimize their stay. For some, optimization means securing a great deal at a lower price, while for others, it could be booking a luxurious suite in a 7-star hotel during a non-peak period.

Using the Hotel Booking dataset, which contains tabular data on guest booking patterns, stay durations, meal preferences, and more over a year, we aim to conduct an in-depth analysis to uncover key insights and predict trends. Personally, I prioritize avoiding random price surges and prefer paying a fair, optimized rate for my stay. Additionally, ensuring a safe vacation with my loved ones is crucial, so I would favor hotels with fewer crowds and robust health and safety measures.

The objective of our analysis is twofold: to assist potential guests in making informed decisions about the ideal hotel, stay duration, and other preferences, and to provide actionable insights for hotel management to enhance their services and better cater to guest needs.

# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required.
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits.
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 20 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule.

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Importing important libraries for analysis

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns


### Dataset Loading

In [None]:
#Mouting dataset to access the file

from google.colab import drive
drive.mount ('/content/drive')

In [None]:
# Load Dataset
#importing data

hotel_df = pd.read_csv("/content/Hotel Bookings Analysis.csv")

### Dataset First View

In [None]:
# printing head of dataset which will result top 5 rows
hotel_df.head()

In [None]:
# printing tail of dataset which will result last 5 rows
hotel_df.tail()

### Dataset Rows & Columns count

In [None]:
#Verifying the shape of dataset
hotel_df.shape

Dataset have 119390 rows and 32 columns in total.

### Dataset Information

In [None]:
hotel_df.info()

#### Duplicate Values

In [None]:
#Counting sum of duplicate value
hotel_df.duplicated().sum()

In [None]:
#Dropping duplicate values in dataset
hotel_df.drop_duplicates(inplace=True)

#### Missing Values/Null Values

In [None]:
# Missing Values/Null Values Count
hotel_df.isnull().sum().sort_values(ascending=False)

In [None]:
# Visualizing the missing values
plt.figure(figsize=(18,8))
colours = ['#34495E', 'seagreen']
sns.heatmap(hotel_df.isnull(), cmap=sns.color_palette(colours))

### What did you know about your dataset?

In this dataset, there are not too many columns with null(NaN) values, as we can see that children column has only 4 null values And country, agent and company columns also have maximum null values.

Above Heatmap show the visualization of our data wherein agent and Company columns highlights are more because it has more missing data in it.

## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns
hotel_df.columns

In [None]:
# Dataset Describe
hotel_df.describe()

### Variables Description

Here is information on what these columns represent.

1.hotel : H1 = Resort Hotel , H2 = City Hotel.

2.is_canceled : If the booking was canceled(1) or not(0).

3.lead_time : Number of the days that elapsed between the entering date of the booking into the PMS and arrival date.

4.arrival_date_year : Year of arrival date.

5.arrival_date_month : Month of arrival date.

6.arrival_date_week_number : Week number of arrival date.

7.arrival_date_day_of_month : Day of arrival date.

8.stays_in_weekend_nights : Number of Weekend nights(Saturday or Sunday) the guest stayed or booked to stay at the hotel.

9.stays_in_week_nights : Number of Week nights(Monday to Friday) the guest stayed or booked to stay at the hotel.

10.adults : Number of adults.

11.children : Number of childrens.

12.babies : Number of babies.

13.meal : Kind of meal opted for.

14.country : Country code.

15.market_segment : Which segement the customer belongs to.

16.distribution_channel : How the customer accesed the stay corporate booking/Direct/TA.TO.

17.is_repeated_guest : Guest comming for first time or not.

18.previous_cancellations : Was their previous cancellations before.

19.previous_bookings_not_canceled : Count of previous booking.

20.reserved_room_type : Type of room reserved.

21.assigned_room_type : Type of room assigned.

22.booking_changes : Count of booking made changes.

23.deposit_type : Deposit type.

24.agent : Booked through agent.

25.company : Type of guest's company.

26.days_in_waiting_list : Number of Days in waiting list.

27.customer_type : Type of customer.

28.adr : Average daily rate.

29.required_car_parking_spaces : If car parking is required.

30.total_of_special_requests : Number of additional special requirments.

31.reservation_status : Reservation of status.

32.reservation_status_date : Date of the specific status.

### Check Unique Values for each variable.

In [None]:
# Check Unique Values for each variable.
hotel_df['hotel'].unique()

hotel_df['is_canceled'].unique()


## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:
# replacing null values in childern column with zero
hotel_df.fillna({'children':0},inplace=True)

In [None]:
# replacing null country with unknown
hotel_df.fillna({'country':'unknown'},inplace=True)

In this dataset, we can see that childern cloumn has only 4 null values which we replace with zero. and country cloumn seems to be important hence we cannot dropped that. Will futher drop the cloumn copmany and agent to avoid null values.

In [None]:
#Checking again with null values
hotel_df.isnull().sum()

In [None]:
#dropping the cloumn company and agent.
hotel_df.drop(['company','agent'], inplace = True , axis = 1)

In [None]:
#Checking how many booking are cancelled and how many are not
hotel_df['is_canceled'].value_counts()

There is a column on 1st index whose name is is_cancelled which has only 2 values 0 meaning the booking is not cancelled and 1 meaning the booking is cancelled.

### What all manipulations have you done and insights you found?

As we can see 63371 bookings has been canceled ,so will not focus on those. we will only analyse those who has not canceled.

## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

#### Chart - 1

In [None]:
# Chart - 1 visualization code
#Showing cancelled and not cancelled bookings through pie chart
hotel_df['is_canceled'].value_counts().rename({0:'not cancelled', 1:'cancelled'}).plot.pie(explode=[0.05,0.05],autopct= '%1.1f%%', figsize=(10,8), shadow=True, fontsize= 13)
plt.title('who all canceled their bookings' , fontsize= 14)
plt.xlabel('')
plt.ylabel('')
plt.show()

##### 1. Why did you pick the specific chart?

**1.Why did you choose this specific chart?**

As we wrap up the data cleaning process, we’ll focus on rows where bookings are not canceled, and the country information is available. I chose a pie chart because it visually highlights the proportion of canceled versus non-canceled bookings, making it easy to compare percentages.

##### 2. What is/are the insight(s) found from the chart?

The chart reveals that approximately 73% of bookings were not canceled, while 28% were canceled.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

With 28% of bookings being canceled, the hotel could reduce this by implementing strategies like stricter cancellation policies, offering discounts for confirmed bookings, or sending timely reminders to guests to encourage them to retain their reservations.

#### Chart - 2

In [None]:
#We are goining to use parent dataframe for this
df_hotel_by_year = hotel_df[['arrival_date_year', 'hotel']].value_counts().groupby('arrival_date_year').sum()

In [None]:
# Chart - 2 visualization code
plt.rcParams['figure.figsize']= (8,6)
df_hotel_by_year.plot(kind = 'bar' ,color = ['green','blue','orange'], fontsize= 12)
plt.title('Year wise booking' , fontsize= 12)
plt.xlabel('Arrival Date',fontsize = 12)
plt.ylabel('Count of arrival', fontsize = 12)
plt.show()

##### 1. Why did you pick the specific chart?

Given the data spans three consecutive years (2015, 2016, and 2017), I wanted to analyze the distribution of bookings across these years. A bar chart was chosen as it effectively compares the total number of bookings for each year, making trends easy to spot.

##### 2. What is/are the insight(s) found from the chart?

The bar chart shows that the highest number of arrivals occurred in 2016, which was approximately 2.5 times higher than in 2015. However, there was a significant decline in 2017 compared to the previous year.



##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

The steep decline in bookings observed in 2017 raises a red flag for management. Understanding the factors behind this drop could help the hotel address potential issues. For example, analyzing customer feedback, pricing, and market trends could lead to actionable strategies to regain growth and maintain customer engagement.

#### Chart - 3

In [None]:
# Chart - 3 visualization code
#plotting the arrival count for per year 2015,2016,2017 in order
plt.rcParams['figure.figsize']= (10,5)
sns.countplot(data = hotel_df , x= 'arrival_date_year' , hue= 'hotel')
plt.title('Count of arrival per year for each hotel' , fontsize= 12)
plt.xlabel('Arrival Date',fontsize = 12)
plt.ylabel('Count of arrival', fontsize = 12)
plt.show()

##### 1. Why did you pick the specific chart?

I chose this count plot to visualize the overall data and compare the number of arrivals per year for both hotels. The count plot is effective for showing categorical data distributions and highlighting differences between the two hotels over the years.



##### 2. What is/are the insight(s) found from the chart?

The chart shows the annual count of arrivals for each hotel. We can observe variations in the number of arrivals per year, helping us identify trends or shifts in customer preferences between the two hotels.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, these insights can guide business decisions. For instance, if one hotel consistently outperforms the other, management can investigate the reasons and replicate successful strategies across both properties. Additionally, understanding yearly fluctuations can help in planning marketing campaigns or adjusting pricing strategies to attract more guests during low-performing years.

In [None]:
#Plotting arrival count over months
plt.rcParams['figure.figsize']= (10,5)
sns.countplot(data = hotel_df , x= 'arrival_date_month' , hue= 'hotel' , order= ['January','February','March','April','June','July','August','September','October','November','December'])
plt.title('Count of arrival per month' , fontsize= 12)
plt.xlabel('Arrival Month',fontsize = 12)
plt.ylabel('Count of arrival', fontsize = 12)
plt.show()

In [None]:
#Plotting arrival count over days
plt.rcParams['figure.figsize']= (20,5)
sns.countplot(data = hotel_df , x= 'arrival_date_day_of_month' , hue= 'hotel')
plt.title('Count of arrival per day for each month' , fontsize= 12)
plt.xlabel('Arrival date',fontsize = 12)
plt.ylabel('Count of arrival', fontsize = 12)
plt.show()

#### Chart - 4

In [None]:
# Chart - 4 visualization code
#Plotting stays on weekend nights
plt.rcParams['figure.figsize']= (20,5)
sns.countplot(data = hotel_df , x= 'stays_in_weekend_nights')
plt.title('Count of stays on weekend nights' , fontsize= 12)
plt.xlabel('Stays in weekend night',fontsize = 12)
plt.ylabel('Count', fontsize = 12)
plt.xlim(0.5,20)
plt.show()

In [None]:
#plotting desnity of stays on weekend nights
sns.histplot(data = hotel_df , x=hotel_df['stays_in_weekend_nights'], bins = [1,2,3,4,5,6,7])
plt.title('Count of stays on weekend nights' , fontsize= 12)
plt.xlabel('Stays in weekend night',fontsize = 12)
plt.ylabel('Density of Count', fontsize = 12)
plt.show()

In [None]:
#plotting countplot for week nights
plt.rcParams['figure.figsize']= (20,5)
sns.countplot(data = hotel_df , x= 'stays_in_week_nights')
plt.title('Count of stays on weekend nights' , fontsize= 12)
plt.xlabel('Stays in weekend night',fontsize = 12)
plt.ylabel('Count', fontsize = 12)
plt.show()

##### 1. Why did you pick the specific chart?

The countplot is used to visualize the frequency of stays for specific numbers of weekend or weeknights, while the histplot shows the distribution of stays across defined bins, making it easier to identify patterns.

##### 2. What is/are the insight(s) found from the chart?

The countplots highlight how many guests stayed for different durations on weekend and weeknights. The histplot shows that shorter stays (e.g., 1-2 nights) are more common than longer ones.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, these insights help identify guest preferences, enabling the hotel to adjust pricing, plan promotions, and allocate resources more effectively for short versus long stays.

#### Chart - 5

In [None]:
# Chart - 5 visualization code
#identifying adults counts from the data
df_adults = hotel_df[['hotel', 'adults']][hotel_df['adults']>0].groupby(['hotel']).count()
df_adults

In [None]:
#Visualization for adults
plt.rcParams['figure.figsize']= (20,5)
sns.countplot(data = hotel_df , x= 'adults', hue ='hotel').set_title('Count of adults')
plt.show()

##### 1. Why did you pick the specific chart?

The chart was chosen to analyze the number of adults per booking across hotels, which helps in identifying the guest composition for better hotel categorization.

##### 2. What is/are the insight(s) found from the chart?

Most guests check in as a pair, with both hotels showing a peak at 2 adults per booking.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, this suggests that the City Resort can be positioned as a "Duo/Couple-Friendly Hotel," catering to pairs regardless of their relationship type.

#### Chart - 6

In [None]:
# Chart - 6 visualization code
#identifying childerns counts from the data
df_children = hotel_df[['hotel', 'children']][hotel_df['children']>0].groupby(['hotel']).count()
df_children

In [None]:
plt.rcParams['figure.figsize']= (20,5)
sns.countplot(data = hotel_df , x= 'children', hue ='hotel').set_title('Count of children')
plt.show()

##### 1. Why did you pick the specific chart?

The chart helps show how many children stay in each hotel, making it easier to categorize the hotels.

##### 2. What is/are the insight(s) found from the chart?

More families with children prefer staying at the City Hotel.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, the City Hotel can be promoted as a "Children-Friendly Hotel" to attract more families.

In [None]:
#identifying babies counts from the data
plt.rcParams['figure.figsize']= (20,5)
sns.countplot(data = hotel_df , x= 'babies', hue ='hotel').set_title('Count of babies')
plt.show()

1.Why did you pick the specific chart?
The chart helps show how many guests bring babies to each hotel, assisting in categorizing the hotels.

2.What is/are the insight(s) found from the chart?
There’s no strong preference for one hotel over another, but slightly more guests with babies choose the Resort Hotel.

3.Will the gained insights help create a positive business impact?
Yes, the Resort Hotel can be labeled as a "Babies-Friendly Hotel" to attract more families with infants.

#### Chart - 7

In [None]:
# Chart - 7 visualization code
#Plotting countplot for deposite type
plt.rcParams['figure.figsize']= (5,5)
sns.countplot(data = hotel_df , x= 'deposit_type')
plt.title('Types of deposits' , fontsize= 12)
plt.xlabel('Types',fontsize = 12)
plt.ylabel('Count', fontsize = 12)
plt.show()

In [None]:
#lets also check the 'No deposit' booking by hotels
hotel_df[['deposit_type', 'hotel']][hotel_df['deposit_type']=='No Deposit'].groupby('hotel').count()


##### 1. Why did you pick the specific chart?

I chose this count plot to examine the deposit type, and it shows that the percentage of 'No Deposit' bookings is higher for the Resort Hotel compared to the City Hotel.

##### 2. What is/are the insight(s) found from the chart?

Most guests prefer booking with no deposit, while fewer opt for Non-refundable or Refundable deposits. It's interesting that some guests choose Non-refundable deposits despite the risk of not being able to get their deposit back in case of cancellation—are they very certain about their travel plans?

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Since customers generally avoid paying a pre-deposit, hotels should encourage advance deposits. This would help recognize revenue sooner and reduce the risk of cancellations.

#### Chart - 8

In [None]:
# Chart - 8 visualization code
#Visualisation for repeated guests
plt.rcParams['figure.figsize']= (5,5)
sns.countplot(data = hotel_df , x= 'is_repeated_guest')
plt.title('Repeated guest' , fontsize= 8)
plt.show()

In [None]:
#lets also check the repeated guest in hotels
hotel_df[['is_repeated_guest', 'hotel']][hotel_df['is_repeated_guest']== 1].groupby('hotel').count()


##### 1. Why did you pick the specific chart?

I picked the count plot to analyze the ratio of repeated versus non-repeated guests.

##### 2. What is/are the insight(s) found from the chart?

The chart shows that most guests are first-time visitors, with very few repeated guests.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

This insight may point to the need for improved customer retention strategies. The management team can focus on enhancing service and marketing tactics to encourage repeat visits, which could boost long-term growth.

#### Chart - 9

In [None]:
# Chart - 9 visualization code
#Calculate total stay duration of guests including stays in weekend and week nights
hotel_df['total_night_stays']= hotel_df['stays_in_weekend_nights'] + hotel_df['stays_in_week_nights']

#Getting back to our seaborn plot
sns.countplot(x = hotel_df ['total_night_stays'])
plt.xlim(0.5,20)
plt.title('Night spend by guest(Till 20 nights)' , fontsize= 8)
plt.xlabel('No of nights',fontsize = 12)
plt.ylabel('Count', fontsize = 12)
plt.show()

##### 1. Why did you pick the specific chart?

I chose this chart to analyze the duration of guests' stays, helping to identify the optimal stay duration overall and for each hotel.

##### 2. What is/are the insight(s) found from the chart?

Most guests (about 80%) stayed for 1 to 5 nights, indicating a common preference for shorter stays.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

The hotel management team can use this insight to offer additional services or promotions that encourage guests to extend their stays, enhancing revenue and guest satisfaction.

#### Chart - 10

In [None]:
# Chart - 10 visualization code
#Use countplot to find duration of stay for both hotel
plt.rcParams['figure.figsize']= (10,5)
sns.countplot(data = hotel_df , x = 'total_night_stays', hue = 'hotel')
plt.xlim(0.5,20)
plt.title('Hotel-wise nights spend by guests(Till 20 nights)' , fontsize= 8)
plt.xlabel('No of nights',fontsize = 12)
plt.ylabel('Count', fontsize = 12)
plt.show()

##### 1. Why did you pick the specific chart?

I picked this chart to analyze the duration of stay for each hotel and compare guest behavior at both locations.

##### 2. What is/are the insight(s) found from the chart?

Most guests at City Hotel stayed for 2 or 3 days, while Resort Hotel guests preferred staying for either 1 or 7 days.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

The insights can guide targeted strategies. However, City Hotel’s relatively shorter stays may indicate lower occupancy and revenue potential, which could lead to negative growth unless they address this with promotions or services to encourage longer stays.

#### Chart - 11

In [None]:
# Chart - 11 visualization code
#Visualiazing special request hotel-wise
df_special_requests_hotel_wise = hotel_df[['hotel', 'total_of_special_requests']][hotel_df['total_of_special_requests']>0].groupby(['hotel']).count()
df_special_requests_hotel_wise

In [None]:
plt.rcParams['figure.figsize']= (6,6)
df_special_requests_hotel_wise.plot(kind= 'pie' , autopct= '%.0f%%' , fontsize= 14 , subplots = True)
plt.title('Hotel-wise percentage of special requests')
plt.ylabel('')
plt.show()

##### 1. Why did you pick the specific chart?

I chose a pie chart to show the hotel-wise distribution of special requests, as it effectively displays the percentage of requests from each hotel.

##### 2. What is/are the insight(s) found from the chart?

City Hotel guests made 62% of the total special requests, while Resort Hotel guests made 38%. However, since the guest ratio is similar, the insights might not be highly significant.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

The insight helps City Hotel staff anticipate more special requests, allowing them to allocate resources accordingly. This can improve service quality but doesn’t necessarily lead to negative growth unless not managed effectively.

#### Chart - 12

Determining the price for each customer

In [None]:
# Chart - 12 visualization code
# define custom function for price
def price(a,b):
  return a*b
hotel_df['price'] = hotel_df.apply(lambda x1:price(x1['adr'], x1['total_night_stays']),axis =1)

In [None]:
hotel_df[['price','adr','total_night_stays']].head(15)

In [None]:
#Visualisation for revenue earns by hotels yearly
plt.rcParams['figure.figsize']= (24,9)
sns.lineplot(data = hotel_df , x = 'arrival_date_month', y ='price', hue = 'arrival_date_year',palette='Dark2')
plt.title('Revenues rearned by hotels per year' , fontsize= 12)
plt.xlabel('Nights stays',fontsize = 12)
plt.ylabel('Price', fontsize = 12)
plt.show()

##### 1. Why did you pick the specific chart?

I chose the line plot to visualize the revenue earned by hotels each year, making it easy to track trends over time.

##### 2. What is/are the insight(s) found from the chart?

The revenue for 2016 and 2017 is the highest, with August being the busiest month for both hotels.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

The hotels should focus on improving revenue during all seasons, not just during peak months like August, to maintain consistent growth throughout the year.

#### Chart - 13

In [None]:
# Chart - 13 visualization code
#Visualaising type of meal preferred
plt.figure(figsize = (20,5))
sns.countplot(x = hotel_df['meal'])
plt.title('Preferred meal type')
plt.xlabel('Type of meal required in hotel')
plt.ylabel('Count')
plt.show()

In [None]:
hotel_df['meal'].value_counts(normalize= True)

In [None]:
#Use pieplot for meal distribution
hotel_df['meal'].value_counts().plot.pie(explode= (0.03,0.03,0.03,0.01,0.001), autopct= '%1.01f%%' , shadow = False , figsize= (9,10) ,fontsize = 10, labels= None)
plt.title('% Distrubution of meal' , fontsize = 12)
labels = hotel_df['meal'].value_counts().index.tolist()
plt.legend(bbox_to_anchor=(0.85,1),loc='upper left' , labels = labels)
plt.show()

##### 1. Why did you pick the specific chart?

I chose the count plot to show the different types of meals provided by the hotel, and the pie chart to display the percentage distribution of each meal type.



##### 2. What is/are the insight(s) found from the chart?

The chart reveals that the most preferred meal type by customers is BB (Bed and Breakfast), which makes up 78.7% of the meals served.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

The hotel management team can focus on improving the availability of FB (Full Board) options to attract more guests who may prefer all-inclusive meal plans, potentially boosting bookings. There's no immediate negative growth observed, but expanding meal options could further increase guest satisfaction.

#### Chart - 14 - Correlation Heatmap

In [None]:
# Correlation Heatmap visualization code
#creating new datset
new_df = hotel_df[['lead_time','arrival_date_year', 'arrival_date_week_number','arrival_date_day_of_month', 'stays_in_weekend_nights', 'stays_in_week_nights','adults','children','babies','is_repeated_guest','previous_cancellations','previous_bookings_not_canceled','booking_changes','days_in_waiting_list','adr','required_car_parking_spaces','total_of_special_requests','total_night_stays','price']]

In [None]:
plt.rcParams['figure.figsize']= (20,10)
sns.heatmap(new_df.corr(), cmap= 'coolwarm', annot=True);

##### 1. Why did you pick the specific chart?

I used a heatmap for multivariate analysis to examine the correlation between multiple variables, providing a clear view of how they are related.

##### 2. What is/are the insight(s) found from the chart?

The chart highlights a strong correlation between total night stays and stays on weekdays, with weekday stays being more frequent than weekend stays when it comes to total night stays.

#### Chart - 15 - Pair Plot

In [None]:
# Pair Plot visualization code
plt.figure(figsize=(20,15))
ax=sns.pairplot(hotel_df)
plt.show()

##### 1. Why did you pick the specific chart?

I chose the pair plot for multivariable analysis to visually explore relationships between multiple variables at once.

##### 2. What is/are the insight(s) found from the chart?

The chart shows that a longer lead time does not necessarily result in cancellations. Additionally, the relationship between lead time and arrival year indicates that people consistently booked rooms in advance during 2015, 2016, and 2017.

## **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ?
Explain Briefly.

To achieve the business objectives, the hotel management should focus on several key strategies. Resort hotels, with fewer bookings than city hotels, should consider offering packages and promotions to boost bookings. This could include creating attractive deals to draw in more customers and make the resort more appealing.

Regarding food preferences, since BB (Bed and Breakfast) is the most popular choice, it’s crucial for the hotel to maintain its quality. At the same time, they can offer discounts on other types of meals to encourage guests to explore different options. This would not only relieve pressure on the kitchen but also provide variety to customers.

The data shows that repeat bookings are low, so it’s important to find ways to increase return guests. The hotel can achieve this by offering incentives for repeat bookings, marketing tailored offers based on past guest preferences, and understanding the motivations behind those who return. By targeting this group, the hotel can turn occasional guests into loyal customers.

In terms of guest composition, both city and resort hotels see many couples booking stays, so promoting group and family bookings could be a great strategy. Special discounts for family or group reservations can help maximize occupancy and revenue, as well as cater to those looking for more space or amenities suitable for larger groups.

The data also shows that guests tend to stay longer at city hotels, with most staying for two or three nights, while resort guests typically stay for one or seven nights. Resort hotels can capitalize on this by offering promotions for middle-range stays, encouraging longer visits without overwhelming guests with extended durations.

A key observation is that weekday bookings outnumber weekend stays. By introducing special weekday promotions or discounted rates, hotels can drive bookings during these quieter periods, thus increasing revenue throughout the week.

Additionally, understanding guest demographics is vital. Families with children seem to prefer city hotels, while couples and families with babies may lean toward resort accommodations. Hotels should cater to these preferences by highlighting specific amenities for each group, such as family-friendly rooms or romantic getaway packages.

Lastly, promoting advance deposits is essential for both hotels. This approach helps secure revenue faster, reduces cancellations, and minimizes the risk of no-shows. With guests increasingly avoiding pre-deposit bookings, encouraging this practice can help improve financial stability and reduce potential losses.

By focusing on these strategies, hotels can enhance their bookings, attract more repeat customers, and maximize their overall revenue.

# **Conclusion**

City Hotel holds the largest share of bookings, with 73% of reservations not being canceled. Resort Hotel tends to attract more families and guests with children. However, City Hotel experiences higher cancellation rates, possibly due to their no-deposit and no-cancellation-charge policies.

2016 saw the highest number of bookings across both hotels, with August being the peak month for guest arrivals. Additionally, guests tend to prefer weekday stays over weekends.

City Resort can be branded as "Duo/Couple-Friendly" and "Children-Friendly," while Resort Hotel is more suitable for families with babies. Both hotels have relatively few repeat guests.

City Hotel guests typically stay for 2 or 3 days, while Resort Hotel guests often stay for 1 or 7 days. City Hotel guests account for 64% of total special requests, compared to Resort Hotel's 36%. City Hotel also offers more "No Deposit" bookings, although Resort Hotel has a higher percentage of no-deposit bookings overall.

The price range and standard deviation for City Hotel are lower than Resort Hotel, with Resort Hotel prices exceeding City Hotel rates in peak months like July, August, September, and June. For the remaining months, City Hotel maintains higher prices than Resort Hotel.

### ***Hurrah! You have successfully completed your EDA Capstone Project !!!***