# **Project Name**    -



##### **Project Type**    - EDA
##### **Contribution**    - Individual
##### **Name**            - Devanshu Mankar

# **Project Summary -**

EDA provides a foundational understanding of the hotel booking dataset, revealing critical insights that can drive strategic decision-making. By identifying patterns in booking behaviors, cancellation risks, and customer preferences, hotels can optimize their operations, improve customer satisfaction, and ultimately increase their revenueEDA provides a foundational understanding of the hotel booking dataset, revealing critical insights that can drive strategic decision-making. By identifying patterns in booking behaviors, cancellation risks, and customer preferences, hotels can optimize their operations, improve customer satisfaction, and ultimately increase their revenue.



# **GitHub Link -**

https://github.com/devanshumankar/Hotel-booking-analysis.git

# **Problem Statement**


Have you ever wondered when the best time of year to book a hotel room is? Or the optimal length of stay in order to get the best daily rate? What if you wanted to predict whether or not a hotel was likely to receive a disproportionately high number of special requests? This hotel booking dataset can help you explore those questions! This data set contains booking information for a city hotel and a resort hotel, and includes information such as when the booking was made, length of stay, the number of adults, children, and/or babies, and the number of available parking spaces, among other things. All personally identifying information has been removed from the data. Explore and analyse the data to discover important factors that govern the bookings


#### **Define Your Business Objective?**

1.Analyze booking patterns over time to identify seasonal trends and peak booking periods.
2.Explore the distribution of bookings across different hotel types (e.g., resorts, city hotels) and room types.
3.Investigate the impact of various factors such as lead time, length of stay, and booking channel on booking cancellations.
4.Examine customer demographics to understand the preferences of different segments.
5.Identify potential areas for improvement or optimization in hotel operations based on EDA findings.

# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required.
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits.
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 20 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule.

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Import Libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

### Dataset Loading

In [None]:
# Load Dataset
from google.colab import drive
drive.mount('/content/drive')

In [None]:
df=pd.read_csv('/content/drive/MyDrive/Hotel Bookings.csv')

### Dataset First View

In [None]:
# Dataset First Look
df

### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count
print('rows :',df.shape[0])
print('columns :',df.shape[1])

### Dataset Information

In [None]:
# Dataset Info
df.info()

#### Duplicate Values

In [None]:
# Dataset Duplicate Value Count
print('duplicates values:',df.duplicated().sum())

#### Missing Values/Null Values

In [None]:
# Missing Values/Null Values Count
df.isnull().sum()

In [None]:
# Visualizing the missing values
plt.figure(figsize=(20, 8))
sns.heatmap(df.isna())
plt.show()

### What did you know about your dataset?

1. The Dataset contains 119320 rows and 32 columns
2. The Dataset has 31994 duplicate values
3. The Dataset contains 4 columns that has null values that columns are children company agent and country In this company column has highest number of null values.

## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns
df.columns

In [None]:
# Dataset Describe
df.describe()

### Variables Description

hotel : Resort hotel or city hotel

is_canceled : If the booking was not cancelled(0) or cancelled(1)

lead_time : Number of days that elapsed between the entering date of the booking into the PMS and the arival date

arrival_date_year : Year of arival date

arrival_date_month : Month of arival date

arrival_date_week_number : Week number of arival date

arrival_date_day_of_month : Day of arival date

stays_in_weekend_nights : Number of weekend nights(Saturday or Sunday)
the customer stayed in hotel or book to stay at the hotel

stays_in_week_nights : Number of week nights(Monday to Friday) the customer stayed in hotel or book to stay at the hotel

adults : Number of adults

children : Number of children

babies : Number of babies

meal : kind of meal opted for

country : Country code

market_segment : Which segment the customer belongs to

distribution_channel : How the customer accessed the stay - corporate booking/Direct/TA.TO

is_repeated_guest : Guess coming first time or not

previous_cancellations : Was there a cancellation before

previous_bookings_not_canceled : Count of prevoius bookings

reserved_room_type : Type of room reserved

assigned_room_type : Type of room assigned

booking_changes : Count of changes made to booking

deposit_type : Deposite Type

agent : Booking through agent company : Bokking through company

days_in_waiting_list : Number of days in waiting list

customer_type : Type of customer

adr : Average daily rate

required_car_parking_spaces : count of how many car parking spaces
customer required

total_of_special_requests : Number of additional special requirements

reservation_status : Reservation of status

reservation_status_date : Date of the specific status


### Check Unique Values for each variable.

In [None]:
# Check Unique Values for each variable.
pd.Series({column:df[column].unique() for column in df})

## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:
# Write your code to make your dataset analysis ready.
df.drop_duplicates(inplace=True)

In [None]:
## filling null values with 0 in children column
df['children']=df['children'].fillna('0')

## filling null values with others in country column
df['country']=df['country'].fillna('others')

## filling null values with 0 in agent column
df['agent']=df['agent'].fillna(0)

## filling null values with 0 in company column
df['company']=df['company'].fillna(0)

In [None]:
df.isnull().values.any()

In [None]:
df.duplicated().values.any()

In [None]:
#Adding new columns
#First column of total number of people for every booking
df['total_no_of_people'] = df['adults']+df['children']+df['babies']

#Second coulmn how many nights they stay menas total stay count (week+weekend)
df['total_nights_stay'] = df['stays_in_week_nights']+df['stays_in_weekend_nights']

#Third column for for adults with no children and babies
df['guests_with_no_kids'] = df['adults'] *((df['children'] == 0) & (df['babies'] == 0))


#Fourth column for only adults who have children or babies
df['guests_with_kids'] = df['total_no_of_people'] -df['guests_with_no_kids']

In [None]:
df.shape

In [None]:
df['is_canceled'] = df['is_canceled'].replace({0: 'not_canceled', 1: 'canceled'})
df['is_repeated_guest'] = df['is_repeated_guest'].replace({0: 'not_repeatted', 1: 'repeatted'})

In [None]:
df

### What all manipulations have you done and insights you found?

1.The dataset contains duplicates values which is removed

2.The dataset has 4 columns where null value is present which is filled

3.'is_cancelled','is_repeated_guest' columns values are replace by  0 and 1 for better understaning

4.New columns are added which are 'total_no_of_people','total_nights_stay', 'guests_with_no_kids','guests_with_kids'

## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

#### Chart - 1

In [None]:
# Chart - 1 visualization code
x=df['hotel'].value_counts().values
y=df['hotel'].value_counts().index
plt.pie(x,labels=y,autopct='%0.1f%%',explode=[0.0,0.1],shadow=True,startangle=90,textprops={'fontsize': 12})
plt.title('distribution of booking among hotel')
plt.show()

##### 1. Why did you pick the specific chart?

I have to show a part to a whole relationship and percentage of both values and in this case pie chart is easy to understand


##### 2. What is/are the insight(s) found from the chart?

City hotels has more booking than Resort Hotels

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

This insight make stack holders in comparing the type of hotel while booking,

#### Chart - 2

In [None]:
# Chart - 2 visualization code
sns.countplot(x='customer_type',data=df,hue='hotel',palette='rainbow')
plt.title('customer type')
plt.show()

##### 1. Why did you pick the specific chart?

I have to count the no of  customer type who are booking the hotel


##### 2. What is/are the insight(s) found from the chart?

Transient type of customer is doing more booking

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Focus should be given on resort type of hotel and groups to grow the business

#### Chart - 3

In [None]:
# Chart - 3 visualization code
x1=df['adults'].sum()
x2=df['children'].sum()
x3=df['babies'].sum()
y=['adults','children','babies']
plt.pie([x1,x2,x3],labels=y,autopct='%0.1f%%',explode=[0.0,0.1,0.1],shadow=True,radius=1.5)
plt.pie([1],colors='w')
plt.title('proportion of adults,children,babies in booking')
plt.show()

##### 1. Why did you pick the specific chart?

I have to show a part to a whole relationship and percentage of both values and in this case donot chart is easy to understand

##### 2. What is/are the insight(s) found from the chart?

adults are making more booking than others

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Business can be grow by continuing attracting adults

#### Chart - 4

In [None]:
# Chart - 4 visualization code
sns.barplot(x='hotel',y='total_of_special_requests',data=df,hue='customer_type',palette='dark:g')
plt.show()

##### 1. Why did you pick the specific chart?



I want to show comparision between the type of hotel and special rquests that's why i chose to use bar plot this is very easy to understand in this case

##### 2. What is/are the insight(s) found from the chart?

In City hotel more no of special requests are coming

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Fullfill all the requests so that customer get impressed and come again to the hotel

#### Chart - 5

In [None]:
##setting a figure size for our chart
plt.figure(figsize = (12,8))
##creating a count plot using seaborn
sns.countplot(x = 'meal',hue = 'hotel',data = df)
## labeling x axis
plt.xlabel("Meal-Type",fontsize = 15)
## labeling y axis
plt.ylabel("Customer Prefer Count",fontsize = 12)
## Tilte of our chart
plt.title('Distribution of Meal-Type in term of customer preferation',fontsize = 15)
## setting legend title
plt.legend(title ='Hotel')
## Displaying our chart
plt.show()

##### 1. Why did you pick the specific chart?

I prefer this chart because I want to show which meal type is more prefered by customers in both hotels

##### 2. What is/are the insight(s) found from the chart?

I found out that both hotels have high deamnd of BB meal type.
There is no demand of FB meal type in City hotel and very less demand in Resort hotel.As we can see there is almost no demand of SC meal type in Resort hotel

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes As we can see customer prefered BB meal type most so we can focused on that and have a good supply of BB meal type so the customers did not face any issue so they will come again and we can also cost cutting of FB meal type in Resort hotel so we can save money

#### Chart - 6

In [None]:
# Chart - 6 visualization code
sns.barplot(x='hotel',y='total_nights_stay',data=df,hue='customer_type')
plt.show()

##### 1. Why did you pick the specific chart?

I want to show comparision between the type of hotel and total_nights_stays that's why i chose to use bar plot this is very easy to understand in this case

##### 2. What is/are the insight(s) found from the chart?

Customer likes to spend nights at resort hotels

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

yes,By arranging events at nights along with the facilities to customers business can be grow

#### Chart - 7

In [None]:
# Chart - 7 visualization code
month_order = ['January', 'February', 'March', 'April', 'May', 'June', 'July', 'August', 'September', 'October', 'November', 'December']

# Convert the 'arrival_date_month' column to a Categorical data type with the custom order
df['arrival_date_month'] = pd.Categorical(df['arrival_date_month'], categories=month_order, ordered=True)

##Setting figure size for our chart
plt.figure(figsize=(12, 8))
## creating chart using seaborn
sns.countplot(x='arrival_date_month', hue='hotel', data=df)

plt.xlabel('Month',fontsize = 15)
plt.ylabel('Count',fontsize = 12)
plt.title('Distribution of bookings in term of Month')
plt.legend(title='Hotel', loc='upper right')  # Add legend with 'hotel' as title

plt.show()

##### 1. Why did you pick the specific chart?

I choose count chart because the goal is to show the distribution of bookings count for both hotels in term of months

##### 2. What is/are the insight(s) found from the chart?

I found out that March to August means summers are also hot in bookings and September to feburary means Winters are also cold in bookings

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes this insight can make positive growth if we maintained good service in summers and prepared for summers before so we can take more bookings

#### Chart - 8

In [None]:
# Chart - 8 visualization code
deposite_type_count = df['deposit_type'].value_counts()
plt.bar(x = deposite_type_count.index,height = deposite_type_count.values)
plt.title('Distubution of deposit type while booking')
plt.xlabel('Deposit Type',fontsize = 12)
plt.show()

##### 1. Why did you pick the specific chart?

I want to show comparision between the type of deposit that's why i chose to use bar plot this is very easy to understand in this case

##### 2. What is/are the insight(s) found from the chart?

I found out that almost all guest does not pay any deposit in advanced.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes with the help of this insight we can make a positive bussiness impact if we make a base line that if you want to book a room you have to deposit some amount that is refundable when they want to cancel the amount is refund to this but doing this can avoid making unwanted bookings So we have more genuene bookings

#### Chart - 9

In [None]:
# Chart - 9 visualization code
df['country'].value_counts().head(5)
sns.barplot(x=df['country'].value_counts().head(5).index,y=df['country'].value_counts().head(5).values,palette='rainbow')
plt.title('Top 5 countries with most bookings')
plt.show()

##### 1. Why did you pick the specific chart?

I have to compare different country across bookings

##### 2. What is/are the insight(s) found from the chart?

PRT country is doing more booking as compare to others

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Focus should be given on other countries where less amount of booking is done

#### Chart - 10

In [None]:
# Chart - 10 visualization code
x=df['market_segment'].value_counts().head(5)
sns.barplot(x=x.index,y=x.values)
plt.title('Top 5 market segment')
plt.show()

##### 1. Why did you pick the specific chart?

I have compare category with values

##### 2. What is/are the insight(s) found from the chart?

Top 5 market segments who makes bookings

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

#### Chart - 11

In [None]:
# Chart - 11 visualization code
x=df['agent'].value_counts().head(5)
sns.barplot(x=x.index,y=x.values,palette= ['#a83232','#a87d32','#6d32a8','#32a889','#c70c0c'],hue=x)
plt.title('Top 5 agents')
plt.show()

##### 1. Why did you pick the specific chart?

I prefer this chart because I want to compare variable values and and for this bar chart is easy to understand

##### 2. What is/are the insight(s) found from the chart?

I found out Agent No. 9.0 has the made most bookings

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

With help of this insight stack holders can encourage other agents by rewardings agent with most bookings.

#### Chart - 12

In [None]:
# Chart - 12 visualization code
customer_type = df['customer_type'].value_counts()

customer_type.plot(figsize=(15,5))

plt.xlabel('Customer Type',fontsize = 20)
plt.ylabel('Count',fontsize = 15)
plt.title('Customer type and there booking count',fontsize = 20)
plt.show()


##### 1. Why did you pick the specific chart?

I choose line chart because I want to share a specific trend between customer-type and there bookings

##### 2. What is/are the insight(s) found from the chart?

I found out that Transient customers have most bookings among all

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Hotels can run promotional offers to increase the number of bookings in other categories such as hotel should offer discounts for groups

#### Chart - 13

In [None]:
# Chart - 13 visualization code
total_revenue =df['adr'].sum()
most_revenue = df.groupby('hotel')['adr'].sum()
plt.figure(figsize = (10,6))
plt.pie(most_revenue.values,labels = most_revenue.index,autopct =lambda p: 'Rs.{:.0f}\n\n({:.1f}%)'.format(p * sum(most_revenue) / 100, p),
        explode = (0.03,0.03),startangle = 80,shadow = True,textprops = {"fontsize":12})
plt.title('Distribution of Total Revenue between both hotels ',fontsize = 15)
plt.show()

##### 1. Why did you pick the specific chart?

I prefer this chart because I want to show the portion of both hotels in total revenue

##### 2. What is/are the insight(s) found from the chart?

I found out that city hotel revenue is higher than resort hotel considering their booking percentage their revenue is normal

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

This insight help us to know revenue stats of hotels and With this insight where our most money are come from and we can invest in that hotel more and generate more profits

#### Chart - 14 - Correlation Heatmap

In [None]:
# Correlation Heatmap visualization code
plt.figure(figsize = (16,10))
sns.heatmap(df.corr(numeric_only=True),annot = True)
plt.title('Correlations of the columns',fontsize= 20)
plt.show()

##### 1. Why did you pick the specific chart?

Correlational heatmaps was used to find potentiol relationship between variables to understand the strength of their relationship

##### 2. What is/are the insight(s) found from the chart?

1.lead time and total_stay are positivily corelated if customer stays more the lead time increases.

2.total_people and adr are corelated with each other.That the more the people the more the adr.

3.is_repeated guest and previous bookings not cancelled has strong corelation.That mean repeated guest don't cancel their bookings

#### Chart - 15 - Pair Plot

In [None]:
# Pair Plot visualization code
sns.pairplot(df)
plt.show()

##### 1. Why did you pick the specific chart?

A pair plot allows us to see both distribution of single variables and relationship between two variables.We can see relationship between all the columns with each other in above chart

##### 2. What is/are the insight(s) found from the chart?


From the above pair chart we can see if cancellation increases then total stay also decreases
As the total number of people increases adr also increases Thus adr are direclty proposanal to number of people

## **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ?
Explain Briefly.

1.City Hotel is more preffered by Guests so stack holders should maintain the service in City Hotel so the repeated customers are incrased and offer discount on Resort Hotel so the booking increases.
2.Around 27.52% booking are canceled and there is also no deposite while booking so stack holder should add 10% advanced payment while booking so unnessary bookings are decreased and should offer some extra discount if customer pay full payment in advanced.
3.Stack holders should increase A room type because they are most preffered by guests and L room type is less preffered so convert some of them to A type so the cost is also less.
4.Stack holders should have open some activity area like gaming golf and party area because there are most of the guests are not with kids.
5.Stack holders should also add a manditory column name martial status so they should more identify with customer needs also launch pakage for couples.
6.Most of the guests are from Portugal so stack holders should add some cultral acitivity of portugal in hotel so the repeated customer rate increases
7.Stack holders should also focused on other sources of bookings now on bookings are mostly come from TA/TO stack holders should increases direct booking by giving some discounts.
8.Hotel should maintain the avablity of BB meal type so the customer will suggest more people to book our hotels.
9.Stack Holders should also add feedback system on check out and work on the feedback given by customers so stack holders know what customer wants

# **Conclusion**

Inorder to make a bussiness objective I would suggest client to make the price dynamic and introduce offers and pakage to attract new customers.To retain the exisiting customers ensure there comfort stay always take feedback in their check out and continuosly improvise them.

### ***Hurrah! You have successfully completed your EDA Capstone Project !!!***