# **Project Name**    - Hotel Booking Analysis




##### **Project Type**    - Exploratory Data Analysis (EDA)
##### **Contribution**    - Individual
##### **Name -**Rakesh Kumar


# **Project Summary -**

As an overview, hotel reservation systems typically work by allowing guests to reserve the dates for their stay based on the real-time display of your rates and inventory across all booking channels and finalise their reservation through an online payment portal.

A hotel booking system helps streamline your operations by automating many of these tasks. For example, you can set up automatic confirmation emails to be sent to guests when they make a reservation. This saves you a lot of time and ensures that your guests always have the latest information about their reservation.

In addition, a hotel booking system can help you track your staff's performance and sales. This data can be extremely valuable in identifying areas of improvement and taking action to improve your hotel's overall performance.

For the given Dataset we have

* Imported Libraries
* Loaded Dataset
* Visualize Missing values
* Handled Missing values
* Shown Dataset Information
* Understanding our dataset
* Data Vizualization
* Storytelling & Experimenting with charts
* Understand the relationships between variablesmary here within 500-600 words.

# **GitHub Link -**

https://github.com/RakeshReddi26/My_Projects

# **Problem Statement**


For this project we will be analyzing Hotel Booking data. This data set contains booking information for a city hotel and a resort hotel, and includes information such as when the booking was made, length of stay, the number of adults, children and or babies, and the number of available parking spaces. Hotel industry is a very volatile industry and the bookings depends on above factors and many more. The main objective behind this project is to explore and analyze data to discover important factors that govern the bookings and give Insights to hotel management.

The aim of this study is to conduct a comprehensive Exploratory Data Analysis (EDA) of hotel booking data to uncover valuable insights and patterns that can inform strategic decisions for a hotel business. The analysis will focus on understanding booking trends, customer preferences, and factors influencing booking cancellations.

#### **Define Your Business Objective?**

The EDA of hotel booking data is expected to yield actionable insights that can guide the hotel management in making informed decisions, such as optimizing pricing strategies, refining marketing efforts, enhancing guest experience, and revising cancellation policies. Additionally, this analysis will contribute to a deeper understanding of customer behavior and industry dynamics, enabling the hotel to stay competitive and adaptive in a rapidly changing market.

The EDA will involve data cleaning, visualization, statistical analysis, and exploratory techniques. Python programming and relevant libraries (e.g., Pandas, Matplotlib, Seaborn) will be utilized for data manipulation and visualization. The analysis will be documented through visualizations, summary statistics, and concise reports to present findings and insights effectively.

The main goal of online hotel booking systems is to give clients a quick and easy way to make hotel reservations. This includes offering a user-friendly interface, safe payment processing, and access to the most recent data regarding available rooms. These systems should also be able to handle reservations rapidly so that clients can promptly receive the space they require. Overall, it seems that the majority of online hotel reservation systems in use today are accomplishing these goals. Customers may quickly complete the transaction without any problems or delays and have access to extensive information about each property before making their booking decision.

# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required.
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits.
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 20 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule.

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Importing the required libraries
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import warnings
warnings.filterwarnings('ignore')


### Dataset Loading

In [None]:
# Loading Dataset
hotel_bookings = pd.read_csv("/content/Hotel Bookings data.csv")

### Dataset First View

In [None]:
# To get 1st look of the data set (1st five rows as (.head), last five rows as (.tail))
hotel_bookings

### Dataset Rows & Columns count

In [None]:
# To get count of Rows & Columns
hotel_bookings.shape

### Dataset Information

In [None]:
# Info of the Dataset
hotel_bookings.info()

#### Duplicate Values

In [None]:
# Dataset Duplicate Value Count
hotel_bookings.duplicated().count()

#### Missing Values/Null Values

In [None]:
# Missing Values/Null Values Count
hotel_bookings.isna().sum().sum()


In [None]:
# Visualizing the missing values
hotel_bookings.isna().sum()

In [None]:
# Dropping company attribute for having more missing/null values
hotel_bookings.drop(['company'], axis=1, inplace=True)


In [None]:
# Filling the missing values
hotel_bookings.fillna({"agent":hotel_bookings['agent'].mean(),
                       "children":hotel_bookings['children'].mean(),
           "country": hotel_bookings['country'].mode()[0]},
          inplace=True)
hotel_bookings

In [None]:
# re-checking the null values
hotel_bookings.isna().sum()

### What did you know about your dataset?

The dataset is about hotel booking analysis, in the data we have the number of people stayed in the hotel and how many days they stayed and also the arrival date ,total number of adults , childrens and kind of meals which they had, This dataset includes information about two different types of hotels (City and Resort)

## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns
hotel_bookings.columns

In [None]:
# Dataset Describe
hotel_bookings.describe()

### Variables Description

1. hotel: The category of hotels, which are two resort hotel and city hotel.
2. is_cancelled : The value of column show the cancellation type. If the booking was cancelled or not. Values[0,1], where 0 indicates not cancelled.
3. lead_time : The time between reservation and actual arrival.
4. arrival_date_year : Year of arrival date
5. arrival_date_month : Month of arrival date
6. arrival_date_week_number : Week number for arrival date
7. arrival dat day : Day of arrival date
8. stayed_in_weekend_nights: The number of weekend nights stay per reservation
9. stayed_in_weekday_nights: The number of weekday nights stay per reservation.
10.adults : Number of adults
11.children : Number of children
12.babies : Number of babies
13.meal: Meal preferences per reservation.[BB,FB,HB,SC,Undefined]
14.Country: The origin country of guest
15.market_segment: This column show how reservation was made and what is the purpose of reservation. Eg, corporate means corporate trip, TA for travel agency.
16.distribution_channel: The medium through booking was made.[Direct,Corporate,TA/TO,undefined,GDS.]
17.Is_repeated_guest: Shows if the guest is who has arrived earlier or not.Values[0,1]-->0 indicates no and 1 indicated yes person is repeated guest.
18.previous_cancellation : Was there a cancellation before
19.previous bookings : Count of previous bookings.
20.reserved_room_type : Type of room reserved
21.assigned_room_type : Type of room assigned
22.booking changes : Count of changes made to booking
23.deposit_type : Deposit type
24.agent : Booked through agent
25.days_in_waiting_list: Number of days between actual booking and transact.
26.customer_type: Type of customers( Transient, group, etc.)
27.required_car_parking : If car parking is required
28.total_of_special_req : Number of additional special requirements
29.reservation_status : reservation_status_date
30.Reservation of status : Date of the specific status

### Check Unique Values for each variable.

In [None]:
# Check Unique Values for each variable.
hotel_bookings.nunique()


In [None]:
# Dropping company attribute for having more unique values
hotel_bookings.drop(['adr'], axis=1, inplace=True)

## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:
# Write your code to make your dataset analysis ready.
# It describes the attribute with object datatype
hotel_bookings.describe(include=["object"])

In [None]:
#To show the data types
hotel_bookings.dtypes


In [None]:
 # It counts the total values in that column
hotel_bookings['hotel'].value_counts()

In [None]:
# It tells the percentage of counts of total values in that column
hotel_bookings.hotel.value_counts(normalize = True)*100

In [None]:
# It counts the total values in that column of top 10
hotel_bookings["country"].value_counts().head(10)

In [None]:
# It tells the percentage of counts of total values in that column of top 10
hotel_bookings.country.value_counts(normalize = True).head(10)*100

In [None]:
# To show the co-relation
hotel_bookings.corr()


### What all manipulations have you done and insights you found?

I have collected and transformed Raw data into another format for better understanding, decision-making, accessing, and analysis in less time. Like I have dropped the attribute which have more null value and the attribute which has less null values I have filled it by using mean for numeric data type and mode for categorical data type. And also I dropped the column adr(average daily rate) which is having more unique values. Like these we have cleaned the dataset for doing our EDA in less time.

## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

#### Chart - 1

In [None]:

#Chart 1 visualization code (Box Plot)
hotel_bookings.boxplot(column =['lead_time'], grid = False)



##### 1. Why did you pick the specific chart?

It is to show the outliers in the lead time attribute.

Lead time - the period of time (most typically measured in calendar days) between when a guest makes the reservation and the actual check-in/arrival date.

##### 2. What is/are the insight(s) found from the chart?

By seeing the chart we can say that more outliers likes some of the people are reserving the hotel before 700 days and some of them are booking in advance before 400 days.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.



Yes, the gained insights help creating a positive business impact that we can know in advance about the reserved booking in particular hotel and how many vacancies we have in our hotel to give rooms for the customers.

#### Chart - 2

In [None]:
# Chart - 2 visualization code
import matplotlib.pyplot as plt (Heatmap)
import seaborn as sns
import pandas as pd

# Assuming hotel_bookings is our DataFrame
# Example of dropping non-numeric columns (like categorical columns)
numeric_columns = hotel_bookings.select_dtypes(include=['number'])

plt.figure(figsize=(15, 10))
sns.heatmap(numeric_columns.corr(), annot=True)
plt.show()


##### 1. Why did you pick the specific chart?

This chart is to show the co-relation between all the attributes.

##### 2. What is/are the insight(s) found from the chart?

Here we can see that no 2 attributes are either highly co-related or negatively co-related.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.


It is used when there are highly co-related or negatively co-related then we can drop those columns for better accuracy while predicting.

#### Chart - 3

In [None]:
# Chart - 3 visualization code (Countplot - displays the results as a bar chart)
sns.countplot(x='adults',hue='hotel', data=hotel_bookings)
plt.title("Number of adults in each hotel type")
plt.show()


##### 1. Why did you pick the specific chart?

To visualize things about the Adults who arrives the most in hotels

##### 2. What is/are the insight(s) found from the chart?

This bar graphs for knowing the count of Adults arriving the Resort and City hotel, with respect to the count of them.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

As in the above data more chances of Adults arriving are couples. So we can arrange the things in the hotels to morely attracted to couples.

#### Chart - 4

In [None]:
# Chart - 4 visualization code (Countplot)
plt.figure(figsize=(12, 6))
sns.countplot(x='children',hue='hotel', data=hotel_bookings)
plt.title("Number of children in each hotel type")

(0.5, 1.0, 'Number of children in each hotel type')
plt.show()

##### 1. Why did you pick the specific chart?

To visualize things about the Children who arrives hotel

##### 2. What is/are the insight(s) found from the chart?

This bar graphs for knowing the count of Children arriving the Resort and City hotel, with respect to the count of them.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

This tells about the children very rarely arriving its a negative growth. So we have to make our hotels by arranging some games and toys to attractive them.

#### Chart - 5

In [None]:
# Chart - 5 visualization code (Countplot)
sns.countplot(x='babies',hue='hotel', data=hotel_bookings)
plt.title("Number of babies in each hotel type")

(0.5, 1.0, 'Number of babies in each hotel type')
plt.show()

##### 1. Why did you pick the specific chart?

To visualize things about the Babies who arrives hotel

##### 2. What is/are the insight(s) found from the chart?

This bar graphs for knowing the count of Babies arriving the Resort and City hotel, with respect to the count of them.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

We can clearly see that babies arriving is very much less. So we reduce the supply like, it can make to be limited supply for babies and give some basic requirements is enough.

#### Chart - 6

In [None]:
# Chart - 6 visualization code (Bar Plot)
# 0 is not canceled, 1 i canceled
plt.figure(figsize=(8,4))
sns.barplot(x= 'arrival_date_year',y='lead_time', hue='is_canceled', data= hotel_bookings)
plt.title('Leadtime and Cancellations in particular year')

(0.5, 1.0, 'Leadtime and Cancellations in particular year')
plt.show()

##### 1. Why did you pick the specific chart?

To see how many cancellations are occuring in each year.

##### 2. What is/are the insight(s) found from the chart?

Lead time means the time period between the reservation date and arrival date. Here we can see that Most of them are cancellations are occuring in that leap time in every year.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

It is an negative growth. But we can make it positive like, after booking the Hotel rooms we can tell our customers that the amount will be refunded only for certian time period or else we can charge then a certain amount.

#### Chart - 7

In [None]:
# Chart - 7 visualization code (Pie Plot)
# 0 is not canceled, 1 i canceled
hotel_bookings['is_canceled'].value_counts().plot.pie(autopct='%1.1f%%')
plt.title('Overall canceled and not canceled percentage')
plt.show()

##### 1. Why did you pick the specific chart?

To see overall canceled and not canceled percentage in Hotels

##### 2. What is/are the insight(s) found from the chart?

We can see that there are 63% of the people who are not canceled their booking. where as the remaining 37% were canceled.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, by using that data we can make some changes in our refund policy and to make people to reduce the count of those cancellations

#### Chart - 8

In [None]:
# Chart - 8 visualization code (Countplot)
sns.countplot(x='hotel',hue="is_canceled", data=hotel_bookings)
plt.title("Cancelation rates in particular hotel type")
plt.show()

##### 1. Why did you pick the specific chart?

It is to show that the cancellation count for particular Hotel type

##### 2. What is/are the insight(s) found from the chart?

In resort hotels the cancellations are low with respect to their not cancalled in the same hotel. But where as in city hotels the rate of cancelled are like more than half of the not cancelled customers.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

It leads to negative growth in city hotel because rate of cancelled are more than half of the not cancelled customers.

#### Chart - 9

In [None]:
# Chart - 9 visualization code (Histogram plot)
sns.distplot( a=hotel_bookings["agent"], hist=True, bins = 20
             ).set(title = 'Density of bookings through an agent')

[(0.5, 1.0, 'Density of bookings through an agent')]

##### 1. Why did you pick the specific chart?

It tells about the density of agents who are working on behalf of customers to book the rooms.

##### 2. What is/are the insight(s) found from the chart?

We can clearly see that less number of customers are booking their rooms on behalf of them. As we can see only some of the agents in range of 100 and in between 200 and 300 are only their for bookings.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Are there any insights that lead to negative growth? Justify with specific reason.

Leads to negative growth for the agents. Because mostly customers are booking the hotels on their own.

#### Chart - 10

In [None]:
# Chart - 10 visualization code (Countplot)
sns.countplot(x='arrival_date_year',hue='hotel', data=hotel_bookings)
plt.title("Type hotels and there arrivals every year")
plt.show()

##### 1. Why did you pick the specific chart?

To see the arrival rate of customers in every year.

##### 2. What is/are the insight(s) found from the chart?

Here in 2016 the was very much high in arrivals of the customers . It totally Doubled the business when compare to 2015. But in 2017 the arrival rate was decresed.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Its a negative growth in 2017 compare to 2016. we have much demand in 2016, we have to make that arrivals rate increased or may be to that constant rate but it was decresed. So we have to find the reasons for that decreasing arrivals rate and make it to not to happen in future.

#### Chart - 11

In [None]:
#Chart 11 visualization code (Countplot)
plt.figure(figsize=(12, 6))
sns.countplot(data = hotel_bookings, x = 'arrival_date_month')
plt.title('count of arrivals each month')
plt.show


##### 1. Why did you pick the specific chart?

It tells the count of arrivals in every month.


##### 2. What is/are the insight(s) found from the chart?

We can see that the top 3 arrival months are August, July and May. And the least 3 are January, December and November.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, we can take it as a positive like, in the ending 2 months of each year and starting of the year (Jan) we can make so renovations for our hotels in that particular time. So it may not effect to customers. We can make those changes and things happen by planing it eventually.

#### Chart - 12

In [None]:
# Chart - 12 visualization code (Histogram plot)
sns.distplot( a=hotel_bookings["arrival_date_day_of_month"], hist=True, bins = 20
             ).set(title = 'Density of arrival of particular day')


##### 1. Why did you pick the specific chart?

It tells the density of arrivals in particular day.

##### 2. What is/are the insight(s) found from the chart?

We can see more arrivals in the range of 15th day and after the 25th day.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

In the middle and ending of the months mostly customers are arriving in every months.

#### Chart - 13

In [None]:
# Chart - 13 visualization code (Pie chart)
hotel_bookings['customer_type'].value_counts().plot.pie(autopct='%1.1f%%')
plt.title('Percentage of type of customers arriving')
plt.show()

##### 1. Why did you pick the specific chart?

It shows about the percentage of type of customers arriving the hotels

##### 2. What is/are the insight(s) found from the chart?

Transient guest means an overnight lodging guest who does not intend to stay for any permanent length of time. Here transient guests are more in number like 75% of the customers are transient. Transient-party guests are individuals or groups that are occupying less than 10 rooms per night. Contract made between the Owner and the Renter when the Renter makes payment for a Stay in a Property on a temporary basis.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

More no.of customers are transient guest like the like in a short time, which is negative growth. we have to attract customers to arrive our hotels for long period of time.

#### Chart - 14 - Correlation Heatmap

In [None]:
# Correlation Heatmap visualization code (Pie chart)
hotel_bookings['deposit_type'].value_counts().plot.pie(autopct='%1.1f%%')
plt.title('Percentage of deposit type')
plt.show()


##### 1. Why did you pick the specific chart?

To see the percentage of deposits done by the customers.

##### 2. What is/are the insight(s) found from the chart?

Almost 88% of customers are booking in advance with 0 deposit and .1% is Refundable. Like the hotels are not charging for advance so that 87.6% of guests are booking with 0 deposit.

#### Chart - 15 - Pair Plot

In [None]:
# Pair Plot visualization code (Countplot)
plt.figure(figsize=(12, 6))
sns.countplot(data = hotel_bookings, x = 'market_segment',hue='hotel', palette='cool')
plt.title('Types of market segment')


##### 1. Why did you pick the specific chart?

To see the market segments while bookings in particular hotel.

##### 2. What is/are the insight(s) found from the chart?

Hotel market segmentation is the process of grouping hotel guests into categories based on their booking patterns and travel habits. online TA - online travel agents make bookings on behalf of their users. there are more booking patterns happening and after that Offline TA/TO

Chart - 16

In [None]:
#stacked bar plot
# 0 not repeated 1 repeated
hotel_bookings.groupby(['hotel', 'is_repeated_guest']
              )['is_repeated_guest'].count().unstack().plot(kind='bar', stacked=True,
                                                             ).set(title = 'Repeated guest in type hotels')

1. Why did you pick the specific chart?

  This is to show the count of the guests repeated their bookings.

2. What is/are the insights found from the chart?

  In both of the hotel types the repeated guests are less.

3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.


         We have to increase the rate of that repeated guests. Means we have to attract them and we have to take a feedback while their are checking out and so that we can improve on what we are lagging.

## **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ?
Explain Briefly.

In the hotels the staff have to treat the customers well so that they will stay for a long time and they will visit again and again. Most of the customers are transient, which means they are for a short time so we have to attract them.

In the hotels they are more cancellations occuring so we have to reduce that. So take a review from the customers that why they are canceling if there was an issue from our side then we can do that not happened again. If there is no particular reason then make some cancellation charges for that so that it may help to reduce the cancellations.

And also the refund policy are we giving the reservations with 0 deposit so that many of them are booking with no cost so it may be also an reason for cancellation. So we have to make any deposit amount for reservation or else we have to charge for cancellation fee we have to implement any one it that.

To attract children we have to make some changes in hotels like renovating our hotels and keep some playstations etc.

We have to take the feedback from our customers so that by analyzing it we can work on that and may increase the count of repeated customers.

# **Conclusion**

DA is crucial for data analysis because it offers insightful data analysis by looking at distributions, correlations, and trends. The EDA in this project clarifies several aspects of hotel reservations and offers suggestions to enhance City Hotel and Resort Hotel's operations and income generating. By using this EDA not only for us to understand for data modelling but also we can explain what are the things going on what should we change in the future and these are the things we can show to ours clients visually. It is an understandable way for our clients to see those visualizations.

In conclusion, the exploratory data analysis conducted on the hotel booking data has provided valuable insights into various aspects of customer behavior, booking patterns, and influencing factors.

### ***Hurrah! You have successfully completed your EDA Capstone Project !!!***