<a href="https://colab.research.google.com/github/TheVaibhav125/Hotel-Booking-Analysis-Exploratory-Data-Analysis/blob/main/EDA_Submission_Cohort_Osaka_Vaibhav_Lawande.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Project Name**    -



##### **Project Type**    - EDA/Regression/Classification/Unsupervised
##### **Contribution**    - Individual
##### **Name**    - Vaibhav Lawande


# **Project Summary -**

aim of this project was to perform exploratory data analysis (EDA) on a hotel booking dataset in order to gain insights into customer behavior and booking patterns. This data set contains booking information for a city hotel and a resort hotel and includes information such as when the booking was made, length of stay, the number of adults, children, and/or babies, and the number of available parking spaces, among other things.First, we data processing and data manipulation which involved removing missing values, handling duplicates.Then we do Data visualization using various  graphical techniques, and we discovered several intresting insights from this.


# **GitHub Link -**

https://github.com/TheVaibhav125/Hotel-Booking-Analysis-Exploratory-Data-Analysis.git

# **Problem Statement**


Have you ever considered the ideal season of the year to reserve a hotel room? Alternatively, how long should I remain to get the greatest daily rate? What if you wanted to foretell whether a hotel would unreasonably frequently receive unusual requests? You can investigate those questions using the data from hotel reservations! This data collection comprises reservation details for a city hotel and a resort hotel, as well as details like the date the reservation was made, the duration of the stay, the number of adults, kids, and/or babies, and the number of parking spaces that are available. The data is free of any information that may be used to identify an individual. Explore and assess the information to find important details



#### **Define Your Business Objective?**


The primary business objective of this project is to leverage the hotel booking dataset to gain valuable insights into customer behavior and booking patterns. By performing comprehensive exploratory data analysis (EDA), we aim to:

1. **Identify key trends and patterns:**
   - Understand the distribution of bookings across different seasons, weekdays, and lengths of stay.
   - Analyze factors influencing booking decisions, such as the number of adults, children, and babies, as well as the need for parking spaces.

2. **Discover hidden insights:**
   - Uncover potential relationships between various booking characteristics and identify any unusual or unexpected patterns.
   - Gain a deeper understanding of customer preferences and requirements to improve hotel services and marketing strategies.

3. **Support decision-making:**
   - Provide valuable insights to hotel management to optimize pricing strategies, resource allocation, and marketing campaigns.
   - Enable informed decisions based on data-driven analysis and findings.

By achieving these objectives, the business aims to enhance its understanding of customer behavior, improve operational efficiency, and ultimately increase revenue and profitability.


# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required.
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits.
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 20 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule.

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Import Libraries
# Importing Libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns

# Set maximum columns to be display in datafram as 36
pd.set_option("display.max_columns", 36)
plt.style.use('seaborn')

# Setting fontsizes,fontweight,label weight for lebels,titles.
plt.rcParams["font.weight"] = "bold"
plt.rcParams["axes.labelweight"] = "bold"
plt.rcParams["axes.titlesize"] = 25
plt.rcParams["axes.titleweight"] = 'bold'
plt.rcParams['xtick.labelsize']=15
plt.rcParams['ytick.labelsize']=15
plt.rcParams["axes.labelsize"] = 20
plt.rcParams["legend.fontsize"] = 15
plt.rcParams["legend.title_fontsize"] = 15


### Dataset Loading

In [None]:

from google.colab import drive
drive.mount('/content/drive')

In [None]:
filepath = "/content/Hotel Bookings.csv"

In [None]:
# Load Dataset
hotel_data = pd.read_csv(filepath)

### Dataset First View

In [None]:
# Dataset First Look
hotel_data

### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count
print(f" We have total {hotel_data.shape[0]} rows and {hotel_data.shape[1]} coulmns")

### Dataset Information

In [None]:
# Dataset Info
hotel_data.info()

In [None]:
#Create Copy of Our Dataset

hotel_data_copy = hotel_data.copy()

#### Duplicate Values

In [None]:
# Dataset Duplicate Value Count
hotel_data_copy.duplicated().value_counts() #true means duplicate rows

#### Missing Values/Null Values

In [None]:
# Missing Values/Null Values Count
check_null_value = hotel_data_copy.isna().sum().sort_values(ascending=False)
check_null_value

In [None]:
# Visualizing the missing values by using heatmap
plt.figure(figsize=(25,10))
sns.heatmap(hotel_data_copy.isnull(), cbar=False, yticklabels=False ,cmap='viridis')
plt.xlabel("Name Of Columns")
plt.title("Places of missing values in column")


### What did you know about your dataset?

This dataset contains information on records for client stays at hotels. More specifically, it contains booking information for a city hotel and a resort hotel, and includes information such as when the booking was made, length of stay, the number of adults, children, and/or babies, and the number of available parking spaces, among other things. For the purpose of this post, We only focused on some of these variables to examine.

## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns
hotel_data_copy.columns

In [None]:
# Dataset Describe looking at the min, max values,mean values etc. NAN values for mean,25% ,50%,75%,max indicates those are categorical columns.
hotel_data_copy.describe(include='all')

### Variables Description

# **Data Description:**


1.   hotel : Hotel(Resort Hotel or City Hotel)

2.   is_canceled : Value indicating if the booking was canceled (1) or not (0)

3.   lead_time :* Number of days that elapsed between the entering date of the booking into the PMS and the arrival date*

4.   arrival_date_year : Year of arrival date

5.   arrival_date_month : Month of arrival date

6.   arrival_date_week_number : Week number of year for arrival date

7.   arrival_date_day_of_month : Day of arrival date

8.   stays_in_weekend_nights : Number of weekend nights (Saturday or Sunday) the guest stayed or booked to stay at the hotel

9. stays_in_week_nights : Number of week nights (Monday to Friday) the guest stayed or booked to stay at the hotel

10.  adults : Number of adults

11.  children : Number of children

12.   babies : Number of babies

13.   meal : Type of meal booked. Categories are presented in standard hospitality meal packages:


14.   country : Country of origin.`
15.   market_segment : Market segment designation. In categories, the term “TA” means “Travel Agents” and “TO” means “Tour Operators”

16.   distribution_channel : Booking distribution channel. The term “TA” means “Travel Agents” and “TO” means “Tour Operators”
17.   is_repeated_guest : Value indicating if the booking name was from a repeated guest (1) or not (0)


18.   previous_cancellations : Number of previous bookings that were cancelled by the customer prior to the current booking

19.   previous_bookings_not_canceled : Number of previous bookings not cancelled by the customer prior to the current booking

20.   reserved_room_type : Code of room type reserved. Code is presented instead of designation for anonymity reasons.

21.   assigned_room_type : Code for the type of room assigned to the booking.

22.   booking_changes : Number of changes/amendments made to the booking from the moment the booking was entered on the PMS until the moment of check-in or cancellation

23.   deposit_type : Indication on if the customer made a deposit to guarantee the booking.

24.   agent : ID of the travel agency that made the booking
25.   company : ID of the company/entity that made the booking or responsible for paying the booking.

26.   days_in_waiting_list : Number of days the booking was in the waiting list before it was confirmed to the customer


27.   customer_type : Type of booking, assuming one of four categories

28.   adr : Average Daily Rate as defined by dividing the sum of all lodging transactions by the total number of staying nights

29.   required_car_parking_spaces : Number of car parking spaces required by the customer

30.   total_of_special_requests :* Number of special requests made by the customer (e.g. twin bed or high floor)*


31.   reservation_status : Reservation last status, assuming one of three categories



*   Canceled – booking was canceled by the customer

*   Check-Out – customer has checked in but already departed

*   No-Show – customer did not check-in and did inform the hotel of the reason why
  

32.  reservation_status_date : Date at which the last status was set. This variable can be used in conjunction with the ReservationStatus to understand when was the booking canceled or when did the customer checked-out of the hotel



















In [None]:
# converting object type to datetime
hotel_data_copy['reservation_status_date'] = pd.to_datetime(hotel_data_copy['reservation_status_date'], format = '%Y-%m-%d')

### Check Unique Values for each variable.

In [None]:
columns = hotel_data_copy.columns

for col in columns:
  print(hotel_data_copy[col].unique())


In [None]:
# Check unique values in categorical columns

categorical_cols=list(set(hotel_data_copy.drop(columns=['reservation_status_date','country','arrival_date_month']).columns)-set(hotel_data_copy.describe()))
for col in categorical_cols:
  print(f'Unique values in variable {col} are:, {(hotel_data_copy[col].unique())}')

## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:
#dropping the duplicate rows
hotel_data_copy= hotel_data_copy.drop_duplicates()

In [None]:
# data set reduced
hotel_data_copy.shape

### What all manipulations have you done and insights you found?

We Have Null values in columns- Company, agent, Country,children.

1.  For company and agent I will fill the Missing values with 0

2.  For country I will fill Missing values with boject 'Others'. ( assuming while collecting data country was not found so user selected the 'Others' option.)

3.   As the count of missing values in Children Column is only 4, so we can replace with 0 considering no childrens.


In [None]:
# Filling/replacing null values with 0.

null_column = ['agent', 'children', 'company']
for col in null_column:
  hotel_data_copy[col].fillna(0, inplace = True)

# Replacing NA values with 'others'

hotel_data_copy['country'].fillna('others', inplace = True)

In [None]:
# droppping all 166 those rows in which addtion of of adlults ,children and babies is 0. That simply means  no bookings were made.
len(hotel_data_copy[hotel_data_copy['adults'] + hotel_data_copy['children']+ hotel_data_copy['babies'] == 0])
hotel_data_copy.drop(hotel_data_copy[hotel_data_copy['adults'] + hotel_data_copy['children']+ hotel_data_copy['babies'] == 0].index, inplace = True)

In [None]:
# Add New Columns

hotel_data_copy ['Total_People'] = hotel_data_copy['adults'] + hotel_data_copy['children'] + hotel_data_copy['babies']
hotel_data_copy ['Total_Stay'] = hotel_data_copy['stays_in_weekend_nights'] + hotel_data_copy['stays_in_week_nights']

In [None]:
hotel_data_copy.shape

## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

# **1) Which type of hotel is mostly prefered by the guests?**

In [None]:
# Chart - 1 visualization code
# Visualizsing the by pie chart.
hotel_data_copy['hotel'].value_counts().plot.pie(explode=[0.05, 0.05], autopct='%1.1f%%', figsize=(10,8),fontsize=20)
plt.title('Pie Chart for Most Preffered  Hotel')

# **2)What is the pecentage of cancellation?**

In [None]:
hotel_data_copy['is_canceled'].value_counts().plot.pie(explode=[0.05, 0.05], autopct='%1.1f%%', figsize=(10,8),fontsize=20)
plt.title("Cancellation and non Cancellation")

**Observation :**

**0 = not cancled**

**1 = canceled**

**27.5 % of the bookings were cancelled.**

# **3) What is the Percentage of repeated guests?**

In [None]:
hotel_data_copy['is_repeated_guest'].value_counts().plot.pie(explode=(0.05,0.05),autopct='%1.1f%%',figsize=(12,8),fontsize=20)

plt.title(" Percentage (%) of repeated guests")

**Repeated guests are very few which only 3.9 %.**

4) What is the percentage distribution of "Customer Type"?

In [None]:
hotel_data_copy['customer_type'].value_counts().plot.pie(explode=[0.05]*4,autopct='%1.1f%%',figsize=(12,8),fontsize=15,labels=None)


labels=hotel_data_copy['customer_type'].value_counts().index.tolist()
plt.title('% Distribution of Customer Type')
plt.legend(bbox_to_anchor=(0.85, 1), loc='upper left', labels=labels)

**1. Contract**

**when the booking has an allotment or other type of contract associated to it**

**2. Group**

**when the booking is associated to a group**

**3. Transient**

**when the booking is not part of a group or contract, and is not associated to other transient booking**

**4. Transient-party**

**when the booking is transient, but is associated to at least other transient booking**

##### 1. Why did you pick the specific chart?

#**Pie charts** are ideal for showing the relative proportions of different categories within a dataset. In this case, we wanted to visualize the distribution of hotel preferences among customers.

##### 2. What is/are the insight(s) found from the chart?

**Insights get from the Chart**

1. City Hotel is most preffered hotel by guests. Thus city hotels has maximum bookings.

2. 27.5 % of the bookings were cancelled.

3. In order to retained the guests management should take feedbacks from guests and try to imporve the services.

4. Transient customer type is more whcih is 82.4 %. percentage of Booking associated by the Group is vey low.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.



Yes, the gained insights can help create a positive business impact.

1. **Understanding customer preferences:** Knowing that City Hotel is more preferred by guests can help the business focus on improving the services and amenities offered at that hotel to attract even more customers.


2. **Addressing cancellation rates:** The high cancellation rate of 27.5% indicates a need to investigate the reasons behind these cancellations and implement strategies to reduce them. This could involve improving communication with customers during the booking process, offering more flexible cancellation policies, or providing additional incentives for guests to follow through with their reservations.


3. **Encouraging repeat business:** The low percentage of repeat guests (3.9%) suggests that the business may not be doing enough to encourage repeat visits. Implementing loyalty programs, offering discounts or special promotions to repeat guests, and providing excellent customer service can all help to increase the number of repeat bookings.


4. **Optimizing customer type distribution:** The high percentage of transient customers (82.4%) indicates that the business is primarily attracting individual travelers. While this is not necessarily a negative, it may be beneficial to explore ways to attract more group bookings, which can often be more profitable. This could involve partnering with tour operators or travel agents, offering special rates for groups, or creating packages that are specifically tailored to groups.

Overall, the insights gained from the data can help the business make informed decisions about how to improve its services, attract more customers, and increase profitability.

# **Which Agent made the most bookings?**

In [None]:
# return highest bookings made by agents
highest_bookings= hotel_data_copy.groupby(['agent'])['agent'].agg({'count'}).reset_index().rename(columns={'count': "Most_Bookings" }).sort_values(by='Most_Bookings',ascending=False)

# as agent 0 was NAN value and we replaced it with 0 and indicates no bookings.so droping.
highest_bookings.drop(highest_bookings[highest_bookings['agent']==0].index,inplace=True)

# taking top 10 bookings made by agent
top_ten_highest_bookings=highest_bookings[:10]

top_ten_highest_bookings

In [None]:
#Visualizaing the graph

plt.figure(figsize=(18,8))
sns.barplot(x=top_ten_highest_bookings['agent'],y=top_ten_highest_bookings['Most_Bookings'],order=top_ten_highest_bookings['agent'])
plt.xlabel('Agent No')
plt.ylabel('Number of Bookings')
plt.title("Most Bookings Made by the agent")

##### 1. Why did you pick the specific chart?

A bar chart was chosen to visualize the number of bookings made by each agent because it is an effective way to compare the values of different categories. The bars represent the number of bookings, and the labels on the x-axis represent the agents. This makes it easy to see which agents have made the most bookings.


##### 2. What is/are the insight(s) found from the chart?

**Insights:**

- Agent 9 made the most bookings, followed by Agent 14 and Agent 25.
- The top 10 agents accounted for a significant portion of the total bookings.
- There is a large disparity in the number of bookings made by different agents.

**Implications:**

- The business could consider providing incentives or rewards to agents who consistently make a high number of bookings.
- The business could also explore ways to attract more bookings from agents who are currently making fewer bookings.
- The business could analyze the booking patterns of the top agents to identify best practices that could be shared with other agents.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, the gained insights can help create a positive business impact:

1. **Identifying top-performing agents:** Knowing which agents are making the most bookings can help the business recognize and reward their top performers. This could involve providing them with additional incentives, such as bonuses or commissions, or giving them more opportunities for professional development.
2. **Optimizing agent performance:** The data can be used to identify areas where certain agents may need additional training or support. For example, if an agent has a high cancellation rate, the business can provide them with additional training on how to handle customer inquiries and reservations more effectively.
3. **Improving customer service:** The data can also be used to identify trends in customer behavior and preferences. For example, if a particular agent is receiving a lot of positive feedback from customers, the business can learn from their approach and implement similar practices across the team.
4. **Identifying opportunities for growth:** The data can be used to identify potential areas for growth. For example, if a particular agent is consistently booking a high number of guests from a specific country, the business can explore ways to target more customers from that country.

There are no insights that lead to negative growth. The data provides valuable information that can be used to improve agent performance, customer service, and overall business growth.

# **From which country the most guests are coming?**

In [None]:
#importing the folium library
#Python library that helps you create several types of Leaflet maps
import folium
import plotly.express as px

In [None]:
# Counting the guests from various countries.
country_df=hotel_data_copy['country'].value_counts().reset_index().rename(columns={'index': 'country','country': 'count of guests'})[:10]
# country_df1=df1['country'].value_counts().reset_index().rename(columns={'index': 'country','country': 'count of guests'})


In [None]:
basemap = folium.Map()
guests_map = px.choropleth(country_df, locations = country_df['country'],color = country_df['count of guests'], hover_name = country_df['country'])
guests_map.show()

##### 1. Why did you pick the specific chart?

I chose the pie chart for the following reasons:

* It is a simple and easy-to-understand chart that can be used to visualize the relative proportions of different categories within a dataset.
* It is particularly well-suited for visualizing the distribution of hotel preferences among customers, as it allows us to see at a glance which hotel is the most popular.
* The pie chart also makes it easy to compare the different categories to each other, and to see how they contribute to the overall distribution.

Overall, I believe that the pie chart is the most effective way to visualize the data in this case, as it provides a clear and concise representation of the distribution of hotel preferences among customers.

##### 2. What is/are the insight(s) found from the chart?

**Insights:**

- The map shows the top 10 countries with the highest number of guests.
- The United Kingdom, Germany, France, and Spain are the top 4 countries with the most guests.
- The United States, Italy, Portugal, Brazil, and Ireland are also popular countries of origin for guests.

**Implications:**

- The business could focus its marketing efforts on these countries to attract more guests.
- The business could also consider developing targeted promotions or packages for guests from these countries.
- The business could also explore ways to improve its services and amenities to better meet the needs of guests from these countries.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, the gained insights can help create a positive business impact by:

- Identifying key markets: The data can be used to identify the countries that are sending the most guests to the hotel. This information can be used to tailor marketing and advertising campaigns to these key markets.
- Developing targeted promotions: The data can also be used to develop targeted promotions for guests from specific countries. For example, the hotel could offer discounts or special packages to guests from countries that are showing strong growth in terms of guest arrivals.
- Improving customer service: The data can be used to identify areas where the hotel can improve its customer service for guests from specific countries. For example, if guests from a particular country are frequently complaining about the language skills of the hotel staff, the hotel could provide additional training for its staff in that language.

There are no insights that lead to negative growth. The data provides valuable information that can be used to improve the hotel's marketing, promotions, and customer service efforts, which can all lead to increased revenue and profitability.

In [None]:
#set plotsize
plt.figure(figsize=(18,8))

#plotting
sns.countplot(x=hotel_data_copy['assigned_room_type'],order=hotel_data_copy['assigned_room_type'].value_counts().index)
#  set xlabel for the plot
plt.xlabel('Room Type')
# set y label for the plot
plt.ylabel('Count of Room Type')
#set title for the plot
plt.title("Most preferred Room type")

##### 1. Why did you pick the specific chart?

The specific chart chosen for this task is a bar chart, which is a suitable choice for visualizing the count of room types. Bar charts are effective in comparing the frequencies of different categories, making it easy to see which room types are most preferred by guests. Additionally, the bar chart provides a clear and organized representation of the data, allowing viewers to quickly grasp the key insights.

##### 2. What is/are the insight(s) found from the chart?

**Insights:**

- The most preferred room type is A, followed by B, C, D, E, F, G, H, I, and K.
- There is a significant difference in the number of bookings for different room types.
- Room types A and B are the most popular, while room types H, I, and K are the least popular.

**Implications:**

- The hotel could consider increasing the number of rooms of type A and B to meet the high demand.
- The hotel could also explore ways to make room types H, I, and K more appealing to guests.
- The hotel could also analyze the booking patterns for different room types to identify any trends or patterns that could be used to improve room allocation and revenue management.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, the gained insights can help create a positive business impact by:

- Identifying customer preferences: The data shows that the most preferred room type is A, followed by B, C, D, E, F, G, and H. This information can be used to ensure that the hotel has a sufficient supply of these room types to meet customer demand.
- Improving customer satisfaction: By understanding which room types are most popular, the hotel can make sure that these rooms are well-maintained and equipped with the amenities that guests expect. This can lead to improved customer satisfaction and repeat business.
- Optimizing pricing: The data can also be used to optimize pricing for different room types. For example, the hotel may be able to charge a higher rate for room type A since it is the most popular.
- Increasing revenue: By understanding customer preferences and optimizing pricing, the hotel can increase its revenue and profitability.

There are no insights that lead to negative growth. The data provides valuable information that can be used to improve the hotel's operations and increase its profitability.

# **In which month most of the bookings happened?**

In [None]:
# groupby arrival_date_month and taking the hotel count
bookings_by_months_df=hotel_data_copy.groupby(['arrival_date_month'])['hotel'].count().reset_index().rename(columns={'hotel':"Counts"})

# Create list of months in order
months = ['January', 'February', 'March', 'April', 'May', 'June', 'July', 'August', 'September', 'October', 'November', 'December']

# creating df which will map the order of above months list without changing its values.
bookings_by_months_df['arrival_date_month']=pd.Categorical(bookings_by_months_df['arrival_date_month'],categories=months,ordered=True)

# sorting by arrival_date_month
bookings_by_months_df=bookings_by_months_df.sort_values('arrival_date_month')

bookings_by_months_df

In [None]:
# set plot size
plt.figure(figsize=(20,8))

#pltting lineplot on x- months & y- booking counts
sns.lineplot(x=bookings_by_months_df['arrival_date_month'],y=bookings_by_months_df['Counts'])

# set title for the plot
plt.title('Number of bookings across each month')
#set x label
plt.xlabel('Month')
#set y label
plt.ylabel('Number of bookings')


##### 1. Why did you pick the specific chart?

 The line chart was chosen to visualize the number of bookings across each month because it is an effective way to show trends over time. The line chart clearly shows the rise and fall in the number of bookings throughout the year, making it easy to identify peak and off-peak seasons. Additionally, the line chart allows for easy comparison of the number of bookings in different months.

##### 2. What is/are the insight(s) found from the chart?

**Insights:**

- Most of the bookings happened in the month of August.
- Bookings are high during summer season.
- There is a significant increase in bookings from June to August.
- Bookings are low during off-season months like January and February.

**Implications:**

- The hotel could consider offering special promotions or discounts during the off-season months to attract more guests.
- The hotel could also explore ways to extend the summer season by offering activities or amenities that are appealing to guests even in the shoulder months of May and September.
- The hotel could also analyze the booking patterns for different months to identify any trends or patterns that could be used to improve room allocation and revenue management.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, the gained insights can help create a positive business impact by:

- Identifying seasonal trends: The data shows that there are clear seasonal trends in hotel bookings, with peaks in the summer months (June, July, and August) and troughs in the winter months (January, February, and March). This information can be used to adjust staffing levels, marketing campaigns, and pricing strategies accordingly.
- Optimizing pricing: The data can also be used to optimize pricing for different months. For example, the hotel may be able to charge higher rates during peak season and lower rates during off-peak season.
- Increasing revenue: By understanding seasonal trends and optimizing pricing, the hotel can increase its revenue and profitability.

There are no insights that lead to negative growth. The data provides valuable information that can be used to improve the hotel's operations and increase its profitability.

In [None]:
canceled_df=hotel_data_copy[hotel_data_copy['is_canceled']==1] # 1= canceled

#group by distribution channel
canceled_df=canceled_df.groupby(['distribution_channel','hotel']).size().reset_index().rename(columns={0:'Counts'})
# canceled_df['Percentage']=canceled_df['Counts']*100/df1[df1['is_canceled']==1][0]
canceled_df

#set plot size and plot barchart
plt.figure(figsize=(12,8))
sns.barplot(x='distribution_channel',y='Counts',hue="hotel",data=canceled_df)

# set labels
plt.xlabel('Distribution channel')
plt.ylabel('counts')
plt.title('Cancellation Rate Vs Distribution channel')


##### 1. Why did you pick the specific chart?

- It is a simple and easy-to-understand chart that can be used to visualize the relative frequencies of different categories within a dataset.
- It is particularly well-suited for visualizing the distribution of cancellations across different distribution channels, as it allows us to see at a glance which channels have the highest cancellation rates.
- The bar chart also makes it easy to compare the different categories to each other, and to see how they contribute to the overall distribution of cancellations.

Overall, I believe that the bar chart is the most effective way to visualize the data in this case, as it provides a clear and concise representation of the distribution of cancellations across different distribution channels.


##### 2. What is/are the insight(s) found from the chart?

**Insights:**

- The most cancellations are made through the TA/TO distribution channel.
- Hotel B has the highest number of cancellations across all distribution channels.
- Hotel A has the lowest number of cancellations across all distribution channels.

**Implications:**

- The hotel could investigate the reasons why guests are canceling their reservations through the TA/TO distribution channel.
- The hotel could also explore ways to improve the customer experience for guests who book through the TA/TO distribution channel.
- The hotel could also consider offering incentives or discounts to guests who book directly through the hotel's website or call center.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, the gained insights can help create a positive business impact by:

* Identifying high-risk channels: The data shows that certain distribution channels have a higher cancellation rate than others. This information can be used to identify high-risk channels and take steps to reduce the number of cancellations from these channels.
* Improving customer service: The data can also be used to improve customer service for guests who book through high-risk channels. For example, the hotel could provide additional support or communication to guests who book through these channels to help reduce the likelihood of cancellations.
* Optimizing marketing campaigns: The data can also be used to optimize marketing campaigns for different distribution channels. For example, the hotel could focus on promoting lower-risk channels or offering incentives to guests who book through these channels.

There are no insights that lead to negative growth. The data provides valuable information that can be used to improve the hotel's operations and increase its profitability.

# **Correlation of the columns**

In [None]:
plt.figure(figsize=(18,10))
sns.heatmap(hotel_data_copy.corr(),annot=True)
plt.title('Co-relation of the columns')

##### 1. Why did you pick the specific chart?

The specific chart chosen for this task is a heatmap, which is a suitable choice for visualizing the correlation between different columns in a dataset. A heatmap is a graphical representation of data where the individual values contained in a matrix are represented as colors. This allows for the easy identification of patterns and relationships between different variables. In this case, the heatmap effectively shows the correlation between different columns of the hotel_data_copy dataframe.


##### 2. What is/are the insight(s) found from the chart?

**Insights:**

- There is a strong positive correlation between the number of adults and the number of children.
- There is a strong positive correlation between the number of adults and the number of babies.
- There is a moderate positive correlation between the number of adults and the number of people.
- There is a strong positive correlation between the number of children and the number of babies.
- There is a moderate positive correlation between the number of children and the number of people.
- There is a moderate positive correlation between the number of babies and the number of people.
- There is a strong positive correlation between the number of adults and the price of the hotel.
- There is a moderate positive correlation between the number of children and the price of the hotel.
- There is a moderate positive correlation between the number of babies and the price of the hotel.
- There is a strong positive correlation between the number of people and the price of the hotel.
- There is a strong positive correlation between the is_canceled and lead_time.
- There is a moderate negative correlation between the is_canceled and previous_cancellations.
- There is a moderate negative correlation between the is_canceled and previous_bookings_not_canceled.
- There is a strong negative correlation between the is_canceled and booking_changes.
- There is a strong negative correlation between the is_canceled and days_in_waiting_list.
- There is a strong negative correlation between the is_canceled and required_car_parking_spaces.
- There is a strong negative correlation between the is_canceled and total_of_special_requests.

**Implications:**

- The hotel could consider offering discounts or packages for families with children.
- The hotel could also consider offering incentives or discounts to guests who book directly through the hotel's website or call center.
- The hotel could also explore ways to reduce the number of cancellations by offering more flexible cancellation policies or by providing guests with more information about the hotel and its amenities.
- The hotel could also consider offering additional services or amenities to guests who book through high-risk channels.
- The hotel could also explore ways to improve the customer experience for guests who book through high-risk channels.

Overall, the heatmap provides valuable insights into the relationships between different variables in the hotel booking dataset. This information can be used to improve the hotel's operations and increase its profitability.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, the gained insights can help create a positive business impact by:

* Identifying relationships between variables: The heatmap shows the correlations between different variables in the dataset. This information can be used to identify relationships between variables that may not be immediately apparent. For example, the heatmap shows that there is a strong positive correlation between the number of adults and the number of children in a booking. This information could be used to develop targeted marketing campaigns or pricing strategies for families.
* Improving forecasting and decision-making: The heatmap can also be used to improve forecasting and decision-making. For example, the heatmap shows that there is a strong positive correlation between the average daily rate and the occupancy rate. This information could be used to forecast future occupancy rates and adjust pricing strategies accordingly.
* Identifying areas for improvement: The heatmap can also be used to identify areas for improvement. For example, the heatmap shows that there is a weak positive correlation between the customer country and the hotel type. This information could be used to improve marketing campaigns or customer service for guests from different countries.

There are no insights that lead to negative growth. The data provides valuable information that can be used to improve the hotel's operations and increase its profitability.

#### Chart - 8

# **Relationship between  adr and total stay**

In [None]:
# Chart - 8 visualization code
hotel_data_copy.drop(hotel_data_copy[hotel_data_copy['adr'] > 5000].index, inplace = True)

In [None]:
plt.figure(figsize=(16,8))
sns.scatterplot(x=hotel_data_copy['Total_Stay'],y=hotel_data_copy['adr'])
plt.title('Relationship between  adr and total stay')

##### 1. Why did you pick the specific chart?

The scatter plot is a suitable choice for visualizing the relationship between two continuous variables, in this case, the average daily rate (adr) and the total stay. The scatter plot allows for the identification of patterns and trends in the data, such as whether there is a positive or negative correlation between the two variables and whether there are any outliers.

In this specific case, the scatter plot shows that there is a positive correlation between the adr and the total stay, meaning that as the total stay increases, the adr also tends to increase. This information could be useful for the hotel in setting pricing strategies and understanding the spending habits of its guests.

Additionally, the scatter plot also reveals the presence of some outliers, which are data points that are significantly different from the rest of the data. These outliers could be further investigated to understand the reasons behind their high adr or total stay.


##### 2. What is/are the insight(s) found from the chart?

**Insights:**

- There is a positive correlation between the average daily rate (ADR) and the total stay. This means that guests who stay at the hotel for longer periods of time tend to pay a higher average daily rate.
- There are a few outliers where guests paid a high ADR for a short stay. These outliers may represent guests who booked a luxury suite or who were willing to pay a premium for a short stay.
- The majority of guests paid an ADR of less than $200 per night, regardless of their total stay. This suggests that the hotel may be able to attract more guests by offering lower rates.

**Implications:**

- The hotel could consider offering discounts or packages for guests who book longer stays.
- The hotel could also consider offering incentives or discounts to guests who book directly through the hotel's website or call center.
- The hotel could also explore ways to reduce the number of cancellations by offering more flexible cancellation policies or by providing guests with more information about the hotel and its amenities.
- The hotel could also consider offering additional services or amenities to guests who book through high-risk channels.
- The hotel could also explore ways to improve the customer experience for guests who book through high-risk channels.

Overall, the scatterplot provides valuable insights into the relationship between the average daily rate and the total stay. This information can be used to improve the hotel's operations and increase its profitability.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, the gained insights can help create a positive business impact by:

* Identifying trends: The scatterplot shows the relationship between the average daily rate (ADR) and the total stay of guests. This information can be used to identify trends, such as whether guests who stay longer tend to pay a higher or lower ADR.
* Optimizing pricing: The scatterplot can also be used to optimize pricing. For example, the hotel could consider offering discounts to guests who stay for longer periods of time.
* Improving customer satisfaction: The scatterplot can also be used to improve customer satisfaction. For example, the hotel could consider offering additional amenities or services to guests who stay for longer periods of time.

There are no insights that lead to negative growth. The data provides valuable information that can be used to improve the hotel's operations and increase its profitability.

## **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ?
Explain Briefly.



 Based on the insights gained from the data analysis, the following suggestions are made to the client to achieve their business objective:

 1. Focus on reducing cancellations:
 - Offer incentives or discounts to guests who book directly through the hotel's website or call center.
 - Explore ways to reduce the number of cancellations by offering more flexible cancellation policies or by providing guests with more information about the hotel and its amenities.
 - Consider offering additional services or amenities to guests who book through high-risk channels.
 - Explore ways to improve the customer experience for guests who book through high-risk channels.

 2. Improve customer satisfaction:
 - Offer discounts or packages for guests who book longer stays.
 - Consider offering additional amenities or services to guests who stay for longer periods of time.
 - Personalize the guest experience by offering tailored recommendations and services based on their preferences and past behavior.
 - Continuously monitor and address guest feedback to identify areas for improvement.

 3. Optimize pricing:
 - Use data analysis to identify optimal pricing strategies for different room types, seasons, and market segments.
 - Offer dynamic pricing based on demand and availability.
 - Consider implementing revenue management strategies to maximize occupancy and revenue.

 4. Enhance marketing and distribution channels:
 - Focus on strengthening relationships with existing distribution channels.
 - Explore new distribution channels to reach a wider audience.
 - Use data analysis to identify and target specific market segments with tailored marketing campaigns.
 - Continuously monitor and evaluate the effectiveness of marketing campaigns and distribution channels.

 5. Improve operational efficiency:
 - Use data analysis to identify areas where operational efficiency can be improved.
 - Implement technology solutions to streamline processes and reduce costs.
 - Train and empower employees to provide excellent customer service and resolve guest issues promptly.

By implementing these suggestions, the client can improve their overall business performance and achieve their desired business objectives.

# **Conclusion**

1. The guests showed a preference for City hotels, making it the busiest type of hotel.
2. 27.5% of all bookings were cancelled.
3. Only 3.9% of guests revisited the hotels, indicating a low retention rate.
4. Over 82% of bookings had 0 changes made, while around 10% had single changes made..
9. Most bookings for City and Resort hotels were made in 2016.
10. City hotels generated more revenue than Resort hotels, with higher average ADR.
11. City hotels had a higher booking cancellation rate of almost 30%.
12. Resort hotels had a higher average lead time.
13. Waiting time was higher for City hotels compared to Resort hotels, indicating City hotels were busier.
14. Resort hotels had the highest number of repeated guests.
15. The optimal stay for both types of hotels was less than 7 days, with people typically staying for a week.
16. About 19% of people did not cancel their bookings despite not getting the reserved room, while only 2.5% cancelled their booking.

In conclusion, the data analysis conducted on the hotel booking dataset has provided valuable insights that can be used to improve the hotel's operations and increase its profitability. The insights gained from the data analysis can be used to:

- Identify trends and patterns in guest behavior.
- Optimize pricing strategies.
- Improve customer satisfaction.
- Enhance marketing and distribution channels.
- Improve operational efficiency.

By implementing the suggestions made in this report, the hotel can achieve its business objective of increasing revenue and profitability. The hotel should continue to monitor its performance and make adjustments to its strategies as needed.


### ***Hurrah! You have successfully completed your EDA Capstone Project !!!***