<a href="https://colab.research.google.com/github/YashPareek1/Hotel-Management-EDA/blob/master/EDA_Hotel_Booking_Analysis_Project.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Project Name**    -    **Hotel Booking Analysis**



##### **Project Type**    - EDA
##### **Contribution**    - Individual
##### **Team member name** - Yash Pareek

# **Project Summary -**

The Hotel Booking Analysis project aims to leverage data analytics to optimize
revenue generation and enhance customer satisfaction within the hospitality
industry. This comprehensive analysis focuses on understanding booking patterns, customer preferences, and market trends to make informed decisions and improve overall business performance for hotels.

Key Objectives:
1. Revenue Optimization:
   - Identify peak booking seasons and pricing strategies to maximize revenue.
   - Analyze historical data to predict future demand and occupancy rates.
   - Optimize room pricing based on demand fluctuations and competitor rates.

2. Customer Segmentation:
   - Segment customers based on demographics, booking behaviors, and preferences.
   - Create personalized marketing campaigns and offers to target specific customer segments.
   - Enhance guest experience by tailoring services to different customer profiles.

3. Forecasting and Demand Management:
   - Develop demand forecasting models to ensure optimal resource allocation.
   - Implement overbooking strategies to minimize revenue loss due to cancellations and no-shows.
   - Fine-tune inventory management to meet varying demand levels.

4. Customer Satisfaction:
   - Analyze guest reviews and feedback to identify areas for improvement.
   - Implement changes in services, amenities, and customer interactions based on feedback.
   - Monitor guest satisfaction scores and track improvements over time.

5. Competitor Analysis:
   - Evaluate the performance of competitors in the local market.
   - Benchmark against industry standards and identify areas where the hotel can gain a competitive edge.
   - Adjust pricing and marketing strategies in response to competitor actions.

6. Marketing and Promotion:
   - Utilize data-driven insights to target marketing efforts more effectively.
   - Analyze the performance of marketing campaigns and their impact on bookings.
   - Allocate marketing budgets to channels that yield the highest ROI.

7. Technology Integration:
   - Implement data analytics tools and systems to automate data collection and analysis.
   - Explore the use of AI and machine learning for predictive analytics.
   - Ensure data security and compliance with privacy regulations.

Expected Outcomes:
1. Increased revenue through optimized pricing and demand management.
2. Improved customer satisfaction and loyalty through personalized experiences.
3. Enhanced competitiveness in the market.
4. Data-driven decision-making for better resource allocation.
5. Efficient marketing campaigns leading to a higher return on investment.
6. Adaptation to changing market conditions and customer expectations.

The Hotel Booking Analysis project will empower hotels to make data-driven decisions, stay ahead of the competition, and provide exceptional experiences to their guests, ultimately leading to improved profitability and sustainable growth.

# **GitHub Link -**

Provide your GitHub Link here :- https://github.com/YashPareek1/Hotel-Management-EDA.git

# **Problem Statement**
The hotel industry faces the challenge of optimizing room bookings, revenue generation, and customer satisfaction in an increasingly competitive market. To address these challenges, a comprehensive hotel booking analysis is required to gain insights into booking patterns, customer preferences, and market dynamics. The problem statement can be broken down into several key aspects:

Demand Forecasting: Hotels need to accurately predict future demand for their rooms to optimize pricing, occupancy rates, and resource allocation. The problem is to develop robust demand forecasting models that consider historical booking data, seasonality, special events, and market trends.

Customer Segmentation: Not all guests have the same preferences and booking behaviors. The problem is to segment customers based on demographics, booking habits, and preferences to tailor marketing efforts and services to specific customer groups.

Inventory Management: Overbooking and underbooking can lead to revenue loss and customer dissatisfaction. The problem is to implement effective inventory management strategies to minimize the impact of cancellations and no-shows while maintaining high occupancy rates.

Customer Satisfaction: Guest satisfaction is essential for repeat business and positive reviews. The problem is to analyze guest feedback, reviews, and complaints to identify areas for improvement and implement changes in services and amenities accordingly.

Competitor Analysis: Understanding the competitive landscape is crucial for staying ahead in the market. The problem is to assess the performance of competitors, benchmark against industry standards, and adapt strategies in response to competitor actions.

Marketing and Promotion: Effective marketing campaigns are essential for attracting and retaining customers. The problem is to analyze the performance of marketing efforts, allocate budgets efficiently, and target the right audience to maximize return on investment.

Overall, the goal of this hotel booking analysis is to empower hotels with actionable insights, enabling them to optimize revenue, enhance customer satisfaction, and maintain a competitive edge in a dynamic and rapidly changing industry.


**Write Problem Statement Here.**

#### **Define Your Business Objective?**

Answer Here.

# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required.
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits.
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 20 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule.

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Import Libraries
import pandas as pd
import numpy as np

# importing visualization libraries
import matplotlib.pyplot as plt
import seaborn as sns
from numpy import math

### Dataset Loading

In [None]:
# Load Dataset
from google.colab import drive
drive.mount('/content/drive')

url = ('/content/drive/MyDrive/Project/Hotel Booking Analysis/Hotel Bookings.csv')

data = pd.read_csv(url)

### Dataset First View

In [None]:
# Dataset First Look
data.head(50)

### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count
rows_num , col_num = data.shape
print('the number of rows',rows_num)
print('the number of columns',col_num)

### Dataset Information

In [None]:
# Dataset Info
data.info()

#### Duplicate Values

In [None]:
# Identify duplicate rows, marking all duplicates as True
duplicates = data.duplicated(keep=False)
num_duplicates = duplicates.sum()

print(f"Number of duplicate rows: {num_duplicates}")
data[data.duplicated()].shape

In [None]:
data.drop_duplicates(keep=False, inplace=True)
data[data.duplicated()].shape

#### Missing Values/Null Values

In [None]:
# Missing Values/Null Values Count
# Identify missing values (True for missing, False for non-missing)
missing_data = data.isnull().sum().sort_values()

# Count the total number of missing values in the entire dataset
total_missing = missing_data.sum()

# Display the missing value counts
print("Missing Value Counts:")
print(missing_data)

# Display the total number of missing values
print(f"Total Missing Values: {total_missing}")



We can observed that there are 4 null values in the children columns, It may be higher possibility the customer with 0 children missed with filling, so we can fill this column by 0

In [None]:
data['children'] = data['children'].fillna(0)
missing_data = data.isnull().sum().sort_values()
print(missing_data)

In [None]:
# we are having 442 missing values in country column, so we can replace this as 'others' for the categorization.
data['country'] = data['country'].fillna('Other')
missing_data = data.isnull().sum().sort_values()
print(missing_data)

As we observe that Agent and Company are the booking medium for any hotel so may be Possible that customer have book directly whitout using any medium we can replace this column by 0

In [None]:
data['agent'] = data['agent'].fillna(0)
data['company'] = data['company'].fillna(0)

missing_data = data.isnull().sum().sort_values()
print(missing_data)

In [None]:
data[data['babies']+data['children'] + data['adults'] == 0].shape

We can observe that some of data is 0 for rows babies,  children, adult which represent no of cutomer in booking are 0 so we can drop those details.

In [None]:
data.drop(data[data['babies']+data['children']+data['adults']==0].index,inplace = True)

In [None]:
# Visualizing the missing values
missing_data = data.isnull().sum().sort_values().to_frame()
plt.figure(figsize=(10, 6))
sns.heatmap(missing_data, cmap='viridis', cbar=False)
plt.title('Missing Data Heatmap')
plt.show()

### What did you know about your dataset?

In [None]:
data.shape

In [None]:
data.info()

A hotel booking dataset typically contains information related to hotel reservations made by guests. Analyzing this dataset can provide valuable insights into various aspects of hotel operations, customer preferences, and market trends.
This dataset contains information about bookings and revenue for resort and hotel properties from July 2015 to August 2017. The dataset consists of 79,069 rows and 32 columns, providing comprehensive details for analysis.

## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns
data.columns

In [None]:
# Dataset Describe
data.describe()

### Variables Description

hotel : Type of the hotel Categorical

is_cancelled : whether booking is cancelled (cancelled = 1, not cancelled = 0 ) numerical

lead_time : The number of days elapsed between the booking and the arrival date of the guests numerical

arrival_date_year:year of the arrival numerical

arrival_date_month : month of the arrival numerical

arrival_date_week_number : week of the arrival numerical

arrival_date_day_of_the_month : day of the arrival numerical

stays_in_weekend_nights : number of weekend nights stayed numerical

stays_in_week_nights : number of week nights stayed numerical

adults : number of adults numerical

children : number of children numerical

babies : number of babies numerical

meal : type of the meal categorical

country : country of the guest country

market_segment : which segment the customer belongs to country

Disribution_channel : Through which means guest got booking categorical

is_repeated_guest : whether the guest is repeated(repeated = 1, not repeated = 0) categorical

previous_cancellation : is there any previous cancellations of the guest categorical

previous_booking : number of completed bookings of the guest numerical

reserved_room_type : type of the room guest booked categorical

assigned_room_type : room assigned to the guest for the booking categorical

booking_changes : number of changes made in the booking numerical

deposit_type : type of deposit the guest made categorical

agent : ID of the agentcategorical

company : ID of the company categorical

days_in_waiting_list : number of days to wait numerical

customer_type : type of the customer categorical

adr : average daily rate(ADR) numerical

required_car_parking : number of car parking spaces required to the guest numerical

total_of_special_requests : special requests made by the guests numerical

reservation_status : status of the reservation categorical

reservation_status_date :date of reservation date

### Check Unique Values for each variable.

In [None]:
# Get the number of unique values in each column
unique_counts = data.nunique()

# Print the number of unique values for each column
print("Number of unique values in each column:")
print(unique_counts)


## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:
# Write your code to make your dataset analysis ready.
df = data.copy()

In [None]:
df['hotel'].value_counts()

In [None]:
df['arrival_date_year'].value_counts()

In [None]:
df['country'].value_counts()

In [None]:
# creating a new column as total_nights by adding stays_in_weekend_nights and stays_in_week_nights

df['total_nights'] = df['stays_in_weekend_nights'] + df['stays_in_week_nights']

In [None]:
# creating a new column as total_nights by adding stays_in_weekend_nights and stays_in_week_nights

df['child'] = df['children'] + df['babies']

In [None]:
df['total_count'] = df['adults'] + df['children'] + df['babies']

In [None]:
df['distribution_channel'].value_counts()

From the above observation we found Undefined data for distribution_channel,
we can handle the undefined values in the distribution_channel column by replacing them with "TA/TO" since its proportion is significantly higher compared to other categories.
it is more likely to replace the undefined values in the distribution_channel column with "TA/TO" to ensure consistency and improve the analysis.

In [None]:
df['distribution_channel'] = df['distribution_channel'].replace(to_replace = 'Undefined', value = 'TA/TO')

In [None]:
df['distribution_channel'].value_counts()

### What all manipulations have you done and insights you found?

In the previous steps, we performed data wrangling by adding new columns and modifying existing columns to enhance our analysis.

we introduced new columns to capture the total number of people involved in each booking. By aggregating the 'children', 'babies', and 'adults' columns into a new column called 'total_count', we can analyze the bookings based on the total number of individuals.

Another new column we added is 'total_nights', which represents the total number of nights for each booking. This allows us to examine the bookings from the perspective of the duration of stay.

Furthermore, we recognized the importance of visual representations in our analysis. By plotting graphs and visualizations, we can gain deeper insights and effectively communicate our findings to stakeholders.

## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

#### Chart - 1 Hotel Booking By Region

In [None]:
# Chart - 1 visualization code
df1 = df['hotel'].value_counts()
x = df1.index
y = df1.values

plt.figure(figsize= (10,8))
plots = sns.barplot(x=x, y=y/sum(y)*100)

for bar in plots.patches:
  plots.annotate(f'{format(bar.get_height(),".1f")}%',
                 (bar.get_x() + bar.get_width()/2,bar.get_height()),
                 size = 12, xytext = (0,8), ha ='center',va= 'center',
                 textcoords = 'offset points')

plt.title('Bookings basis on hotel type')
plt.xlabel('hotel Type')
plt.ylabel('bookings Percentage')
plt.show()

city_hotel_bookings = df1[0]
resort_hotel_bookings = df1[1]
total_bookings = city_hotel_bookings + resort_hotel_bookings
print('the city hotel bookings are',city_hotel_bookings)
print('the resort hotel bookings are',resort_hotel_bookings)
print('the total bookings are',total_bookings)

##### 1. Why did you pick the specific chart?

Answer:- Bar charts are effective for comparing categorical data or showing the distribution of values across different categories. They're especially useful for displaying discrete data, making it easy to interpret and compare values visually. To show the percentage of bookings for city hotel and resort hotel we have used bar chart.

##### 2. What is/are the insight(s) found from the chart?

Answer From the observation, it is found that the dataset contains two types of hotels: city hotels and resort hotels. The total number of bookings in the dataset is 79069, out of which 47437 bookings are for city hotels and 31632 bookings are for resort hotels. This means that city hotels account for 60% of the bookings, while resort hotels account for 40%. The percentage of bookings for resort hotels is slightly lower compared to city hotels.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, the gained insights can help to create a positive business impact. By analyzing the booking percentages of both city hotels and resort hotels, we can identify key parameters that are important for increasing bookings. This information can be used to make improvements in areas that may lead to increased bookings and overall business growth.

There are no insights from the given analysis that indicate negative growth. The focus is on identifying opportunities for improvement.

#### Chart - 2  Hotel Booking ration of 'is_cancelled' and 'not cancelled'

In [None]:
# Chart - 2 visualization code
df2 = df['is_canceled'].value_counts()
x = df2.index.values
y = df2.values

plt.figure(figsize=(10,7))
plots = sns.barplot(x = x, y = y/sum(y)*100)
for bar in plots.patches:
  plots.annotate(f'{format(bar.get_height(),".1f")}%',
                    (bar.get_x() + bar.get_width()/2,
                    bar.get_height()), ha='center',va ='center',
                    size = 12,xytext = (0,8),
                    textcoords = 'offset points')

plt.xlabel('Booking Cancelled (Booking cancelled = 0, not cancelled = 1)')
plt.ylabel('Percentage of bookings')
plt.title('Booking info(Cancelled & Not Cancelled)')
plt.show()

no_of_bookings_not_cancelled = df2[1]
no_of_bookings_cancelled = df2[0]
total_bookings = no_of_bookings_not_cancelled + no_of_bookings_cancelled

print("bookings not cancelled",(no_of_bookings_not_cancelled))
print("bookings cancelled",(no_of_bookings_cancelled ))
print('Total bookings are', total_bookings)

##### 1. Why did you pick the specific chart?

Answer Bar charts are effective for comparing categorical data or showing the distribution of values across different categories. They're especially useful for displaying discrete data, making it easy to interpret and compare values visually. To show the percentage of bookings are cancelled and not cancelled we have used bar chart.

##### 2. What is/are the insight(s) found from the chart?

Answer: From the observation, it is found that the dataset contains two types of hotel booking: cancelled and not cancelled. The total number of bookings in the dataset is 79069, out of which 20762 bookings are cancelled and 58307 bookings are not cancelled. This means that cancelled hotels are 26.3% of the bookings, while bookings are not cancelled are 73.7%. The percentage of bookings for are cancelled is slightly lower compared to booking not cancelled.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer: Yes, the gained insights can help to create a positive business impact. By analyzing the booking percentages of both cancelled and non cancelled hotels, we can identify key parameters that are important for increasing bookings. This information can be used to make improvements in areas that may lead to increased bookings and overall business growth.

There are no insights from the given analysis that indicate negative growth. The focus is on identifying opportunities for improvement.

#### Chart - 3  Analysing Booking in Country

In [None]:
# Chart - 3 visualization code
df3 = df['country'].value_counts().head(10)
x = df3.index
y = df3.values
plt.figure(figsize = (7,5))
plots = sns.barplot(x =x, y= y/sum(y)* 100)

for bar in plots.patches:
  plots.annotate(f'{format(bar.get_height(),".1f")}%',
                 (bar.get_x() + bar.get_width()/2, bar.get_height()),
                 size = 10, xytext = (0,9), ha= 'center',va = 'center',
                 textcoords = 'offset points')

plt.title('Bookings of the guests from different countries')
plt.xlabel('Country')
plt.ylabel('percentage of the bookings')
plt.show()

for i,j in df3.items():
  print("The country",i,'having',j,'bookings')

##### 1. Why did you pick the specific chart?

Answer Bar charts are effective for comparing categorical data or showing the distribution of values across different categories. They're especially useful for displaying discrete data, making it easy to interpret and compare values visually. To show the percentage of bookings base on country we have used bar chart.

##### 2. What is/are the insight(s) found from the chart?

Answer From the observation, it is found that the dataset contains Country wise hotel booking.The total number of bookings in the dataset is 79069.

*   Out of which PRT contain **highest** no of booking 24173 Which is 37.5%
*   NLD CONTAIN **LOWEST** NO OF BOOKING 1771 Which is only 2.7% of total booking.

*   These Analysis help us understand the booking patterns and the significance of different countries in terms of generating bookings. It allows us to identify the countries with the highest potential and focus our marketing efforts accordingly.


##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Yes, the gained insights can help to create a positive business impact. By analyzing the booking percentages of countries, we can identify key parameters that are important for increasing bookings. This information can be used to make improvements in areas that may lead to increased bookings and overall business growth.

There are no insights from the given analysis that indicate negative growth. The focus is on identifying opportunities for improvement.

#### Chart - 4 Hotel Booking base on room type

In [None]:
# Chart - 4 visualization code
df4 = df['reserved_room_type'].value_counts()
x = df4.index
y = df4.values
plt.figure(figsize = (7,5))
plots = sns.barplot(x= x,y= y/sum(y)*100)

for bar in plots.patches:
  plots.annotate(f'{format(bar.get_height(),".1f")}%',
                 (bar.get_x() + bar.get_width()/2, bar.get_height()),
                 size = 10, xytext = (0,5),ha = 'center',va = 'center',
                 textcoords = 'offset points')

plt.title('Bookings for reserved room type')
plt.xlabel('Room type')
plt.ylabel('Percentage of bookings')
plt.show()

for i,j in df4.items():
  print('The reserved room type is',i,'and the bookings are',j)

##### 1. Why did you pick the specific chart?

Answer Bar charts are effective for comparing categorical data or showing the distribution of values across different categories. They're especially useful for displaying discrete data, making it easy to interpret and compare values visually. To show the percentage of bookings base on Room type we have used bar chart.

##### 2. What is/are the insight(s) found from the chart?

Answer From the observation, it is found that the dataset contains Country wise hotel booking.The total number of bookings in the dataset is 79069.

*   A room type contain highest no of booking 49860 Which is 63.1%

*   L room type contain Lowest NO OF booking 6 Which is only 0.0075% of total booking.

These Analysis help us understand the booking patterns and the significance of different countries in terms of generating bookings. It allows us to identify the countries with the highest potential and focus our marketing efforts accordingly.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Yes, the gained insights can help to create a positive business impact. By analyzing the booking percentages base on room type, we can identify key parameters that are important for increasing bookings. This information can be used to make improvements in areas that may lead to increased bookings and overall business growth.

There are no insights from the given analysis that indicate negative growth. The focus is on identifying opportunities for improvement.

#### Chart - 5 Hotel Booking Month Analysis

In [None]:
# Chart - 5 visualization code
df5 = df['arrival_date_month'].value_counts()
x = df5.index
y = df5.values
plt.figure(figsize = (10,5))
plots = sns.barplot(x =x,y = y/sum(y)* 100)

for bar in plots.patches:
  plots.annotate(f'{format(bar.get_height(),".1f")}%',
                 (bar.get_x()+ bar.get_width()/2, bar.get_height()),
                 size = 8, xytext = (0,5), ha= 'center',
                 va ='center',textcoords = 'offset points')

for i , j in df5.items():
  print('The arrival month of the guest is',i,'and its bookings are',j)

##### 1. Why did you pick the specific chart?

Answer Bar charts are effective for comparing categorical data or showing the distribution of values across different categories. They're especially useful for displaying discrete data, making it easy to interpret and compare values visually. To show the percentage of bookings base on Month. we have used bar chart.

##### 2. What is/are the insight(s) found from the chart?

Answer  From the observation, it is found that the dataset contains Country wise hotel booking.The total number of bookings in the dataset is 79069.


*   The month of August has the highest percentage of guest arrivals, accounting for 11.6% of the total bookings.
*   the month of January has the lowest percentage of guest arrivals, with only 5.0% of the total bookings.

These Analysis help us understand the booking patterns and the significance of different Months in terms of generating bookings. It allows us to identify the countries with the highest potential and focus our marketing efforts accordingly.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Yes, the gained insights can help to create a positive business impact. By analyzing the booking percentages base on Month , we can identify key parameters that are important for increasing bookings. This information can be used to make improvements in areas that may lead to increased bookings and overall business growth.

There are no insights from the given analysis that indicate negative growth. The focus is on identifying opportunities for improvement.

#### Chart - 6 Analysis Average daily  rate

In [None]:
# Chart - 6 visualization code
daily_rate = df.groupby('hotel')['adr'].mean().reset_index().rename(columns = {'adr' :'avg_adr'})
print(daily_rate)
plt.figure(figsize = (10,7))
sns.barplot(x = daily_rate['hotel'], y = daily_rate['avg_adr'])

##### 1. Why did you pick the specific chart?

Answer Bar charts are effective for comparing categorical data or showing the distribution of values across different categories. They're especially useful for displaying discrete data, making it easy to interpret and compare values visually. To show the percentage of Averagw booking rate. we have used bar chart.

##### 2. What is/are the insight(s) found from the chart?

Answer From the observation, it is found that the dataset contains Country wise hotel booking.The total number of bookings in the dataset is 79069.


the month of January has the lowest percentage of guest arrivals, with only 5.0% of the total bookings.

*   city hotel price is 105 and resort hotel price is 94
*   we found that as city hotel demand is more the price of city hotels are higher then resort.



##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Yes, the gained insights can help to create a positive business impact. By analyzing the booking percentages  on price of hotel base on hotel type , we can identify key parameters that are important for increasing bookings. This information can be used to make improvements in areas that may lead to increased bookings and overall business growth.

There are no insights from the given analysis that indicate negative growth. The focus is on identifying opportunities for improvement.

#### Chart - 7 Analysis Total no of person for single booking

In [None]:
# Chart - 7 visualization code
df7 = df['total_count'].value_counts()
x = df7.index
y = df7.values

plt.figure(figsize  =(10,5))
plots = sns.barplot(x= x, y =y/sum(y) * 100)
plt.title("Total number of persons for single booking")
plt.xlabel("Number of persons")
plt.ylabel("Percentage of bookings")
plt.show()

for i,j in df7.items():
  percentage = round(j/sum(df7.values)*100,2)
  print(f'The number of people for a booking is {i} and the number of bookings are {j} i.e {percentage}%')

##### 1. Why did you pick the specific chart?

Answer  Bar charts are effective for comparing categorical data or showing the distribution of values across different categories. They're especially useful for displaying discrete data, making it easy to interpret and compare values visually. To show the no of People on  base on Single booking. we have used bar chart.

##### 2. What is/are the insight(s) found from the chart?

Answer From the observation, it is found that the dataset contains Country wise hotel booking.The total number of bookings in the dataset is 79069.


*   The majority of bookings consist of 2 persons.
*   Single-person bookings also make up a considerable proportion
*   more then Two Person reservations representing a smaller percentage of the total bookings.




##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Yes, the gained insights can help to create a positive business impact. By analyzing the booking percentages base on number of people per boooking , we can identify key parameters that are important for increasing bookings. This information can be used to make improvements in areas that may lead to increased bookings and overall business growth.

There are no insights from the given analysis that indicate negative growth. The focus is on identifying opportunities for improvement.

#### Chart - 8 Analysis of repeated guest

In [None]:
# Chart - 8 visualization code
repeated_guest = df[df['is_repeated_guest']== 1]
temp_repeated_guest = pd.DataFrame(repeated_guest.groupby('hotel').size()).rename(columns = {0: 'total repeated guest'})

# calculating the total number of bookings for each type of the hotel
total_bookings = pd.DataFrame(df.groupby('hotel').size()).rename(columns = {0:'total bookings'})

# concatinating the two dataframes for plotting the graph
repeated_guest_to_hotel = pd.concat([temp_repeated_guest,total_bookings], axis = 1)

# calculating the percentage of the guests returned to each type of the hotel
repeated_guest_to_hotel['return %'] = (repeated_guest_to_hotel['total repeated guest']/repeated_guest_to_hotel['total bookings']) * 100

print(repeated_guest_to_hotel)

# plotting the graph for the above dataframe
plt.figure(figsize = (8,5))
sns.barplot(x = repeated_guest_to_hotel.index, y = repeated_guest_to_hotel['return %'])

##### 1. Why did you pick the specific chart?

Answer  Bar charts are effective for comparing categorical data or showing the distribution of values across different categories. They're especially useful for displaying discrete data, making it easy to interpret and compare values visually. To show the no of People on base on Repeated Guest. we have used bar chart.

##### 2. What is/are the insight(s) found from the chart?

Answer From the observation, it is found that the dataset contains Country wise hotel booking.The total number of bookings in the dataset is 79069.

The majority of repeated bookings are more in Resort Hotel.
Resort hotel booking is dauble then city hotel booking.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Yes, the gained insights can help to create a positive business impact. By analyzing the booking percentages base on number of repeated boooking , we can identify key parameters that are important for increasing bookings. This information can be used to make improvements in areas that may lead to increased bookings and overall business growth.

There are no insights from the given analysis that indicate negative growth. The focus is on identifying opportunities for improvement.

#### Chart - 9 Analysis Average Price of hotel per month

In [None]:
# Chart - 9 visualization code
df_city = df[(df['hotel'] == 'City Hotel') & (df['is_canceled'] == 0)]
df_resort = df[(df['hotel'] == 'Resort Hotel') & (df['is_canceled'] == 0)]

# calculating the mean of the df_city & df_resort variables and storing them in new varibles
city = df_city.groupby('arrival_date_month')['adr'].mean().reset_index()
resort = df_resort.groupby('arrival_date_month')['adr'].mean().reset_index()

# merging the both variables basis on the same column of arrival_date_month
hotel  = city.merge(resort,on='arrival_date_month')

# renaming the columns in the hotel variable
hotel.columns = ['month','price_for_city','price_for_resort']

# creating new variable of months
months = ['January','Febraury','March','April','May','June','July','August','September','October','November','December']

# assigning the month column to the new variable of months column and categorizing the column baisis of it
hotel['month'] = pd.Categorical(hotel['month'],categories = months,ordered = True)
hotel = hotel.sort_values('month').reset_index()
plt.figure(figsize = (10,8))
print(hotel)
# plotting the line chart for the comparison of the city and resort hotel type with adr monthly wise
sns.lineplot(data = hotel, x ='month', y ='price_for_city')
sns.lineplot(data = hotel, x ='month', y ='price_for_resort')
plt.ylabel('Price')
plt.legend(['Resort','city hotel'])
plt.show()

##### 1. Why did you pick the specific chart?

Answer  a line chart is a type of graph that displays information as a series of data points called 'markers' connected by straight line segments. This visualization is particularly useful for showing trends or changes over a continuous interval or time span.To show the Trend of Hotel price on base on month. we have used bar chart.

##### 2. What is/are the insight(s) found from the chart?

Answer From the observation, it is found that the dataset contains Country wise hotel booking.The total number of bookings in the dataset is 79069.

*   The majority of price of Resort hotel are rise july and there is highest pic in price is in month of August

*   the City Hotel price slightly increase in middle of the year.







##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Yes, the gained insights can help to create a positive business impact. By analyzing the booking percentages base on number of repeated boooking , we can identify key parameters that are important for increasing price. This information can be used to make improvements in areas that may lead to increased bookings and overall business growth.

There are no insights from the given analysis that indicate negative growth. The focus is on identifying opportunities for improvement.

#### Chart - 10 - Correlation Heatmap

In [None]:
# Correlation Heatmap visualization code
num_df = df[['lead_time','previous_cancellations','previous_bookings_not_canceled','booking_changes','days_in_waiting_list','is_repeated_guest','reserved_room_type','total_count']]

# using the correlation
corrmat = num_df.corr()
f, ax = plt.subplots(figsize = (10,10))
sns.heatmap(corrmat,annot = True,fmt = ".2f",annot_kws ={'size': 10},vmin = -1, vmax = 1, square = True)

##### 1. Why did you pick the specific chart?

Answer A correlation heat map is a graphical representation of the correlation matrix, where the correlations between variables in a dataset are displayed as a matrix of colors. It's an excellent tool for quickly identifying relationships and patterns in data.

Each cell in the heatmap represents the correlation between two variables. The colors and intensity of the cells typically represent the strength and direction of the correlation:

A high positive correlation (strong relationship) is often represented by a brighter or darker color, like dark red.
A high negative correlation (strong inverse relationship) might be represented by a different intense color, like dark blue.
Little to no correlation is often represented by lighter colors or neutral shades.

##### 2. What is/are the insight(s) found from the chart?

AnswerFrom the observation, it is found that the dataset contains Country wise hotel booking.The total number of bookings in the dataset is 79069.

*   Lead time and booking changes have a slight positive correlation. This suggests that as the lead time increases, there may be a slightly higher likelihood of changes being made to the booking.

*   Lead time is also slightly related to the number of days on the waiting list. This implies that longer lead times may result in a slightly higher chance of being on the waiting list for more days.

*   Overall, the correlations observed in the heatmap are relatively weak, with coefficients close to zero. This suggests that the variables in the dataset may not have strong linear relationships with each other.

#### Chart - 11 - Pair Plot

In [None]:
# Pair Plot visualization code
final_data = sns.pairplot(num_df, hue = 'previous_cancellations', palette = 'Set2')

##### 1. Why did you pick the specific chart?

Answer  the pairplot chart through various interactions and conversations about data visualization techniques. It's a common tool in exploratory data analysis, and its utility in understanding relationships between variables it allows you to quickly see how all of the variables in a dataset are related to one another.

##### 2. What is/are the insight(s) found from the chart?

Answer The pairplot analysis indicates a lack of pronounced linear connections among the dataset's variables. There are no conspicuous trends or robust correlations, signifying a state of relative independence among the variables. This suggests a scenario where the variables exhibit minimal influence or impact on one another.

## **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ?
Explain Briefly.

Answer - from my analysis I will suggest the client to achieve Business Objective are as below

Enhanced Marketing Strategies: Develop and implement targeted marketing campaigns to reach potential guests. This includes utilizing digital marketing, social media advertising, search engine optimization (SEO), and partnerships with travel agencies or online booking platforms to increase visibility.

Improved Customer Experience: Focus on providing exceptional service and experiences to guests. This can include personalized services, loyalty programs, easy booking processes, and addressing feedback to enhance overall satisfaction.

Optimized Pricing Strategies: Utilize dynamic pricing models to adjust room rates based on demand, seasonality, events, and competitor analysis. Offering competitive prices can attract more bookings.

Enhanced Online Presence and User Experience: A user-friendly and visually appealing website or app can significantly impact bookings. Ensure ease of navigation, quick loading times, mobile optimization, and clear booking processes.

Targeted Segmentation and Customer Profiling: Understand customer preferences through data analysis and segmentation. Tailor offerings and promotions to specific demographics or customer segments to increase engagement and bookings.

Strategic Partnerships and Collaborations: Collaborate with airlines, event organizers, travel influencers, and local businesses to create packages or promotions that attract travelers and drive bookings.

Investment in Technology and Innovation: Adopt new technologies such as AI-powered chatbots for customer service, virtual tours, or augmented reality experiences to differentiate the hotel and attract more guests.

Focus on Reviews and Reputation Management: Encourage positive reviews from satisfied guests and manage negative feedback promptly. A good online reputation can significantly influence potential guests' decisions.

Sustainability and Social Responsibility: Highlighting eco-friendly practices or involvement in social causes can attract socially conscious travelers who prioritize responsible businesses.

# **Conclusion**

Conclusion is base on EDA Project analysis

1.   The city hotel booking is slightly higher then resort hotel, resort hotel should have to more work on user experience and facility.
2.   Majority of bookings are comming for Room type A so we have to increase the count of type A rooms and increase the facilty for other type of room also
3.   There is positive relation between lead time and cancellations. The hotel should analyze the reasons behind longer lead times leading to cancellations and consider implementing measures to minimize cancellations, such as offering incentives for non-refundable bookings or providing flexible booking options.
4. The majority of booking is coming for 2 person and there is less count for 4 or 5 people so hotel management should work on attracting the family.






### ***Hurrah! You have successfully completed your EDA Capstone Project !!!***