<a href="https://colab.research.google.com/github/debobratopaul/CAPSTONE-PROJECT-HOTEL-BOOKING-ANALYSIS/blob/main/EDA_HOTEL_BOOKING.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Project Name**    -Hotel Booking Analysis



##### **Project Type**    - EDA
##### **Contribution**    - Individual


# **Project Summary -**

The hotel booking analysis project aimed to extract meaningful insights from a comprehensive dataset to optimize revenue generation and improve customer satisfaction in the hotel industry. By employing exploratory data analysis, visualizations, and statistical techniques, the project explored various aspects of hotel bookings, including seasonal patterns, cancellation behavior, market segments, pricing strategies, distribution channels, and customer preferences. The findings provided valuable insights for strategic decision-making to drive positive business outcomes.

One key observation from the analysis was the existence of clear seasonal patterns in hotel bookings, with peak periods occurring during the summer months and holiday seasons. This insight enables hotels to anticipate and prepare for high demand periods, ensuring optimal staffing, inventory management, and pricing strategies. By efficiently allocating resources, hotels can enhance operational efficiency and maximize revenue potential.


Market segmentation analysis played a crucial role in understanding the preferences and behaviors of different customer groups. Online Travel Agents (OTAs) and the Groups segment were identified as significant contributors to hotel bookings. By tailoring pricing strategies and marketing campaigns to these segments, hotels can effectively target their offerings, maximize customer acquisition, and increase revenue.

Pricing analysis across market segments and hotel types unveiled variations in the average daily rate (ADR). This information allows hotels to optimize their pricing strategies, identify competitive advantages, and capture the right market share. By understanding the market dynamics, hotels can align their pricing with customer expectations and market demand, resulting in improved revenue generation.

Distribution channel analysis provided insights into the preferred booking channels of guests. Online channels, especially OTAs and direct hotel websites, emerged as dominant distribution channels. This knowledge empowers hotels to optimize their channel management strategies, negotiate favorable partnerships with OTAs, and invest in their direct booking platforms to improve profitability.

Customer satisfaction and loyalty were explored through the analysis of booking changes and special requests. Understanding common booking modifications and customer preferences enables hotels to enhance their service offerings, improve the guest experience, and foster loyalty. By tailoring services to meet customer expectations, hotels can achieve higher guest satisfaction levels and drive repeat business.

The project also shed light on the geographical origin of guests, providing valuable market insights. Identifying key source markets enables hotels to customize marketing campaigns, target specific regions with high booking volumes, and optimize their promotional efforts. By effectively allocating marketing resources and focusing on high-potential markets, hotels can expand their customer base and increase market share.

The findings highlighted the importance of understanding seasonal patterns, managing cancellations, segmenting the market, implementing dynamic pricing strategies, optimizing distribution channels, and personalizing guest experiences. By leveraging these insights, hotels can make informed decisions, drive positive business impacts, and establish a competitive edge in the dynamic hotel industry. Ultimately, the project demonstrated the significance of data analysis and strategic decision-making in achieving long-term success in the hotel sector.

# **GitHub Link -**

https://github.com/debobratopaul

# **Problem Statement**


**BUSINESS PROBLEM OVERVIEW**

Have you ever wondered when the best time of year to book a hotel room is? Or the optimal length of stay in order to get the best daily rate? This hotel booking dataset can help you explore those questions!
This data set contains booking information for a city hotel and a resort hotel, and includes information such as when the booking was made, length of stay, the number of adults, children, and/or babies, and the number of available parking spaces, among other things. All personally identifying information has been removed from the data.


#### **Define Your Business Objective?**

Identify important factors that influence bookings in order to optimize revenue and customer satisfaction.To forecast hotel booking demand for different periods of the year, allowing the hotel to adjust pricing, inventory, and staffing accordingly.

# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required.
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits.
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 20 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule.

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [1]:
# Import Libraries
import numpy as np
import pandas as pd
import matplotlib
import matplotlib.pyplot as plt
import seaborn as sns
import folium
import plotly.express as px
%matplotlib inline

### Dataset Loading

In [None]:
# Load Dataset

In [2]:
from google.colab import drive
drive.mount('/content/drive')
hotel_df=pd.read_csv("/content/drive/MyDrive/Hotel Bookings.csv")


Mounted at /content/drive


### Dataset First View

In [None]:
# Dataset First Look
hotel_df.head()

### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count
hotel_df.shape


We see that there are 119390 rows and 32 columns in the dataset

### Dataset Information

In [None]:
# Dataset Info
hotel_df.info()

#### Missing Values/Null Values

In [None]:
# Visualizing the missing values
missing_percentages = hotel_df.isnull().mean() * 100

# Sorting columns by the percentage of missing values in descending order
missing_percentages = missing_percentages.sort_values(ascending=False)

# Bar plot to visualize missing values
plt.figure(figsize=(10, 6))
sns.barplot(x=missing_percentages.index, y=missing_percentages)
plt.xticks(rotation=90)
plt.ylabel('Percentage of Missing Values')
plt.xlabel('Columns')
plt.title('Missing Values in Hotel Booking Dataset')
plt.show()

### What did you know about your dataset?
This data set contains booking information for a city hotel and a resort hotel,We see that there are 119390 rows and 32 columns in the dataset.there are 31994 duplicates values are there in the data set and the columns like children company,country and agents have the null value.company column has almost 90 percent of null values and so it will be better to drop the column for further analysis.The main goal is to  Analyze historical booking data to determine the optimal pricing strategy for different room types, seasons, and lengths of stay. Identify pricing patterns and opportunities to maximize revenue and occupancy rates.

## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns
hotel_df.columns

In [None]:
# Dataset Describe
hotel_df.describe()


### Variables Description

**Hotel**:Types of hotel(Resort Hotel,City Hotel)

**is_cancelled**:(True=1, False=0)

**lead_time**:Number of days that elapsed between the entering date of the booking into the PMS and the arrival date

**arrival_date_year**:Year of arrival

**arrival_date_month**:Month of arrival

**arrival_date_week_number**:week number of the arrival date

**arrival_date_day**:Day of arrival date

**stays_in_weekend_nights**:Number of weekend nights(saturday or sunday) the guest stayed or booked to stay at the hotel

**stays_in_week_nights**::Number of week nights(monday to friday) the guest
                        stayed or booked to stay at the hotel

**adults**:Number of adults

**children**:Number of children

**babies**:Number of babies

**meal**:Kind of meal opted for

**country**:Country code

**market_segment**:Which segment the customer belongs to

**Distribution_channel**:How the customer accessed the stay-corporate booking/  direct/TA.TO

**is_repeated_guest**:Guest coming for first time or not

**previous_cancellation**:Was there a cancellation before

**previous_bookings**:Count of previous bookings

**reserved_room_type**:Type of room reserved

**assigned_room_type**:Type of room assigned

**booking_changes**:Count of changes made to booking

**deposit_type**:Deposit type

**agent**:Booked through agent

**days_in_waiting_list**:Number of days in waiting list

**customer_type**:Type of customer

**required_car_parking**:If car parking is required

**total_of_special_req**:Number of additional special requirements

**reservation_status**:Reservation status

**reservation_status_date**:Date of specific status


### Check Unique Values for each variable.

In [None]:
hotel_df['hotel'].unique()

In [None]:
hotel_df['is_canceled'].unique()

In [None]:
hotel_df['arrival_date_year'].unique()

In [None]:
hotel_df['meal'].unique()


In [None]:
hotel_df['market_segment'].unique()

In [None]:
hotel_df['distribution_channel'].unique()

In [None]:
hotel_df['children'].unique()

In [None]:
hotel_df['distribution_channel'].unique()

In [None]:
hotel_df['is_repeated_guest'].unique()

In [None]:
hotel_df['reserved_room_type'].unique()

In [None]:
hotel_df['deposit_type'].unique()

In [None]:
hotel_df['reservation_status'].unique()

In [None]:
hotel_df['customer_type'].unique()

In [None]:
hotel_df['agent'].unique()

## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:
# Write your code to make your dataset analysis ready.

# FIRST OF ALL WE WILL CREATE A COPY OF THE ORIGINAL DATAFRAME TO PERFORM THE ANALYSIS
# Creating a copy of dataframe

df = hotel_df.copy()


**Cleaning data**

Cleaning data is crucial step before EDA as it will remove the ambigous data that can affect the outcome of EDA.

**Step 1: Removing duplicate rows if any**

In [None]:
 # Show no. of rows of duplicate rows and columns
df[df.duplicated()].shape

In [None]:
# Dropping duplicate values
df.drop_duplicates(inplace = True)
df.shape

**Step2: Handling missing values.**

In [None]:
df[['company','agent']] = df[['company','agent']].fillna(0)

This column 'children' has 0 as value which means 0 children were present in group of customers who made that transaction. So, 'nan' values are the missing values due to error of recording data.

We will replace the null values under this column with mean value of children.

In [None]:
df['children'].fillna(df['children'].mean(), inplace = True)

Next column with missing value is 'country'. This column represents the country of oriigin of customer. Since, this column has datatype of string. We will replace the missing value with 'others'

In [None]:
df['country'].fillna('others', inplace = True)

There are some rows with total number of adults, children or babies equal to zero. So we will remove such rows.

In [None]:
df[df['adults']+df['babies']+df['children'] == 0].shape

In [None]:
df.drop(df[df['adults']+df['babies']+df['children'] == 0].index, inplace = True)

In [None]:
df=df.drop("company",axis=1)

Scence the company column has more than 80 percent null value.

**Step 3: Converting columns to appropriate datatypes.**

In [None]:
df[['children', 'agent']] = df[['children', 'agent']].astype('int64')

**Step 4: Adding important columns.**

In [None]:
# Adding total staying days in hotels
df['total_stay'] = df['stays_in_weekend_nights']+df['stays_in_week_nights']

# Adding total people num as column, i.e. total people num = num of adults + children + babies
df['total_people'] = df['adults']+df['children']+df['babies']

**Analysis**

In [None]:
# Count the number of bookings for each hotel type
hotel_type_bookings = df['hotel'].value_counts()

# Group the data by month and count the number of bookings
monthly_bookings = df.groupby('arrival_date_month').size()

# Find the month with the highest number of bookings
highest_month = monthly_bookings.idxmax()

# Calculate the average lead time for bookings
avg_lead_time = df['lead_time'].mean()

# Calculate the average length of stay for bookings
avg_length_of_stay = df['total_stay'].mean()

# Calculate the average number of booking changes
average_booking_changes = df['booking_changes'].mean()

# Calculate the average daily rate (ADR)
average_adr = df['adr'].mean()

# Print the results
print("Each hotel type booking-",hotel_type_bookings)
print("The month with the highest number of bookings is:", highest_month)
print("Average lead time for bookings:", avg_lead_time)
print("Average length of stay for bookings:", avg_length_of_stay)
print("Average number of booking changes:", average_booking_changes)
print("Average Daily Rate (ADR):", average_adr)


To find the overall cancellation rate

In [None]:
# Calculate the total number of bookings
total_bookings = df.shape[0]

# Calculate the number of canceled bookings
canceled_bookings = df['is_canceled'].sum()

# Calculate the cancellation rate
cancellation_rate = canceled_bookings / total_bookings
# Display the cancellation rate
print("Total bookings cancelled:",canceled_bookings)
print("Overall Cancellation Rate: {:.2%}".format(cancellation_rate))

Total bookings in each year for weekends and week days


In [None]:
# Convert the arrival_date_year column to datetime
df['arrival_date_year'] = pd.to_datetime(df['arrival_date_year'], format='%Y')

# Extract the year from the datetime
df['arrival_year'] = df['arrival_date_year'].dt.year

# Create a new column for weekends (stays_in_weekend_nights > 0)
df['weekend_booking'] = df['stays_in_weekend_nights'].apply(lambda x: True if x > 0 else False)

# Create a new column for weekdays (stays_in_week_nights > 0)
df['weekday_booking'] = df['stays_in_week_nights'].apply(lambda x: True if x > 0 else False)

# Group the data by arrival year and calculate the total bookings for weekends and weekdays

booking_counts = df.groupby('arrival_year')[['weekend_booking', 'weekday_booking']].sum()
booking_counts

Average length of stay for hotel guests?

In [None]:
# Calculate the total length of stay for each booking
df['total_stays'] = df['stays_in_weekend_nights'] + df['stays_in_week_nights']

# Calculate the average length of stay
average_length_of_stay = df['total_stays'].mean()

# Print the average length of stay
print("Average length of stay:", average_length_of_stay)

common room type reserved by guests and the most common room type assigned


In [None]:
# Most common room type reserved by guests
most_common_reserved = df['reserved_room_type'].value_counts().idxmax()

# Most common room type assigned
most_common_assigned = df['assigned_room_type'].value_counts().idxmax()

print("Most common room type reserved by guests:", most_common_reserved)
print("Most common room type assigned:", most_common_assigned)

Bookings made by repeated guests, and their booking preferences

In [None]:
# Filter the dataframe for repeated guests
repeated_guests = df[df['is_repeated_guest'] == 1]

# Count the number of bookings made by repeated guests
num_bookings_repeated_guests = len(repeated_guests)

# Analyze the booking preferences of repeated guests
booking_preferences_repeated_guests = repeated_guests['reserved_room_type'].value_counts()

print("Number of bookings made by repeated guests:", num_bookings_repeated_guests)
print("Booking preferences of repeated guests:")
print(booking_preferences_repeated_guests)

What all manipulations have you done and insights you found?

# **Manipulations**

**Conversion of children and agent column to int:**The arrival_date_year column has been converted to datetime format to enable further analysis based on the year of arrival.

**Creation of weekend_booking and weekday_booking columns:** New columns weekend_booking and weekday_booking have been created based on the values of stays_in_weekend_nights and stays_in_week_nights. These columns indicate whether a booking includes weekends or weekdays.

**creating column total_stay and total_people:**New columns of total_stay and total_people has been added.

**Grouping and aggregation**The data has been grouped by arrival_year to calculate the total bookings for weekends and weekdays.

# **Insights**


**Hotel with higher number of bookings:**By analyzing the counts of each hotel (city hotel or resort hotel), we can determine which hotel has a higher number of bookings. This information is valuable for understanding the popularity and demand for each type of hotel.

**Cancellation rate for hotel bookings:**The overall cancellation rate can be calculated by analyzing the is_canceled column. This insight helps understand the level of booking cancellations, which can impact revenue and resource planning.

**Average lead time for hotel bookings:**Calculating the average lead time provides an understanding of how far in advance guests typically make their bookings. This information can assist in revenue management and resource allocation.

**Month or season with the highest number of bookings:** By examining the counts of bookings for each month or season, we can identify the period with the highest demand. This insight aids in revenue forecasting, marketing campaigns, and resource planning.

**Average length of stay for hotel guests:** The average length of stay reveals how long guests typically stay at the hotel. This information is crucial for revenue forecasting, capacity planning, and resource management.

These insights provide valuable information about booking patterns, cancellation rates, lead times, and guest preferences. They can assist in optimizing revenue, pricing strategies, resource allocation, and overall customer satisfaction in the hotel industry.

# **4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables**

# **UNIVARIATE**

#### Chart - 1

**Q1) Which agent makes most no. of bookings?**

In [None]:
d1 = pd.DataFrame(df['agent'].value_counts()).reset_index().rename(columns = {'index':'agent','agent':'num_of_bookings'}).sort_values(by = 'num_of_bookings', ascending = False)
d1.drop(d1[d1['agent'] == 0].index, inplace = True)            # 0 represents that booking is not made by an agent
d1 = d1[:10]                                                   # Selecting top 10 performing agents
plt.figure(figsize = (10,5))
sns.barplot(x = 'agent', y = 'num_of_bookings', data = d1, order = d1.sort_values('num_of_bookings', ascending = False).agent)

Agent no. 9 has made most no. of bookings.

##### 1. Why did you pick the specific chart?

The specific chart was picked because bar chart allows for easy comparison between different agents. The length of each bar represents the number of bookings, making it straightforward to compare the booking numbers for different agents.The agents in this case represent categorical data, as they are distinct entities. Bar charts are commonly used to visualize categorical data, where each category is represented by a bar.

##### 2. What is/are the insight(s) found from the chart?

The insights found from the chart are,

The bar chart allows you to identify the agents with the highest number of bookings. These agents are likely the most successful in terms of generating bookings for the hotel.

The chart can highlight any significant differences in the number of bookings between agents. You can observe if there are a few agents that dominate the bookings, while others have comparatively fewer bookings.

The chart can be used to evaluate the performance of different agents. You can compare the number of bookings against predetermined targets or benchmarks to assess each agent's effectiveness.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes,the gained insights can potentially help create a positive business impact, as they provide valuable information that can be leveraged to optimize business strategies and improve performance.

Yes, negative growth in a business can result from various factors. Below are some specific reasons that could lead to negative growth:

If a significant number of agents have very low or zero bookings in the chart, this could indicate that a considerable portion of the agent workforce is not contributing to bookings, leading to negative growth for the business.

If you have data covering multiple time periods and you observe a declining trend in bookings made by agents, this could be a sign of negative growth. It might be due to changes in market conditions, consumer preferences, or ineffective agent strategies



#### Chart - 2


**Q2) Which room type is in most demand and which room type generates highest adr?**

In [None]:
# Get the count of each room type
room_type_demand = df['reserved_room_type'].value_counts()

# Sort room types by demand in descending order
room_type_demand = room_type_demand.sort_values(ascending=False)

# Set up the figure and axes
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(18, 8))

# Bar plot for room type demand
sns.barplot(x=room_type_demand.index, y=room_type_demand.values, ax=ax1, palette='viridis')
ax1.set_xlabel('Room Type')
ax1.set_ylabel('Demand')
ax1.set_title('Room Type Demand')

# Box plot for ADR distribution by room type
sns.boxplot(x=df['assigned_room_type'], y=df['adr'], ax=ax2, palette='Blues', width=0.9)
ax2.set_xlabel('Room Type')
ax2.set_ylabel('ADR')
ax2.set_title('ADR Distribution by Room Type')

# Rotate x-axis labels for better visibility
for ax in (ax1, ax2):
    ax.tick_params(axis='x', rotation=45)

# Adjust the layout and display the plot
plt.tight_layout()
plt.show()

Most demanded room type is A, but better adr rooms are of type H, G and F and c also.

##### 1. Why did you pick the specific chart?

The reason for these charts are;

 Bar charts are straightforward and easy to understand, making them accessible to a wide range of audiences. The simplicity of the chart allows viewers to quickly grasp the key insights or patterns.

 Bar charts can have a strong visual impact, especially when there are significant differences or variations between the categories being compared. This can help emphasize important findings or highlight trends in the data.


The box plot allows for easy visual comparison of the ADR distributions among different room types. Each box represents the interquartile range (IQR) of the data, providing insights into the spread and variability of ADR for each room type.

 The box plot also includes whiskers that extend to the minimum and maximum values within a certain range. This allows for the identification of outliers, which can be important in understanding any extreme values or unusual patterns in ADR for specific room types.

##### 2. What is/are the insight(s) found from the chart?

Most demanded room type is A, but better adr rooms are of type H, G and C also. Hotels should increase the no. of room types A and H to maximise revenue.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes,the gained insights can help creating a positive business impact

Understanding the most in-demand room type and the room type with the highest ADR allows the hotel to align pricing and inventory strategies to maximize revenue generation.

Meeting the demand for the most in-demand room type can lead to improved customer satisfaction and loyalty. Resource Allocation: With insights on room demand and ADR, the hotel can allocate resources effectively to meet customer needs and optimize operational efficiency.

Negative Growth Insight:

If the most in-demand room type is limited in availability or if the room type with the highest ADR becomes less desirable, the hotel may face challenges in maintaining growth. To mitigate this, the hotel should continuously monitor demand patterns, diversify its room offerings, and adapt pricing strategies accordingly. To sum up, these insights can have a positive impact on the business by optimizing revenue, enhancing customer satisfaction, and improving resource allocation. However, it is essential to continuously evaluate market dynamics and adapt strategies to avoid potential negative impacts and foster sustainable growth.

#### Chart - 3

**Q3) Which meal type is most preffered meal of customers?**

In [None]:
plt.figure( figsize=(10, 8))

sns.countplot(x = df['meal'])
plt.show()

Most preferred meal type is BB (Bed and breakfast).

##### 1. Why did you pick the specific chart?

The specific chart, a bar chart, was selected for the visualization of the most preferred meals due to its suitability for displaying the frequency or count of categorical data (meal types in this case).

##### 2. What is/are the insight(s) found from the chart?

The most apparent insight from the chart would be the identification of the meal type with the highest bar, indicating the most preferred meal among the available options.Similarly, the meal type with the lowest bar represents the least preferred option.We saw that meal type BB is most preffered and FB is of least preffered

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes,the gained insights will help in creating a positive business impact in following ways:

Identifying the most preferred meal type can help the business focus on promoting and optimizing the popular choices. By tailoring the menu to meet customer preferences, the business can increase customer satisfaction and repeat visits.

Knowing the meal types that customers prefer the most can guide marketing efforts. Highlighting these popular meals in promotional materials and advertisements can attract more customers and increase sales.

Yes there are insights that might lead to negative growth:

If the chart shows that certain meal types have very low or no demand, it could indicate that these items are not well-received by customers. Continuously offering unpopular choices may lead to decreased customer satisfaction and lower overall sales.

If the chart reveals that all meal types are distributed evenly, it might suggest that the market is saturated with similar offerings. This saturation could lead to increased competition and lower profit margins, impacting growth negatively.

# **Hotel wise analysis**

#### Chart - 4

**Q1) What is percentage of bookings in each hotel?**

In [None]:
# Assuming you already have the DataFrame 'df' with the required columns, including 'hotel' and 'is_canceled'

# Calculate the total number of bookings
total_bookings = len(df)

# Group by 'hotel' and count the number of bookings for each hotel
hotel_bookings = df.groupby('hotel')['is_canceled'].count()

# Calculate the percentage of bookings for each hotel
percentage_bookings = (hotel_bookings / total_bookings) * 100

# Set the size of the plot
plt.figure(figsize=(8, 6))

# Create the bar chart with different colors
colors = ['skyblue', 'lightgreen']  # You can add more colors if you have more hotels
percentage_bookings.plot(kind='bar', color=colors, edgecolor='black')

# Set the labels and title
plt.xlabel('Hotel')
plt.ylabel('Percentage of Bookings')
plt.title('Percentage of Bookings in Each Hotel')

# Show the plot
plt.tight_layout()
plt.show()

Around 60% bookings are for City hotel and 40% bookings are for Resort hotel.

##### 1. Why did you pick the specific chart?

The bar chart is selected for finding the percentage of booking for each hotel because it is an effective and intuitive way to display and compare the percentage values for different categories (hotels, in this case).

##### 2. What is/are the insight(s) found from the chart?

The bar chart shows the percentage of bookings in each hotel, providing insight into the distribution of bookings between the two hotels. We can see that city hotel has 60 percent bookings and 40 percent bookings are from resort hotel.

 The hotel with the higher percentage of bookings is the most preferred among the customers. The chart visually highlights the preferred hotel based on the height of the bar.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

The insights gained from the bar chart representing the percentage of bookings in each hotel can potentially help create a positive business impact like, Identifying the most preferred hotel from the chart can help the business focus its marketing efforts on promoting and further enhancing the offerings of that hotel.


The insights that lead to negative growth is thst if the chart shows a significantly lower percentage of bookings for one hotel compared to the other, it might indicate that the less preferred hotel is facing challenges. This could lead to negative growth for that particular hotel and might require further investigation to understand the reasons behind its unpopularity.

#### Chart - 5

**Q2) What is preferred stay length in each hotel?**

In [None]:
not_canceled = df[df['is_canceled'] == 0]
s1 = not_canceled[not_canceled['total_stay'] < 15]
plt.figure(figsize = (10,5))
sns.countplot(x = s1['total_stay'], hue = s1['hotel'])
plt.show()

Most common stay length is less than 4 days and generally people prefer City hotel for short stay, but for long stays, Resort Hotel is preferred.

**Q3) Which hotel has longer waiting time?**

In [None]:
# Group by 'hotel' and calculate the average days in waitlist for each hotel
hotel_waitlist_days = df.groupby('hotel')['days_in_waiting_list'].mean()

# Find the hotel with the highest days in waitlist
hotel_with_highest_waitlist_days = hotel_waitlist_days.idxmax()

# Set the size of the plot
plt.figure(figsize=(8, 6))

# Create the bar chart with different colors for each hotel
colors = ['blue' if hotel != hotel_with_highest_waitlist_days else 'orange' for hotel in hotel_waitlist_days.index]
hotel_waitlist_days.plot(kind='bar', color=colors, edgecolor='black')

# Set the labels and title
plt.xlabel('Hotel')
plt.ylabel('Average Days in Waitlist')
plt.title('Average Days in Waitlist for Each Hotel')

# Show the plot
plt.tight_layout()
plt.show()


City hotel has significantly longer waiting time, hence City Hotel is much busier than Resort Hotel.

##### 1. Why did you pick the specific chart?

The bar chart was selected for visualizing the preferred stay in length and longer waiting time because of its effectiveness in representing and comparing the average values of these two numerical variables across different hotels.Bar charts were selected for these cases because they provide a clear and easy-to-interpret visualization of numerical data (average stay length and waiting time) across different hotel

##### 2. What is/are the insight(s) found from the chart?

The insights gained are;

Most common stay length is less than 4 days and generally people prefer City hotel for short stay, but for long stays, Resort Hotel is preferred.

City hotel has significantly longer waiting time, hence City Hotel is much busier than Resort Hotel.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

The insights gained from the two bar charts have the potential to create a positive business impact by informing strategic decisions and improving operations in the following ways

Understanding that most common stay lengths are less than 4 days and City hotels are preferred for short stays, while Resort hotels are preferred for long stays, allows the business to tailor its marketing and service offerings to different customer segments

Recognizing that City hotels have significantly longer waiting times indicates higher demand and busier operations. This insight can help hotel management optimize resource allocation, such as staffing levels during peak times, to ensure smooth check-in processes and improved guest experiences.

#### Chart - 6

**Q4)How often do guests change their bookings?**


In [None]:
# Chart - 6 visualization code
# Create a histogram of booking changes
plt.hist(df['booking_changes'], bins=20, edgecolor='black')

# Set x-axis label, y-axis label, and title
plt.xlabel('Number of Booking Changes')
plt.ylabel('Frequency')
plt.title('Frequency of Booking Changes')

# Display the histogram
plt.show()


Booking changes made once or twice are max.

##### 1. Why did you pick the specific chart?

The histogram graph is selected to visualize the frequency distribution of a single variable.

##### 2. What is/are the insight(s) found from the chart?

The histogram allows you to observe the distribution of the number of booking changes made we can see that changes made once or twice are more.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes,the gained insights will help in creating a positive business impact. These insights can be used to understand guest behavior, improve the booking process, or identify areas for further investigation or analysis.

#### Chart - 7


**Q5) Which hotel has higher bookings cancellation rate.**

In [None]:
# Selecting and counting number of cancelled bookings for each hotel.
cancelled_data = df[df['is_canceled'] == 1]
cancel_grp = cancelled_data.groupby('hotel')
D1 = pd.DataFrame(cancel_grp.size()).rename(columns = {0:'total_cancelled_bookings'})

# Counting total number of bookings for each type of hotel
grouped_by_hotel = df.groupby('hotel')
total_booking = grouped_by_hotel.size()
D2 = pd.DataFrame(total_booking).rename(columns = {0: 'total_bookings'})
D3 = pd.concat([D1,D2], axis = 1)

# Calculating cancel percentage
D3['cancel_%'] = round((D3['total_cancelled_bookings']/D3['total_bookings'])*100,2)
D3

In [None]:
plt.figure(figsize = (10,5))
sns.barplot(x = D3.index, y = D3['cancel_%'])
plt.show()

Almost 30 % of City Hotel bookings got cancelled.

##### 1. Why did you pick the specific chart?

The bar chart was selected for visualizing the cancellation rate because of its effectiveness in representing and comparing the average values of these two numerical variables across different  hotels.


##### 2. What is/are the insight(s) found from the chart?

The insights found from the chart is that city hotel has higher percentage of cancellations as compared to resort hotel which means resort hotels customer are more loyal


##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes the gained insights will help creating a positive business impact.These insights can guide decision-making processes, marketing strategies, and resource allocation to enhance customer satisfaction and optimize business operations in the hotel industry.

## <b>  Distribution Channel wise Analysis </b>

#### Chart - 8

**Q1) Which is the most common channel for booking hotels?**

In [None]:
 # Chart - 8 visualization code
group_by_dc = df.groupby('distribution_channel')
d1 = pd.DataFrame(round((group_by_dc.size()/df.shape[0])*100,2)).reset_index().rename(columns = {0: 'Booking_%'})
plt.figure(figsize = (8,8))
data = d1['Booking_%']
labels = d1['distribution_channel']
plt.pie(x=data, autopct="%.2f%%", explode=[0.05]*5, labels=labels, pctdistance=0.5)
plt.title("Booking % by distribution channels", fontsize=14);



TA/TO Is the most common channel for booking hotels

**Q2) Which channel is mostly used for early booking of hotels?**

In [None]:
group_by_dc = df.groupby('distribution_channel')
d2 = pd.DataFrame(round(group_by_dc['lead_time'].median(),2)).reset_index().rename(columns = {'lead_time': 'median_lead_time'})
plt.figure(figsize = (7,5))
sns.barplot(x = d2['distribution_channel'], y = d2['median_lead_time'])
plt.show()

TA/TO is mostly used for planning Hotel visits ahead of time.

##### 1. Why did you pick the specific chart?

The pie chart is commonly used to represent the distribution or composition of a categorical variable. It is an effective visualization technique when you want to show the proportions or percentages of different categories as part of a whole. In the context of analyzing the dataset of hotel bookings.

##### 2. What is/are the insight(s) found from the chart?

The insights found from the chart is that TA/TO is mostly used for planning Hotel visits ahead of time. But for sudden visits other mediums are most preferred.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes the gained insights will help creating a positive business impact. Understanding that TA/TO (Travel Agent/Tour Operator) is commonly used for planning hotel visits ahead of time can guide the hotel's marketing efforts. The business can collaborate with travel agents, tour operators, and online travel agencies to increase visibility and attract more customers for advance bookings.

Insights Leading to Negative Growth:

If the hotel heavily relies on TA/TO bookings and neglects other channels, it may lead to negative growth during periods when travel agent or tour operator bookings decline. Diversifying the booking sources can mitigate this risk.

#### Chart - 9

**Q3) Which distribution channel brings better revenue generating deals for hotels?**

In [None]:
# Chart - 9 visualization code
group_by_dc_hotel = df.groupby(['distribution_channel', 'hotel'])
d1 = pd.DataFrame(round((group_by_dc_hotel['adr']).agg(np.mean),2)).reset_index().rename(columns = {'adr': 'avg_adr'})
plt.figure(figsize = (7,5))
sns.barplot(x = d1['distribution_channel'], y = d1['avg_adr'], hue = d1['hotel'])
plt.ylim(40,140)
plt.show()


GDS channel brings higher revenue generating deals for City hotel

Resort hotel has more revnue generating deals by direct and TA/TO channel

##### 1. Why did you pick the specific chart?

The bar chart was selected for visualizing the channel through which more revenue generates because of its effectiveness in representing and comparing the average values of these two numerical variables across different hotels.

##### 2. What is/are the insight(s) found from the chart?

GDS channel brings higher revenue generating deals for City hotel, in contrast to that most bookings come via TA/TO. City Hotel can work to increase outreach on GDS channels to get more higher revenue generating deals.

Resort hotel has more revnue generating deals by direct and TA/TO channel. Resort Hotel need to increase outreach on GDS channel to increase revenue.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

 The insights gained from the analysis can help both City Hotel and Resort Hotel identify and implement targeted strategies for revenue growth and customer satisfaction. By optimizing their presence across various booking channels and tailoring services to meet the needs of different customer segments, both hotels can create a positive business impact, increase revenue, and strengthen their competitive position in the market.

Let us try to understand what causes the people to cancel the booking.

#### Chart - 10

**Q4) Which significant distribution channel has highest cancellation percentage?**

In [None]:
d1 = pd.DataFrame((group_by_dc['is_canceled'].sum()/group_by_dc.size())*100).drop(index = 'Undefined').rename(columns = {0: 'Cancel_%'})
plt.figure(figsize = (10,5))
sns.barplot(x = d1.index, y = d1['Cancel_%'])
plt.show()

TA/TO has highest booking cancellation %. Therefore, a booking via TA/TO is 30% likely to get cancelled.

Let us see what causes the cancelation of bookings of rooms by customers

One question can arise that may be longer waiting period or longer lead time causes the cancellation of bookings, let us check that.

In [None]:
 # Selecting bookings with non zero waiting time
waiting_bookings = df[df['days_in_waiting_list'] !=0]
fig, axes = plt.subplots(1, 2, figsize=(18, 8))
sns.kdeplot(ax=axes[0],x = 'days_in_waiting_list', hue = 'is_canceled' , data = waiting_bookings)
sns.kdeplot(ax = axes[1], x = df['lead_time'], hue = df['is_canceled'])
plt.show()

We see that most of the bookings that are cancelled have waiting period of less than 150 days but also most of bookings that are not cancelled also have waiting period less than 150 days. Hence this shows that waiting period has no effect on cancellation of bookings.

Also, lead time has no affect on cancellation of bookings, as both curves of cancelation and not cncelation are similar for lead time too.

Now we will check whether not getting allotted the same room type as demanded is the cause of cancellation fo bookings

In [None]:
def check_room_allot(x):
  if x['reserved_room_type'] != x['assigned_room_type']:
    return 1
  else:
    return 0

df['same_room_not_alloted'] = df.apply(lambda x : check_room_allot(x), axis = 1)
grp_by_canc = df.groupby('is_canceled')

D1 = pd.DataFrame((grp_by_canc['same_room_not_alloted'].sum()/grp_by_canc.size())*100).rename(columns = {0: 'same_room_not_alloted_%'})
plt.figure(figsize = (10,7))
sns.barplot(x = D1.index, y = D1['same_room_not_alloted_%'])
plt.show()

We see that not getting same room as demanded is not the case of cancellation of rooms. A significant percentage of bookings are not cancelled even after getting different room as demanded.



Lets see does not getting same room affects the adr.

In [None]:
#  Create a new column to indicate whether the guests got the same room they reserved
df['got_same_room'] = (df['reserved_room_type'] == df['assigned_room_type']).astype(int)

# Group the data based on whether the guests got the same room or not
grouped_data = df.groupby('got_same_room')

# Calculate the average ADR for each group
average_adr = grouped_data['adr'].mean()

#  Visualize the results using a bar chart
plt.figure(figsize=(8, 6))
average_adr.plot(kind='bar', color='skyblue', edgecolor='black')

# Set the labels and title
plt.xlabel('Got Same Room')
plt.ylabel('Average Daily Rate (ADR)')
plt.title('Impact of Getting the Same Room on ADR')

# Show the plot
plt.tight_layout()
plt.show()

So not getting same room do affects the adr, people who didn't got same room have paid a little lower adr, except for few exceptions.

##### 1. Why did you pick the specific chart?

(KDE) plot that compares the distribution of 'days_in_waiting_list' for waiting bookings (bookings with non-zero waiting days) based on their cancellation status ('is_canceled' being canceled or not canceled). The KDE plot displays the probability density function of the waiting days for both canceled and non-canceled bookings.

##### 2. What is/are the insight(s) found from the chart?

TA/TO has highest booking cancellation %. Therefore, a booking via TA/TO is 30% likely to get cancelled.

We see that most of the bookings that are cancelled have waiting period of less 150 days but also most of bookings that are not cancelled also have waiting period less than 150 days. Hence this shows that waiting period has no effect on cancellation of bookings.

Also, lead time has no affect on cancellation of bookings, as both curves of cancelation and not cncelation are similar for lead time too.

And from the last chart we see that not getting same room do affects the adr, people who didn't got same room have paid a little lower adr, except for few exceptions.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

 several potential areas that may lead to negative growth if not addressed appropriately:

The high cancellation rate of 30% for bookings via Travel Agent/Tour Operator (TA/TO) may negatively impact revenue and occupancy. Frequent cancellations can lead to lost revenue opportunities and difficulty in managing room inventory effectively.

The observation that waiting period (days in waiting list) has no significant effect on the cancellation of bookings may indicate a need to review the hotel's reservation management process. If guests frequently cancel their bookings regardless of the waiting period, it could indicate issues with reservation policies, pricing, or customer satisfaction.








# **BIVARIATE**

# **Time wise analysis**

#### Chart - 11

**Which month has the highest booking and also the revenue?**

In [None]:
#Which are the most busy months?
d_month = df['arrival_date_month'].value_counts().reset_index()
d_month.columns=['months','Number of guests']
d_month
months = ['January', 'February', 'March', 'April', 'May', 'June', 'July', 'August', 'September', 'October', 'November', 'December']
d_month['months'] = pd.Categorical(d_month['months'], categories=months, ordered=True)
d_month.sort_values('months').reset_index()


data_resort = df[(df['hotel'] == 'Resort Hotel') & (df['is_canceled'] == 0)]
data_city = df[(df['hotel'] == 'City Hotel') & (df['is_canceled'] == 0)]
resort_hotel = data_resort.groupby(['arrival_date_month'])['adr'].mean().reset_index()
city_hotel=data_city.groupby(['arrival_date_month'])['adr'].mean().reset_index()
final_hotel = resort_hotel.merge(city_hotel, on = 'arrival_date_month')
final_hotel.columns = ['month', 'price_for_resort', 'price_for_city_hotel']
final_hotel

resort_guest = data_resort['arrival_date_month'].value_counts().reset_index()
resort_guest.columns=['month','no of guests']
resort_guest

city_guest = data_city['arrival_date_month'].value_counts().reset_index()
city_guest.columns=['month','no of guests']
city_guest

final_guest=resort_guest.merge(city_guest, on = 'month')
final_guest.columns=['month','no of guests in resort','no of guest in city hotel']
final_guest
months = ['January', 'February', 'March', 'April', 'May', 'June', 'July', 'August', 'September', 'October', 'November', 'December']
final_guest['month'] = pd.Categorical(final_guest['month'], categories=months, ordered=True)
final_guest = final_guest.sort_values('month').reset_index()

#Which month get most visitors?
sns.lineplot(data=final_guest, x='month', y='no of guests in resort', label='Resort')
sns.lineplot(data=final_guest, x='month', y='no of guest in city hotel', label='City Hotel')
plt.legend()
plt.ylabel('Number of guests')
plt.xlabel('Month')
plt.xticks(rotation=45)
plt.title('Number of Guests by Month')
fig = plt.gcf()
fig.set_size_inches(12, 8)
plt.show()

Most number of guest comes in month of August.

Now lets see which month results in high revenue.

In [None]:
# Set the order of months for x-axis labels
months = ['January', 'February', 'March', 'April', 'May', 'June', 'July', 'August', 'September', 'October', 'November', 'December']
final_hotel['month'] = pd.Categorical(final_hotel['month'], categories=months, ordered=True)
final_hotel = final_hotel.sort_values('month').reset_index()

# Plotting the data
sns.lineplot(data=final_hotel, x='month', y='price_for_resort', label='Resort')
sns.lineplot(data=final_hotel, x='month', y='price_for_city_hotel', label='City Hotel')

# Set labels and title
plt.ylabel('ADR')
plt.xlabel('Month')
plt.xticks(rotation=45)
plt.title('Month with highest revenue')

# Display the legend
plt.legend()

# Adjust the size of the plot
fig = plt.gcf()
fig.set_size_inches(12, 8)

# Show the plot
plt.show()

August has the highest revenue since most bookings are done on august

##### 1. Why did you pick the specific chart?

The line chart allows you to observe the trend in ADR for each hotel type over the months. By connecting the data points with lines, it shows the overall direction and pattern of ADR changes. This helps in understanding if there are any increasing or decreasing trends in ADR over time.

##### 2. What is/are the insight(s) found from the chart?

Insights found from these chart is that Avg adr rises from beginning of year upto middle of year and reaches peak at August and then lowers to the end of year. But hotels do make some good deals with high adr at end of year also.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

The insights gained from the line chart depicting the Average Daily Rate (ADR) trends for the Resort Hotel and City Hotel across different months can indeed help create a positive business impact.The observation that ADR rises from the beginning of the year until the middle and reaches a peak in August suggests that there might be a high demand for hotel stays during these months. Hotels can leverage this insight to implement seasonal pricing strategies, where they can increase room rates during peak demand months to maximize revenue.

 However, there are also insights that might potentially lead to negative growth.The observation that hotels do make some good deals with high ADR at the end of the year might indicate missed revenue opportunities. If the hotel is not capitalizing on potential demand during the year-end holidays or festive seasons, it could lead to revenue loss.

#### Chart - 12

\\**Lets see does length of stay affects the adr.**


In [None]:
# Chart - 12 visualization code
plt.figure(figsize = (12,6))
sns.scatterplot(y = 'adr', x = 'total_stay', data = df)
plt.show()


We notice that there is an outlier in adr, so we will remove that for better scatter plot

In [None]:
df.drop(df[df['adr'] > 5000].index, inplace = True)

In [None]:
plt.figure(figsize = (12,6))
sns.scatterplot(y = 'adr', x = 'total_stay', data = df)
plt.show()

From the scatter plot we can see that as length of tottal_stay increases the adr decreases.

##### 1. Why did you pick the specific chart?

Scatter plots provide insights into the distribution of data points across the plot. You can observe the concentration or dispersion of points, identify clusters or groups, and identify any outliers or unusual patterns.


##### 2. What is/are the insight(s) found from the chart?

From the scatter plot we can see that as length of tottal_stay increases the adr decreases. This means for longer stay, the better deal for customer can be finalised.


##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

The insights gained from the scatter plot indicating that as the length of total stay increases, the ADR (Average Daily Rate) decreases can indeed help create a positive business impact. Understanding that ADR decreases as the length of total stay increases can be an opportunity for hotels to attract guests for longer durations. Hotels can offer attractive discounts or incentives for guests who book longer stays, encouraging them to extend their reservations.

Insights Leading to Negative Growth:

Lowering ADR for longer stays might inadvertently affect short-stay guests. If short-stay guests perceive that they are paying more compared to longer-stay guests, it could lead to dissatisfaction and negative reviews.

**From where the most guests are coming ?**

In [None]:
grouped_by_country = df.groupby('country')
d1 = pd.DataFrame(grouped_by_country.size()).reset_index().rename(columns = {0:'Count'}).sort_values('Count', ascending = False)[:10]
sns.barplot(x = d1['country'], y  = d1['Count'])
plt.show()

Most guest are from Portugal and other Europian countries.

These insights derived from the country distribution map can be used to make informed decisions regarding marketing strategies, customer service enhancements, international market expansion, and overall business planning.



# **MULTIVARIATE**

#### Chart - 14 - Correlation Heatmap

In [None]:
# Correlation Heatmap visualization code
corr_column = df[['lead_time','previous_cancellations','previous_bookings_not_canceled','booking_changes','days_in_waiting_list','adr','required_car_parking_spaces','total_of_special_requests','total_stay','total_people']]

corrmat = corr_column.corr()
f, ax = plt.subplots(figsize=(12, 7))
sns.heatmap(corrmat,annot = True,fmt='.2f', annot_kws={'size': 10},  vmax=.8, square=True);



Positive values indicate a positive correlation, negative values indicate a negative correlation, and values closer to 1 or -1 indicate a stronger correlation.

Total stay length and lead time have slight correlation

##### 1. Why did you pick the specific chart?

By visualizing the correlations in a heat map, it becomes easier to compare the strength and direction of multiple correlations simultaneously. We can quickly identify variables that are strongly correlated, weakly correlated, or not correlated at all.

##### 2. What is/are the insight(s) found from the chart?

1) Total stay length and lead time have slight correlation. This may means that for longer hotel stays people generally plan little before the actual arrival.

2) adr is slightly correlated with total_people, which makes sense as more no. of people means more revenue, therefore more adr.

#### Chart - 15 - Pair Plot

In [None]:
columns = ['lead_time', 'arrival_date_month', 'stays_in_weekend_nights', 'stays_in_week_nights']

# Subset the data to include only the selected columns
data_subset = df[columns]

# Create the pair plot
sns.pairplot(data_subset)

# Title for the plot
plt.suptitle('Pair Plot of Lead Time, Arrival Month, Weekend Nights, and Week Nights', y=1.02)

# Adjust the layout
plt.tight_layout()

# Show the plot
plt.show()

Stays in week nights and weekend nights decreases with the increase in lead time

##### 1. Why did you pick the specific chart?

The pair plot is selected as a visualization tool because it allows us to examine the relationships between multiple variables in a single plot.

##### 2. What is/are the insight(s) found from the chart?

Insights found from the chart is that Stays in week nights and weekend nights decreases with the increase in lead time.

## **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ?
Explain Briefly.

1.Implement dynamic pricing strategies and optimize room rates based on demand and seasonal trends to maximize revenue potential.

2.Focus on personalized services, value-added amenities, and exceptional guest interactions to foster loyalty and positive reviews.

3.Utilize data-driven marketing campaigns, targeted promotions, and social media engagement to attract new guests and retain existing ones.

4.Streamline the reservation process with user-friendly online platforms and ensure efficient handling of booking changes and room assignments.

5.Analyze cancellation patterns, offer flexible cancellation policies, and use incentives to reduce cancellations and encourage rebooking.

6.Identify loyal and high-value customers, provide exclusive perks, and loyalty programs to reward their patronage.

7.Regularly seek guest feedback to identify areas for improvement and respond to guest needs effectively.

8.Offer special deals, seasonal packages, and rewards to encourage repeat bookings and foster customer loyalty.

9.Continuously monitor market trends, benchmark against competitors, and adapt strategies to maintain a competitive edge in the hospitality industry.

# **Conclusion**

1.Agent No. 9 has made the most bookings, indicating their significance as a valuable partner for the hotel. Strengthening the relationship with this agent can lead to increased bookings and revenue.

2.Room Type A is the most demanded, but Room Types H, G, and C have better ADR. To maximize revenue, the hotel should consider increasing the availability of Room Types A and H while ensuring Room Types H, G, and C maintain their premium offerings.

3.The most preferred meal type is Bed and Breakfast (BB), which can guide hotel dining options and pricing strategies.

4.60% of bookings are for City Hotel, and 40% for Resort Hotel. This distribution can inform marketing efforts to attract guests to both types of hotels.

5.Short stays (less than 4 days) are common, and guests generally prefer City Hotel for such stays. However, for longer stays, Resort Hotel is preferred. Hotels can optimize room availability and promotions accordingly.

6.City Hotel has significantly longer waiting times, indicating higher demand and busier operations compared to Resort Hotel. Managing waiting lists effectively can enhance guest experience.

7.Most bookings have one or two changes. City Hotel has a higher percentage of cancellations compared to Resort Hotel, suggesting potential areas for improvement in customer retention.

8.TA/TO is the most common booking channel, indicating the importance of collaboration with travel agents.
GDS channels bring higher revenue for City Hotel, while Resort Hotel generates more revenue through direct bookings and TA/TO channels.

9.Waiting period has no significant effect on booking cancellations, indicating the need to optimize the booking and confirmation process.
Not getting the same room does not significantly impact cancellations. However, it does affect the ADR, suggesting a focus on room allocation and guest preferences.

10.August has the highest number of guests and revenue, presenting opportunities for targeted promotions and maximizing earnings during peak months.

11.Most guests come from Portugal and other European countries, suggesting targeted marketing efforts in these regions.

12.Total stay length and lead time show a slight correlation, indicating potential adjustments in pricing strategies based on booking lead time.

### ***Hurrah! You have successfully completed your EDA Capstone Project !!!***