<a href="https://colab.research.google.com/github/Sahariya55/EDA_OF_HOTEL_BOOKING_ANALYSIS/blob/master/EDA_OF_HOTEL_BOOKING_ANALYSIS.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Project Name**    - Hotel Booking Analysis



##### **Project Type**    - EDA (Exploratory Data Analysis)
##### **Contribution**    - Individual

# **Project Summary -**

# Description :-

Our project aims to analyze a comprehensive hotel booking dataset comprising bookings for both city and resort hotels. By leveraging advanced data analysis techniques, we seek to uncover crucial insights such as the optimal timing for booking rooms, the ideal duration of stay, and factors influencing the likelihood of special requests. Through this analysis, we aim to empower hotel management teams with actionable insights to enhance revenue generation, improve guest satisfaction, and optimize operational efficiency. Our findings will be presented in a detailed report, providing valuable guidance for strategic decision-making in the hospitality industry.

# Project Activities :-

*   Defining the Problem Statement
*   Defining Business Objective
*   Knowing and Understanding the Data
*   Understanding the Variables
*   Data Wrangling
*   Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables
*   Solution to Business Objective
*   Conclusion









# **GitHub Link -**

GitHub Link - https://github.com/Sahariya55/EDA_OF_HOTEL_BOOKING_ANALYSIS


# **Problem Statement**



The hotel industry faces the challenge of optimizing revenue while simultaneously ensuring guest satisfaction, maximizing bookings, and minimizing cancellations. To address this multifaceted challenge, a through analysis of hotel booking data is required. The analysis aims to uncover insights into booking trends, guest preferences, demand fluctuations, and factors contributing to cancellations. By leveraging data-driven insights, the hotel can develop strategic initiatives to improve revenue streams, enhance guest experiences, increase booking rates, and reduce cancellation rates. This holistic approach will enable the hotel to achieve its business objectives effectively and maintain a competitive edge in the hospitality market.



#### **Define Your Business Objective?**

*  **Improve Revenue Optimization**
*  **Enhance Guest Satisfaction**
*  **Maximise Bookings**
*  **Minimise Cancellations**













# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required.
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits.
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 20 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule.

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Import Libraries
import numpy as np
import pandas as pd

import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline
from matplotlib import rcParams
import plotly.express as px


import warnings
warnings.filterwarnings('ignore')

### Dataset Loading

In [None]:
# Mounting to google drive
from google.colab import drive
drive.mount('/content/drive')

In [None]:
#Load DataSet
path='/content/drive/MyDrive/Colab Notebooks/capstone project/Module-2/Capstone_project/Hotel Bookings.csv'
df = pd.read_csv(path)

### Dataset First View

In [None]:
# Dataset First Look
df.head()

### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count
df.shape

### Dataset Information

In [None]:
# Dataset Info
df.info()

#### Duplicate Values

In [None]:
# Dataset Duplicate Value Count
len(df[df.duplicated()])

#### Missing Values/Null Values

In [None]:
# Missing Values/Null Values Count
null_val = df.isnull().sum()
print(null_val)

In [None]:
# Visualizing the missing values
sns.heatmap(df.isnull(),cbar=False)

### What did you know about your dataset?

The dataset given is a dataset from hotel industry. And we have to explore and analyze the data to discover important factors and control the bookings.

The goal is to understand the dataset and take steps to improve revenue optimization,enhance guest satisfaction, maximise bookings and minimise cancellations.

The above dataset has 119390 rows and 32 columns . The dataset have 31994 duplicate values . And there are total 129425 null values in the dataset ,children column has 4 null values, country column has 488 null values, agent column has 16340 null values and company column has 112593 null values are there.

## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns
df.columns

In [None]:
# Dataset Describe
df.describe(include='all')

### Variables Description

*  **Hotel:** H1= Resort Hotel, H2= City Hotel
*  **is_canceled :**If the booking was canceled(1) or not(0)
*  **lead_time :** Number of days that elapsed between the entering date of the booking into the PMS(Property Management System) and the arrival date
*  **arrival_date_year :** Year of arrival date.
*  **arrival_date_month :** Month of arrival date.
*  **arrival_date_week_number :** Week number for arrival date.
*  **arrival_date_day_of_month:** Which day of the months guest is arriving.
*  **stays_in_weekend_nights:** Number of weekend nights (Saturday or Sunday) the guest stayed or booked to stay at the hotel.
*  **stays_in_week_nights:** Number of week nights (Monday to Friday) the guest stayed or booked to stay at the hotel.
*  **adults :** Number of adults.
*  **children :** Number of children.
*  **babies :** Number of babies.
*  **meal:** kind of meal opted for.
*  **country :** Country code.
*  **market_segment:** Through which channel hotels were booked.
*  **distribution_channel :** How the customer accessed the stay- Corporate Booking/Direct/TA.TO
*  **is_repeated_guest :** The values indicating if the booking name was from a repeated guest (1) or not (0).
*  **previous_cancellations :** Was there a cancellation before.
*  **previous_bookings_not_canceled :** Count of previous bookings not cancelled.
*  **reserved_room_type :** Code of room type reserved.
*  **assigned_room_type :** Code for the type of room assigned to the booking.
*  **booking_changes :** Count of changes made to booking.
*  **deposit_type :** Deposit type.
*  **agent :** If the booking happens through agents or not.
*  **company :** If the booking happens through companies, the company ID that made the booking or responsible for paying the booking.
*  **days_in_waiting_list :** Number of days the booking was on the waiting list before the confirmation to the customer.
*  **customer_type :** Booking type like Transient – Transient-Party – Contract – Group.
*  **adr :** Average Daily Rates that described via way of means of dividing the sum of all accommodations transactions using entire numbers of staying nights.
*  **required_car_parking_spaces :** How many parking areas are necessary for the customers.
*  **total_of_special_requests :** Total unique requests from consumers.
*  **reservation_status:** The last status of reservation, assuming one of three
categories: Canceled – booking was cancelled by the customer; Check-Out;No-Show.
*  **reservation_status_date:** The last status date.


### Check Unique Values for each variable.

In [None]:
# Check Unique Values for each variable.
for item in list(df.columns):
  print(f"Column name {item}-No. of unique values: {df[item].nunique()}")

## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:
# Write your code to make your dataset analysis ready.
# make a copy  of the dataset
df_copy = df.copy()

In [None]:
df_copy.head()

In [None]:
#we need to clean the dataset by deleting duplicate values from the dataset
df_copy.drop_duplicates(inplace=True)

In [None]:
df_copy.shape

In [None]:
#checking the duplicate is still there in the dataset or not
len(df_copy[df_copy.duplicated()])

In [None]:
#checking the null value percentage
null_percentage = 100*(df_copy.isna().sum()/df_copy.shape[0]).sort_values(ascending=False)
print(null_percentage)

In [None]:
# we can see that in company column 93% data is missing ,so we should drop the column
df_copy.drop(columns=['company'],inplace=True)

In [None]:
# we can see that others column have less percentage in missing value,so we just replace them
df_copy.agent.fillna(0,inplace=True)
df_copy.children.fillna(0,inplace=True)
df_copy.country.fillna('Others',inplace=True)

In [None]:
# now again checking for missing or null value count
null_val_2 = df_copy.isnull().sum()
print(null_val_2)

In [None]:
df_copy.info()

In [None]:
#In the above we can see that Children, agent and adr column has datatype of float ,so we need to change it in integer
df_copy['children'] = df_copy['children'].astype(int)
df_copy['agent'] = df_copy['agent'].astype(int)
df_copy['adr'] = df_copy['adr'].astype(int)

In [None]:
df_copy.info()

In [None]:
# adults ,children, babies can't be 0 at the same time
df_copy = df_copy[~((df_copy['adults']==0)&(df_copy['children']==0)&(df_copy['babies']==0))]


In [None]:
df_copy.shape

In [None]:
#creating new column for analysis
df_copy['total_night_stayed']= df_copy['stays_in_week_nights']+df_copy['stays_in_weekend_nights']
df_copy[['stays_in_week_nights','stays_in_weekend_nights','total_night_stayed']]

In [None]:
#creation of separate dataset for resort and city hotel
resort_df = df_copy[df_copy['hotel']=='Resort Hotel']
city_df = df_copy[df_copy['hotel']=='City Hotel']

In [None]:
resort_df.head()

In [None]:
city_df.head()

### What all manipulations have you done and insights you found?

*  At first we make a copy of the original dataset to work
*  Then we clean the copied dataset by deleting all the duplicate rows in the copied dataset.
*  After that we handled all the null and missing values and deleted the company column because it's has 93% missing values.
*  Then we changed the datatype of children,agent and adr columns.
*  After that we added new column for better analysis
*  In the last we separated the data frames for both city and resort hotel for better understanding.







## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

# **Booking Trends Over Time:**

# 1.1 How does the number of bookings vary month-by-month for city hotel type?

In [None]:

# Convert arrival_date_month to datetime format
city_df['arrival_date_month'] = pd.to_datetime(city_df['arrival_date_month'], format='%B')

# Group data by hotel type and month, and count the number of bookings
monthly_bookings_city = city_df.groupby(['hotel', city_df['arrival_date_month'].dt.month]).size()

print(monthly_bookings_city)


In [None]:

# Plotting
monthly_bookings_city.plot(kind='bar',figsize=(10, 6))
plt.title('Number of Bookings by Month for City Hotels')
plt.xlabel('Month')
plt.ylabel('Number of Bookings')
plt.xticks(range(0, 12), labels=['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec'])
plt.show()

# 1.2 How does the number of bookings vary month-by-month for resort hotel type?

In [None]:
# Convert arrival_date_month to datetime format
resort_df['arrival_date_month'] = pd.to_datetime(resort_df['arrival_date_month'], format='%B')

# Group data by hotel type and month, and count the number of bookings
monthly_bookings_resort = resort_df.groupby(['hotel', resort_df['arrival_date_month'].dt.month]).size()

print(monthly_bookings_resort)

In [None]:
# Plotting
monthly_bookings_resort.plot(kind='bar',figsize=(10, 6))
plt.title('Number of Bookings by Month for Resort Hotels')
plt.xlabel('Month')
plt.ylabel('Number of Bookings')
plt.xticks(range(0, 12), labels=['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec'])
plt.show()

##### 1. Why did you pick the specific chart?

I choose a bar chart to visualize the number of bookings by month for city and resort hotels because it effectively presents categorical data (months) with numerical values (number of bookings) in a clear and concise manner.


##### 2. What is/are the insight(s) found from the chart?

1.Both city hotels and resort hotels experience fluctuations in booking volumes throughout the year. For example, there is a noticeable increase in bookings during the summer months (June, July, August) for both hotel types, indicating a peak season for tourism and travel.

2.City hotels generally have higher booking volumes compared to resort hotels, especially during the peak summer months. This suggests that city hotels may attract a larger number of tourists or business travelers during certain times of the year.

3.Both city hotels and resort hotels experience lower booking volumes during certain months, such as January and November. These months may represent off-peak periods for tourism or travel, which could be leveraged for targeted marketing campaigns or promotions to attract guests during slower periods.

# 2.1 Can we visualize the distribution of bookings across different days of the month for city hotel?

In [None]:
# Group data by arrival day of the month and count the number of bookings
daily_bookings_city = city_df['arrival_date_day_of_month'].value_counts().sort_index()
print(daily_bookings_city)


In [None]:
# Plotting
plt.figure(figsize=(14, 8))
daily_bookings_city.plot(kind='bar', color='skyblue')
plt.title('Distribution of Bookings Across Different Days of the Month for City Hotels')
plt.xlabel('Day of the Month')
plt.ylabel('Number of Bookings')
plt.xticks(rotation=45)
plt.grid(axis='y', linestyle='--')
plt.show()

# 2.2 Can we visualize the distribution of bookings across different days of the month for resort hotel?

In [None]:
# Group data by arrival day of the month and count the number of bookings
daily_bookings_resort = resort_df['arrival_date_day_of_month'].value_counts().sort_index()
print(daily_bookings_resort)

In [None]:
# Plotting
plt.figure(figsize=(14, 8))
daily_bookings_resort.plot(kind='bar', color='skyblue')
plt.title('Distribution of Bookings Across Different Days of the Month for Resort Hotels')
plt.xlabel('Day of the Month')
plt.ylabel('Number of Bookings')
plt.xticks(rotation=45)
plt.grid(axis='y', linestyle='--')
plt.show()

##### 1. Why did you pick the specific chart?

A bar chart is an effective way to represent categorical data, such as the days of the month, along with their corresponding counts of bookings. Each bar in the chart represents a specific day of the month, and the height of the bar indicates the number of bookings on that day. This clear visual representation makes it easy for viewers to interpret the data.



##### 2. What is/are the insight(s) found from the chart?

1.City Hotel Bookings Distribution:

The distribution of bookings across different days of the month for city hotels appears to be relatively consistent, with fluctuations in the number of bookings across various days.
There is no significant outlier in the distribution, with the highest count of bookings being observed on the 2nd day of the month (1871 bookings) and the lowest count on the 31st day (981 bookings).
Overall, the distribution indicates a relatively stable pattern of bookings throughout the month for city hotels, without any specific trend or anomaly.

2.Resort Hotel Bookings Distribution:

Similarly, the distribution of bookings across different days of the month for resort hotels also shows consistency, with fluctuations in the number of bookings across various days.
The highest count of bookings is observed on the 30th day of the month (1213 bookings) for resort hotels, while the lowest count is observed on the 31st day (751 bookings).
Like city hotels, the distribution indicates a relatively stable pattern of bookings throughout the month for resort hotels, without any specific trend or anomaly.

# 3.1 What are the trends in booking lead time over the years for city hotel?

In [None]:
# Convert arrival_date_year to datetime format
city_df['arrival_date_year'] = pd.to_datetime(city_df['arrival_date_year'], format='%Y')

# Calculate average booking lead time for each year
avg_lead_time_city = city_df.groupby(city_df['arrival_date_year'].dt.year)['lead_time'].mean()

print(avg_lead_time_city)

In [None]:
# Plotting the trends
plt.figure(figsize=(10, 6))
avg_lead_time_city.plot(kind='line', marker='o', color='red')
plt.title('Trends in Booking Lead Time Over the Years for City Hotels')
plt.xlabel('Year')
plt.ylabel('Average Booking Lead Time (Days)')
plt.grid(True)
plt.show()

# 3.2 What are the trends in booking lead time over the years for resort hotel?

In [None]:
# Convert arrival_date_year to datetime format
resort_df['arrival_date_year'] = pd.to_datetime(resort_df['arrival_date_year'], format='%Y')

# Calculate average booking lead time for each year
avg_lead_time_resort = resort_df.groupby(resort_df['arrival_date_year'].dt.year)['lead_time'].mean()

print(avg_lead_time_resort)

In [None]:
# Plotting the trends
plt.figure(figsize=(10, 6))
avg_lead_time_resort.plot(kind='line', marker='o', color='red')
plt.title('Trends in Booking Lead Time Over the Years for Resort Hotels')
plt.xlabel('Year')
plt.ylabel('Average Booking Lead Time (Days)')
plt.grid(True)
plt.show()

##### 1. Why did you pick the specific chart?

A line plot is ideal for visualizing trends over time. It effectively illustrates how the average booking lead time changes from year to year, allowing viewers to identify any increasing or decreasing patterns.

##### 2. What is/are the insight(s) found from the chart?

1.In both city and resort hotels, there is a general increasing trend in the average booking lead time over the years. The average lead time has progressively increased from 2015 to 2017 for both types of hotels.

2.The average booking lead time is consistently higher for resort hotels compared to city hotels across all years. This suggests that guests tend to book their stays further in advance for resort accommodations compared to city accommodations.

3.While there is an increasing trend in average lead time, the growth rate appears to be relatively steady over the years for both city and resort hotels. This indicates a consistent pattern of guests booking their stays further in advance over time, rather than sudden spikes or fluctuations.

# 4. What is the total number of bookings for city and resort hotels over the entire dataset period, and how does it compare between the two hotel types?

In [None]:
# Calculate total number of bookings for city and resort hotels
total_bookings_city = df_copy[df_copy['hotel'] == 'City Hotel'].shape[0]
total_bookings_resort = df_copy[df_copy['hotel'] == 'Resort Hotel'].shape[0]
print(total_bookings_city )
print(total_bookings_resort )

# Plotting
plt.figure(figsize=(8, 6))
plt.bar(['City Hotel', 'Resort Hotel'], [total_bookings_city, total_bookings_resort], color=['blue', 'green'])
plt.title('Total Number of Bookings for City and Resort Hotels')
plt.xlabel('Hotel Type')
plt.ylabel('Total Number of Bookings')
plt.grid(axis='y', linestyle='--', alpha=0.7)
plt.show()

##### 1. Why did you pick the specific chart?

Bar plots are well-suited for comparing categorical data, such as different hotel types. In this case, we are comparing the total number of bookings between city and resort hotels, which are categorical variables.

##### 2. What is/are the insight(s) found from the chart?

1.The insight from the chart is that city hotels have a higher total number of bookings compared to resort hotels. In this specific dataset, city hotels have 53,274 bookings, while resort hotels have 33,956 bookings. This indicates that city hotels are more popular or attract more guests compared to resort hotels.

2.The higher booking volume for city hotels may suggest that they cater to a larger or different market segment compared to resort hotels. City hotels may attract business travelers, tourists, or individuals looking for urban accommodations, while resort hotels may cater to vacationers seeking leisure or relaxation in resort destinations.


# Will the gained insights help creating a positive business impact for Booking Trends Over Time?
Are there any insights that lead to negative growth? Justify with specific reason.

1.Monthly Bookings by Hotel Type:

Understanding how the number of bookings varies month-by-month for each hotel type can help hotel managers anticipate demand fluctuations throughout the year.

2.Distribution of Bookings Across Different Days of the Week:

Visualizing the distribution of bookings across different days of the week enables hotel managers to identify booking patterns and trends related to arrival and departure days.

3.Trends in Booking Lead Time Over the Years:

Analyzing trends in booking lead time provides valuable insights into changing guest behavior and booking preferences over time.

4.Comparing total bookings in City and Resort Hotels:

Knowing the total number of bookings for each hotel type provides hotel managers with valuable information about the demand and popularity of their establishments. Understanding the comparative performance between city and resort hotels allows managers to make strategic decisions to optimize operations and revenue generation.

# **Guest Demographics :**

#1. Can we visualize the geographical distribution of guests using country codes?

In [None]:
# Group data by country code and count the number of guests from each country
guests_by_country = df_copy['country'].value_counts().reset_index()
guests_by_country.columns = ['Country', 'Number of Guests']

# Sort the DataFrame in descending order based on the number of guests
guests_by_country_sorted = guests_by_country.sort_values(by='Number of Guests', ascending=False)

# Print the sorted DataFrame
print(guests_by_country_sorted)


In [None]:
#Top 10 Country wise Guests count
x=df_copy.country.value_counts()
z=sns.countplot(x=df_copy[df_copy['is_canceled'] == 0]['country'], data=df_copy,order=pd.value_counts(df_copy['country']).iloc[:10].index,palette= 'colorblind')
plt.title('Top 10 Countries of Origin of the Guests')
plt.xlabel('Country')
plt.ylabel('Reservation Count')

In [None]:
# Plotting
fig = px.choropleth(guests_by_country, locations='Country', locationmode='ISO-3', color='Number of Guests',
                    hover_name='Country', color_continuous_scale='Viridis',
                    title='Geographical Distribution of Guests by Country')
fig.show()

##### 1. Why did you pick the specific chart?

A countplot showing the top 10 countries of origin of guests, was chosen because it effectively visualizes the distribution of guests across different countries. By displaying the count of reservations for each country in a bar chart format, it allows for easy comparison of the number of guests from each country.

A choropleth map effectively represents geographical data, allowing viewers to easily understand the distribution of guests across different countries. This allows for quick identification of countries with the highest and lowest numbers of guests.

##### 2. What is/are the insight(s) found from the chart?

1.Top Guest-Originating Countries: The chart reveals the countries from which the hotel receives the highest number of guests. In this case, Portugal (PRT) has the highest number of guests, followed by the United Kingdom (GBR), France (FRA), Spain (ESP), and Germany (DEU).

2.Global Representation: The chart illustrates the global representation of guests, showing that guests come from a wide range of countries. This indicates the hotel's international appeal and its ability to attract guests from diverse regions.

3.Varied Guest Demographics: The distribution of guests by country highlights the diverse demographic backgrounds of the hotel's guests. Understanding the geographical distribution of guests can help the hotel tailor its services, amenities, and marketing strategies to cater to the needs and preferences of guests from different countries.

# 2.How does the distribution of guest types vary between city and resort hotels?

In [None]:
# Group data by hotel type and guest type, and count the number of guests for each combination
guests_by_hotel_and_guest_type = df_copy.groupby(['hotel', 'customer_type']).size().unstack()

print(guests_by_hotel_and_guest_type)

In [None]:
# Plotting
guests_by_hotel_and_guest_type.plot(kind='bar', figsize=(12, 8))
plt.title('Distribution of Guest Types between City and Resort Hotels')
plt.xlabel('Hotel Type')
plt.ylabel('Number of Guests')
plt.xticks(rotation=0)
plt.legend(title='Guest Type')
plt.grid(axis='y', linestyle='--', alpha=0.7)
plt.tight_layout()
plt.show()

##### 1. Why did you pick the specific chart?

The grouped bar chart allows for a direct visual comparison of guest types between city and resort hotels. Each bar represents a different hotel type, and within each bar, there are sub-bars representing the number of guests belonging to each guest type category.The bar chart provides a clear and intuitive representation of the data, making it easy for viewers to understand the distribution of guest types for each hotel type at a glance.

##### 2. What is/are the insight(s) found from the chart?

1.Transient Guests Dominance: In both city and resort hotels, the majority of guests belong to the "Transient" category. This suggests that transient guests, who typically make individual bookings for short stays, form the largest segment of guests for both types of hotels.

2.Differentiation in Group Guests: While the number of group guests is relatively low compared to transient guests in both city and resort hotels, there is a slight variation between the two. City hotels have a slightly higher number of group guests compared to resort hotels. This could be due to city hotels attracting more business conferences, events, or group tours.

3.Contract Guests Proportion: The number of contract guests is higher in resort hotels compared to city hotels. This indicates that resort hotels may have more contractual arrangements with companies, travel agencies, or other organizations for group bookings or long-term stays.

4.Transient-Party Guests Distribution: The distribution of transient-party guests, who are typically leisure travelers booking together but staying independently, is higher in city hotels compared to resort hotels. This suggests that city hotels may be more popular among groups of leisure travelers who prefer independent stays but book together for convenience.

# Will the gained insights help creating a positive business impact for Guest Demographics?
Are there any insights that lead to negative growth? Justify with specific reason.

1.Targeted Marketing Strategies: Understanding the top guest-originating countries and the predominant guest types allows hotels to tailor their marketing strategies to effectively reach and attract their target audience. This may involve targeted advertising campaigns, promotions, and partnerships with travel agencies or online booking platforms that cater to the specific preferences and demographics of these guests.

2.Enhanced Guest Experience: By recognizing the varied guest demographics and preferences, hotels can personalize their services, amenities, and experiences to better meet the needs and expectations of their diverse guest base. This may involve offering multilingual services, culturally relevant dining options, and customized experiences tailored to different guest segments.

3.Revenue Optimization: Identifying the most lucrative guest segments, such as transient guests or group bookings, allows hotels to optimize their revenue streams by strategically pricing their rooms, offering package deals, and maximizing occupancy rates during peak periods. Additionally, establishing partnerships with corporate clients or travel agencies for contract bookings can provide a steady stream of revenue and long-term business relationships.

# **Booking Behavior Analysis:**

# 1.1 How long do people stay at the City hotels?

In [None]:
city_df['total_night_stayed'].value_counts()

In [None]:
#plotting
plt.figure(figsize=(24,5))
sns.countplot(x='total_night_stayed', data=city_df)
plt.title('Distribution Of Stay Duration For City Hotels', fontsize=20)
plt.show()

# 1.2 How long do people stay at the Resort hotels?

In [None]:
resort_df['total_night_stayed'].value_counts()

In [None]:
#plotting
plt.figure(figsize=(24,5))
sns.countplot(x='total_night_stayed', data=resort_df)
plt.title('Distribution Of Stay Duration For Resort Hotels', fontsize=20)
plt.show()

##### 1. Why did you pick the specific chart?

Countplots are suitable for visualizing the distribution of categorical data, such as the duration of stay in this case. By plotting the count of each category (total night stayed), we can observe the frequency of different durations of stay for both city and resort hotels.

##### 2. What is/are the insight(s) found from the chart?

1.Most Common Stay Durations: In both city and resort hotels, the most common stay durations are 1 night, 2 nights, and 3 nights, as indicated by the highest counts in the respective datasets.

2.Short vs. Long Stays: City hotels tend to have a higher proportion of shorter stays, with a significant number of guests staying for 1 to 4 nights. In contrast, resort hotels have a more diverse distribution of stay durations, with a notable number of guests staying for 7 nights, likely representing weekly vacation stays.

3.Weeklong Stays: Resort hotels exhibit a prominent peak in the distribution at 7 nights, indicating that many guests opt for weeklong stays, which is common for leisure-oriented accommodations like resorts.

4.Longer Stays: While shorter stays dominate in both types of hotels, there is also a presence of longer stays, particularly in resort hotels. This suggests that some guests choose to extend their vacations or opt for longer-term stays in resort settings.

# 2.1 What is the most common Distribution channel for booking City hotels?

In [None]:
# Count the frequency of each distribution channel
channel_counts_city = city_df['distribution_channel'].value_counts()

print(channel_counts_city)

In [None]:
# Plotting the horizontal bar chart
plt.figure(figsize=(10, 6))
channel_counts_city.sort_values().plot(kind='barh', color='yellow')
plt.title('Distribution of Booking Channels for City Hotels')
plt.xlabel('Number of Bookings')
plt.ylabel('Distribution Channel')
plt.grid(axis='x', linestyle='--', alpha=0.7)
plt.tight_layout()
plt.show()

# 2.2 What is the most common Distribution channel for booking Resort hotels?

In [None]:
# Count the frequency of each distribution channel
channel_counts_resort= resort_df['distribution_channel'].value_counts()

print(channel_counts_resort)

In [None]:
# Plotting the horizontal bar chart
plt.figure(figsize=(10, 6))
channel_counts_resort.sort_values().plot(kind='barh', color='yellow')
plt.title('Distribution of Booking Channels for Resort Hotels')
plt.xlabel('Number of Bookings')
plt.ylabel('Distribution Channel')
plt.grid(axis='x', linestyle='--', alpha=0.7)
plt.tight_layout()
plt.show()

##### 1. Why did you pick the specific chart?

Horizontal bar charts allow for easy comparison of values between categories. In this case, it enables a side-by-side comparison of the distribution of booking channels for city hotels and resort hotels.The horizontal orientation of the bars makes it easy to read the labels for each booking channel, especially when there are many categories.

##### 2. What is/are the insight(s) found from the chart?

1.TA/TO Dominance: In both city hotels and resort hotels, the most common distribution channel is Travel Agents/Tour Operators (TA/TO). This suggests that a significant portion of bookings in both types of hotels are made through third-party travel agencies or tour operators.

2.Direct Bookings: Direct bookings, where guests book directly with the hotel, are also prevalent in both city and resort hotels, although they are less common compared to TA/TO bookings.

3.Corporate Bookings: Corporate bookings, which are likely made by companies for business purposes, constitute a notable portion of the distribution channels for both city hotels and resort hotels.

4.Undefined Category: There are a small number of bookings categorized as "Undefined" in both city hotels and resort hotels. This category may require further investigation to determine the reasons behind it and whether it represents a data anomaly or a specific type of booking channel.

# 3.1 What is the distribution of Market Segment based on Deposit Type for City hotels?

In [None]:
# Create a pivot table to aggregate the counts of market segments based on deposit types
pivot_table_city = city_df.pivot_table(index='deposit_type', columns='market_segment', aggfunc='size', fill_value=0)

print(pivot_table_city)

In [None]:
# Plotting the stacked bar chart
pivot_table_city.plot(kind='bar', stacked=True, figsize=(10, 6))
plt.title('Distribution of Market Segments Based on Deposit Type for City Hotel')
plt.xlabel('Deposit Type')
plt.ylabel('Number of Bookings')
plt.xticks(rotation=0)
plt.legend(title='Market Segment')
plt.grid(axis='y', linestyle='--', alpha=0.7)
plt.tight_layout()
plt.show()

# 3.2 What is the distribution of Market Segment based on Deposit Type for Resort hotels?

In [None]:
# Create a pivot table to aggregate the counts of market segments based on deposit types
pivot_table_resort = resort_df.pivot_table(index='deposit_type', columns='market_segment', aggfunc='size', fill_value=0)

print(pivot_table_resort)

In [None]:
# Plotting the stacked bar chart
pivot_table_resort.plot(kind='bar', stacked=True, figsize=(10, 6))
plt.title('Distribution of Market Segments Based on Deposit Type for Resort Hotel')
plt.xlabel('Deposit Type')
plt.ylabel('Number of Bookings')
plt.xticks(rotation=0)
plt.legend(title='Market Segment')
plt.grid(axis='y', linestyle='--', alpha=0.7)
plt.tight_layout()
plt.show()

##### 1. Why did you pick the specific chart?


I choose stacked bar charts to visualize the distribution of market segments based on deposit type for both city and resort hotels because they effectively display the relationship between multiple categorical variables.
We can easily compare the contribution of each market segment to the total number of bookings within each deposit type category.


##### 2. What is/are the insight(s) found from the chart?

1.City Hotel:
The majority of bookings with "No Deposit" come from the Online TA market segment, followed by Offline TA/TO and Direct bookings.
Non Refund and Refundable deposit types have minimal representation in most market segments, except for a few bookings in the Online TA and Offline TA/TO segments.

2.Resort Hotel:
Similar to city hotels, the Online TA market segment dominates bookings with "No Deposit", followed by Direct bookings and Offline TA/TO.
Non Refund and Refundable deposit types have very limited representation across all market segments.

# 4.1 Which channel is mostly used for the early booking of City hotels?

In [None]:
# Filter the data for early bookings (e.g., lead time less than 30 days)
early_bookings_city = city_df[city_df['lead_time'] < 30]

# Group the data by distribution channel and calculate the average lead time
avg_lead_time_city = early_bookings_city.groupby('distribution_channel')['lead_time'].mean().sort_values()
print(avg_lead_time_city)

In [None]:
# Plotting
plt.figure(figsize=(12, 6))
sns.barplot(x=avg_lead_time_city.index, y=avg_lead_time_city.values, palette='viridis')
plt.title('Average Lead Time for Early Bookings (City Hotel)')
plt.xlabel('Distribution Channel')
plt.ylabel('Average Lead Time (Days)')
plt.xticks(rotation=45)
plt.grid(axis='y', linestyle='--', alpha=0.7)
plt.tight_layout()
plt.show()

# 4.2 Which channel is mostly used for the early booking of Resort hotels?

In [None]:
# Filter the data for early bookings (e.g., lead time less than 30 days)
early_bookings_resort = resort_df[resort_df['lead_time'] < 30]

# Group the data by distribution channel and calculate the average lead time
avg_lead_time_resort = early_bookings_resort.groupby('distribution_channel')['lead_time'].mean().sort_values()
print(avg_lead_time_resort)

In [None]:
plt.figure(figsize=(12, 6))
sns.barplot(x=avg_lead_time_resort.index, y=avg_lead_time_resort.values, palette='viridis')
plt.title('Average Lead Time for Early Bookings (Resort Hotel)')
plt.xlabel('Distribution Channel')
plt.ylabel('Average Lead Time (Days)')
plt.xticks(rotation=45)
plt.grid(axis='y', linestyle='--', alpha=0.7)
plt.tight_layout()
plt.show()

##### 1. Why did you pick the specific chart?

I choose to use bar plots because they are effective for comparing the average lead time across different distribution channels. Bar plots allow us to easily visualize and compare the average lead time values for each distribution channel, making it clear which channel is mostly used for early booking. Additionally, using seaborn's barplot function provides automatic aggregation of data and error bars, if needed, making the visualization process more efficient.

##### 2. What is/are the insight(s) found from the chart?

1.City hotel:

The distribution channel with the shortest average lead time for early bookings (lead time less than 30 days) is Undefined, with an average lead time of 3 days.
Direct bookings have the next shortest average lead time at approximately 6.29 days.

Corporate bookings follow with an average lead time of about 6.69 days.

GDS (Global Distribution System) bookings have a longer average lead time of approximately 8.71 days.

Finally, bookings made through TA/TO (Travel Agents/Tour Operators) have the longest average lead time among the observed distribution channels, with an average of about 11.56 days.

2.Resort hotel:

Direct bookings have the shortest average lead time among the observed distribution channels, with an average lead time of approximately 5.62 days.

Corporate bookings follow with an average lead time of about 6.32 days.

TA/TO (Travel Agents/Tour Operators) bookings have the longest average lead time among the observed distribution channels for the resort hotel, with an average of approximately 9.90 days.


# Will the gained insights help creating a positive business impact for Booking Behavior Analysis?
Are there any insights that lead to negative growth? Justify with specific reason.

1.Understanding Stay Durations: Knowing the most common lengths of stay allows hotels to optimize room inventory, staffing levels, and service offerings accordingly. City hotels can focus on accommodating shorter stays efficiently, while resort hotels can tailor experiences and amenities to cater to guests planning weeklong vacations.

2.Optimizing Booking Channels: Recognizing the dominance of TA/TO bookings highlights the importance of partnerships with third-party agencies for both city and resort hotels. Hotels can leverage this insight to strengthen relationships with travel agents and tour operators, negotiate favorable terms, and implement targeted marketing campaigns to attract more guests through these channels.

3.Deposit Type and Market Segments: Understanding the correlation between deposit types and market segments provides hotels with insights into guest preferences and behaviors. This information can inform pricing strategies, promotional offers, and deposit policies tailored to different market segments, ultimately enhancing revenue management and guest satisfaction.

4.Lead Time for Early Bookings: Analyzing lead time by distribution channel helps hotels identify opportunities to optimize booking processes and revenue streams. Hotels can focus on streamlining direct booking channels, leveraging corporate partnerships, and implementing targeted marketing initiatives to drive early bookings and maximize occupancy rates.

# **Guest Behavior Analysis:**

# 1.1 Which meal type is most preferred meal of customer for city hotels?

In [None]:
# Count the frequency of each meal type
meal_counts_city = city_df['meal'].value_counts()

print(meal_counts_city)

In [None]:
# Create a pie chart to visualize the distribution of meal types
plt.figure(figsize=(6, 8))
plt.pie(meal_counts_city, labels=meal_counts_city.index, autopct='%1.1f%%', startangle=140)
plt.title('Distribution of Meal Types')
plt.tight_layout()
plt.show()

# 1.2 Which meal type is most preferred meal of customer for resort hotels?

In [None]:
# Count the frequency of each meal type
meal_counts_resort = resort_df['meal'].value_counts()

print(meal_counts_resort)

In [None]:
# Create a pie chart to visualize the distribution of meal types
plt.figure(figsize=(6, 8))
plt.pie(meal_counts_resort, labels=meal_counts_resort.index, autopct='%1.1f%%', startangle=140)
plt.title('Distribution of Meal Types')
plt.tight_layout()
plt.show()

##### 1. Why did you pick the specific chart?

Pie charts are effective for visualizing proportions and percentages, making them suitable for displaying the distribution of meal types. They provide a quick and easy way to compare the relative frequencies of different categories, allowing stakeholders to understand the popularity of each meal type at a glance. Additionally, pie charts are visually appealing and intuitive, making them accessible to a wide audience. Therefore, I choose pie charts as a suitable visualization for depicting the distribution of meal types in both city and resort hotels.

##### 2. What is/are the insight(s) found from the chart?

1.City Hotel:

The most preferred meal type is "Bed & Breakfast" (BB), with a significantly higher number of bookings compared to other meal types.

"Room Only" (SC) and "Half Board" (HB) are also popular choices, although they have fewer bookings compared to "Bed & Breakfast".

"Full Board" (FB) is the least preferred meal type, with a very small number of bookings.

2.Resort Hotel:

Similar to city hotels, "Bed & Breakfast" (BB) is the most preferred meal type, with a higher number of bookings compared to other options.

"Half Board" (HB) is the second most popular meal type, but it has fewer bookings compared to "Bed & Breakfast".

"Room Only" (SC) has significantly fewer bookings compared to city hotels, indicating that guests at resort hotels are more inclined towards meal-inclusive options.

There are a few bookings categorized as "Undefined", which may require further investigation to understand their nature and possible implications.

# Will the gained insights help creating a positive business impact for Guest Behavior Analysis?
Are there any insights that lead to negative growth? Justify with specific reason.

1.Tailored Offerings: Understanding which meal types are most preferred by guests allows hotels to tailor their offerings accordingly. By prioritizing popular meal options such as "Bed & Breakfast," hotels can ensure that they meet the expectations and preferences of a majority of their guests.

2.Revenue Optimization: With knowledge of the most preferred meal types, hotels can optimize their revenue by strategically pricing and promoting these offerings. They can also identify opportunities to upsell meal packages or create special promotions to attract more guests.

3.Guest Satisfaction: Offering the meal types that guests prefer can significantly enhance their overall experience and satisfaction during their stay. Satisfied guests are more likely to leave positive reviews, recommend the hotel to others, and become repeat customers, thereby fostering loyalty and driving future business.

# **Revenue Insights:**



# 1. Which hotel has more average daily rate(ADR) and hence seems to make more revenue?

In [None]:
# calculating average adr
grouped_by_hotel = df_copy.groupby('hotel')
d3 = grouped_by_hotel['adr'].agg(np.mean).reset_index().rename(columns = {'adr':'avg_adr'})   # calculating average adr
print(d3)

In [None]:
plt.figure(figsize = (8,5))
sns.barplot(x = d3['hotel'], y = d3['avg_adr'] )
plt.title(" calculating average adr")
plt.show()

##### 1. Why did you pick the specific chart?


The bar plot chosen here effectively displays the average daily rate (ADR) for each hotel type (city hotel and resort hotel). This visualization allows for a clear comparison of ADR between the two hotel types, making it easy to identify which hotel type has a higher average rate.

##### 2. What is/are the insight(s) found from the chart?

The insight from the chart is that, on average, the City Hotel has a higher average daily rate (ADR) compared to the Resort Hotel. This suggests that the City Hotel may potentially generate more revenue per room compared to the Resort Hotel.







# 2.What is the impact of offering different meal plans on revenue generation?

In [None]:
# Group the data by meal plan and calculate the total revenue for each meal plan
meal_revenue = df_copy.groupby('meal')['adr'].sum().reset_index()
print(meal_revenue)

In [None]:
# Plotting
plt.figure(figsize=(10, 6))
sns.barplot(x='meal', y='adr', data=meal_revenue, palette='Set2')
plt.title('Revenue Generated by Meal Plan')
plt.xlabel('Meal Plan')
plt.ylabel('Total Revenue')
plt.show()

##### 1. Why did you pick the specific chart?


The bar chart chosen here is effective for comparing the average daily rate (ADR) across different meal plans. It provides a clear visual representation of how the ADR varies for each meal plan, making it easy to identify which meal plan generates the highest revenue. The use of different colors for each bar enhances readability and allows for quick comparison.

##### 2. What is/are the insight(s) found from the chart?

From the bar chart, it's evident that the Bed & Breakfast (BB) meal plan generates the highest revenue, followed by Half Board (HB) and Self Catering (SC). Full Board (FB) generates the least revenue among the meal plans. Additionally, there are some bookings categorized as "Undefined," which contribute to revenue but are not associated with a specific meal plan. This insight suggests that offering a variety of meal plans, especially those with higher revenue potential like BB and HB, can positively impact overall revenue generation.

# 3.Can we visualize the revenue contribution from different market segments ?

In [None]:
# Calculate revenue contribution from each market segment
market_segment_revenue = df_copy.groupby('market_segment')['adr'].sum().reset_index()
print(market_segment_revenue)

In [None]:
# Plotting
plt.figure(figsize=(10, 6))
sns.barplot(x='market_segment', y='adr', data=df_copy, palette='Set2')
plt.title('Revenue Contribution by Market Segment')
plt.xlabel('Market Segment')
plt.ylabel('Total Revenue')
plt.xticks(rotation=45)
plt.show()

#### 1. Why did you pick the specific chart?


I choose a bar chart because it effectively displays the revenue contribution from different market segments. The x-axis represents the market segments, allowing for easy comparison, while the y-axis represents the total revenue generated by each segment. The bar heights provide a visual comparison of revenue contributions, making it easy to identify which segments contribute the most to the overall revenue.

#### 2. What is/are the insight(s) found from the chart?

1.Online Travel Agents (TA): Online TA contributes the highest revenue among all market segments, indicating that a significant portion of bookings comes from online travel agencies.

2.Direct Bookings: Direct bookings also make a substantial contribution to revenue, suggesting that guests booking directly with the hotel contribute significantly to overall revenue.

3.Corporate Bookings: Corporate bookings contribute a considerable amount to revenue, indicating that business travelers play a significant role in revenue generation.

4.Groups: Revenue from group bookings is notable, indicating that group events or tours contribute significantly to overall revenue.

5.Offline Travel Agents (TA/TO): Revenue from offline TA/TO is also significant but slightly lower compared to online TA.

6.Other Segments: Aviation and Complementary segments contribute relatively lower revenue compared to other segments, indicating that they might represent niche or specialized markets.

# 4.Can we visualize the revenue contribution from different booking channels?

In [None]:
# Calculate revenue contribution from each distribution channel
distribution_channel_revenue = df_copy.groupby('distribution_channel')['adr'].sum().reset_index()
print(distribution_channel_revenue)

In [None]:
# Plotting
plt.figure(figsize=(10, 6))
sns.barplot(x='distribution_channel', y='adr', data=df_copy, palette='Set2')
plt.title('Revenue Contribution by Distribution Channel')
plt.xlabel('Distribution Channel')
plt.ylabel('Total Revenue')
plt.xticks(rotation=45)
plt.show()

#### 1. Why did you pick the specific chart?

I choose a bar plot because it effectively compares the revenue contribution across different distribution channels. The x-axis represents the distribution channels, allowing for easy comparison, while the y-axis represents the total revenue generated by each channel. The length of the bars directly corresponds to the revenue contribution, making it easy to interpret the data.

#### 2.What is/are the insight(s) found from the chart?

1.TA/TO Dominance: The distribution channel "TA/TO" (Travel Agents/Tour Operators) contributes the most to revenue, with significantly higher revenue compared to other channels. This suggests that a substantial portion of bookings comes from third-party travel agencies or tour operators.

2.Direct Bookings: Direct bookings follow TA/TO as the second-highest contributor to revenue. While their revenue is notably lower than TA/TO, direct bookings still make a significant impact on overall revenue generation.

3.Corporate Contributions: Corporate bookings contribute a considerable amount to revenue, although their contribution is lower compared to TA/TO and direct bookings.

4.GDS and Undefined Channels: The revenue contribution from the GDS (Global Distribution System) channel and the "Undefined" category is relatively minor compared to other channels. This indicates that these channels play a smaller role in revenue generation compared to TA/TO, direct bookings, and corporate bookings.

#  Will the gained insights help creating a positive business impact for Revenue Insights?
Are there any insights that lead to negative growth? Justify with specific reason.

1.City Hotel ADR Superiority: Knowing that the City Hotel has a higher average daily rate (ADR) compared to the Resort Hotel suggests that the City Hotel may be able to generate more revenue per room. This insight could lead to strategies focused on optimizing room rates and maximizing revenue from each guest stay at the City Hotel.

2.Meal Plan Revenue Optimization: Understanding which meal plans generate the highest revenue, such as Bed & Breakfast (BB) and Half Board (HB), allows hotels to tailor their offerings and marketing efforts to promote these higher-revenue meal plans. This could involve highlighting the benefits of these meal plans, offering attractive packages, or adjusting pricing strategies to encourage more bookings for these plans.

3.Market Segment Revenue Focus: Identifying the market segments that contribute the most revenue, such as Online Travel Agents (TA) and direct bookings, enables hotels to prioritize marketing and sales efforts towards these segments. This might involve strengthening partnerships with online travel agencies, enhancing direct booking channels, or implementing targeted marketing campaigns to attract business from these lucrative segments.

4.Distribution Channel Optimization: Recognizing the dominance of certain distribution channels, such as TA/TO and direct bookings, underscores the importance of channel management strategies. Hotels can focus on optimizing distribution channels that yield the highest revenue by negotiating better terms with third-party agencies, improving direct booking channels, or investing in technology to streamline distribution processes.

# **Cancellation Patterns:**

# 1. How do cancellation rates vary across different months and seasons?

In [None]:
# Group data by month and calculate cancellation rate
monthly_cancellations = df_copy.groupby('arrival_date_month')['is_canceled'].mean()
print(monthly_cancellations)


In [None]:
#plotting
plt.figure(figsize=(10, 6))
monthly_cancellations.plot(marker='o')
plt.title('Cancellation Rate by Month')
plt.xlabel('Month')
plt.ylabel('Cancellation Rate')
plt.xticks(rotation=45)
plt.grid(True)
plt.show()

#### 1.Why did you pick the specific chart?

I choose a line plot to visualize the cancellation rate by month because it effectively displays the trend over time. Line plots are ideal for showing changes in a variable (in this case, cancellation rate) across different categories (months) in a sequential manner. The markers on the lines help highlight individual data points, making it easy to identify any significant fluctuations or patterns in cancellation rates throughout the year.

#### 2. What is/are the insight(s) found from the chart?

1.Seasonal Variation: The cancellation rate varies across different months of the year, indicating a seasonal pattern. For example, months like August and July have higher cancellation rates, suggesting increased cancellations during peak travel seasons, which typically coincide with summer vacations in many regions. Conversely, months like November and January have lower cancellation rates, possibly due to fewer travel activities during off-peak seasons or holidays.

2.Potential Trends: While there is variation from month to month, there may also be underlying trends or patterns worth exploring further. For instance, there seems to be a general trend of higher cancellation rates during warmer months (e.g., July, August) and lower cancellation rates during colder months (e.g., November, January). However, this trend may vary depending on factors such as geographic location, holidays, and events.

3.Strategic Insights: Understanding the seasonal variation in cancellation rates can help hotels and travel businesses optimize their operations and marketing strategies. For example, during peak travel seasons with higher cancellation rates, hotels may implement flexible booking policies or overbooking strategies to mitigate revenue loss from cancellations. Conversely, during slower months with lower cancellation rates, they may focus on attracting more bookings through targeted promotions or discounts.

# 2. Can we visualize the reasons for cancellations (e.g., lead time, previous cancellations) using interactive charts?

In [None]:
# Grouping the data by lead time and previous cancellations and calculating the mean cancellation rate
lead_time_prev_cancel_cancellation_rate = df.groupby(['lead_time', 'previous_cancellations'])['is_canceled'].mean().reset_index()

# Display the calculated cancellation rates
print(lead_time_prev_cancel_cancellation_rate)

In [None]:
# Create a scatter plot
fig = px.scatter(df, x='lead_time', y='previous_cancellations', color='is_canceled',
                 hover_data=['lead_time', 'previous_cancellations'],
                 title='Reasons for Cancellations',
                 labels={'lead_time': 'Lead Time', 'previous_cancellations': 'Previous Cancellations'},
                 color_discrete_map={0: 'blue', 1: 'red'})

# Update layout
fig.update_layout(xaxis_title='Lead Time',
                  yaxis_title='Previous Cancellations',
                  legend_title='Cancellation Status')

# Show plot
fig.show()

#### 1. Why did you pick the specific chart?


I choose the scatter plot because it allows us to visualize the relationship between lead time, previous cancellations, and cancellation status (whether the booking was canceled or not) in a single plot. The use of color coding helps distinguish between canceled and non-canceled bookings, while the hover data feature provides additional information when interacting with specific data points. This visualization enables us to identify any patterns or trends in cancellation behavior based on lead time and previous cancellations.






#### 2. What is/are the insight(s) found from the chart?

1.Cancellation Distribution: We observe various points representing different combinations of lead time and previous cancellations. Canceled bookings are denoted by red points, while non-canceled bookings are shown in blue.

2.Cancellation Patterns: There seems to be a correlation between lead time, previous cancellations, and the likelihood of cancellation. Bookings with longer lead times and higher numbers of previous cancellations appear to have a higher probability of being canceled.

3.Clusters: Certain clusters of points indicate distinct patterns. For instance, there are clusters of red points (canceled bookings) with high lead times and/or previous cancellations, suggesting areas where cancellations are more prevalent.

4.Outliers: Some outliers might represent unusual cases where bookings with very high lead times or previous cancellations are either always canceled or never canceled.

# 3.What are the trends in cancellation rates for repeat guests versus new guests?

In [None]:
# Categorize guests as repeat or new based on previous cancellations
df['guest_type'] = 'New'
df.loc[df['is_repeated_guest'] == 1, 'guest_type'] = 'Repeat'

# Group data by arrival date month and guest type, and calculate cancellation rates
cancellation_rates = df.groupby(['arrival_date_month', 'guest_type'])['is_canceled'].mean().reset_index()
print(cancellation_rates)

In [None]:
# plotting
plt.figure(figsize=(12, 6))
sns.lineplot(data=cancellation_rates, x='arrival_date_month', y='is_canceled', hue='guest_type', marker='o')
plt.title('Cancellation Rates for Repeat Guests vs. New Guests Over Time')
plt.xlabel('Arrival Date Month')
plt.ylabel('Cancellation Rate')
plt.xticks(rotation=45)
plt.grid(True)
plt.legend(title='Guest Type')
plt.show()

#### 1. Why did you pick the specific chart?

A line plot allows us to compare the cancellation rates for different guest types (repeat and new) across various months, providing insights into how these rates change over time.Line plots are effective in displaying trends and patterns in data over continuous variables, such as time (months in this case). The lines connecting the data points make it easy to track changes and identify any notable patterns or fluctuations.

#### 2. What is/are the insight(s) found from the chart?

1.Higher Cancellation Rate for New Guests: Across most months, new guests tend to have higher cancellation rates compared to repeat guests. This suggests that guests who are staying for the first time at the hotel are more likely to cancel their bookings, potentially due to uncertainties or changes in plans.

2.Consistent Pattern Across Months: The pattern of higher cancellation rates for new guests holds relatively consistent across different months throughout the year. This indicates that the influence of guest type on cancellation behavior remains relatively stable over time.

3.Variation in Magnitude: While the general trend of higher cancellation rates for new guests is consistent, there is some variation in the magnitude of cancellation rates between different months. For example, in June and April, the difference in cancellation rates between new and repeat guests is particularly pronounced.

4.Repeat Guests Show Lower Cancellation Rates: Repeat guests consistently exhibit lower cancellation rates compared to new guests across all months. This suggests that guests who have previously stayed at the hotel are more likely to follow through with their bookings, indicating a higher level of commitment or satisfaction with their past experiences.

# 4.How many bookings were cancelled?

In [None]:
# Calculate total number of canceled bookings
total_cancelled_bookings = df['is_canceled'].sum()
print("Total number of canceled bookings:", total_cancelled_bookings)

In [None]:
# Count the number of canceled and non-canceled bookings
cancelled_counts = df['is_canceled'].value_counts()

print(cancelled_counts)

In [None]:
# Plotting
plt.figure(figsize=(6, 6))
cancelled_counts.plot(kind='bar', color=['green', 'red'])
plt.title('Cancelled Bookings')
plt.xlabel('Cancellation Status')
plt.ylabel('Number of Bookings')
plt.xticks([0, 1], ['Not Canceled', 'Canceled'], rotation=0)
plt.show()

#### 1. Why did you pick the specific chart?

I choose a bar chart because it effectively displays the comparison between canceled and non-canceled bookings using distinct bars for each category. The use of contrasting colors (green for non-canceled and red for canceled) makes it easy to differentiate between the two categories. The chart's simplicity and clarity make it suitable for quickly understanding the distribution of canceled bookings in the dataset.

#### 2. What is/are the insight(s) found from the chart?


The insight from the chart indicates that there were 44,224 bookings that were canceled and 75,166 bookings that were not canceled. This suggests that cancellation is a common occurrence in the dataset, affecting a significant portion of the bookings.

# 5.What are cancellations by various market segments?

In [None]:
# Group data by market segment and cancellation status, then count the number of bookings
cancellation_by_market_segment = df.groupby(['market_segment', 'is_canceled']).size().unstack()

print(cancellation_by_market_segment)

In [None]:
# Plotting
cancellation_by_market_segment.plot(kind='bar', stacked=True, figsize=(10, 6))
plt.title('Cancellations by Market Segment')
plt.xlabel('Market Segment')
plt.ylabel('Number of Bookings')
plt.xticks(rotation=45)
plt.legend(title='Cancellation Status', labels=['Not Canceled', 'Canceled'])
plt.show()

#### 1. Why did you pick the specific chart?

I choose a stacked bar chart because it effectively visualizes the distribution of cancellations across different market segments while also allowing for a comparison between canceled and not canceled bookings within each segment. The stacked bars make it easy to see the total number of bookings in each segment as well as the proportion of cancellations within those segments. This visualization helps in understanding which market segments have higher cancellation rates and provides insights into potential areas for improvement in managing cancellations.

#### 2. What is/are the insight(s) found from the chart?

1.The "Groups" market segment has the highest number of cancellations, with 12,097 bookings canceled out of 19,811 total bookings.

2.The "Online TA" segment also has a significant number of cancellations, with 20,739 bookings canceled out of 56,477 total bookings.

3.The "Offline TA/TO" segment has a high number of cancellations as well, with 8,311 bookings canceled out of 24,219 total bookings.

4.Other segments such as "Corporate" and "Direct" also show notable cancellation numbers, with 992 and 1,934 bookings canceled, respectively.

5.The "Undefined" segment has a negligible number of cancellations, with only 2 bookings canceled out of an unspecified total.

# 6.What are cancellations by various distribution channels?

In [None]:
# Group data by distribution channel and cancellation status, and count the number of bookings
cancellations_by_channel = df.groupby(['distribution_channel', 'is_canceled']).size().unstack(fill_value=0)

# Reset index to make the DataFrame suitable for visualization
cancellations_by_channel.reset_index(inplace=True)

# Rename columns for clarity
cancellations_by_channel.columns.name = None  # Remove column index name
cancellations_by_channel.columns = ['Distribution Channel', 'Not Canceled', 'Canceled']

# Display the DataFrame
print(cancellations_by_channel)

In [None]:
# Plotting
plt.figure(figsize=(10, 6))
cancellations_by_channel.plot(kind='bar', x='Distribution Channel', stacked=True,
                              color=['green', 'red'], ax=plt.gca())
plt.title('Cancellations by Distribution Channel')
plt.xlabel('Distribution Channel')
plt.ylabel('Number of Bookings')
plt.xticks(rotation=45, ha='right')
plt.legend(title='Cancellation Status', labels=['Not Canceled', 'Canceled'])
plt.tight_layout()
plt.show()

#### 1. Why did you pick the specific chart?

I choose a stacked bar chart because it effectively illustrates the cancellation status (canceled vs. not canceled) for each distribution channel. By stacking the bars, it's easy to compare the total number of bookings along with the proportion of canceled bookings within each channel. This visualization provides a clear understanding of cancellation patterns across different distribution channels.

#### 2. What is/are the insight(s) found from the chart?

1.TA/TO (Travel Agents/Tour Operators): This channel has the highest number of both canceled and not canceled bookings, with a significant number of cancellations compared to other channels.

2.Direct Bookings: While direct bookings have fewer cancellations compared to TA/TO bookings, they still have a notable number of cancellations.

3.Corporate Bookings: Corporate bookings show a relatively lower number of cancellations compared to TA/TO and direct bookings.

4.GDS (Global Distribution System): The GDS channel has a minimal number of cancellations compared to other channels, indicating a lower cancellation rate.

5.Undefined Channel: The undefined channel has a negligible number of bookings, with a small proportion being canceled.

# 7.What is the effects of deposit on cancellations?



In [None]:
# Calculate cancellation rate for each deposit type
deposit_cancellation_rate = df.groupby('deposit_type')['is_canceled'].mean()

# Display the cancellation rates
print(deposit_cancellation_rate)

In [None]:
# Plotting
plt.figure(figsize=(8, 6))
deposit_cancellation_rate.plot(kind='bar', color='darkgreen')
plt.title('Cancellation Rate by Deposit Type')
plt.xlabel('Deposit Type')
plt.ylabel('Cancellation Rate')
plt.xticks(rotation=45)
plt.grid(axis='y', linestyle='--', alpha=0.7)
plt.tight_layout()
plt.show()

#### 1. Why did you pick the specific chart?

I choose a bar chart to visualize the cancellation rates for different deposit types because it effectively presents the comparison between multiple categories (deposit types) and their associated cancellation rates. Bar charts are commonly used for such comparisons, making it easy to interpret the relative differences in cancellation rates across deposit types. Additionally, the use of color helps distinguish between the categories, making the chart visually appealing and informative.

#### 2. What is/are the insight(s) found from the chart?

1.No Deposit: Bookings made without any deposit (No Deposit) have a cancellation rate of approximately 28.4%. This suggests that a significant portion of bookings without a deposit end up being canceled.

2.Non Refundable: Bookings with a non-refundable deposit have a very high cancellation rate, close to 99.4%. This indicates that once guests make a non-refundable deposit, they are highly unlikely to cancel their bookings.

3.Refundable: Bookings with a refundable deposit have a cancellation rate of around 22.2%. While this cancellation rate is lower compared to non-refundable bookings, it still indicates that a portion of guests with refundable deposits end up canceling their bookings.

# 8.Which hotel has higher booking cancellation rate?

In [None]:
# Calculate total bookings and cancellations for each hotel
total_bookings = df.groupby('hotel')['is_canceled'].count()
total_cancellations = df.groupby('hotel')['is_canceled'].sum()

# Calculate cancellation rates
cancellation_rates = (total_cancellations / total_bookings) * 100
print(cancellation_rates)

In [None]:
# Plotting
plt.figure(figsize=(8, 6))
cancellation_rates.plot(kind='bar', color=['blue', 'orange'])
plt.title('Booking Cancellation Rates by Hotel')
plt.xlabel('Hotel')
plt.ylabel('Cancellation Rate (%)')
plt.xticks(rotation=0)
plt.grid(axis='y', linestyle='--', alpha=0.7)
plt.tight_layout()
plt.show()

#### 1. Why did you pick the specific chart?

I choose a bar chart because it effectively compares the cancellation rates between the two hotels. The bars provide a clear visual representation of the cancellation rates for each hotel, allowing for easy comparison. The use of colors also enhances the readability of the chart, making it more engaging for viewers. Additionally, the chart includes axis labels and a title to provide context and clarity. Overall, the bar chart is a suitable choice for visualizing and comparing cancellation rates between different categories, such as hotel types.

#### 2. What is/are the insight(s) found from the chart?

The insight from the chart is that the City Hotel has a higher booking cancellation rate compared to the Resort Hotel. Approximately 41.73% of bookings at the City Hotel are cancelled, while the cancellation rate at the Resort Hotel is lower, at around 27.76%. This indicates that guests are more likely to cancel their bookings at the City Hotel compared to the Resort Hotel.

# Will the gained insights help creating a positive business impact for Cancellation Patterns?

1.Seasonal Adaptation: Understanding the seasonal variation in cancellation rates enables hotels to adapt their strategies accordingly. During peak travel seasons with higher cancellation rates, hotels can implement flexible booking policies or overbooking strategies to mitigate revenue loss from cancellations. Conversely, during slower months with lower cancellation rates, hotels may focus on attracting more bookings through targeted promotions or discounts.

2.Operational Optimization: By identifying cancellation trends, hotels can optimize their operations to better manage cancellations. For example, they can adjust staffing levels, inventory management, and resource allocation based on anticipated cancellation patterns, ensuring efficient utilization of resources.

3.Customer Experience Enhancement: Insights into cancellation patterns can help hotels improve the overall customer experience. By analyzing the reasons for cancellations and addressing common pain points, such as rigid cancellation policies or lack of flexibility, hotels can enhance customer satisfaction and loyalty.

4.Revenue Maximization: By strategically managing cancellations, hotels can maximize revenue generation. For example, implementing dynamic pricing strategies or offering incentives for non-refundable bookings can help minimize revenue loss from cancellations and optimize revenue potential.

# **Operational Efficiency:**

# 1.How does the availability of parking spaces correlate with booking volumes and revenue?

In [None]:
# Extracting relevant columns
parking_data = df_copy[['required_car_parking_spaces', 'booking_changes', 'adr']]

# Calculating correlation coefficients
correlation_matrix = parking_data.corr()
print(correlation_matrix)

In [None]:
corr_df= correlation_matrix.corr()
plt.figure(figsize=(10,5))
sns.heatmap(corr_df, vmin=-1,annot=True,cmap='coolwarm')

#### 1. Why did you pick the specific chart?

I choose a correlation heatmap to visualize the relationships between the variables 'required_car_parking_spaces', 'booking_changes', and 'adr'. This heatmap provides a clear and concise overview of the correlation coefficients between these variables, allowing for easy interpretation of their associations.

#### 2. What is/are the insight(s) found from the chart?

1.Required Car Parking Spaces vs. Booking Changes: There is a very weak positive correlation (0.050659) between the number of required car parking spaces and the number of booking changes. This suggests that there is a slight tendency for bookings with more required parking spaces to have more changes, but the relationship is not strong.

2.Required Car Parking Spaces vs. Average Daily Rate (ADR): There is a very weak positive correlation (0.039013) between the number of required car parking spaces and the ADR. This indicates that there is a slight tendency for bookings with more required parking spaces to have a slightly higher ADR, but again, the relationship is not significant.

3.Booking Changes vs. ADR: There is a very weak positive correlation (0.010186) between the number of booking changes and the ADR. This suggests that there is a slight tendency for bookings with more changes to have a slightly higher ADR, but the relationship is minimal.

# Will the gained insights help creating a positive business impact for Operational Efficiency?

1.Optimizing Parking Space Allocation: Understanding the weak correlation between required car parking spaces and booking changes can help hotel management optimize parking space allocation. While there is a slight tendency for bookings with more required parking spaces to have more changes, this relationship is not strong enough to drive significant operational changes. However, it may still be useful for management to consider when allocating parking spaces to ensure efficient use of available resources.

2.Revenue Management: The weak positive correlation between required car parking spaces and ADR suggests that there may be a slight impact on revenue generation. While bookings with more required parking spaces tend to have a slightly higher ADR, the relationship is not significant enough to drive substantial revenue optimization strategies. Revenue management efforts should focus on more impactful factors affecting ADR, such as room type, seasonality, and pricing strategies.

3.Monitoring Booking Changes: The weak positive correlation between booking changes and ADR indicates that there may be some relationship between the two variables. Monitoring booking changes can provide insights into guest behavior and preferences, which can inform operational decisions such as staffing levels, inventory management, and service offerings. However, the impact on ADR may be minimal, and other factors may have a more significant influence on revenue optimization.

# Correlation Heatmap

In [None]:
# Calculate the correlation matrix
corr_df_data = df_copy[['lead_time', 'previous_cancellations', 'previous_bookings_not_canceled', 'booking_changes', 'days_in_waiting_list','adr',
          'required_car_parking_spaces','total_of_special_requests','total_night_stayed']]
print(corr_df_data)


In [None]:
corr_df= corr_df_data.corr()
plt.figure(figsize=(10,5))
sns.heatmap(corr_df, vmin=-1,annot=True,cmap='coolwarm')

##### 1. Why did you pick the specific chart?

The heatmap you've choosen is a great way to visualize the correlation between different numerical features in your dataset. Heatmaps provide a clear and concise overview of the pairwise correlations between variables, making it easy to identify patterns and relationships.

##### 2. What is/are the insight(s) found from the chart?

1.Lead Time and Previous Cancellations: There is a positive correlation between lead time and previous cancellations. This suggests that as the lead time increases (i.e., the time between booking and arrival), the likelihood of previous cancellations also increases. It implies that guests who have previously canceled their bookings tend to book further in advance.

2.Booking Changes and Previous Cancellations: There is a weak positive correlation between booking changes and previous cancellations. This indicates that guests who have previously canceled their bookings may be more likely to make changes to their bookings before arrival.

3.Required Car Parking Spaces and Total Special Requests: There is a weak positive correlation between the number of required car parking spaces and the total number of special requests. This suggests that guests who require more car parking spaces may also have more special requests, such as specific room preferences or amenities.

4.Total Night Stayed and ADR (Average Daily Rate): There is no significant correlation between the total number of nights stayed and the average daily rate (ADR). This implies that the length of stay does not strongly influence the ADR, indicating that guests may not receive discounted rates for longer stays.

# **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ?
Explain Briefly.

**Improve Revenue Optimization:**

1.Dynamic Pricing Strategies: Implement dynamic pricing strategies based on demand fluctuations, seasonal patterns, and booking lead times to optimize revenue generation.

2.Upselling and Cross-selling: Offer personalized upselling and cross-selling options during the booking process to increase average order value and maximize revenue per guest.

3.Promotional Packages: Introduce attractive promotional packages or bundled offers targeting specific market segments to stimulate demand and increase revenue.

4.Enhance Ancillary Revenue: Explore opportunities to enhance ancillary revenue streams, such as offering premium services, experiences, or add-on amenities to guests.

**Enhance Guest Satisfaction:**

1.Personalized Guest Experience: Implement personalized guest experience initiatives by leveraging guest data and preferences to provide tailored services and recommendations.

2.Streamlined Booking Process: Simplify and streamline the booking process to enhance convenience for guests, offering flexible booking options and transparent policies.

3.High-Quality Service Standards: Maintain high-quality service standards across all touchpoints, including pre-arrival communication, on-site services, and post-stay follow-up, to ensure guest satisfaction and loyalty.

4.Feedback Mechanisms: Establish effective feedback mechanisms to capture guest feedback and sentiments, enabling continuous improvement based on guest insights and preferences.

**Maximize Bookings:**

1.Optimized Distribution Channels: Optimize distribution channels by leveraging data analytics to identify high-performing channels and allocate resources effectively to maximize bookings and reach target markets.

2.Strategic Partnerships: Form strategic partnerships with online travel agencies, tour operators, and other relevant partners to expand reach and attract a diverse range of guests.

3.Targeted Marketing Campaigns: Develop targeted marketing campaigns tailored to different market segments, demographics, and booking behaviors to drive demand and increase bookings.

4.Incentives and Loyalty Programs: Offer incentives, discounts, and loyalty programs to incentivize repeat bookings, referrals, and extended stays, thereby maximizing occupancy and revenue.

**Minimize Cancellations:**

1.Flexible Booking Policies: Implement flexible booking policies, including free cancellation options and transparent terms, to reduce barriers to booking and minimize cancellations.

2.Proactive Communication: Proactively communicate with guests through personalized emails or messages, providing relevant information, updates, and incentives to encourage commitment and reduce cancellations.

3.Overbooking Strategies: Implement overbooking strategies based on historical data and demand forecasts to optimize occupancy rates while mitigating the impact of cancellations and no-shows.

4.Cancellation Analysis: Continuously analyze cancellation patterns and reasons to identify trends, anticipate potential cancellations, and implement proactive measures to address underlying issues.

# **Conclusion**

1.  The analysis highlights seasonal variations in booking volumes for both city hotels and resort hotels. Peak seasons, notably during the summer months, drive increased demand for accommodations. City hotels consistently maintain higher booking volumes throughout the year, indicating potential attractiveness for both business and leisure travelers. Conversely, off-peak months present opportunities for targeted marketing efforts to stimulate demand. Understanding these patterns enables hotels to optimize strategies and resources to meet guest expectations and maximize revenue across different seasons.

2. Both city hotels and resort hotels exhibit consistent distributions of bookings across different days of the month, reflecting stable patterns without notable outliers. While fluctuations occur, there is no discernible trend or anomaly, suggesting regular booking activity throughout the month for both hotel types. This consistency provides valuable insights for operational planning and resource allocation, enabling hotels to effectively manage capacity and meet guest demands across various days of the month.

3. The analysis reveals a consistent upward trend in average booking lead time for both city and resort hotels from 2015 to 2017. Resort hotels consistently exhibit higher lead times compared to city hotels throughout these years. Despite the increasing trend, the growth rate remains relatively steady, indicating a gradual shift towards longer advance bookings without significant fluctuations. This insight can inform strategic decisions related to pricing, marketing campaigns, and resource allocation to better accommodate guest booking behaviors and optimize revenue.

4. The analysis highlights that city hotels exhibit a higher total number of bookings compared to resort hotels in the dataset. City hotels, with 53,274 bookings, outpace resort hotels, which have 33,956 bookings. This disparity suggests distinct market segments for each hotel type, with city hotels potentially attracting business travelers and urban tourists, while resort hotels cater to vacationers seeking leisure and relaxation. Understanding these differences can inform targeted marketing strategies and service offerings to better meet the needs of each segment and optimize revenue.

5. The analysis of guest origins underscores the hotel's global reach and diverse clientele. With Portugal, the United Kingdom, France, Spain, and Germany emerging as top guest-originating countries, the hotel has a strong international presence. This insight enables the hotel to tailor its services and marketing strategies to meet the unique needs and preferences of guests from different regions, enhancing guest satisfaction and fostering a welcoming environment for a global audience.

6. The analysis highlights the dominance of transient guests across both city and resort hotels, indicating their significance in the hospitality industry. While city hotels attract a slightly higher number of group guests, resort hotels accommodate a larger proportion of contract guests, potentially due to their settings and amenities. Furthermore, the distribution of transient-party guests underscores the distinct preferences and booking behaviors observed between city and resort accommodations, guiding tailored marketing strategies and service offerings to meet diverse guest needs.

7. The analysis reveals that short-term stays, spanning 1 to 3 nights, are prevalent across both city and resort hotels, reflecting common guest booking preferences. City hotels attract a higher proportion of shorter stays, while resort hotels accommodate a diverse range of durations, with a notable peak at 7 nights, indicative of weeklong vacation stays. This highlights the distinct booking patterns between urban and leisure-oriented accommodations, guiding strategic decisions to optimize room inventory and tailor guest experiences for different stay durations.

8. The analysis underscores the dominance of Travel Agents/Tour Operators (TA/TO) as the primary distribution channel for bookings in both city and resort hotels, highlighting the significant role of third-party intermediaries in the hospitality industry. Additionally, direct bookings and corporate bookings contribute substantially to the distribution channels, indicating diverse booking sources for both types of hotels. However, the presence of a small number of bookings categorized as "Undefined" warrants further investigation to ensure data accuracy and understand any underlying factors influencing this category.

9. The analysis reveals that in both city and resort hotels, the majority of bookings with "No Deposit" originate from the Online TA market segment, indicating a strong preference for this channel among guests who do not require deposits. Direct bookings also contribute significantly, while Non Refund and Refundable deposit types show minimal representation across all market segments, underscoring a consistent trend in deposit preferences across different booking channels.

10. The analysis highlights variations in average lead times across different distribution channels for both city and resort hotels. Direct bookings generally exhibit the shortest lead times, indicating a preference for last-minute reservations or bookings made directly by guests. Corporate bookings follow with slightly longer lead times, suggesting more planned or organized travel arrangements. TA/TO bookings, on the other hand, consistently show the longest lead times, implying a longer booking process often associated with third-party intermediaries.

11. The analysis reveals that "Bed & Breakfast" is the preferred meal type for guests in both city and resort hotels, indicating a preference for meal-inclusive options. While "Half Board" is also popular, it is less favored compared to "Bed & Breakfast." Interestingly, the "Room Only" option has fewer bookings in resort hotels compared to city hotels, suggesting that guests staying at resort accommodations may prioritize meal-inclusive packages. The presence of "Undefined" categories warrants further investigation to ensure accurate categorization and understand potential implications for guest preferences.

12. The analysis indicates that the City Hotel boasts a higher average daily rate (ADR) compared to the Resort Hotel on average. This finding suggests that the City Hotel has the potential to generate more revenue per room in comparison to the Resort Hotel. Such insights into revenue differentials between the two types of accommodations can inform strategic decisions regarding pricing strategies, marketing efforts, and resource allocation to optimize revenue generation for both hotels.

13. The analysis of meal plan revenue reveals that Bed & Breakfast (BB) generates the highest revenue, followed by Half Board (HB) and Self Catering (SC), while Full Board (FB) generates the least revenue. Additionally, bookings categorized as "Undefined" also contribute to revenue. This insight underscores the importance of offering a diverse range of meal plans, particularly those with higher revenue potential like BB and HB, to maximize overall revenue generation. Such findings can guide strategic decisions regarding meal plan offerings and marketing strategies aimed at enhancing revenue optimization for the hotel.

14. The analysis highlights that revenue from Online Travel Agents (TA) is the highest among all market segments, followed closely by Direct Bookings and Corporate Bookings. Group bookings also make a notable contribution to revenue, indicating the significance of group events or tours. While Offline Travel Agents (TA/TO) contribute significantly, revenue from Aviation and Complementary segments is comparatively lower. These insights emphasize the importance of online channels and direct bookings in revenue generation, alongside the significant contribution of corporate and group bookings.

15. The analysis underscores the dominance of the TA/TO distribution channel in revenue generation, indicating a strong reliance on third-party travel agencies and tour operators. Direct bookings follow closely behind, making a significant contribution to overall revenue. While corporate bookings also contribute notably, channels like GDS and Undefined have comparatively minor roles in revenue generation. These insights emphasize the importance of strategic partnerships with travel agents and the direct booking channel in maximizing revenue for the hotel.

16. The analysis reveals a clear seasonal pattern in cancellation rates, with higher rates during peak travel seasons and lower rates during off-peak periods. This insight underscores the importance of strategic planning to adapt to fluctuating demand throughout the year. By implementing flexible booking policies and targeted promotions, hotels can effectively optimize revenue and mitigate the impact of cancellations, ultimately enhancing overall operational efficiency and guest satisfaction.

17. The analysis of cancellation patterns highlights the importance of lead time and previous cancellations in predicting the likelihood of future cancellations. Clusters of data points indicate specific scenarios where cancellations are more prevalent, potentially informing targeted strategies to mitigate cancellations. Identifying outliers can also provide valuable insights into extreme cases that may require special attention or intervention to minimize their impact on overall cancellation rates.

18. The analysis reveals a consistent trend of higher cancellation rates among new guests compared to repeat guests across various months. This underscores the importance of guest segmentation and targeted strategies to address the unique needs and concerns of new guests, potentially through personalized communication or incentive programs to enhance booking commitment. Additionally, the lower cancellation rates among repeat guests highlight the significance of guest loyalty and satisfaction in fostering booking reliability and revenue stability for the hotel.

19. The analysis underscores the prevalence of booking cancellations within the dataset, with 44,224 bookings canceled compared to 75,166 bookings that were not canceled. This highlights the substantial impact of cancellations on the hotel's operations and revenue management. Understanding the factors contributing to cancellations and implementing strategies to mitigate their occurrence, such as flexible booking policies or targeted marketing efforts, could be crucial for optimizing revenue and enhancing overall operational efficiency.

20. The analysis reveals varying cancellation patterns across different market segments. While the "Groups" segment exhibits the highest cancellation rate, other segments such as "Online TA" and "Offline TA/TO" also experience significant cancellations. Understanding these patterns can aid in developing targeted strategies to reduce cancellations and improve overall booking stability, thereby enhancing operational efficiency and revenue optimization.

21. The analysis highlights TA/TO as the dominant channel with both the highest number of bookings and cancellations, suggesting potential areas for targeted intervention to mitigate cancellations. Direct bookings, while fewer in number, still contribute to cancellations, emphasizing the importance of strategies to enhance booking stability across all channels. Understanding these patterns can inform tailored approaches to minimize cancellations and optimize revenue generation effectively.

22. The analysis underscores the influence of deposit policies on cancellation rates, with non-refundable deposits exhibiting the highest retention of bookings. Conversely, bookings made without any deposit or with refundable deposits show comparatively higher cancellation rates. This suggests the importance of deposit policies in managing booking stability and optimizing revenue while considering guest preferences for flexibility.

23. The comparison between City and Resort Hotels reveals a notable disparity in booking cancellation rates, with the City Hotel experiencing a significantly higher cancellation rate of approximately 41.73%, compared to the Resort Hotel's lower rate of around 27.76%. This insight underscores the importance of understanding factors contributing to cancellation behaviors and implementing strategies tailored to each hotel type to mitigate revenue loss and enhance guest satisfaction.

24. The analysis reveals subtle correlations among required car parking spaces, booking changes, and average daily rates (ADR). While there's a slight tendency for bookings with more parking spaces to experience more changes and possibly higher ADR, these relationships are weak. These insights suggest that factors beyond parking requirements primarily influence booking changes and ADR, underscoring the need for a comprehensive approach to revenue optimization and guest satisfaction.

25. The analysis highlights correlations between various factors impacting guest behavior and hotel operations. While lead time and previous cancellations exhibit a positive relationship, indicating potential booking patterns, weak correlations between booking changes, parking needs, and special requests suggest nuanced guest preferences. Additionally, the lack of correlation between total nights stayed and ADR underscores the complexity of pricing strategies and guest dynamics in revenue optimization efforts.

### ***Hurrah! You have successfully completed your EDA Capstone Project !!!***