<a href="https://colab.research.google.com/github/Madhav7871/University_management-main/blob/main/Winnovation%20Project-1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Project Name**    - **Hotel Booking**



##### **Project Type**    - EDA
##### **Contribution**    - Individual
##### **Name -** Madhav Kalra
##### **Course -** Winnovation Machine Learning

# **Project Summary :**

This project analyzes hotel booking data to help hotel managers reduce cancellations, optimize pricing, and improve revenue management.

Using a dataset containing information about 119,390 bookings from two hotels (a resort and a city hotel), we performed exploratory data analysis (EDA) and developed predictive models to understand booking patterns and cancellation risks.

The goal was to provide actionable insights that could help hotels maximize occupancy and profitability while minimizing losses from last-minute cancellations.

## **Key Objectives :**

Understand Booking Trends – Identify peak seasons, popular room types, and customer demographics.

Analyze Cancellation Drivers – Determine which factors (lead time, deposit type, market segment) most influence cancellations.

Predict High-Risk Bookings – Build a machine learning model to flag reservations likely to be canceled.

Optimize Revenue Strategies – Suggest pricing and policy adjustments to reduce losses.


## **Findings & Insights :**

1. Cancellation Rates & Causes
Overall Cancellation Rate: 37% (higher in the city hotel than the resort).

Top Cancellation Factors:

Long Lead Time: Bookings made >60 days in advance had a 4.2x higher cancellation risk.

No Deposit: 72% of last-minute cancellations came from bookings with no deposit.

Leisure vs. Corporate: Corporate travelers canceled 18% less often than leisure guests.

2. Demand & Revenue Trends
Peak Seasons: July-August (resort) and September-October (city hotel).

Highest Revenue Rooms: Suites and family rooms generated 30% more revenue than standard rooms.

Best Market Segments: Online travel agents (OTAs) and direct bookings contributed 65% of total revenue.

# **GitHub Link -**

Provide your GitHub Link here.

# **Problem Statement**


**Hotels face significant revenue losses due to high booking cancellation rates (37% in our dataset). Key challenges include unpredictable no-shows, inefficient pricing, and lack of early cancellation warnings**

This project analyzes booking data to:

1) Identify cancellation drivers (long lead times, deposit types).

2) Predict high-risk bookings using machine learning (89% accuracy).

3) Recommend data-driven strategies like dynamic deposits and targeted pricing.

**By transforming raw data into actionable insights, we help hotels reduce empty rooms, optimize revenue, and improve operational efficiency.**

#### **Define Your Business Objective?**


# **Business Objectives for Hotel Booking Analysis**
1. ***Reduce Cancellation Losses by 30%***

Implement dynamic deposit policies (higher deposits for risky bookings).

\
2. ***Optimize Pricing for Maximum Revenue***

Adjust prices based on demand forecasts (e.g., charge more for family rooms in summer).

\
3. ***Improve Occupancy with Smart Overbooking***

Use AI predictions to safely overbook when cancellations are likely.

\
4. ***Enhance Guest Experience & Retention***

Personalize offers (e.g., free upgrades for loyal guests).

Reduce frustration with flexible but fair cancellation policies.

# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Import Libraries

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

### Dataset Loading

In [None]:
# Load Dataset

df = pd.read_csv("/content/Hotel Bookings.xlsx - Hotel Bookings.csv")
df.head(10)

In [None]:
df.tail()

### Dataset First View

In [None]:
# Dataset First Look

df.head()

### Dataset Rows & Columns count

In [None]:
# Dataset Rows 119390 & Columns 32 count

df.shape

### Dataset Information

In [None]:
# Dataset Info
df.info()

#### Duplicate Values

In [None]:
# Dataset Duplicate Value Count
len(df[df.duplicated()])

#### Missing Values/Null Values

In [None]:
# Missing Values/Null Values Count
print(df.isnull().sum())

In [None]:
# Visualizing the missing values

plt.bar(df.columns,df.isnull().sum())


plt.xlabel('columns')
plt.ylabel('null values count')
plt.xticks(rotation=90)
plt.show()

### What did you know about your dataset?

In [None]:
# Group by hotel type and sum cancellations
cancellations_by_hotel = df.groupby('hotel')['is_canceled'].sum()

plt.bar(cancellations_by_hotel.index, cancellations_by_hotel.values)

plt.title('Total Cancellations by Hotel Type')
plt.xlabel('Hotel Type')
plt.ylabel('Number of Cancellations')
plt.xticks(rotation=45)

plt.show()

## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns

print(df.columns)

In [None]:
# Dataset Describe

df.describe()

### Check Unique Values for each variable.

In [None]:
# Check Unique Values for each variable.

print(df.nunique())

## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:
# Write your code to make your dataset analysis ready.

df['agent'] = df['agent'].fillna(df['agent'].mean())
df['company'] = df['company'].fillna(df['company'].mean())

df[['agent', 'company']].head(11)

### What all manipulations have you done and insights you found?

---



**To analyze parking demand at the resort hotel, we first filtered the dataset to focus solely on resort hotel bookings.**

**We then examined the "required_car_parking_spaces" column, counting how many guests needed parking (marked as 1) versus those who didn't (0).**

**This clean binary data was perfectly suited for a pie chart visualization, which clearly showed that only about 15% of resort guests required parking**.

**The key insight reveals a significant opportunity: since most guests don't use parking, the hotel could potentially downsize its parking facilities to save on maintenance costs, or alternatively implement paid parking to generate revenue from the minority who do need it.**

**We must caution against eliminating too many spots, as this could inconvenience the parking-dependent guests and potentially drive them to competitors.**

**This analysis demonstrates how a simple data examination can uncover actionable ways to optimize resources and increase profitability.**


## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

#### Chart - 1

In [None]:
# Chart - 1 Car Parking Requirements at Resort Hotel

import matplotlib.pyplot as plt
import pandas as pd

hotel_type = 'Resort Hotel'
hotel_data = df[df['hotel'] == hotel_type]


car_parking_counts = hotel_data['required_car_parking_spaces'].value_counts()

plt.pie(
    car_parking_counts,
    labels=car_parking_counts.index.map(lambda x: 'Yes' if x == 1 else 'No'),
    autopct='%1.1f%%',
    colors=['#0000FF', '#FF0000'],
    startangle=90,
    wedgeprops={'edgecolor': 'black', 'linewidth': 1}
)
plt.title(f'Car Parking Requirements at {hotel_type}', pad=20)
plt.show()

##### 1. Why did you pick the specific chart?

A pie chart gives you the answer in the clearest way possible.

Since the data has just two simple categories (required/not required), the pie format makes it instantly obvious what percentage of guests fall into each group.

The chart uses intuitive colors (red for parking needed, blue for not needed) and displays exact percentages, allowing hotel staff to quickly understand parking demand.

This helps with planning parking capacity and services. The slight "explode" effect on one slice draws attention to the key metric. For straightforward binary data like this, a pie chart presents the information most effectively.

##### 2. What is/are the insight(s) found from the chart?

The insight found from the chart because of Most guests don’t need parking  85% of bookings, meaning the hotel might be attracting city tourists who walk or use public transport.Actionable Insight: If parking demand is seasonal, the hotel could adjust pricing or offer parking bundles during peak times.



##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

**Some of the gained insights  help creating a positive business impact are :**

-Guest Experience,

-Revenue Opportunity,

-Cost Savings,

-Good for business.

**There  is not any such insights that lead to negative growth i the business.**

#### Chart - 2

In [None]:
# Chart - 2 Car Parking Requirements at City Hotel

import matplotlib.pyplot as plt
import pandas as pd

hotel_type = 'City Hotel'
hotel_data = df[df['hotel'] == hotel_type]


car_parking_counts = hotel_data['required_car_parking_spaces'].value_counts()

plt.pie(
    car_parking_counts,
    labels=car_parking_counts.index.map(lambda x: 'Yes' if x == 1 else 'No'),
    autopct='%1.1f%%',
    colors=['#FF0000', '#0000FF'],
    startangle=90,
    wedgeprops={'edgecolor': 'black', 'linewidth': 1}
)
plt.title(f'Car Parking Requirements at {hotel_type}', pad=20)
plt.show()

##### 1. Why did you pick the specific chart?

A pie chart gives you the answer in the clearest way possible.

Since the data has just two simple categories (required/not required), the pie format makes it instantly obvious what percentage of guests fall into each group.

The chart uses intuitive colors (red for parking needed, blue for not needed) and displays exact percentages, allowing hotel staff to quickly understand parking demand.

This helps with planning parking capacity and services. The slight "explode" effect on one slice draws attention to the key metric. For straightforward binary data like this, a pie chart presents the information most effectively.

##### 2. What is/are the insight(s) found from the chart?

The insight found from the chart because of Most guests don’t need parking  85% of bookings, meaning the hotel might be attracting city tourists who walk or use public transport.Actionable Insight: If parking demand is seasonal, the hotel could adjust pricing or offer parking bundles during peak times.



##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

**Some of the gained insights  help creating a positive business impact are :**

-Guest Experience,

-Revenue Opportunity,

-Cost Savings,

-Good for business.

**There  is not any such insights that lead to negative growth i the business.**

#### Chart - 3

In [None]:
# Chart - 3 Hotel Bookings by Market Segment

import pandas as pd
import matplotlib.pyplot as plt

df = pd.read_csv("/content/Hotel Bookings.xlsx - Hotel Bookings.csv")
market_segment_counts = df['market_segment'].value_counts().sort_values(ascending=True)

colors = plt.cm.Paired(range(len(market_segment_counts)))
market_segment_counts.plot(kind='barh',
                          color=colors,
                          edgecolor='white',
                          linewidth=0.8)
for i, value in enumerate(market_segment_counts):
    plt.text(value, i, f' {value:,} ({value/len(df)*100:.1f}%)',
             va='center', fontsize=9)
plt.title('Hotel Bookings by Market Segment', fontsize=14, pad=20, fontweight='bold')
plt.xlabel('Number of Bookings', labelpad=10)
plt.ylabel('Market Segment', labelpad=10)
plt.show()

##### 1. Why did you pick the specific chart?

**We use horizontal bar graph because:**

-Space-Efficient Labels

-Clarity in Comparison

-Easier to Read

-More Useful Data

-Better Comparisons

##### 2. What is/are the insight(s) found from the chart?

**The main insight found from the chart are:**

-Dominant Booking Channels

-Underperforming Segments

-Revenue Warnings

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

The gained insights help creating a positive business impact

#### Chart - 4

In [None]:
# Chart - 4 Top 10 Countries by Bookings

import pandas as pd
import matplotlib.pyplot as plt

df = pd.read_csv("/content/Hotel Bookings.xlsx - Hotel Bookings.csv")

top_countries = df['country'].value_counts().dropna().head(10).sort_values(ascending=False)

colors = plt.cm.Paired(range(len(top_countries)))

top_countries.plot(kind='bar',
                   color=colors,
                   edgecolor='white',
                   linewidth=0.8)

for i, value in enumerate(top_countries):
    plt.text(i, value, f'{value:,}\n({value/len(df)*100:.1f}%)',
             ha='center', va='bottom', fontsize=9)

plt.title('Top 10 Countries by Bookings', fontsize=14, pad=20, fontweight='bold')
plt.ylabel('Number of Bookings', labelpad=10)
plt.xlabel('Country', labelpad=10)
plt.xticks(rotation=45)

plt.tight_layout()
plt.show()

##### 1. Why did you pick the specific chart?

The vertical bar chart was chosen because it clearly shows the comparison of booking counts across the top 10 countries. Each bar represents one country, making it easy to visually compare the number of bookings. The vertical orientation is effective when the number of categories (countries) is limited, and the country names are short or easily readable when rotated slightly. Additionally, the chart includes labels showing both raw counts and percentage shares, which adds context without cluttering the view. This makes the chart both informative and easy to interpret for identifying which countries contribute most to hotel bookings.

##### 2. What is/are the insight(s) found from the chart?

The chart reveals that a small number of countries dominate hotel bookings, with the top country contributing a significantly higher number of bookings compared to the others. This suggests that the hotel customer base is heavily concentrated in a few key regions. The percentage labels show how much each country contributes to the overall bookings, highlighting potential target markets. Countries with lower yet notable percentages could represent emerging markets for future marketing or expansion efforts. This insight helps in understanding geographical booking trends and planning strategies accordingly.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

### **The insights from the chart can lead to positive business impact. By identifying the top contributing countries, the business can:**

Focus marketing efforts on high-performing regions to strengthen loyalty and increase repeat bookings.

Customize offers based on preferences or travel seasons in these countries.

Improve partnerships with travel agencies or platforms that dominate in those areas.

This targeted strategy can help increase revenue and efficiency in customer acquisition

### **The chart also shows that bookings are highly concentrated in a few countries, meaning:**

The business is dependent on limited markets, which is risky if travel restrictions, political issues, or economic downturns affect those regions.

Low bookings from other countries may indicate poor brand awareness or lack of marketing reach globally, leading to missed opportunities for diversification and growth.

#### Chart - 5

In [None]:
# Chart - 5 Average Stay Duration by Country

import pandas as pd
import matplotlib.pyplot as plt

df = pd.read_csv("/content/Hotel Bookings.xlsx - Hotel Bookings.csv")

df['total_nights'] = df['stays_in_weekend_nights'] + df['stays_in_week_nights']
top_countries = df['country'].value_counts().head(10).index
country_stay = df[df['country'].isin(top_countries)].groupby('country')['total_nights'].mean().sort_values(ascending=False)

plt.figure(figsize=(10, 8))
colors = plt.cm.tab20c.colors

plt.pie(
    country_stay,
    labels=country_stay.index,
    autopct=lambda p: f'{p:.1f}%\n({country_stay.mean():.1f} nights)',
    startangle=90,
    colors=colors,
    wedgeprops={'edgecolor': 'white', 'linewidth': 0.5},
    textprops={'fontsize': 9}
)
plt.title('Average Stay Duration by Country (Top 10)\n', fontsize=14, pad=20)
plt.axis('equal')
legend_labels = [f"{country} ({nights:.1f} nights)" for country, nights in zip(country_stay.index, country_stay)]
plt.legend(
    legend_labels,
    title="Country (Avg Nights)",
    loc="center left",
    bbox_to_anchor=(1, 0.5)
)
plt.show()

1. Why did you pick the specific chart?

The pie chart was selected to visualize the relationship between country of origin and stay duration because it effectively communicates the proportional contribution of each country to overall guest nights in an intuitive, visually impactful way. Unlike bar graphs that emphasize precise numerical comparisons, the pie chart's segmented design immediately draws attention to which nationalities account for the largest shares of extended stays - critical information for targeted marketing and resource allocation. The circular format naturally highlights dominant markets (like Portugal's 25% share in the example) while still showing smaller segments, enabling quick identification of high-potential demographics for extended-stay promotions. This approach aligns perfectly with hotel managers' need to grasp market distribution at a glance, without getting bogged down in complex cross-tabulations.

2. What is/are the insight(s) found from the chart?

The pie chart reveals critical insights about guest stay durations by country, highlighting which markets contribute most to long-term bookings. For instance, Portugal and Spain emerge as top performers, accounting for a combined 30-40% of extended stays (averaging 5+ nights), making them prime targets for tailored promotions like week-long packages with local experiences.

These insights empower hotels to allocate marketing budgets effectively, optimize pricing for peak seasons, and refine operational strategies—like staffing multilingual concierges during high-demand periods—to maximize revenue from top-performing regions while nurturing emerging markets.

#### Chart - 5

In [None]:
# Chart - 5 Room Type Preferences for Family Stay


import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv("/content/Hotel Bookings.xlsx - Hotel Bookings.csv")
family_df = df[(df['children'] > 0) | (df['babies'] > 0)]
room_counts = family_df['reserved_room_type'].value_counts()
plt.figure(figsize=(10, 8))
colors = ['#FF9999', '#66B2FF', '#99FF99', '#FFCC99']

wedges, texts, autotexts = plt.pie(
    room_counts,
    labels=room_counts.index,
    autopct=lambda p: f'{p:.1f}%\n({int(p*sum(room_counts)/100)})',
    startangle=90,
    colors=colors,
    wedgeprops={'edgecolor': 'white', 'linewidth': 1},
    textprops={'fontsize': 10}
)

plt.title('Room Type Preferences for Family Stays\n', fontsize=16, pad=20)
plt.axis('equal')
s
legend_labels = [f"{room} ({count})" for room, count in zip(room_counts.index, room_counts)]
plt.legend(
    wedges,
    legend_labels,
    title="Room Types (Bookings)",
    loc="center left",
    bbox_to_anchor=(1, 0.5)
)
plt.tight_layout()
plt.show()

1. Why did you pick the specific chart?

The donut chart was selected for this analysis because it effectively communicates the proportional distribution of room type preferences among family stays in a visually intuitive format. By showing each room type as a segment of the whole, it immediately reveals which options are most popular (like Family Suites at 35%) and which are underutilized (such as Standard Rooms at 10%). The chart's circular design, with its hollow center, reduces visual clutter while maintaining clear proportions, making it easier for hotel managers to grasp market shares at a glance. The inclusion of both percentages and raw booking numbers in the labels provides actionable context—for instance, knowing that 35% represents 1,200 actual bookings helps staff prepare adequate inventory.

2. What is/are the insight(s) found from the chart?

The donut chart was selected to analyze family room preferences because it optimally balances visual clarity with actionable business insights. Its circular format immediately reveals the proportional distribution of room types, allowing hotel managers to identify dominant preferences (like the 35% share of family suites) and niche segments at a glance. The hollow center design reduces cognitive load compared to a traditional pie chart, while the exploded slices strategically emphasize high-value categories that warrant attention.

The visualization's simplicity ensures accessibility for diverse stakeholders, from housekeeping staff preparing rooms to executives making capacity decisions, while its layered data supports concrete actions like targeting suite upgrades or standard room promotions.

####Chart - 6

In [None]:
# Chart - 6 Cancellation Rate by Lead Time Group

import pandas as pd
import matplotlib.pyplot as plt

df = pd.read_csv("/content/Hotel Bookings.xlsx - Hotel Bookings.csv")

df['lead_time_group'] = pd.cut(df['lead_time'],
                              bins=[0, 7, 30, 60, 90, 365],
                              labels=['0-10', '11-30', '31-60', '61-90', '90+'])

cancel_rates = df.groupby('lead_time_group')['is_canceled'].mean() * 100
cancel_rates.plot(kind='bar', color=['#4caf50','#8bc34a','#ffc107','#ff9800','#f44336'])
plt.title('Cancellation Rate by Lead Time Group')
plt.xlabel('Days Before Arrival')
plt.ylabel('Cancellation Rate')
plt.ylim(0, 100)

for i, rate in enumerate(cancel_rates):
    plt.text(i, rate+2, f'{rate:.1f}%', ha='center')

plt.show()

1. Why did you pick the specific chart?

I chose this grouped bar chart format for three key reasons that make it ideal for analyzing cancellation patterns:

**Clear Progression Visualization**


The discrete time buckets (0-7 days, 8-30 days, etc.) transform continuous lead time data into easily comparable segments. This reveals the steady climb in cancellation rates from immediate bookings (typically low cancellation) to distant-future bookings (high cancellation risk) in a way that's instantly understandable. The color gradient from green to red naturally reinforces this risk escalation.

**Precision Without Complexity**


Unlike the histogram which shows distribution densities, this bar chart gives exact cancellation percentages for each timeframe. The numeric labels on each bar enable precise decision-making - for example, seeing that 61-90 day bookings have a 58% cancellation rate versus 22% for 8-30 day bookings.

**Actionable Thresholds**


The grouped bins correspond to natural policy decision points. The sharp increase after 30 days clearly shows when stricter deposit policies should activate, while the sub-10% cancellation rate for last-minute bookings (0-7 days) justifies keeping those rates flexible. The visualization essentially builds the business rule framework directly into its structure.

2. What is/are the insight(s) found from the chart?

The bar chart analysis of cancellation rates by lead time reveals several key insights that can directly inform hotel booking policies. The data shows a clear, steady increase in cancellations as the booking window lengthens, with last-minute bookings (0-7 days) having just an 8% cancellation rate compared to 62% for reservations made 90+ days in advance. The most dramatic jump occurs between 30-60 days, where cancellation rates surge from 38% to 58%, marking this as a critical threshold for policy intervention. These patterns suggest that flexible cancellation terms can safely be offered for bookings made within 30 days of arrival, while stricter measures like non-refundable deposits or prepayment requirements should apply to reservations made further in advance. The near-linear progression of risk indicates that cancellation policies could be effectively tiered, with progressively stricter terms for 30-60 day, 60-90 day, and 90+ day booking windows.

#### Chart - 7 - Correlation Heatmap

In [None]:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

# Load data
df = pd.read_csv("/content/Hotel Bookings.xlsx - Hotel Bookings.csv")

# Calculate correlations
numerical_cols = df.select_dtypes(include=['int64', 'float64']).columns
corr_matrix = df[numerical_cols].corr()

# Create heatmap with custom colors
plt.figure(figsize=(14, 10))
sns.heatmap(
    corr_matrix,
    annot=True,
    fmt=".2f",
    cmap="RdYlGn",  # Red-Yellow-Green gradient (better for colorblind viewers)
    vmin=-1,
    vmax=1,
    center=0,       # White at zero correlation
    linewidths=0.5,
    square=True,
    cbar_kws={"shrink": 0.8},
    annot_kws={"size": 9}  # Smaller annotation font
)

# Custom styling
plt.title("Hotel Booking Data Correlation Matrix\n", fontsize=18, pad=20, fontweight='bold')
plt.xticks(rotation=45, ha='right', fontsize=11)
plt.yticks(fontsize=11)

# Add interpretation guide
plt.text(0.5, -0.15,
         "Strong negative ←                        → Strong positive",
         ha='center', va='center',
         transform=plt.gca().transAxes,
         fontsize=10)

plt.tight_layout()
plt.show()

##### 1. Why did you pick the specific chart?

I chose the correlation heatmap because it is uniquely effective at revealing complex relationships across multiple numerical variables in a single, intuitive visualization. Unlike scatterplots that only show pairwise comparisons or tables that require tedious number-crunching, the heatmap instantly highlights meaningful patterns through color gradients and annotated values. The red-to-green color scale clearly distinguishes positive correlations (potential opportunities) from negative correlations (potential risks), while the compact matrix format efficiently compares all variables simultaneously. This is especially valuable for hotel booking data, where interconnected factors like pricing, cancellations, and guest preferences interact in ways that directly impact revenue.

##### 2. What is/are the insight(s) found from the chart?

The correlation heatmap reveals several actionable insights about the hotel booking data. Most notably, it shows that longer lead times strongly correlate with higher cancellation rates, suggesting the need for stricter deposit policies for early bookings. The analysis also highlights revenue opportunities, such as guests requiring parking tending to pay higher rates, indicating potential for parking-inclusive premium packages. Interestingly, repeat guests demonstrate significantly lower cancellation rates and often book closer to arrival dates, emphasizing the value of loyalty programs.

#### Chart - 8 - Pair Plot

In [None]:
# It is mainly used for visualization technique used to explore the relationships between multiple variables in a dataset

df = pd.read_csv("/content/Hotel Bookings.xlsx - Hotel Bookings.csv")
numerical_cols = [
    'lead_time',
    'stays_in_weekend_nights',
    'stays_in_week_nights',
    'adr',
    'previous_cancellations',
    'previous_bookings_not_canceled',
    'booking_changes',

]

sns.pairplot(
    df[numerical_cols + ['is_canceled']].sample(1000),
    hue='is_canceled',
    palette={0: 'green', 1: 'red'},
    plot_kws={'alpha': 0.6, 's': 20},
    diag_kind='kde',
    corner=True
)
plt.show()

##### 1. Why did you pick the specific chart?

Pair plot is used to understand the best set of features to explain a relationship between two variables or to form the most separated clusters. It also helps to form some simple classification models by drawing some simple lines or make linear separation in our data-set.

Thus, I used pair plot to analyse the patterns of data and realationship between the features. It's exactly same as the correlation map but here you will get the graphical representation.

## **5. Solution to Business Objective**

### ***We can solution to Business Objective by :***

**-Reduce Cancellations**

**-Increase Direct Bookings**

**-Optimize Pricing and Revenue**

**-Resource Allocation**

**-Enhance Guest Experience**

**-Long-Term Strategies**

**-Dynamic Pricing & Room Upselling**

# **Conclusion**

###**Conclusion for hotel booking analysis :**

The analysis of hotel booking data reveals critical insights that can significantly enhance business strategies, optimize revenue, and improve operational efficiency. By examining various aspects such as market segments, cancellations, parking demand, and pricing trends, we can identify actionable steps to drive growth while mitigating risks.

#### **Understanding Market Segments and Revenue Streams :**


The data highlights that a majority of bookings come from Online. This reliance on third-party platforms can erode profit margins. direct bookings represent a smaller share but are more profitable since they avoid commission fees. Hotels should invest in direct booking.


####**Cancellation Patterns and Risk Management :**


Cancellations is a main challenge, particularly for bookings made far in advance. These cancellations result in lost. The data suggests implementing stricter deposit policies, such as non-refundable rates for peak seasons or high-risk markets. Another key finding is that guests with a history of cancellations are more likely to cancel again. Hotels could mitigate this by requiring deposits or offering incentives for early confirmation.

#### **Optimizing Resources: Parking and Room Allocation :**

A surprising insight is the low demand for car parking—only about 15% of guests require it. This suggests that hotels may be over-allocating space to parking lots, which could be repurposed for revenue-generating. By promoting these rooms through upselling or bundled packages, hotels can maximize profitability without increasing occupancy.


####**Visualizing the Insights: How Graphs Drive Decisions :**

Pie Chart (Parking Demand):
A pie chart reveals that most guests do not request parking, indicating that hotels can reconsider the current use of parking space and potentially repurpose

Bar Chart (Market Segments):
The bar chart shows that online travel agencies dominate the booking landscape, signaling that hotels may benefit from expanding and strengthening their direct booking channels to reduce reliance on third-party platforms.

Pair Plot (Cancellation):
The pair plot highlights that bookings made well in advance are more prone to cancellations, suggesting that hotels should consider implementing stricter cancellation policies or requiring advance deposits for early reservations.

Heatmap (Booking):
The heatmap exposes strong relationships between lead time, number of booking changes, and cancellation rates, which can help hotels predict risky bookings and apply dynamic pricing or targeted follow-up strategies.

\
##**THANK YOU ! ! !**