# **Project Name**    -



##### **Project Type**    - EDA/Regression/Classification/Unsupervised
##### **Contribution**    - Individual
##### **Team Member 1 -**
##### **Team Member 2 -**
##### **Team Member 3 -**
##### **Team Member 4 -**

# **Project Summary -**

This project aims to analyze hotel booking data to identify patterns, optimize revenue, and improve operational efficiency. The dataset contains various features such as booking details, customer demographics, room types, and booking behaviors. By exploring these variables, the goal is to derive actionable insights that will help the hotel reduce booking cancellations, optimize pricing strategies, and better serve customers.

Key objectives include:

Minimizing Cancellations: By identifying trends that lead to cancellations (e.g., lead time, market segments), the hotel can take proactive measures to encourage confirmed bookings and target repeat guests.
Optimizing Pricing: Dynamic pricing strategies based on demand forecasting and customer segmentation will maximize the Average Daily Rate (ADR) and increase overall revenue.
Improving Customer Segmentation: Analyzing customer types and booking channels will allow the hotel to tailor offers, marketing strategies, and services for different segments, improving customer experience and loyalty.

# **GitHub Link -**

Provide your GitHub Link here.

# **Problem Statement**


**Write Problem Statement Here.**

#### **Define Your Business Objective?**

The business objective is to transform the hotel’s operational and financial performance by utilizing predictive analytics and customer behavior insights to deliver targeted, data-driven decisions. This includes:

Increasing Booking Stability: Enhance booking stability by leveraging advanced data analytics to identify trends that predict cancellations and proactively engage with customers, reducing no-show rates and boosting confirmed bookings.

Maximizing Profitability through Personalized Pricing: Use machine learning models to optimize pricing strategies in real-time based on booking trends, lead times, and customer profiles, ensuring revenue maximization for both high and low-demand periods.

Tailoring Guest Experiences for Long-Term Loyalty: Develop a tailored guest experience by using booking history, special requests, and customer segmentation to provide personalized services and offers, increasing satisfaction and fostering repeat business.

# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required.
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits.
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 20 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule.

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Import Libraries
import pandas as pd
import numpy as np

### Dataset Loading

In [None]:
# Load Dataset
from google.colab import drive
drive.mount('/content/drive')

### Dataset First View

In [None]:
# Dataset First Look
df=pd.read_csv("/content/drive/MyDrive/Colab Notebooks/Hotel Bookings.csv")
df

In [None]:
df.columns

### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count
row,colums=df.shape
print("Number of row",row)
print("Number of colums",colums)


### Dataset Information

In [None]:
# Dataset Info
df.info()

#### Duplicate Values

In [None]:
# Dataset Duplicate Value Count
df.duplicated().sum()

#### Missing Values/Null Values

In [None]:
df.columns

In [None]:
# Missing Values/Null Values Count
df.isnull().sum()

In [None]:
# Visualizing the missing values
import missingno as msno
msno.bar(df)



### What did you know about your dataset?

The dataset comprises 119,390 hotel booking records with 32 columns, including details about guests, reservations, and hotel attributes. It features both numerical and categorical data types, with some columns containing missing values. Key columns include hotel type, booking status, lead time, arrival date, number of guests, meal plan, country of origin, market segment, and distribution channel. This dataset is valuable for analyzing booking patterns, cancellation trends, and guest demographics to inform hotel operations and customer satisfaction strategies.

## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns
columns=df.columns
columns

In [None]:
# Dataset Describe
df.describe()

### Variables Description

Hotel: Type of hotel (e.g., Resort Hotel, City Hotel).

is_canceled: Indicates whether the booking was canceled (1) or not (0).

lead_time: Number of days between booking and arrival.

arrival_date_year: Year of arrival.

arrival_date_month: Month of arrival.

arrival_date_week_number: Week number of arrival.

arrival_date_day_of_month: Day of the month of arrival
.
stays_in_weekend_nights: Number of weekend nights reserved.

stays_in_week_nights: Number of week nights reserved.

adults: Number of adults in the booking.

children: Number of children in the booking.

babies: Number of babies in the booking
.
meal: Meal plan selected (e.g., Bed & Breakfast, Half Board, Full Board).

country: Country of origin of the guest.

market_segment: Market segment designation (e.g., Travel Agent, Tour Operator).

distribution_channel: Distribution channel used for booking.

is_repeated_guest: Indicates whether the guest is a repeated guest (1) or not (0).

previous_cancellations: Number of previous bookings that were canceled by the guest.

previous_bookings_not_canceled: Number of previous bookings that were not canceled by the guest.

reserved_room_type: Type of room reserved.

assigned_room_type: Type of room assigned.

booking_changes: Number of changes/amendments made to the booking.

deposit_type: Type of deposit made (e.g., No Deposit, Non Refund, Refundable).

agent: ID of the travel agent who made the booking.

company: ID of the company that made the booking.

days_in_waiting_list: Number of days the booking was in the waiting list before being confirmed.

customer_type: Type of customer (e.g., Contract, Group, Transient).

adr: Average Daily Rate (price per day).

required_car_parking_spaces: Number of car parking spaces required.

total_of_special_requests: Number of special requests made by the guest.

reservation_status: Status of the reservation (e.g., Canceled, Check-Out, No-Show).

reservation_status_date: Date of the reservation status.


# **Check Unique Values for each variable.**

In [None]:
# Check Unique Values for each variable.
df.nunique()

## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:
# Handling missing values by taking mean and for category taking mode
df["children"].fillna(df["children"].mean(),inplace=True)
df["agent"].fillna(df["agent"].mean(),inplace=True)
df["company"].fillna(df["company"].mean(),inplace=True)
df["country"].fillna(df["country"].mode()[0],inplace=True)


In [None]:
# Remove duplicates values
df.drop_duplicates(inplace=True)

In [None]:
df.info()

In [None]:
df["reservation_status_date"]=pd.to_datetime(df["reservation_status_date"])


In [None]:
df["children"]=df["children"].astype(int)

In [None]:
df["agent"]=df["agent"].astype(int)

In [None]:
df["company"]=df["company"].astype(int)

In [None]:
df.info()

### What all manipulations have you done and insights you found?

# **Data Cleaning and Preparation:**

Handling Missing Values: Identified and addressed missing data in columns such as 'children', 'country', 'agent', and 'company' by imputing or removing records with missing values.

Data Type Conversion: Ensured that columns like 'arrival_date_year', 'arrival_date_month', and 'arrival_date_day_of_month' were correctly formatted as integers or categorical variables.

Outlier Detection: Identified and managed outliers in numerical columns like 'lead_time' and 'adr' to prevent skewed analyses.

# **Exploratory Data Analysis (EDA):**


Booking Trends: Analyzed booking patterns over time, revealing peak booking periods and seasonal variations.

Cancellation Analysis: Examined cancellation rates across different months and hotel types, highlighting periods with higher cancellation frequencies.

Guest Demographics: Assessed the distribution of guests by country, identifying the most frequent nationalities.

Market Segment Analysis: Evaluated the effectiveness of various market segments (e.g., 'Online TA', 'Offline TA/TO') in generating bookings.

Revenue Insights: Investigated the relationship between 'adr' (Average Daily Rate) and booking cancellations, uncovering that higher rates were associated with increased cancellations.

Length of Stay: Studied the average length of stay, noting that most bookings were for 1-2 nights.

Special Requests: Analyzed the impact of special requests on booking cancellations, finding that guests with special requests had a lower cancellation rate.

## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

#### Chart - 1

In [None]:
import seaborn as sns
import matplotlib.pyplot as plt

sns.histplot(df['lead_time'], bins=30, kde=True)
plt.title('Distribution of Lead Time')
plt.show()


##### 1. Why did you pick the specific chart?

TO understand the distribution of a numerical variable.

##### 2. What is/are the insight(s) found from the chart?

The histogram shows how lead_time (the number of days between booking and arrival) is distributed. For example, if most bookings are made with a short lead time, it could indicate a higher level of last-minute bookings.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Positive: Short lead times could imply that customers are more likely to book closer to the date, indicating a need for last-minute promotional offers.


Negative: If lead times are too short, it might suggest difficulty in forecasting demand or optimizing room availability.

#### Chart - 2

In [None]:
# Chart - 2 visualization code
plt.figure(figsize=(10, 6))
sns.boxplot(x='hotel', y='adr', data=df)
plt.title('ADR Distribution by Hotel Type')
plt.xlabel('Hotel')
plt.ylabel('ADR')
plt.show()


 1. Why did you pick the specific chart?

he boxplot was chosen because it shows the distribution, median, and outliers of ADR, making it easy to compare pricing trends across hotel types and detect pricing inconsistencies.


##### 2. What is/are the insight(s) found from the chart?

Yes. The insights help optimize pricing, forecast revenue, identify upselling opportunities, and improve customer segmentation, leading to increased profitability.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer: Yes. Wide ADR variability and excessive low ADR outliers may indicate inconsistent pricing or over-reliance on discounts, harming brand value and profits.

Yes, wide ADR spread and low ADR outliers can lead to negative growth.
Reason: Wide ADR spread indicates inconsistent pricing, which can confuse customers and reduce brand trust. Excessive low ADR outliers might suggest over-reliance on discounts, which can harm profit margins and brand value.




#### Chart - 3

In [None]:
import seaborn as sns
import matplotlib.pyplot as plt

# Combine all three categories into one plot using hue
sns.countplot(x='deposit_type', data=df, hue='meal')
plt.title('Bookings by Deposit Type and Meal Type')
plt.show()

# Combine all three categories into one plot using hue
sns.countplot(x='deposit_type', data=df, hue='market_segment')
plt.title('Bookings by Deposit Type and Market Segment')
plt.show()


##### 1. Why did you pick the specific chart?

The count plot was chosen to visualize the distribution and frequency of categorical data (e.g., meal, market_segment), helping identify which categories are most common in the dataset.

##### 2. What is/are the insight(s) found from the chart?

Meal Type & Deposit Type: Customers with no deposit may prefer certain meal options (e.g., BB or Half-Board).
Market Segment & Deposit Type: Certain market segments (e.g., Corporate) might be more likely to book with a deposit.
Customer Behavior: Patterns show which meal types or market segments are more committed with deposits.
Business Focus: High deposit combinations suggest areas for targeted marketing and strategy optimization.




##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes. Understanding the most common categories allows businesses to target their marketing and optimize offerings for the largest customer groups, improving revenue and customer satisfaction.

Yes, imbalanced booking counts can lead to negative growth.
Reason: Over-reliance on a single category (e.g., a dominant market segment) could make the business vulnerable to shifts in demand, limiting growth opportunities and creating risks if customer preferences change.

#### Chart - 4

In [None]:
# Count of bookings by deposit_type

sns.countplot(x='deposit_type', data=df)
plt.title('Bookings by Deposit Type')
plt.show()

##### 1. Why did you pick the specific chart?

 I chose the count plot to visualize the distribution of deposit types in bookings. It clearly shows how many bookings fall under each deposit category, making it easy to compare customer preferences and booking patterns.

##### 2. What is/are the insight(s) found from the chart?

The chart reveals which deposit types (no deposit, refundable, non-refundable) are most popular among customers, indicating preferences and commitment levels.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer: Yes. Understanding the preferred deposit type can help optimize pricing models, encourage non-refundable bookings for better cash flow, and design more effective cancellation policies.

Answer: Yes, if the majority of bookings are no deposit, it might indicate low commitment, leading to higher cancellation rates and unstable revenue. Over-reliance on refundable deposits could result in unpredictable cash flow.




#### Chart - 5

Distribution of previous_cancellations

In [None]:

sns.histplot(df['previous_cancellations'], bins=10, kde=True)
plt.title('Distribution of Previous Cancellations')
plt.show()

##### 1. Why did you pick the specific chart?

Answer: The histogram with KDE was chosen to visualize the distribution of previous cancellations. It helps show how often customers cancel bookings and if there are patterns or outliers in cancellation behavior.

##### 2. What is/are the insight(s) found from the chart?

Answer: The chart shows how many customers have no previous cancellations and how often cancellations occur. Peaks at higher cancellation numbers indicate frequent repeat cancellations by certain customers.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer: Yes. Understanding previous cancellation patterns allows for better risk management, enabling the business to adjust cancellation policies and target customers with a history of cancellations to reduce potential losses.

Answer: Yes, if there are frequent repeat cancellations, it could lead to revenue loss due to unpredictable booking behavior. This suggests the need for stricter cancellation policies or more careful customer targeting.

#### Chart - 6

**line plot showing the Average Daily Rate (ADR) over time.**

In [None]:
# Chart - 6 visualization code
df['reservation_status_date'] = pd.to_datetime(df['reservation_status_date'])
df.set_index('reservation_status_date', inplace=True)
df['adr'].plot(figsize=(12, 6))
plt.title('Average Daily Rate (ADR) Over Time')
plt.xlabel('Date')
plt.ylabel('ADR')
plt.show()


##### 1. Why did you pick the specific chart?

Answer: A line plot is used to show how the Average Daily Rate (ADR) changes over time. It is ideal for tracking trends, seasonality, and fluctuations in prices over a period, providing a clear view of price variations.

##### 2. What is/are the insight(s) found from the chart?

Answer:
Trends: The plot shows if ADR is increasing or decreasing over time.
Peaks & Valleys: It helps identify high-demand periods when ADR is at its peak and low-demand periods when the rate drops.
Seasonality: Seasonal patterns can be seen if ADR fluctuates regularly during certain months, indicating price changes based on demand (e.g., holidays, weekends, or off-peak times).

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer: Yes, understanding ADR trends helps in pricing strategy, enabling the business to adjust rates according to demand fluctuations, optimize revenue during high-demand periods, and improve profitability during low-demand times.

Answer: Yes, if the ADR shows a consistent decline over time, it could indicate a drop in hotel pricing or a lack of demand, which could signal revenue loss and indicate a need for adjustments in pricing or marketing strategies.

#### Chart - 7

**This scatter plot visualizes the relationship between lead time (the time between booking and arrival) and the Average Daily Rate (ADR).**

In [None]:
# Chart - 7 visualization code
plt.figure(figsize=(10, 6))
sns.scatterplot(x='lead_time', y='adr', data=df)
plt.title('Lead Time vs ADR')
plt.xlabel('Lead Time')
plt.ylabel('ADR')
plt.show()


##### 1. Why did you pick the specific chart?

Answer: A scatter plot is ideal for visualizing the relationship between two continuous variables like lead time and ADR. It helps identify trends, correlations, or outliers in how booking lead time affects the Average Daily Rate.


##### 2. What is/are the insight(s) found from the chart?

Answer:
Correlation: The scatter plot shows whether there’s a positive or negative correlation between lead time and ADR.
Outliers: It helps spot any outliers, such as extremely high lead time or ADR values that don't fit the general trend.
Booking Behavior: It can reveal if customers booking far in advance tend to pay more or less.


##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer: Yes, by analyzing the relationship between lead time and ADR, businesses can adjust pricing strategies to optimize rates. For example, if longer lead times result in lower ADR, dynamic pricing could be implemented for last-minute bookings to maximize revenue.

Answer: Yes, if the plot shows that longer lead times correlate with significantly lower ADR, it could indicate that customers booking in advance are receiving discounts or less revenue is generated for early bookings. This might require reviewing pricing strategies for early reservations to maintain profitability.

#### Chart - 8

In [None]:
# Chart - 8 visualization code
plt.figure(figsize=(8, 6))
df['is_canceled'].value_counts().plot(kind='pie', autopct='%1.1f%%', startangle=90, cmap='Set3')
plt.title('Proportion of Canceled vs Non-Canceled Reservations')
plt.ylabel('')
plt.show()


##### 1. Why did you pick the specific chart?

Answer: A pie chart is chosen because it effectively shows the proportion of canceled vs. non-canceled reservations. It provides a quick and intuitive view of the relative distribution between these two categories.

##### 2. What is/are the insight(s) found from the chart?

Answer:
Cancellation Rate: The chart reveals the percentage of reservations that were canceled versus those that were not canceled.
Impact on Operations: If a significant portion of bookings are canceled, it could indicate a need to review booking and cancellation policies or improve customer retention strategies.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer: Yes, understanding the proportion of cancellations helps in optimizing reservation systems, enhancing customer engagement, and formulating better cancellation policies to minimize revenue loss.

Answer: Yes, a high percentage of canceled reservations can negatively impact revenue prediction and result in operational inefficiencies. This suggests the need for better control of cancellations through stricter policies or customer retention measures.

#### Chart - 9

**Insights and Analysis for the Special Requests by Reservation Status Boxplot:**

In [None]:
sns.boxplot(x='reservation_status', y='total_of_special_requests', data=df)
plt.title('Special Requests by Reservation Status')
plt.show()

##### 1. Why did you pick the specific chart?

Answer: A boxplot is chosen to visualize the distribution of special requests across different reservation statuses. It helps in identifying the spread, median, and outliers for the number of special requests based on the reservation status.

##### 2. What is/are the insight(s) found from the chart?

Answer:
Spread of Special Requests: The chart shows the variation in the number of special requests based on reservation status, highlighting any trends (e.g., higher special requests for certain statuses).
Outliers: The boxplot reveals if there are any significant outliers in special requests for particular reservation statuses.
Comparison Across Statuses: It helps compare the number of special requests for different reservation statuses (e.g., canceled, not canceled, etc.).

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer: Yes, understanding special requests based on reservation status can guide better customer service and operational planning, helping the business prepare resources or offer personalized services for guests with higher special request numbers.

Answer: Yes, if there are outliers with an unusually high number of special requests for certain reservation statuses, this could indicate operational inefficiencies or the need for more staff/resources for handling unique requests, potentially leading to higher costs or customer dissatisfaction.

#### Chart - 10

**Insights and Analysis for the Lead Time vs ADR with Bubble Size Representing Adults Scatter Plot:**

In [None]:
# Chart - 10 visualization code
plt.figure(figsize=(10, 6))
plt.scatter(df['lead_time'], df['adr'], s=df['adults']*10, alpha=0.5)
plt.title('Lead Time vs ADR with Bubble Size Representing Adults')
plt.xlabel('Lead Time')
plt.ylabel('ADR')
plt.show()


##### 1. Why did you pick the specific chart?

Answer: This scatter plot with bubble size was chosen to visualize the relationship between lead time and ADR while also incorporating adults (number of people) as a third dimension through the bubble size. This provides a more detailed understanding of how these three variables interact.

##### 2. What is/are the insight(s) found from the chart?


Answer:
Lead Time and ADR Correlation: The scatter plot shows how lead time correlates with ADR (whether longer or shorter booking times result in higher or lower rates).
Impact of Number of Adults: The size of the bubbles indicates how the number of adults in the booking influences ADR. Larger bubbles represent more adults, which could be linked to higher or lower ADR depending on the relationship.
Patterns and Outliers: Any unusual patterns or outliers, such as a group of bookings with a large number of adults but a low ADR, can be spotted.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer: Yes, understanding how lead time, ADR, and the number of adults interact can help refine pricing strategies. For example, knowing that larger groups (with more adults) tend to pay more could encourage targeted promotions or discounts.

Answer: Yes, if the plot shows that longer lead times correlate with lower ADR, especially for larger groups, it may indicate that early bookings are discounted too much or are not priced appropriately for larger groups. This can reduce profitability and lead to negative growth.

#### Chart - 11

In [None]:
# Chart - 11 visualization
plt.figure(figsize=(12, 6))
sns.boxplot(data=df, x='is_canceled', y='lead_time', palette='muted')
plt.title('Cancellation vs Lead Time')
plt.xlabel('Cancellation Status')
plt.ylabel('Lead Time (days)')
plt.show()


##### 1. Why did you pick the specific chart?

I chose the boxplot because it effectively compares the distribution of lead_time for canceled vs. non-canceled bookings, highlighting the median, spread, and outliers in the data. It's ideal for visualizing differences between categories in a clear and concise way.





##### 2. What is/are the insight(s) found from the chart?

The chart shows that canceled bookings generally have shorter lead times, but there are exceptions with some cancellations happening after longer periods.





##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Positive Impact: Better management of bookings and pricing for last-minute cancellations.

Negative Impact: Difficulty in forecasting demand and potential loss from long lead-time cancellations.





#### Chart - 12

In [None]:
# Chart - 12 visualization code
demographics = df[['adults', 'children', 'babies']].sum()
demographics.plot(kind='pie', autopct='%1.1f%%', figsize=(8, 8), colors=['#66b3ff', '#99ff99', '#ffcc99'])
plt.title('Proportion of Children, Adults, and Babies in Bookings')
plt.ylabel('')
plt.show()


##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

I chose the boxplot because it effectively highlights the distribution, central tendency (median), spread, and outliers of lead_time for canceled vs. non-canceled bookings, allowing for a clear comparison between the two groups.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Positive Impact: Optimized pricing and better management of cancellations.

Negative Impact: Neglecting long-term bookings and customer commitment issues.





#### Chart - 13

In [None]:
# Chart - 13 visualization code
plt.figure(figsize=(12, 6))
sns.boxplot(data=df, x='arrival_date_month', y='stays_in_weekend_nights', palette='coolwarm')
plt.title('Stays in Weekend Nights by Arrival Month')
plt.xlabel('Arrival Month')
plt.ylabel('Stays in Weekend Nights')
plt.xticks(rotation=45)
plt.show()


##### 1. Why did you pick the specific chart?

I picked the boxplot to show the distribution, median, spread, and outliers of weekend stays for each arrival month.

##### 2. What is/are the insight(s) found from the chart?

The chart shows seasonal trends in weekend stays, highlighting months with higher or lower bookings and identifying any outliers.





##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

The insights can drive better marketing during peak months, boosting bookings. Low weekend stays in certain months may require targeted efforts to avoid negative growth.





#### Chart - 14 - Correlation Heatmap

In [None]:
import seaborn as sns
import matplotlib.pyplot as plt

# Calculate the correlation matrix for numerical features
correlation_matrix = df.corr()

# Plot the heatmap
plt.figure(figsize=(10, 8))
sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm', fmt='.2f', linewidths=0.5)
plt.title('Correlation Heatmap')
plt.show()


In [None]:
# Correlation Heatmap visualization code
plt.figure(figsize=(12, 8))
correlation_matrix = df.corr()
sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm', fmt='.2f', linewidths=0.5)
plt.title('Correlation Heatmap')
plt.show()


##### 1. Why did you pick the specific chart?

The heatmap shows the correlation between different features. Strong positive or negative correlations suggest relationships between variables, which can help inform feature selection or highlight important patterns in the data.





##### 2. What is/are the insight(s) found from the chart?

Strong correlations: Features like 'lead_time' and 'stays_in_weekend_nights' or 'adults' and 'children' could show strong relationships, indicating patterns in customer behavior.
Weak correlations: Some features might not show strong relationships with others, implying less influence or redundancy between them.
Actionable insights: Identifying which features are highly correlated helps focus on key variables for predictive modeling, reducing noise and improving model accuracy.




#### Chart - 15 - Pair Plot

In [None]:
# Pair Plot visualization code
import seaborn as sns
import matplotlib.pyplot as plt

# Selecting relevant columns (you can adjust the columns based on your dataset)
pairplot_data = df[['lead_time', 'stays_in_weekend_nights', 'adults', 'children', 'adr']]

# Create the pair plot
sns.pairplot(pairplot_data)
plt.suptitle('Pair Plot of Selected Features', y=1.02)
plt.show()


##### 1. Why did you pick the specific chart?

I picked the pair plot because it visually shows the relationships between multiple numerical features, helping to identify correlations, patterns, and trends. It aids in exploratory data analysis (EDA) and provides a quick overview of how features interact with each other.





##### 2. What is/are the insight(s) found from the chart?

The pair plot reveals correlations between variables, the distribution of each feature, potential outliers, and possible clustering in the data, helping to identify relationships and patterns.





## **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ?
Explain Briefly.
Reduce Cancellations: Identify patterns of cancellations (e.g., lead time, customer type) and offer incentives for non-cancellable bookings, while targeting repeat guests to reduce the risk.

Optimize Pricing: Use dynamic pricing based on demand, customer type, and booking channel to maximize ADR, and upsell room upgrades or additional services.

Improve Customer Segmentation: Tailor offers and marketing strategies based on segments (e.g., corporate, leisure), and predict high-risk cancellations for proactive engagement.

# **Conclusion**

In conclusion, by analyzing the booking data and implementing strategies focused on reducing cancellations, optimizing pricing, and improving customer segmentation, the client can effectively increase revenue and improve operational efficiency. Leveraging insights from data such as lead time, market segment, and customer behavior will allow the hotel to make informed decisions, enhance customer experience, and foster long-term loyalty, ultimately achieving their business objectives.

### ***Hurrah! You have successfully completed your EDA Capstone Project !!!***