# **Project Name**   - Hotel Booking Analysis-



##### **Project Type**    - EDA
##### **Contribution**    - Individual


# **Project Summary -**

This project focuses on analyzing hotel booking data to gain insights into customer behavior and identify trends that can aid in decision-making. The dataset, Hotel_Booking.csv, is loaded and preprocessed using Python libraries such as NumPy, Pandas, Matplotlib, Seaborn, and Plotly.

Data Preprocessing
Data Cleaning:

The agent and company columns, which contain missing values, are filled with their respective mean values.
Duplicate rows are removed, and remaining missing values are filled with zeros.
The cleaned dataset is free from missing values.
Filtering Data:

The dataset is filtered to exclude bookings with no adults, children, or babies to ensure we are analyzing valid bookings.
Analysis and Visualizations
Best Time to Book a Hotel:

Data is split into Resort Hotel and City Hotel categories, considering only non-canceled bookings.
The average daily rate (ADR) for each month is calculated for both types of hotels.
A function sort_month is used to order the months correctly for better analysis.
Result:

The sorted monthly ADRs for both hotel types are plotted to identify the best time of year to book a hotel.
Guest Rush Periods:

Monthly guest counts for Resort and City Hotels are calculated and merged.
The data is sorted to reflect the correct chronological order of months.
Result:

The rush periods, showing the number of guests each month for both hotel types, are identified.
Optimal Length of Stay:

The dataset is filtered to include only non-canceled bookings.
The total number of nights (weekend and weeknights) is calculated.
The mean total number of nights is determined to find the optimal length of stay.
Result:

The optimal length of stay to get the best daily rate is identified.
Predicting High Number of Special Requests
While the provided code does not directly address this prediction, future work could involve creating a predictive model using features such as lead time, previous cancellations, and customer demographics to predict the likelihood of high special requests.
Visualizations
Lead Time Distribution:

A histogram of lead times is plotted to understand the distribution and frequency of lead times for bookings.
Monthly Bookings:

The number of bookings per month is visualized using a bar plot to identify peak booking periods.
Cancellation Rates by Booking Type:

A count plot is created to analyze cancellation rates based on booking changes, offering insights into customer behavior related to booking modifications.
Lead Time vs. Hotel Type:

A box plot is used to compare lead times across different hotel types, providing insights into planning behavior of guests.
Special Requests vs. Customer Satisfaction:

A box plot is created to explore the relationship between the number of special requests and customer satisfaction (measured by repeated guests).
Conclusion
This project provides a comprehensive analysis of hotel booking data, revealing key insights into booking trends, optimal booking periods, and customer behavior. The visualizations aid in understanding these patterns, helping hotel management make informed decisions. Future work could expand on predictive analytics to further enhance the decision-making process.

# **GitHub Link -**

Provide your GitHub Link here.

# **Problem Statement**


The hospitality industry is highly competitive, with hotels constantly seeking ways to optimize their operations, improve customer satisfaction, and increase profitability. One crucial aspect of this is understanding customer booking behavior and identifying trends that can help in strategic planning. This project aims to analyze hotel booking data to uncover insights that can aid in making informed decisions about pricing, marketing, and resource allocation.

#### **Define Your Business Objective?**

Data Cleaning and Preprocessing:

Address missing values in key columns (agent, company) by filling them with appropriate statistical measures.
Remove duplicate entries to ensure data integrity.
Filter the data to focus on valid bookings with non-zero guests.
Identify the Best Time to Book a Hotel:

Determine the average daily rate (ADR) for Resort and City Hotels across different months.
Identify the months with the lowest ADR to help customers find the best time to book and help hotel management plan promotions.
Analyze Guest Rush Periods:

Calculate the monthly guest counts for Resort and City Hotels.
Identify peak periods of guest influx to aid in resource planning and management.
Determine the Optimal Length of Stay:

Analyze the total number of nights stayed (combining weekend and weeknights) to find the optimal length of stay that offers the best daily rate.
Predict High Number of Special Requests:

Develop a framework to predict whether a hotel booking is likely to receive a disproportionately high number of special requests, using features like lead time, booking changes, and customer demographics.
Visualizations:

Create meaningful visualizations to present key findings, including:
Distribution of lead times.
Monthly booking trends.
Cancellation rates by booking type.
Relationship between lead time and hotel type.
Correlation between special requests and customer satisfaction.

# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required.
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits.
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 20 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule.

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Import Libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
from calendar import month_name

In [None]:
from google.colab import drive
drive.mount('/content/drive')

### Dataset Loading

In [None]:
# Load Dataset
df=pd.read_csv('/content/Hotel Bookings.csv')

### Dataset First View

In [None]:
# Dataset First Look
df.head()

### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count
rows,columns=df.shape
print(f'Rows:{rows}')
print(f'Columns:{columns}')

### Dataset Information

In [None]:
# Dataset Info
df.info()

#### Duplicate Values

In [None]:
# Dataset Duplicate Value Count
duplicate_count = df.duplicated().sum()
print(f'Duplicate count: {duplicate_count}')
df.drop_duplicates(inplace=True)

#### Missing Values/Null Values

In [None]:
# Missing Values/Null Values Count
null_values=df.isnull().sum()
print(null_values)


In [None]:
# Visualizing the missing values
plt.figure(figsize=(12, 8))
sns.heatmap(df.isnull(), cbar=False, cmap='viridis', yticklabels=False)
plt.title('Missing Values Heatmap')
plt.xlabel('Columns')
plt.ylabel('Rows')
plt.show()

### What did you know about your dataset?

Answer Here

## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns
# Print columns and their data types
print("Dataset Columns and Data Types:")
print(df.dtypes)


In [None]:
# Dataset Describe
df.describe()

### Variables Description

Answer Here

### Check Unique Values for each variable.

In [None]:
# Check unique values in 'hotel' column after filtering
print("Unique values in 'hotel' column after filtering:")
print(df['hotel'].unique())

## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:
# Write your code to make your dataset analysis ready.
df['agent'].fillna(df['agent'].mean(),inplace=True)
df['company'].fillna(df['company'].mean(),inplace=True)
df.drop_duplicates(inplace=True)

df.dropna(inplace=True)
print(df.isnull().sum())
list_cols=["children","adults","babies"]
for i in list_cols:
    print(f"{i} has unique values as{df[i].unique()}")
filtered_data=(df['children']==0)& (df['adults']==0) & (df['babies']==0)
final_data=df[~filtered_data]
print(final_data.shape)
#Best time of year to book the hotel
# Convert arrival_date_month to a categorical type with the correct order
df['arrival_date_month'] = pd.Categorical(df['arrival_date_month'],
    categories=['January', 'February', 'March', 'April', 'May', 'June', 'July',
                'August', 'September', 'October', 'November', 'December'],
    ordered=True)

# Calculate average ADR per month
avg_adr_per_month = df.groupby('arrival_date_month')['adr'].mean().reset_index()

# Plot the average ADR per month
plt.figure(figsize=(12, 6))
sns.barplot(x='arrival_date_month', y='adr', data=avg_adr_per_month, palette='viridis')
plt.title('Average Daily Rate (ADR) by Month')
plt.xlabel('Month')
plt.ylabel('Average Daily Rate (ADR)')
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()


In [None]:
#let's analyze the optimal length of stay to get the best daily rate.
# Calculate length of stay (total nights)
df['length_of_stay'] = df['stays_in_weekend_nights'] + df['stays_in_week_nights']

# Calculate average ADR by length of stay
avg_adr_by_stay = df.groupby('length_of_stay')['adr'].mean().reset_index()

# Plot the average ADR by length of stay
plt.figure(figsize=(12, 6))
sns.lineplot(x='length_of_stay', y='adr', data=avg_adr_by_stay, marker='o')
plt.title('Average Daily Rate (ADR) by Length of Stay')
plt.xlabel('Length of Stay (nights)')
plt.ylabel('Average Daily Rate (ADR)')
plt.grid(True)
plt.tight_layout()
plt.show()


In [None]:
# Check the unique values in the 'hotel' column
print(df['hotel'].unique())

# Calculate the average number of special requests for each hotel type
avg_special_requests = df.groupby('hotel')['total_of_special_requests'].mean().reset_index()

# Plotting
plt.figure(figsize=(10, 6))
sns.barplot(x='hotel', y='total_of_special_requests', data=avg_special_requests)
plt.title('Average Number of Special Requests by Hotel Type')
plt.xlabel('Hotel Type')
plt.ylabel('Average Number of Special Requests')
plt.show()



### What all manipulations have you done and insights you found?

Answer Here.

## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

#### Chart - 1

In [None]:
# Chart - 1 visualization code
plt.figure(figsize=(10,6))
sns.histplot(df['lead_time'],bins=30,kde=True)
plt.title('Distribution of lead time')
plt.xlabel('Lead Time')
plt.ylabel('Frequency')
plt.show()

##### 1. Why did you pick the specific chart?

I chose this histogram with a kernel density estimate (KDE) overlay for visualizing the distribution of lead time because it offers several key advantages:

Clear Distribution View

Smoothed Density Estimation

Pattern Identification

Outlier Detection

Comprehensive Insight

##### 2. What is/are the insight(s) found from the chart?

Distribution Insight: The histogram provides a clear view of how lead time values are distributed across the range, showing the frequency of occurrences within specified bins.

Comprehensive Insight: By using both the histogram and KDE, we gain a more comprehensive view of the data, facilitating better interpretation and insights that can inform business decisions or further statistical analysis.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, the insights gained from understanding lead time distribution can indeed create a positive business impact by improving operational efficiency, optimizing inventory management, and enhancing customer satisfaction. However, there could potentially be insights that lead to negative growth, such as discovering significant variability or inefficiencies in lead times that could increase costs, lead to stockouts, or result in inconsistent service delivery. Identifying these issues allows businesses to address them proactively, minimizing their negative impact and potentially turning them into opportunities for improvement.

#### Chart - 2

In [None]:
# Chart - 2 visualization code
df['arrival_date_month'] = pd.Categorical(df['arrival_date_month'],
                                            categories=['January', 'February', 'March', 'April', 'May', 'June',
                                                        'July', 'August', 'September', 'October', 'November', 'December'],
                                            ordered=True)
monthly_bookings = df['arrival_date_month'].value_counts().sort_index()
plt.figure(figsize=(12, 6))
sns.barplot(x=monthly_bookings.index, y=monthly_bookings.values)
plt.title('Number of Bookings per Month')
plt.xlabel('Month')
plt.ylabel('Number of Bookings')
plt.show()


##### 1. Why did you pick the specific chart?

I chose a bar chart because it effectively displays the number of bookings per month, allowing for easy comparison of booking volumes across different months

##### 2. What is/are the insight(s) found from the chart?

The chart shows that certain months have higher booking volumes compared to others, indicating seasonal trends or periods of peak demand throughout the year.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, understanding seasonal booking patterns can help businesses optimize staffing, pricing strategies, and marketing efforts to capitalize on peak periods, thereby potentially increasing revenue.

If the chart reveals unexpected dips or fluctuations in booking numbers for crucial months, it could lead to negative growth by impacting revenue projections and operational planning. For instance, a significant drop in bookings during peak seasons could indicate missed opportunities or challenges in attracting customers during critical periods.

#### Chart - 3

In [None]:
# Chart - 3 visualization code
plt.figure(figsize=(10,6))
sns.countplot(x='is_canceled',hue='booking_changes',data=df)
plt.title('Cancellation Rates by Booking Type')
plt.xlabel('Cancellation Status')
plt.ylabel('Count')
plt.show()

##### 1. Why did you pick the specific chart?

I chose a countplot because it effectively compares cancellation rates based on different booking change scenarios. This type of plot is ideal for visualizing categorical data and understanding how cancellation rates vary across different conditions.

##### 2. What is/are the insight(s) found from the chart?

The chart indicates how cancellation rates differ depending on whether there were booking changes. It shows whether bookings with changes have higher or lower cancellation rates compared to those without changes.


##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, understanding these cancellation patterns can help businesses optimize their booking policies and procedures. It allows them to identify scenarios where cancellations are more likely, enabling proactive measures to reduce cancellations and improve revenue stability.

If the chart reveals that bookings with frequent changes have significantly higher cancellation rates, it could lead to negative growth by impacting revenue stability and operational efficiency. High cancellation rates in such scenarios might indicate dissatisfaction or uncertainty among customers, potentially affecting the business's reputation and financial outcomes.

#### Chart - 4

In [None]:
# Chart - 4 visualization code
plt.figure(figsize=(10,6))
sns.boxplot(x='hotel',y='lead_time',data=df)
plt.title('Lead Time vs. Hotel Type')
plt.xlabel('Hotel  Type')
plt.ylabel('Lead Time')
plt.show()

##### 1. Why did you pick the specific chart?

I chose a boxplot because it effectively displays the distribution of lead time across different types of hotels. This type of plot is useful for comparing the central tendency, spread, and potential outliers of lead time data between different categories (in this case, types of hotels).

##### 2. What is/are the insight(s) found from the chart?

The boxplot reveals the median lead time, quartiles, and any outliers for each hotel type. It shows whether lead times vary significantly between different types of hotels and if there are any notable differences in booking patterns.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, understanding how lead times vary across different hotel types can help businesses tailor their operational strategies accordingly. For example, it can inform staffing levels, resource allocation, and customer service expectations based on typical lead time patterns for each type of hotel.

If the boxplot indicates that certain types of hotels consistently have longer lead times compared to others, it could lead to negative growth if not properly managed. Longer lead times might lead to reduced customer satisfaction, higher cancellation rates, or missed revenue opportunities due to delayed bookings. Addressing such insights proactively is crucial to mitigate potential negative impacts on business growth and profitability.

#### Chart - 5

In [None]:
# Chart - 5 visualization code
plt.figure(figsize=(10,6))
sns.boxplot(x='total_of_special_requests', y='is_repeated_guest', data=df)
plt.title('Special Requests vs. Customer Satisfaction')
plt.xlabel('Number of Special Requests')
plt.ylabel('Customer Satisfaction Rating')
plt.show()

##### 1. Why did you pick the specific chart?

I chose a boxplot because it effectively shows the distribution of customer satisfaction ratings (represented by is_repeated_guest) across different levels of special requests. This plot type is suitable for comparing how customer satisfaction varies based on the number of special requests made.

##### 2. What is/are the insight(s) found from the chart?

The boxplot reveals whether there is a relationship between the number of special requests and customer satisfaction ratings. It shows the central tendency (median), spread (quartiles), and potential outliers in customer satisfaction for each level of special requests.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, understanding how customer satisfaction varies based on special requests can help businesses tailor their service offerings. For example, it can inform training programs for staff handling special requests, or influence pricing strategies for packages that include additional services.

If the boxplot indicates that higher numbers of special requests are associated with lower customer satisfaction ratings, it could lead to negative growth if not addressed. This might suggest issues with service delivery, customer expectations management, or the need for improvements in fulfilling special requests effectively. Addressing such insights promptly is crucial to maintain or enhance customer satisfaction levels and ensure positive business growth.

## **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ?
Explain Briefly.

1. Understanding Special Requests Trends
Insight: The average number of special requests differs between "Resort Hotel" and "City Hotel". If one type receives significantly more special requests, it indicates that guests at this type of hotel may have higher expectations or specific needs.
Action: Tailor services and staff training to better meet these expectations. For example, if "Resort Hotel" guests make more special requests, enhance concierge services and ensure prompt handling of requests.
2. Enhancing Guest Experience
Insight: By identifying the common types of special requests (e.g., extra pillows, early check-in), the hotel can proactively offer these services or make them more easily accessible.
Action: Create pre-arrival questionnaires for guests to fill out their preferences and requests. Implement a more flexible check-in/check-out system and ensure popular requests are accommodated without hassle.
3. Optimizing Operations
Insight: Understanding the lead time and booking patterns can help in better managing resources and staff.
Action: Use booking data to predict busy periods and allocate resources accordingly. Implement dynamic staffing models where more staff are available during peak times identified from booking patterns.
4. Personalizing Marketing Efforts
Insight: Market segments and distribution channels give insight into where most bookings come from and what kind of guests are booking.
Action: Develop targeted marketing campaigns for the most lucrative segments. Offer personalized deals and discounts for repeat guests or those booking through specific channels that bring in high-value customers.
5. Leveraging Technology
Insight: Analysis of special requests and customer preferences can be greatly enhanced with the use of technology.
Action: Implement a CRM system to track guest preferences and requests. Use AI and machine learning models to predict and suggest potential special requests based on past data.
6. Improving Facilities Based on Feedback
Insight: Feedback and special requests provide direct input from guests about what they value.
Action: Regularly review and analyze feedback and special requests to identify areas for improvement. Invest in facilities that are frequently requested or mentioned in feedback.

# **Conclusion**

The boxplot analysis of special requests vs. customer satisfaction reveals valuable insights:

Insights: It shows how customer satisfaction ratings vary across different levels of special requests. The plot highlights the median satisfaction level, quartiles, and potential outliers for each category of special requests.

Positive Impact: Understanding this relationship can help businesses enhance customer satisfaction by optimizing service delivery for special requests. This includes improving staff training, refining service packages, and better managing customer expectations.

Negative Impact: High numbers of special requests correlating with lower satisfaction ratings could indicate operational inefficiencies or gaps in service delivery. Addressing these issues is crucial to mitigate negative impacts on customer retention and overall business growth.

Overall, leveraging insights from this analysis can empower businesses to tailor their services effectively, thereby enhancing customer satisfaction and fostering positive business outcomes.

### ***Hurrah! You have successfully completed your EDA Capstone Project !!!***