# **Project Name**    -



##### **Project Type**    - EDA
##### **Contribution**    - Individual


# **Project Summary -**

Have you ever wondered when the best time of year to book a hotel room is? Or the optimal length of stay in order to get the best daily rate? What if you wanted to predict whether or not a hotel was likely to receive a disproportionately high number of special requests? This hotel booking dataset can help you explore those questions! This data set contains booking information for a city hotel and a resort hotel, and includes information such as when the booking was made, length of stay, the number of adults, children, and/or babies, and the number of available parking spaces, among other things. All personally identifying information has been removed from the data. Explore and analyse the data to discover important factors that govern the bookings.

# **GitHub Link -**

GitHub Link here :

# **Problem Statement**


- Project Title: "Guest Booking Analysis."
- The objective of the project is to analyze the information about hotels and resort hotels in the city and explore the various options for booking hotels.
- The dataset contains important information, including date of booking, length of stay, number of adults, children, infants and parking spaces available, enabling a comprehensive insight into factors affecting hotel policy
- Notably, all personally identifiable information has been carefully removed from the data set to ensure data confidentiality and compliance.
- To optimize the analysis, the project will use popular libraries such as Pandas, Matplotlib, Seaborn, and NumPy, which provide robust functionality for data manipulation, visualization, and computational processing
- Through this hotel design analysis, participants will be able to identify important patterns, trends and relationships in the data, helping to better understand consumer behavior and hotel design trends.

#### **Define Your Business Objective?**

The business objective of the "Hotel Booking Analysis" project is to explore booking information for city hotels and resort hotels. The analysis aims to address questions such as the best time of year to book hotel rooms, the optimal length of stay for the best daily rate, and predicting hotels likely to receive a disproportionately high number of special requests. By gaining insights into these factors, the project aims to assist businesses in the hospitality industry to make data-driven decisions and enhance customer satisfaction.

# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required.
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits.
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 20 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule.

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Import Libraries
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np

### Dataset Loading

In [None]:
# Load Dataset
from google.colab import drive
drive.mount('/content/drive')

In [None]:
url = pd.read_csv('/content/drive/MyDrive/EDA PROJECT/Hotel Bookings.csv')
df = url

### Dataset First View

In [None]:
# Dataset First Look
df.head()

### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count
df.shape

### Dataset Information

In [None]:
# Dataset Info
df.info()

#### Duplicate Values

In [None]:
# Dataset Duplicate Value Count
duplicate_count = df.duplicated().sum()
print('Number of duplicate rows : ', duplicate_count)

In [None]:
duplicate_count_columns = df.duplicated(subset=['hotel', 'arrival_date_year']).sum()

# Display the count
print('Number of duplicate rows for columns "hotel" and "arrival_date_year":', duplicate_count_columns)

This code will give you the total number of duplicate rows based on the combination of values in the 'hotel' and 'arrival_date_year' columns. The duplicated() method with the subset parameter checks for duplicates only within the specified columns.

#### Missing Values/Null Values

In [None]:
# Missing Values/Null Values Count
df.isnull().sum()

In [None]:
# Visualizing the missing values

sns.heatmap(df.isnull(), cbar = False, cmap = 'viridis')
plt.title('Missing Values Heatmap')
plt.show()

* There are 4 missing values in the "children" column, which represents the number of children included in the booking. These missing values might indicate cases where the number of children was not specified during booking.
*  There are 488 missing values in the "country" column, which represents the country of origin of the guest. The missing values could be due to instances where the country information was not provided or recorded
* There are 16,340 missing values in the "agent" column, which represents the ID of the travel agency that made the booking. The missing values might indicate bookings made without the involvement of any specific travel agency.
* There are 112,593 missing values in the "company" column, which represents the ID of the company/entity making the booking or responsible for payment. The missing values may suggest individual bookings without the involvement of any company.

### What did you know about your dataset?



```
# This is formatted as code
```

* The dataset contains booking information for both city hotels and resort hotels. It includes various attributes such as date of booking, length of stay, the number of adults, children, and/or babies, available parking spaces, meal type, market segment, distribution channel, previous booking and cancellation history, deposit type, customer type, and more.
* The dataset consists of 119,390 rows and 32 columns.

## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns
df.columns

In [None]:
# Dataset Describe
df.describe()

### Variables Description

####hotel: Represents the type of hotel (Resort Hotel or City Hotel).
####is_canceled: Indicates whether the booking was canceled (1) or not canceled (0).
####lead_time: The number of days between the booking date and the arrival date.
####arrival_date_year: The year of the arrival date.
####arrival_date_month: The month of the arrival date.
####arrival_date_week_number: The week number of the arrival date.
####arrival_date_day_of_month: The day of the month of the arrival date.
####stays_in_weekend_nights: The number of weekend nights (Saturday or Sunday) the guest stays.
####stays_in_week_nights: The number of weekday nights (Monday to Friday) the guest stays.
####adults: The number of adults included in the booking.
####children: The number of children included in the booking.
####babies: The number of babies (infants) included in the booking.
####meal: The type of meal booked.
####country: The country of origin of the guest.
####market_segment: The market segment designation (e.g., Online TA, Offline TA/TO, Groups).
####distribution_channel: The booking distribution channel (e.g., Direct, TA/TO).
####is_repeated_guest: Indicates whether the guest is a repeated guest (1) or not (0).
####previous_cancellations: The number of previous bookings that were canceled by the guest.
####previous_bookings_not_canceled: The number of previous bookings that were not canceled by the guest.
####reserved_room_type: The type of room reserved.
####assigned_room_type: The type of room assigned to the guest at check-in.
####booking_changes: The number of changes made to the booking before check-in.
####deposit_type: The type of deposit made for the booking (e.g., No Deposit, Non-Refundable).
####agent: ID of the travel agency that made the booking.
####company: ID of the company/entity making the booking or responsible for payment.
####days_in_waiting_list: The number of days the booking was on the waiting list before it was confirmed.
####customer_type: The type of booking customer (e.g., Transient, Contract).
####adr: Average Daily Rate, the average revenue earned per room occupied.
####required_car_parking_spaces: The number of car parking spaces required by the guest.
####total_of_special_requests: The total number of special requests made by the guest.
####reservation_status: The current status of the booking (e.g., Canceled, Check-Out).
####reservation_status_date: The date of the last update to the reservation status.


### Check Unique Values for each variable.

In [None]:
# Check Unique Values for each variable.
df.columns.unique()

In [None]:
unique_values = {column : df[column].unique() for column in df.columns}
print(unique_values)

#### this code will provide a comprehensive view of the unique value present in each column. helping us  understand the  distint categorical and data distributation  for categorical variables as well as the range and spread of numerical variable

## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:
# Write your code to make your dataset analysis ready.
median_children = df['children'].median()
df['children'].fillna(median_children, inplace = True)

In [None]:
fill_Value = 'Unknown'
df['country'].fillna(fill_Value,inplace = True)
df['agent'].fillna(fill_Value,inplace = True)
df['company'].fillna(fill_Value,inplace = True)

####The purpose of filling missing values with 'Unknown' in these columns might be to maintain data completeness for further analysis. Instead of leaving missing values as NaN (Not a Number) or blank, filling them with a placeholder value like 'Unknown' allows better handling of the data during analysis, especially when considering categorical variables like "country," "agent," and "company."

In [None]:
df.isnull().sum()

####  by seeing the above data you have seen that i have replaced the missing values and there is no missing values in  any columnI addressed missing values in the dataset by replacing them with appropriate values. As a result of this data cleaning process, there are no missing values in any column, ensuring that the dataset is now complete and ready for further analysis and visualization.

### What all manipulations have you done and insights you found?

#####Checked for Duplicate Values: You calculated the total number of duplicate rows in the dataset, which was 31,994 duplicates.
#####Checked for Duplicate Rows with Specific Columns: You calculated the number of duplicate rows based on the combination of values in the 'hotel' and arrival_date_year' columns, which was 119,384 duplicates.
#####Addressed Missing Values: You replaced the missing values in the 'children' column with the median value, and in the 'country,' 'agent,' and 'company'
#####columns with the placeholder value 'Unknown.' This ensured there were no missing values in any column.

## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

#### Chart - 1

In [None]:
# Chart - 1 visualization code
sns.countplot(data = df , x= "stays_in_weekend_nights" )
plt.xlabel('Number of Weekend Nights')
plt.ylabel('count')
plt.title('Distribution of Stays in Weekend Nights')
plt.show()

##### 1. Why did you pick the specific chart?

#####This count plot will show the distribution of the number of weekend nights guests stay in the hotel, providing insights into the booking patterns and preferences related to weekend stays.

##### 2. What is/are the insight(s) found from the chart?

In [None]:
sns.countplot(data=df, x='arrival_date_year')
plt.xlabel('Year')
plt.ylabel('Count')
plt.title('Distribution of Bookings by Year')

plt.show()

#### This count plot will show the distribution of bookings across different years, allowing us to observe any trends or patterns related to the number of bookings over time

In [None]:
sns.countplot(data = df, x = 'stays_in_week_nights')
plt.xlabel('Number of Weekday Nights')
plt.ylabel('Count')
plt.title('Distribution of Stays in Weekday Nights')

plt.show()

##### This count plot will show the distribution of bookings based on the number of weekday nights guests stay in the hotel. It will provide insights into the booking patterns and preferences related to weekday stays.

In [None]:
sns.boxplot(data=df, x='is_canceled', y='adr', hue='hotel')
plt.xlabel('Canceled (1) or Not Canceled (0)')
plt.ylabel('Average Daily Rate (adr)')
plt.title('Box Plot: Cancellation Status vs. Average Daily Rate')
plt.show()

In [None]:
sns.barplot(data=df, x='hotel', y='stays_in_weekend_nights')
plt.xlabel('Hotel Type')
plt.ylabel('Number of Weekend Nights')
plt.title('BAR Plot: Weekend Nights Stayed by Hotel Type')
plt.show()

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

####The insights gained from this bar plot can be valuable for the hotel management to understand the preferences and booking behaviors of guests for weekend stays. They can use this information to optimize their pricing strategies, promotional offers, and hotel policies to attract more guests for weekend bookings. Additionally, it can help them tailor their services and amenities based on the expected length of stay for guests in each hotel type, leading to a positive business impact in terms of increased occupancy rates and guest satisfaction.



#### Chart - 2

In [None]:
# Chart - 2 visualization code
sns.catplot(data=df, x='reserved_room_type', hue='hotel', kind='count', col='hotel')
plt.xlabel('Reserved Room Type')
plt.ylabel('Count')
plt.suptitle('Reserved Room Type vs. City Name with Separate Plots for Hotel Types', y=1.02)
plt.show()

##### 1. Why did you pick the specific chart?

#### This plot will show the count of each reserved room type for  both hotel type creating seprate  plots for each hotel

#### The x-axis will represent the reserved room type for both hotel types (Resort Hotel And City Hotel)

####  The insights gained from this plot can help hotel management understand the distribution of room types for each hotel type and city.

##### 2. What is/are the insight(s) found from the chart?

#### The insight ganied from this plot can help hotel managment understand the distribution of room type for each hotel type and city
#### It can provide valuable information on which room type are more popular in each city and hotel category

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

#### The gained insights from the visualizations have the potential  to create a positive bussiness impact by guiding data driven decisions and enchancing various  aspects of the hotel bussiness by leveraging these insights the hotel can imporve customer satisfaction and operational efficiency leading to growth and success in the hospitality industry

#### Chart - 3

In [None]:
df['customer_type'].value_counts()

In [None]:
# Chart - 3 visualization code
sns.countplot(data = df , x = 'customer_type')
plt.xlabel('Customer Type')
plt.ylabel('count')
plt.title('Distribution of customer Types')
plt.show()

##### 1. Why did you pick the specific chart?

### This plot will help in understanding the distribution of different customer types who made booking in the hotel. it will show which customer types are more dominant and can help the hotel managment in tailoring their services and marketing strategies to cater to the needs of different customer segments.

##### 2. What is/are the insight(s) found from the chart?

#### This insight can lead to a positive business impact by enhancing customer satisfaction and loyalty, resulting in increased repeat bookings and positive word-of-mouth recommendations.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

#### Additional understanding the customer types can also guide revenue managment decision such as offering customers segments to maximize revenue and occupency rate.

#### Chart - 4

In [None]:
df['required_car_parking_spaces'].value_counts()

In [None]:
# Chart - 4 visualization code
sns.countplot(data = df, x = 'required_car_parking_spaces' )
plt.xlabel('Requried car parking spaces')
plt.ylabel('count')
plt.title('Distribution of Required Car Parking Spaces')
plt.show()

##### 1. Why did you pick the specific chart?

#### This plot will help in understanding the distribution of bookings based on the number of required car parking spaces. It will show how many bookings require different numbers of parking spaces, and whether most bookings do not require any parking space or if there is a demand for multiple parking spaces.

##### 2. What is/are the insight(s) found from the chart?

#### his insight can be useful for the hotel management to plan and allocate parking spaces accordingly. If there is a high demand for parking spaces, the hotel may consider expanding their parking facilities to accommodate more guests, which can lead to a positive business impact by enhancing guest satisfaction and attracting more guests with vehicles

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

#### if there is a low demand for parking spaces, the hotel can explore other ways to utilize the space effectively and optimize resources for better cost management.

#### Chart - 5

In [None]:
agent_count = df['agent'].value_counts().reset_index()
agent_count.columns = ['agent' , 'count']


In [None]:
# Chart - 5 visualization code
plt.figure(figsize=(10, 6))
sns.barplot(data = agent_count , x = 'agent',y = 'count', palette = 'viridis')
plt.xlabel('Agent')
plt.ylabel('Count')
plt.title('Number of Customers Booked Hotel through Each Agent')
plt.xticks(rotation=90)
plt.show()

##### 1. Why did you pick the specific chart?

#### This bar plot provides a clear view of the number of customers who booked the hotel through each agent, allowing us to identify the most popular and least popular agents based on the booking count.

##### 2. What is/are the insight(s) found from the chart?

####Agent Booking Distribution
####Most Preferred Agents
####Least Preferred Agents
####Agent Performance Evaluation

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

#### Understanding the distribution of bookings among different agents can help the hotel management strategize their marketing efforts and improve collaboration with top-performing agents. This can lead to a positive business impact by increasing the number of bookings and overall revenue for the hotel.

#### Chart - 6

In [None]:
# Chart - 6 visualization code
hotel_by_country = df.groupby(['country', 'hotel']).size().unstack()
hotel_by_country.plot(kind = 'bar', stacked = True,figsize=(12, 6))
plt.xlabel('Country')
plt.ylabel('Count')
plt.title('Distribution of Hotel Types within Each Country')
plt.xticks(rotation=90)

plt.legend(title='Hotel Type', loc='upper right', labels=['Resort Hotel', 'City Hotel'])
plt.tight_layout()
plt.show()

#### In the stacked bar plot, each country will be represented on the x-axis, and the bars will be stacked to show the distribution of hotel types (Resort Hotel and City Hotel) within each country. The different segments of the stacked bars will represent the count of bookings for each hotel type within the respective country.

In [None]:
heatmap_data = df.pivot_table(index='country', columns='hotel', values='is_canceled', aggfunc='count')
plt.figure(figsize=(10, 8))
sns.heatmap(heatmap_data, cmap='viridis', annot=True, fmt='g', cbar_kws={'label': 'Number of Bookings'})

plt.xlabel('Hotel Type')
plt.ylabel('Country')
plt.title('Count of Hotel Bookings by Country and Hotel Type')
plt.show()

##### 1. Why did you pick the specific chart?

#### In the heatmap, the countries will be represented on the y-axis, and the hotel types (Resort Hotel and City Hotel) will be represented on the x-axis. Each cell in the heatmap will show the count of hotel bookings for the corresponding country and hotel type. The color intensity in the cells will represent the count, with darker colors indicating higher booking counts.

##### 2. What is/are the insight(s) found from the chart?

#### Distribution of Hotel Types within Each Country: The stacked bar plot will provide insights into the distribution of hotel types (Resort Hotel and City Hotel) within each country. It will show which hotel type is more dominant in each country.

#### Count of Hotel Bookings by Country and Hotel Type: The heatmap will give a visual representation of the count of hotel bookings for each country and hotel type. It will help in identifying countries with higher booking counts for each hotel category.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

# Understanding the relationship between country and hotel bookings through different visualizations can have several business impacts

####  The insights can be used to tailor marketing strategies for each hotel type in different countries. This can help in targeted promotions and advertising campaigns

####Demand Analysis: The heatmap can help identify countries with high booking demand for each hotel type. This information can be used to allocate resources and optimize capacity planning accordingly.

####Overall, these visualizations can provide valuable insights into the booking preferences of guests from different countries and help the hotel management make data-driven decisions to enhance customer satisfaction and drive positive business outcomes.

#### Chart - 7

In [None]:
df['adults'].value_counts()

In [None]:
# Chart - 7 visualization code
average_adult_by_hotel = df.groupby('hotel')['adults'].mean().reset_index()
sns.barplot(data=average_adult_by_hotel, x='hotel', y='adults', palette='pastel')

plt.xlabel('Hotel Type')
plt.ylabel('Average Number of Adults')
plt.title('Relationship between Number of Adults and Hotel Type')
plt.show()


##### 1. Why did you pick the specific chart?

#### In this bar plot the x-axis reprsent the two hotel types (Resort Hotel And City Hotel) , While the y-axis represent the average number of adults for the corresponding hotel Type

##### 2. What is/are the insight(s) found from the chart?

#### this bar plot will provide insight into the average number of adult for each hotel type . ut can help identifi if there is any difference  in average number of adults between Resort Hotel  and City Hotel

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

# Understanding the relationship between the number of adults and the type of hotel can have several

#### **Room Capacity Planning**
####**Marketing and Advertising** : Knowing the average number of adults for each hotel type can help in targeting marketing campaigns and creating promotional offers that cater to the preferences of guests traveling with different group sizes.
#### **Pricing Strategies**
####**Customer Experience**

#### Chart - 8

In [None]:
df['meal'].value_counts()

In [None]:
# Chart - 8 visualization code
meal_by_country = df.pivot_table(index = 'country', columns = 'meal', values = 'is_canceled',aggfunc = 'count')
plt.figure(figsize=(12, 6))
sns.heatmap(meal_by_country, cmap='viridis', annot=True, fmt='g', cbar_kws={'label': 'Number of Bookings'})
plt.xlabel('Meal Type')
plt.ylabel('Country')
plt.title('Distribution of Meal Types within Each Country')
plt.tight_layout()
plt.show()

##### 1. Why did you pick the specific chart?

#### In this stacked bar plot, the countries will be represented on the y-axis, while the different meal types (e.g., BB, HB, FB) will be represented on the x-axis. Each cell in the heatmap will show the count of bookings for the corresponding country and meal type. The color intensity in the cells will represent the count, with darker colors indicating higher booking counts

##### 2. What is/are the insight(s) found from the chart?

#### The stacked bar plot will provide insights into the distribution of meal types for each country. It will show which meal type is more dominant in each country and any variations in preferences for meal types across different countries

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

# Understanding the relationship between the type of meal and the country of origin of the guest can have several business impacts

#### **Catering Services**:The insights gained from this plot can help hotels plan their catering services based on the preferred meal types of guests from different countries.

#### **Menu Planning**: The data can influence menu planning, offering a variety of meal options that cater to the preferences of guests from diverse backgrounds.


#### **Special Offers**: Understanding the preferred meal types for each country can help hotels create special offers and packages that include meals tailored to the preferences of specific guest segments.

#### Chart - 9

In [None]:
# Chart - 9 visualization code
df['market_segment'].value_counts()

In [None]:
sns.countplot(data = df , x = 'market_segment',palette='viridis')
plt.xlabel('Market Segment')
plt.ylabel('Count')
plt.title('Distribution of Bookings by Market Segment')
plt.xticks(rotation=90)
plt.tight_layout()
plt.show()

##### 1. Why did you pick the specific chart?

#### In the countplot, each bar is representing a market segment, and the height of the bar are indicateing  the number of bookings associated with this market segment.

##### 2. What is/are the insight(s) found from the chart?

#### The countplot will provide insights into the distribution of bookings across different market segments. It will help identify which market segments have higher booking counts and are more popular among guests.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

# Understanding the distribution of bookings across market segments can have several business impacts:

### **Targeted Marketing**:  Hotels can use this information to tailor their marketing strategies and promotional campaigns to target specific market segments more effectively.

### **Revenue Optimization**: By identifying market segments with higher booking counts, hotels can optimize their revenue management strategies to maximize profits.

#### Chart - 10

In [None]:
# Chart - 10 visualization code
plt.figure(figsize=(12, 6))
sns.countplot(data=df, x='market_segment', hue='reserved_room_type', palette='viridis')
plt.xlabel('Market Segment')
plt.ylabel('Count')
plt.title('Distribution of Reserved Room Types within Each Market Segment')
plt.xticks(rotation=90)
plt.legend(title='Reserved Room Type', loc='upper right')
plt.tight_layout()
plt.show()

##### 1. Why did you pick the specific chart?

#### In the grouped bar plot, each bar will represent a market segment, and the different colors within each bar will indicate the count of each reserved room type associated with that market segment.

##### 2. What is/are the insight(s) found from the chart?

#### The grouped bar plot will provide insights into the distribution of reserved room types within each market segment. It will help identify which room types are more popular in different market segments and whether certain room types are preferred by specific customer groups.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

# Understanding the relationship between market segments and reserved room types can have several business impacts:

###**Customer Segmentation**: Hotels can use this information to segment their customers based on their preferred room types and create targeted offers and promotions for different market segments.

### **Pricing Strategies**: The insights from this visualization can assist hotels in setting differentiated pricing strategies based on the room types preferred by various market segments.


#### Chart - 11

In [None]:
# Chart - 11 visualization code
sns.boxplot(data = df , x = 'market_segment', y = 'days_in_waiting_list', palette = 'viridis')
plt.xlabel('Market Segment')
plt.ylabel('Days in Waiting List')
plt.title('Box Plot: Days in Waiting List vs. Market Segment')
plt.xticks(rotation=90)
plt.tight_layout()
plt.show()

##### 1. Why did you pick the specific chart?

#### In the box plot, each box will represent a market segment, and the y-axis will represent the distribution of "days_in_waiting_list" for each market segment.

##### 2. What is/are the insight(s) found from the chart?

####  The box plot will provide insights into how the waiting list duration varies across different market segments. It will help identify any significant differences in the waiting list durations for guests from various market segments.


##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

# Understanding the relationship between market segments and the duration of days spent in the waiting list can have several business impacts:

### **Customer Service**: The insights gained from this plot can help hotels prioritize guests on the waiting list based on their market segment, ensuring higher levels of customer service and satisfaction.


### **Capacity Planning**: The waiting list data can be used to optimize capacity planning and make decisions about inventory management to accommodate guests on the waiting list.

#### Chart - 12

In [None]:
# Chart - 12 visualization code
sns.countplot(data = df , x = 'is_repeated_guest' , palette = 'viridis')
plt.xlabel('IS Repeated Guest')
plt.ylabel('count')
plt.title('Distribution of Repeated and Non-Repeated Guests')
plt.xticks(ticks = [0,1], label = ['Not Repeated' , 'Repeated'])
plt.show()

##### 1. Why did you pick the specific chart?

#### this visualization can assist hotels in understanding their customer base better and making informed decisions to enhance customer retention and drive positive business outcomes.



##### 2. What is/are the insight(s) found from the chart?

#### The count plot will provide insights into the distribution of guests who are repeated visitors to the hotel versus those who are not. It will help identify the proportion of repeat guests among all the bookings.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

# Understanding the proportion of repeated guests can have several business impacts:

### **Customer Service**: Knowing the percentage of repeat guests can help hotels focus on providing excellent customer service and personalized experiences to retain their loyal customers.


### **Marketing Strategies**: Hotels can tailor their marketing strategies to target both repeat and non-repeat guests differently, focusing on retention and acquisition strategies.

#### Chart - 13

In [None]:
# Chart - 13 visualization code
sns.boxplot(data = df , x = 'hotel', y = 'lead_time', palette = 'viridis')
plt.xlabel('Hotel Type')
plt.ylabel('Lead Time (Days)')
plt.title('Relationship between Lead Time and Hotel Type')
plt.show()

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

### The box plot will provide insights into the distribution of lead times for each hotel type. It will show the median, quartiles, and outliers for both Resort Hotel and City Hotel.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

### **Revenue Management**: This visualization can help hotels identify trends in lead times and plan their pricing and revenue management strategies accordingly. For instance, if guests tend to book City Hotel stays with shorter lead times, the hotel can consider offering last-minute deals to attract more bookings.

#### Chart - 14 - Correlation Heatmap

In [None]:
# Correlation Heatmap visualization code
plt.figure(figsize = (16,12))
correlation_matrix = df.corr()
sns.heatmap(correlation_matrix , annot = True, cmap = 'coolwarm', fmt = '2f')
plt.title('Correlation HeatMap')
plt.show()

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

#### The correlation heatmap will displaying  the correlations between different numerical variables in the dataset. The values in the heatmap range from -1 to 1, where -1 indicates a strong negative correlation, 1 indicates a strong positive correlation, and 0 indicates no correlation.

#### Chart - 15 - Pair Plot

In [None]:
# Pair Plot visualization code
numerical_columns = df.select_dtypes(include = 'number')

sns.pairplot(numerical_columns)
plt.title('Pair Plot of Numerical Variables')
plt.show()

##### 1. Why did you pick the specific chart?

## **Identifying Patterns**
## **Outlier Detection**

##### 2. What is/are the insight(s) found from the chart?

#### The pair plot will display scatter plots for each pair of numerical variables in the dataset. The diagonal of the pair plot will show the distribution of each individual numerical variable.

## **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ?
Explain Briefly.

##Based on the analysis of the "Hotel Booking Analysis" project, here are  some suggested solutions to achieve the business objectives:

####**Optimize Booking Policies**: Analyze the booking patterns and trends to identify the best time of year to book a hotel room and the optimal length of stay to get the best daily rate. This information can be used to optimize booking policies and offer targeted promotions during specific periods.

###**Enhance Customer Experience:** Use insights gained from customer preferences and behavior to tailor hotel services and amenities. For example, if guests tend to stay more on weekends, hotels can introduce special weekend packages or events to enhance the guest experience.

###**Predict Special Requests:** Utilize machine learning algorithms to predict whether a hotel is likely to receive a disproportionately high number of special requests. This can help hotels prepare in advance and allocate resources accordingly.

###**Targeted Marketing Strategies:** Leverage the insights on market segments, preferred room types, and meal preferences to design targeted marketing campaigns for different customer groups. This will help attract more bookings and improve customer satisfaction.


###**Resource Optimization:** Optimize the allocation of parking spaces based on the demand for parking facilities. If there is high demand for parking, consider expanding parking facilities to accommodate more guests, and if there is low demand, explore alternative uses for the space to optimize resources.

###**Collaboration with Preferred Agents:** Identify the most preferred and least preferred booking agents based on the booking distribution. Strengthen collaborations with top-performing agents to increase bookings and explore ways to improve relationships with less preferred agents.

###**Country-Specific Strategies:** Tailor marketing and pricing strategies based on country-specific booking patterns. This can help attract more bookings from specific countries and improve international guest satisfaction.

###**Capacity Planning:** Use insights on the number of adults and room types preferred by guests to optimize capacity planning. Ensure the availability of room types that are in high demand and plan staffing accordingly.

###**Enhance Revenue Management**: Utilize insights on lead times and booking trends to develop effective revenue management strategies. Offer last-minute deals or promotions during periods of low bookings to maximize revenue.

####**Repeat Guest Engagement:** Implement strategies to enhance repeat guest engagement and loyalty. Offer personalized packages or loyalty rewards to encourage repeat bookings and increase customer retention.

####By implementing these solutions, the "Hotel Booking Analysis" project can lead to improved guest satisfaction, increased occupancy rates, and better business performance for both city hotels and resort hotels.


# **Conclusion**

#In conclusion, the "Hotel Booking Analysis" project has provided valuable insights into the booking patterns and trends for city hotels and resort hotels. Through thorough exploratory data analysis, we have gained a deeper understanding of the factors that influence hotel bookings and customer preferences.

##***Key Findings:***

###**Seasonal Booking Trends:** The analysis revealed seasonal patterns in hotel bookings, with certain months experiencing higher booking volumes. Hotels can leverage this information to optimize pricing and promotions during peak and off-peak seasons.

###**Lead Time Impact:** The lead time, or the duration between booking and arrival, plays a crucial role in hotel bookings. Guests booking with shorter lead times tend to stay for a shorter duration. Hotels can offer last-minute deals to attract more bookings from this segment.

###**Weekend Stays:** Guests staying at city hotels tend to stay more on weekends, while those at resort hotels have longer stays during the week. Hotel management can customize offerings and events based on these preferences.

###**Room Type Preferences:** The distribution of room types varies between city hotels and resort hotels. Understanding these preferences can help optimize room capacity planning and resource allocation.

###**Country-Specific Booking Behaviors:** Different countries exhibit distinct booking behaviors and preferences. Hotels can tailor marketing strategies to attract more bookings from specific countries and enhance international guest satisfaction.

###**Agent Booking Distribution:** Identifying the most preferred and least preferred booking agents can guide marketing efforts and enhance collaborations with top-performing agents.

###**Meal Type Choices:** Meal preferences can differ between guests from various countries. Hotels can use this information to offer meal options that align with the preferences of their guests.

###**Parking Space Demand:** Understanding the demand for parking spaces can help optimize parking facilities, leading to improved guest satisfaction and resource management.

###**Market Segments:** Analyzing market segments and their booking behavior can guide targeted marketing efforts and pricing strategies.

####Overall, the insights gained from this analysis can help hotels in the hospitality industry make data-driven decisions to enhance customer satisfaction, increase occupancy rates, and improve business performance. By implementing the suggested solutions, hotels can create positive business impacts and ensure continued success in the competitive market.

### ***Hurrah! You have successfully completed your EDA Capstone Project !!!***