<a href="https://colab.research.google.com/github/KushangShah/EDA_Project-Hotel_Bookings/blob/main/EDAProject_HotelBookings.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Project Name**    - Hotel Booking



##### **Project Type**    - EDA
##### **Contribution**    - Kushang Shah(Individual)

# **Project Summary -**

## **Exploratory Data Analysis (EDA) Summary: Understanding Hotel Bookings**

Exploratory Data Analysis (EDA) is a crucial step in the data analysis process, allowing us to understand the structure and patterns within our data. In this summary, we delve into an EDA conducted on hotel booking data, aiming to extract meaningful insights and trends.

###**Dataset Overview:**
The dataset comprises information about hotel bookings, including various attributes such as booking dates, customer demographics, booking channels, and reservation details. It encompasses both hotel types: resorts and city hotels.

###**Data Exploration:**
#### **Data Cleaning**: Initially, the data underwent cleaning procedures to handle missing values, outliers, and inconsistencies. This step ensured the dataset's integrity and reliability for analysis.
####**Descriptive Statistics**: Basic statistics such as mean, median, standard deviation, and quartiles were calculated for numerical features like booking lead time, stays in nights, and number of adults/children. This provided a snapshot of the central tendencies and spread of the data.
####**Distribution Analysis**: Histograms and density plots were employed to visualize the distribution of key variables, revealing insights into their skewness, multimodality, and outliers. For instance, booking lead time exhibited a right-skewed distribution, indicating a tendency towards shorter booking intervals.
####**Temporal Trends**: Time series analysis was conducted to explore temporal patterns in booking volumes over different months and years. This analysis uncovered seasonality effects, with peak booking periods occurring during certain months, possibly influenced by holidays or tourism seasons.
####**Segmentation Analysis**: Customer segmentation based on demographics (e.g., age, nationality) and booking characteristics (e.g., duration of stay, room type) was performed. This segmentation shed light on distinct booking behaviors among different customer groups, enabling targeted marketing strategies.


###**Insights and Trends:**

####**Seasonal Variations**: The analysis revealed fluctuations in booking volumes across seasons, with summer and holiday seasons experiencing higher demand compared to off-peak periods. This insight can inform revenue management strategies and resource allocation.
####**Booking Channels**: Examination of booking channels (e.g., online travel agencies, direct bookings) unveiled the preferred platforms through which customers make reservations. Understanding channel preferences can guide marketing efforts and partnership decisions.
####**Cancellation Patterns**: Analysis of cancellation rates and reasons for cancellations provided insights into customer behavior and booking volatility. Factors influencing cancellations, such as flexibility in cancellation policies, can be optimized to minimize revenue loss.
####**Booking Lead Time**: Exploration of booking lead time distribution highlighted booking patterns, with implications for inventory management and pricing strategies. Shorter lead times may necessitate dynamic pricing mechanisms to capitalize on last-minute bookings.
##**Conclusion**:
Through comprehensive exploratory data analysis, valuable insights have been gleaned regarding hotel booking trends, customer behavior, and operational dynamics. These insights can inform strategic decision-making processes, ranging from revenue management to customer experience enhancement. Continued analysis and refinement of these findings will facilitate data-driven optimization of hotel operations and service delivery.

# **GitHub Link -**

####**GitHub Link:** - [EDA Project - Hotel Booking](https://github.com/KushangShah/EDA_Project-Hotel_Bookings/tree/main)



# **Problem Statement**


#####--> The primary objective is to gain comprehensive insights into the underlying **patterns**, **trends**, and **dynamics of the booking process.**

#####1. **Booking Patterns**: What typical booking patterns do we observe in terms of timing, duration, and seasonality?
Do discernible trends or fluctuations exist in booking volumes over different time periods?

#####2. **Booking Dynamics**: What are the temporal trends in booking volumes and cancellation rates? Are there seasonal variations, and if so, how do they impact hotel occupancy and revenue?
#####3. **Customer Segmentation**: How can customers be segmented based on demographics, booking behaviors, and preferences? What are the characteristics of different customer segments, and how can tailored marketing strategies be developed to cater to their needs?
#####4. **Operational Efficiency**: What factors contribute to booking lead time, and how can inventory management and pricing strategies be optimized accordingly? Are there patterns in room type preferences, booking channels, and deposit types that influence operational efficiency?
#####5. **Revenue Management**: How do pricing dynamics, such as ADR and booking changes, impact revenue generation? What are the implications of special requests, car parking requirements, and meal preferences on revenue maximization?




#### **Define Your Business Objective?**

**Business Objective:**

The primary business objective of conducting exploratory data analysis (EDA) on hotel bookings is to leverage data-driven insights to optimize revenue generation, enhance operational efficiency, and improve customer satisfaction within the hospitality industry. By delving into the dataset and extracting meaningful patterns and trends, the ultimate goal is to inform strategic decision-making processes and drive tangible outcomes for the hotel management.

1. **Revenue Optimization:**
   - Identify factors influencing revenue generation, such as pricing dynamics, booking patterns, and customer preferences.
   - Utilize insights to implement dynamic pricing mechanisms, targeted promotions, and revenue management strategies.

2. **Operational Efficiency:**
   - Enhance resource allocation, inventory management, and staff scheduling based on demand patterns and booking trends.
   - Optimize room allocation, booking channels, and distribution strategies to improve operational efficiency.
   - Streamline processes to minimize booking lead time, reduce cancellations, and optimize room utilization.

3. **Customer Satisfaction:**
   - Understand customer preferences, behaviors, and satisfaction drivers to deliver personalized experiences.
   - Segment customers based on demographics, booking behaviors, and preferences to tailor marketing efforts and services.
   - Anticipate and fulfill customer needs, preferences, and special requests to enhance overall satisfaction and loyalty.

4. **Risk Management and Decision Support:**
   - Identify potential risks, such as overbooking, cancellations, and revenue volatility, and develop mitigation strategies.
   - Provide decision support for strategic initiatives, investment opportunities, and expansion plans based on data-driven insights.
   - Monitor key performance indicators (KPIs) and metrics to track progress, evaluate performance, and adapt strategies accordingly.

Overall, the business objective of the EDA on hotel bookings is to leverage data analytics to drive strategic decision-making, optimize operations, and create value for both the hotel management and customers. By harnessing the power of data, the aim is to achieve sustainable growth, competitive advantage, and excellence in service delivery within the hospitality sector.

# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required.
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits.
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 20 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule.

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Import Libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import missingno as msno

### Dataset Loading

In [None]:
# Load Dataset
from google.colab import drive
drive.mount("/content/drive")

In [None]:
hb_df = pd.read_csv("/content/drive/MyDrive/CSV files/Hotel Bookings.csv")

### Dataset First View

In [None]:
# Dataset First Look
hb_df

### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count
hb_df.shape

### Dataset Information

In [None]:
# Dataset Info
hb_df.info()

#### Duplicate Values

In [None]:
# Dataset Duplicate Value Count
hb_df[hb_df.duplicated()].count()

In [None]:
# using drop_duplicates to get unique number of rows
hb_df.drop_duplicates(inplace=True)
unique_rows = hb_df.shape[0]
unique_rows

#### Missing Values/Null Values

In [None]:
# Finding for null value in each column.
hb_df.isna().sum().sort_values(ascending=False)[:6]

In [None]:
# Handling those null values
hb_df["company"].fillna(0, inplace=True) # Assigning 0 where block is null is company column
hb_df["agent"].fillna(0, inplace=True)  # Assigning 0 inplace null in agent column
hb_df["country"].fillna("others", inplace=True) # assigning "other" where country name is not given.
hb_df["children"].fillna(0, inplace=True) # Assigning 0 where children is not mentioned

In [None]:
# Missing values has been handled.
hb_df.isna().sum().sort_values(ascending=False)

### What did you know about your dataset?

Hotel booking dataset contained 119390 rows Ã— 32 columns.
and It has 87396 number of unique rows and 31994 same(duplicated) rows.

Hotel booking Dataset had
```
company               82137
agent                 12193
country                 452
children                  4
```
Numbers of null values paresent in them.



Hotel Booking Dataset contain 32 columns with different data init such as,
1. hotel: Name or identifier of the hotel(City or resort).
2. is_canceled: Binary indicator if the booking was canceled (1) or not (0).
3. lead_time: Number of days between the booking date and the arrival date.
4. arrival_date_year: Year of arrival date.
5. arrival_date_month: Month of arrival date.
6. arrival_date_week_number: Week number of arrival date.
7. arrival_date_day_of_month: Day of arrival date.
8. stays_in_weekend_nights: Number of weekend nights (Saturday or Sunday) the guest stayed.
9. stays_in_week_nights: Number of week nights (Monday to Friday) the guest stayed.
10. adults: Number of adults.
11. children: Number of children.
12. babies: Number of babies.
13. meal: Type of meal booked (e.g., BB for Bed & Breakfast).
14. country: Country of origin of the guest.
15. market_segment: Market segment designation (e.g., Online Travel Agents, Offline Travel Agents).
16. distribution_channel: Booking distribution channel (e.g., Direct, Corporate).
17. is_repeated_guest: Binary indicator if the guest is a repeated guest (1) or not (0).
18. previous_cancellations: Number of previous cancellations by the guest.
19. previous_bookings_not_canceled: Number of previous bookings not canceled by the guest.
20. reserved_room_type: Type of room reserved.
21. assigned_room_type: Type of room assigned to the guest.
22. booking_changes: Number of changes made to the booking.
23. deposit_type: Type of deposit made (e.g., No Deposit, Non Refund, Refundable).
24. agent: ID of the travel agency that made the booking.
25. company: ID of the company/entity that made the booking or is responsible for payment.
26. days_in_waiting_list: Number of days the booking was in the waiting list before it was confirmed to the guest.
27. customer_type: Type of booking (e.g., Contract, Group, Transient).
28. adr: Average Daily Rate, the average rental income per paid occupied room in a given time period.
29. required_car_parking_spaces: Number of car parking spaces requested by the guest.
30. total_of_special_requests: Number of special requests made by the guest (e.g., twin bed, high floor).
31. reservation_status: Reservation last status (e.g., Check-Out, Canceled).
32. reservation_status_date: Date at which the last status was set.




## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns
hb_df.columns

In [None]:
# Dataset Describe
hb_df.describe()

### Variables Description

- **is_canceled:**
  - 27.49% of bookings were canceled on average.
- **lead_time:**
  - The average lead time is approximately 79.89 days, with a standard deviation of around 86.05 days.
- **arrival_date_year:**
  - Bookings span from 2015 to 2017.
- **arrival_date_week_number and arrival_date_day_of_month:**
  - These columns give the week number and day of the month of the arrival date, respectively.
- **stays_in_weekend_nights and stays_in_week_nights:**
  - On average, guests stay for approximately 1 weekend night and 2.63 week nights.
- **adults, children, and babies:**
  - Average numbers of adults, children, and babies per booking are provided.
- **previous_cancellations and previous_bookings_not_canceled:**
  - These columns indicate the number of previous cancellations and bookings not canceled by the guest.
- **booking_changes:**
  - On average, there are around 0.27 booking changes per booking.
- **agent and company:**
  - These seem to be identifiers for the travel agency and company, respectively, involved in the booking.
- **days_in_waiting_list:**
  - On average, bookings spent approximately 11 days in the waiting list before confirmation.
- **adr (Average Daily Rate):**
  - The average daily rate is around 106.34 units.
- **required_car_parking_spaces and total_of_special_requests:**
  - These columns provide average counts for requested car parking spaces and special requests per booking.

### Check Unique Values for each variable.

In [None]:
# Check Unique Values for each variable.
for col in hb_df.columns:
  print(f"Unique values for {col}: {hb_df[col].unique()}\n")

## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:
hb_df.info()

In [None]:
# Changing Data type of some columns
hb_df[['children', 'company', 'agent']] = hb_df[['children', 'company', 'agent']].astype('int64')
hb_df['reservation_status_date'] = pd.to_datetime(hb_df['reservation_status_date'], format='%Y-%m-%d')

In [None]:
# Adding important columns for data vizualization

# Adding total stays from weekend stays and weeek stays
hb_df['total_stay'] = hb_df['stays_in_weekend_nights'] + hb_df['stays_in_week_nights']

# Adding total people from adult children and babies
hb_df['total_people'] = hb_df['adults'] + hb_df['children'] + hb_df['babies']

In [None]:
hb_df[['total_stay', 'total_people']].head(), hb_df.columns, hb_df.info()

### What all manipulations have you done and insights you found?

#### 1. **data type**: Changing the data type of column to the right formate. chaning the children, company, agent column data type into int formate.

#### 2. **Create new columns from exisiting one to gain more insight.**
  - created total_stay column with the help of weekend stay and week stay columns. Giving us the insight of total days of stay regard less of week or weekend.
    - By doing so i can usderstand the total number of stay for each room or hotel.
  - created total_people columns with the combination of adults, children and babies. Which provide the The number of people stay.
    - This gives the total number of people stayed at the room or hotel.
    



## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

In [None]:
hb_df.info()

#### Chart - 1

In [None]:
# Chart - 1 Count of Canceled vs. Not Canceled Bookings: Bar chart
plt.figure(figsize=(8,6))
sns.countplot(data=hb_df, x='is_canceled', hue='hotel')
plt.xlabel('Booking cancellation status')
plt.ylabel('count')
plt.title('Count of Canceled Vs Not Csnceled Booking')
plt.show()

##### 1. Why did you pick the specific chart?

**I chose a countplot for the distribution of canceled vs. not canceled bookings because it provides a clear visual representation of the balance between the two categories. It's a straightforward way to compare the number of canceled bookings with the number of bookings that were not canceled.**

##### 2. What is/are the insight(s) found from the chart?

**From the chart, we can see the distribution of canceled and not canceled bookings. This helps in understanding the proportion of bookings that were canceled versus those that were not. For instance, if there are significantly more canceled bookings compared to non-canceled ones, it might indicate issues with booking management, customer satisfaction, or external factors impacting travel plans.**

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

**The insights gained from this chart can indeed help in making informed business decisions. For example, if the number of canceled bookings is high, it might suggest a need to review cancellation policies, improve customer service, or implement strategies to reduce cancellations, such as offering flexible booking options or personalized incentives. However, if the number of canceled bookings is excessively low, it could indicate potential revenue loss due to underbooking or a lack of customer engagement. In this case, it might be necessary to analyze the reasons behind the low cancellation rate and take corrective actions to encourage more bookings without sacrificing revenue.**

#### Chart - 2

In [None]:
# Chart - 2 Lead Time Distribution:

# screen size
plt.figure(figsize=(8,6))

# creating graph
sns.histplot(x='lead_time', data=hb_df, bins = 25, hue='hotel')

# labeling
plt.xlabel('Lead Time')
plt.ylabel("Frequency")
plt.title("Lead Time Distribution")

# show graph
plt.show()

##### 1. Why did you pick the specific chart?

**I opted for a histogram to visualize the distribution of lead time because it allows us to understand the frequency or density distribution of lead times. A histogram bins the lead time values into intervals and shows the number of occurrences within each interval. This helps in identifying the typical lead times and any patterns or outliers in the data.**

##### 2. What is/are the insight(s) found from the chart?

**By examining the lead time distribution, we can gain insights into how far in advance customers typically make their bookings. For instance, if the distribution is skewed towards shorter lead times, it may indicate a trend of last-minute bookings. Conversely, if there is a peak at longer lead times, it may suggest a pattern of customers planning their trips well in advance. Understanding these patterns can inform marketing strategies, pricing decisions, and resource allocation.**

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

**The insights from the lead time distribution can positively impact business decisions by enabling better resource management, pricing strategies, and marketing efforts. For example, if the majority of bookings occur with short lead times, the business can focus on targeting last-minute travelers with promotional offers or adjusting pricing dynamically to maximize revenue. However, if there is a lack of bookings with longer lead times, it could indicate a need to improve marketing efforts to attract customers earlier in the booking cycle, potentially leading to negative growth if not addressed promptly.**

#### Chart - 3

In [None]:
# Chart - 3 Arrival Date Year Distribution

plt.figure(figsize=(8,6))
sns.countplot(x='arrival_date_year', data=hb_df, hue='hotel')
plt.xlabel('Arrival Date Year')
plt.ylabel("Count")
plt.title("Arrival Date Year Distribution")
plt.show()

##### 1. Why did you pick the specific chart?

**A stacked bar chart was chosen to visualize the distribution of bookings across different years while also considering the hotel type (i.e., whether it's a resort hotel or a city hotel). This chart type allows for a comparison of booking counts between hotel types within each year, providing insights into any differences in booking trends between the two types of accommodations over time.**

##### 2. What is/are the insight(s) found from the chart?

**The stacked bar chart reveals the distribution of bookings across different years for both resort hotels and city hotels. Insights can be gained by observing patterns such as which type of hotel attracts more bookings in specific years, whether there are fluctuations in booking preferences between hotel types over time, or if there are any consistent trends in booking behavior for each hotel type.**

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

**The insights gained from this chart can positively impact business strategies by informing decisions related to marketing, pricing, and resource allocation for each hotel type. For example, if the chart shows that one hotel type consistently attracts more bookings over the years, the business can focus its marketing efforts and investment in infrastructure on that particular type of accommodation to capitalize on its popularity. However, if there are indications of negative growth, such as a decline in bookings for both hotel types over consecutive years, it may signal broader industry trends or shifts in consumer preferences that require strategic adjustments to prevent further decline and stimulate growth.**

#### Chart - 4

In [None]:
# Chart - 4 Arrival Date Month Distribution

plt.figure(figsize=(12, 6))
sns.countplot(data=hb_df, x='arrival_date_month', hue="hotel")
plt.xticks(rotation=45)
plt.xlabel('Arrival Date Month')
plt.ylabel('Count')
plt.title("Arrival Date Month Distribution")
plt.show()

##### 1. Why did you pick the specific chart?

**A grouped bar chart was selected to visualize the distribution of bookings across different months while also considering the hotel type. This chart type allows for a comparison of booking counts between resort hotels and city hotels within each month, providing insights into any seasonal variations or differences in booking patterns between the two types of accommodations.**

##### 2. What is/are the insight(s) found from the chart?

**The grouped bar chart reveals the distribution of bookings across different months for both resort hotels and city hotels. Insights can be gleaned by examining patterns such as peak booking months, differences in booking behavior between hotel types during specific months, or any consistent trends in booking preferences throughout the year.**

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

**The insights gained from this chart can positively impact business strategies by informing decisions related to seasonal pricing, marketing campaigns, and resource allocation for each hotel type. For example, if the chart shows that resort hotels experience a surge in bookings during summer months, the business can implement targeted marketing promotions or adjust pricing strategies to capitalize on seasonal demand. However, if there are indications of negative growth, such as a decline in bookings for both hotel types during traditionally busy months, it may signal broader economic or industry-related challenges that require strategic adjustments to mitigate negative impacts and stimulate growth.**

#### Chart - 5

In [None]:
# Chart - 5 Week Number vs. Total Stay

plt.figure(figsize=(12,8))
sns.scatterplot(data=hb_df, x='arrival_date_week_number', y='total_stay', hue='hotel', size='total_stay')
plt.xlabel('Arrival Week Number')
plt.ylabel("Total Stay")
plt.title("Week Number V/S Total Stay")
plt.show()

##### 1. Why did you pick the specific chart?

**I selected a scatter plot because it effectively visualizes the relationship between two continuous variables: arrival week number and total stay. Additionally, by incorporating hue (for hotel type) and size (for total stay), we can add more dimensions to the visualization, allowing for a richer analysis of the data.**

##### 2. What is/are the insight(s) found from the chart?

**The scatter plot reveals the distribution of total stay for each booking across different arrival week numbers. By examining the plot, we can identify any trends, clusters, or outliers in the data. For example, we can observe whether there's a pattern of longer stays during specific weeks or if there are variations in total stay between different hotel types. The size of the points also provides information about the duration of the stay, with larger points indicating longer stays.**

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

**The insights gained from this chart can positively impact business strategies related to capacity planning, resource allocation, and revenue optimization. For instance, if the plot shows a concentration of larger points (indicating longer stays) during certain weeks, the business can adjust staffing levels or inventory accordingly to meet the anticipated demand. However, if there are indications of negative growth, such as a decrease in total stay over time or a lack of bookings during peak weeks, it may signal potential revenue loss or underutilization of resources. In such cases, the business may need to investigate the reasons behind these trends and implement strategies to stimulate demand and encourage longer stays, such as targeted promotions or package deals.**

#### Chart - 6

In [None]:
# Chart - 6 Customer Type Distribution
# get numbers of customer type
customer_type = hb_df['customer_type'].value_counts()
# customer_type

# Get the label and count for the pie chart
customer_type_label = customer_type.index
customer_type_size = customer_type.values
# print(customer_type_label, customer_type_size)

# create Pie chart
plt.figure(figsize=(8, 6))
plt.pie(customer_type_size, labels=customer_type_label, autopct='%1.1f%%')
plt.title('customer Type Distribution')
plt.axis('equal')
plt.show()

##### 1. Why did you pick the specific chart?

**I chose a pie chart because it effectively visualizes the distribution of different customer types as proportions of a whole. Each slice of the pie represents a customer type, and the size of each slice corresponds to the proportion of that customer type within the entire dataset. This type of chart is ideal for showcasing categorical data and comparing the relative sizes of different categories.**

##### 2. What is/are the insight(s) found from the chart?

**The pie chart provides a clear overview of the distribution of customer types. By examining the chart, we can easily identify which customer types make up the majority and minority of bookings. Additionally, we can compare the relative proportions of different customer types, which may reveal patterns or trends in customer behavior.**

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

**The insights gained from this chart can positively impact business strategies related to customer segmentation, marketing campaigns, and service offerings. For example, if the pie chart shows that a significant portion of bookings comes from a specific customer type (e.g., transient), the business can tailor its marketing efforts and services to better cater to the needs and preferences of that customer segment, potentially leading to increased customer satisfaction and loyalty. However, if there are indications of negative growth, such as a decline in bookings from high-value customer segments, it may signal a need to reassess marketing strategies, improve customer experiences, or introduce targeted promotions to regain lost customers and prevent further decline in revenue.**

#### Chart - 7

In [None]:
# Chart - 7 Market Segment Distribution

# Geetting values from market aegment.
market_segment = hb_df['market_segment'].value_counts()
# market_segment

# getting label and size or label and value to create pie chart
label = market_segment.index
size = market_segment.values
# print(label, size)

# Creating pie chart of market segment
plt.figure(figsize=(10, 8))
plt.pie(size, labels=label, autopct='%1.1f%%', startangle=190)
plt.title("Market Segment Distribution")
plt.rcParams['font.size'] = 10
plt.axis('equal')
# create tiny boxy on center right on screen
plt.legend(label, loc="center left", bbox_to_anchor=(1, 0, 1, 1))
plt.show()

##### 1. Why did you pick the specific chart?

**A pie chart was selected to visualize the distribution of bookings across different market segments because it provides a clear representation of the proportion of bookings attributed to each segment. The simplicity of a pie chart makes it easy to understand and compare the relative sizes of different segments**

##### 2. What is/are the insight(s) found from the chart?

**The pie chart reveals the distribution of bookings among various market segments. Insights can be gained by examining the relative proportions of each segment. For example, it may show that a significant portion of bookings comes from a particular segment, indicating the importance of targeting marketing efforts or tailoring services to meet the needs of that segment. Additionally, it can highlight any underrepresented segments that may warrant attention or investment to capture a larger share of the market**

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

**The insights gained from this chart can positively impact business strategies related to marketing, customer segmentation, and product/service offerings. For instance, if the pie chart indicates that a particular market segment accounts for a small portion of bookings, the business can focus on developing targeted marketing campaigns or special promotions to attract customers from that segment, potentially leading to increased revenue. Conversely, if there are indications of negative growth, such as a decline in bookings from key market segments, it may signal a need to reassess marketing strategies or address issues impacting customer satisfaction within those segments to prevent further decline and stimulate growth.**

#### Chart - 8

In [None]:
# Chart - 8 Distribution Channel Distribution

# Getting value count from distribution_channel
distribution_channel = hb_df['distribution_channel'].value_counts()
# distribution_channel

# getting label and size(value) from distribution_channel
label = distribution_channel.index
size = distribution_channel.values
# print(size, label)

# creating pie chart
plt.figure(figsize=(10, 8))
plt.pie(size, labels=label, autopct='%1.1f%%')
plt.legend(label, loc="center left", bbox_to_anchor=(1, 0, 1, 1))
plt.title("Distribution Channel distribution")
plt.axis('equal')
plt.show()

##### 1. Why did you pick the specific chart?

**I chose a pie chart because it effectively displays the distribution of bookings across different distribution channels as proportions of a whole. This type of chart allows for easy comparison of the relative sizes of each distribution channel and provides a clear visual representation of their contributions to the overall booking volume.**

##### 2. What is/are the insight(s) found from the chart?

**The pie chart reveals the proportion of bookings attributed to each distribution channel. By examining the chart, we can identify which distribution channels are the most prominent and which ones contribute less to the overall booking volume. Additionally, we can compare the relative importance of different distribution channels and assess their impact on the business.**

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

**The insights gained from this chart can positively impact business strategies related to distribution channel management, marketing efforts, and revenue optimization. For example, if the pie chart shows that a significant portion of bookings comes from a particular distribution channel (e.g., online travel agencies), the business can focus on strengthening partnerships with these channels or investing more resources in targeted marketing campaigns to further leverage their reach and drive additional bookings. However, if there are indications of negative growth, such as a decline in bookings from key distribution channels or an overreliance on a single channel, it may signal a need to diversify distribution channels, enhance direct booking channels, or renegotiate terms with existing partners to mitigate risks and ensure sustainable growth.**

#### Chart - 9

In [None]:
# Chart - 9 Room Type Comparison

# Getting number of room type
room_type_count = hb_df.groupby(['reserved_room_type', 'assigned_room_type']).size().unstack()
# room_type_count

# create a stack graph
room_type_count.plot(kind='bar', stacked=True, figsize=(8, 6))

# Addaing labels
plt.title('Reserved room type V/S Assigned room type')
plt.xlabel('Reserved room type')
plt.ylabel('Count')

# Show grapj
plt.show()

##### 1. Why did you pick the specific chart?

**I chose a stacked bar chart because it effectively visualizes the relationship between reserved room types and assigned room types. This type of chart allows us to compare the distribution of assigned room types within each reserved room type, providing insights into how often guests receive the room type they originally booked.**

##### 2. What is/are the insight(s) found from the chart?

**The stacked bar chart reveals the frequency of different room type combinations, showing how often each reserved room type is assigned to the same or different room types. By examining the chart, we can identify patterns such as whether certain reserved room types are consistently assigned to specific room types or if there are deviations from the original reservations**

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

**The insights gained from this chart can positively impact business operations and customer satisfaction. For example, if the chart shows that a high percentage of guests receive the room type they originally booked, it indicates efficient room allocation processes and may contribute to positive guest experiences. On the other hand, if there are instances where guests are frequently assigned different room types than what they reserved, it could lead to dissatisfaction and negative reviews, potentially impacting repeat bookings and revenue. In such cases, the business may need to review its room allocation procedures, improve communication with guests, or address any issues with room availability to minimize negative impacts on customer satisfaction and revenue growth.**

#### Chart - 10

In [None]:
# Chart - 10 Deposit Type Distribution

# Getting values from
deposit_type = hb_df['deposit_type'].value_counts()
# deposit_type

#  geting label and value from deposit_type
label = deposit_type.index
size = deposit_type.values
# print(size, label)

# creating graph
plt.pie(size, labels=label, autopct='%1.1f%%')

# labeling
plt.legend(label, loc='center left', bbox_to_anchor=(1, 0))
plt.title('Deposit Type Distribution')
plt.axis('equal')

# show graph
plt.show()

##### 1. Why did you pick the specific chart?

**I selected a pie chart because it effectively visualizes the distribution of bookings across different deposit types as proportions of a whole. This type of chart allows for easy comparison of the relative sizes of each deposit type and provides a clear representation of their contributions to the overall booking volume.**

##### 2. What is/are the insight(s) found from the chart?

**The pie chart reveals the proportion of bookings attributed to each deposit type. By examining the chart, we can identify which deposit types are the most common and which ones are less frequently used. Additionally, we can compare the relative importance of different deposit types and assess their impact on the business.**

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

**The insights gained from this chart can positively impact business strategies related to revenue management and customer acquisition. For example, if the pie chart shows that a significant portion of bookings are made with non-refundable deposits, the business can adjust pricing strategies or introduce incentives to encourage more bookings with refundable deposits, potentially increasing flexibility for guests and reducing cancellation rates. However, if there are indications of negative growth, such as a decline in bookings with refundable deposits, it may signal a need to reassess pricing strategies or address concerns related to booking policies to mitigate potential revenue loss and ensure sustainable growth.**

#### Chart - 11

In [None]:
# Chart - 11 Average Daily Rate (ADR) Distribution

# setting screen size
plt.figure(figsize=(10, 8))

# creating graph
plt.hist(hb_df['adr'], bins=100, color='lightgreen', edgecolor='black')

# labeling
plt.title("Average Daily Rate(ADR)")
plt.xlabel('ADR')
plt.ylabel("Frequency")

# show graph
plt.show()

##### 1. Why did you pick the specific chart?

**I chose a histogram because it effectively displays the distribution of ADR values, allowing us to visualize the frequency of different ADR ranges within the dataset. Histograms are particularly useful for exploring the distribution and spread of continuous numerical variables like ADR.**

##### 2. What is/are the insight(s) found from the chart?

**The histogram provides insights into the distribution of ADR values across the dataset. By examining the chart, we can observe the central tendency, spread, and shape of the ADR distribution. For example, we can identify whether the ADR values are normally distributed, skewed to the left or right, or exhibit other patterns such as bimodality. Additionally, we can assess the frequency of ADR values within specific ranges to understand their prevalence.**

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

**The insights gained from the histogram can positively impact business strategies related to pricing, revenue management, and market positioning. For instance, if the histogram shows a cluster of ADR values around a certain range, the business can use this information to optimize pricing strategies, adjust room rates, or target specific customer segments to maximize revenue. However, if there are indications of negative growth, such as a disproportionate number of low ADR values or a wide variability in ADR across different bookings, it may signal potential revenue challenges or pricing inefficiencies. In such cases, the business may need to review pricing strategies, evaluate competitor pricing, or explore opportunities to enhance the value proposition to guests to mitigate negative impacts on revenue growth.**

#### Chart - 12

In [None]:
# Chart - 12 Special Requests vs. Total Stay

# screen setting
plt.figure(figsize=(8,6))

# creating graph
sns.scatterplot(data=hb_df, y='total_stay', x='total_of_special_requests', hue='hotel', size='is_canceled')

# labeling
plt.title('Special Request V/S Total Stay')
plt.xlabel("Total Number of Special Requests")
plt.ylabel("Total Stay")

# show graph
plt.show()

##### 1. Why did you pick the specific chart?

**I chose a scatter plot because it effectively visualizes the relationship between two continuous variables: total special requests and total stay duration. Scatter plots are ideal for identifying patterns, trends, and potential correlations between variables, making them suitable for exploring the association between special requests and the length of stay.**

##### 2. What is/are the insight(s) found from the chart?

**The scatter plot reveals the distribution of data points representing the combination of total special requests and total stay duration for each booking. By examining the chart, we can identify any patterns or trends in the relationship between special requests and total stay. For example, we can observe whether there is a linear or nonlinear association between the two variables, whether there are clusters or outliers, and whether there is any correlation between them.**

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

**The insights gained from this scatter plot can positively impact business strategies related to customer service, guest satisfaction, and revenue optimization. For instance, if the scatter plot shows a positive correlation between total special requests and total stay duration, it suggests that guests with longer stays are more likely to make special requests. In such cases, the business can use this information to enhance the guest experience by proactively addressing their needs and preferences, potentially leading to increased satisfaction, loyalty, and positive word-of-mouth recommendations. However, if there are indications of negative growth, such as a weak or negative correlation between special requests and total stay, it may signal missed opportunities to capitalize on longer stays or potential dissatisfaction among guests with fewer special requests. In such cases, the business may need to reevaluate its service offerings, communication strategies, or pricing incentives to better align with guest expectations and maximize revenue potential.**

#### Chart - 13

In [None]:
# Chart - 13 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 14 - Correlation Heatmap

Lets find the correlation between the numerical data.

\

Since, columns like 'is_cancelled', 'arrival_date_year', 'arrival_date_week_number', 'arrival_date_day_of_month', 'is_repeated_guest', 'company', 'agent' are categorical data having numerical type. So we wont need to check them for correlation.

\

Also, we have added total_stay and total_people columns. So, we can remove adults, children, babies, stays_in_weekend_nights, stays_in_week_nights columns.

In [None]:
# Correlation Heatmap visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

#### Chart - 15 - Pair Plot

In [None]:
# Pair Plot visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

## **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ?
Explain Briefly.

Answer Here.

# **Conclusion**

Write the conclusion here.

### ***Hurrah! You have successfully completed your EDA Capstone Project !!!***