# **Project Name**    -



##### **Project Type**    - EDA/Hotel Booking Analysis
##### **Contribution**    - Individual/Nisha


# **Project Summary -**

The hotel booking analysis project aims to explore and understand various aspects of the hotel industry, including booking trends, customer preferences, and market dynamics. By examining historical data and patterns, this analysis seeks to provide valuable insights to inform strategic decisions and optimize revenue generation for hotels.

**Key Components of the Analysis:**

**Booking Trends:** Analyzing historical booking data to identify patterns such as peak seasons, fluctuations in demand, and trends in booking volume. Factors such as seasonality, holidays, and major events will be examined to understand their impact on booking behavior.

**Customer Segmentation:** Segmenting customers based on demographics, behavior, and preferences to develop targeted marketing strategies and personalized offerings. By understanding the distinct needs and preferences of different customer segments, hotels can enhance customer satisfaction and loyalty.

**Channel Performance:** Evaluating the performance of various booking channels, including direct bookings, online travel agencies (OTAs), and third-party platforms. This analysis will help hotels optimize their distribution strategies and allocate resources effectively to maximize bookings and revenue.

**Cancellation Rates:** Analyzing cancellation rates to identify potential areas for improvement in booking policies, pricing strategies, or customer service. High cancellation rates may indicate issues with the booking process or dissatisfaction among customers, requiring remedial action.
Revenue Management: Implementing revenue management techniques to optimize pricing, inventory allocation, and distribution channels. This includes dynamic pricing strategies, demand forecasting, and competitor analysis to maximize revenue and profitability.

**Competitor Analysis**: Monitoring competitors' pricing, offerings, and marketing strategies to identify strengths, weaknesses, and opportunities for differentiation. Understanding competitor positioning and market trends is crucial for hotels to maintain a competitive edge in the industry.

**Customer Satisfaction:** Assessing customer satisfaction through feedback, reviews, and ratings to identify areas for improvement in service quality and guest experience. By addressing customer concerns and enhancing the overall guest experience, hotels can build long-term loyalty and reputation.

# **GitHub Link -**

Provide your GitHub Link here.

# **Problem Statement**


**Write Problem Statement Here.**
Despite leveraging insights from hotel data, the hospitality industry faces persistent challenges such as high cancellation rates, low customer retention, and over-reliance on single booking channels. These issues hinder hotels' growth and success in the competitive market landscape. Addressing these challenges requires practical solutions to improve operational efficiency and enhance guest experience.

#### **Define Your Business Objective?**

**Answer -** The objective of this project is to help hotels increase revenue and improve operations by reducing cancellation rates, increasing customer retention, diversifying booking channels, optimizing pricing strategies, and enhancing operational efficiency.

# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required.
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits.
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 20 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule.

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Import Libraries
import numpy as np
import pandas as pd
import math
from numpy import loadtxt
import seaborn as sns
import matplotlib.pyplot as plt


### Dataset Loading

In [None]:
# Load Dataset
df_dataset = pd.read_csv("/content/drive/MyDrive/Hotel Bookings.csv")


### Dataset First View

In [None]:
# Dataset First Look
df_dataset.head()

### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count
df_dataset.shape

### Dataset Information

In [None]:
# Dataset Info
df_dataset.info()


#### Duplicate Values

In [None]:
# Dataset Duplicate Value Count
num_duplicates = df_dataset.duplicated().sum()
print("Number of duplicate rows:", num_duplicates)

#### Missing Values/Null Values

In [None]:
# Missing Values/Null Values Count
Null_values = df_dataset.isnull().sum()
print("Number of Null values:",Null_values)


In [None]:
# Visualizing the missing values
sns.heatmap(df_dataset.isnull(), cbar=False, cmap ='YlGnBu')
plt.title('Missing Values in the Dataset')
plt.show()


### What did you know about your dataset?

Answer Here

## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns
df_dataset.columns


In [None]:
# Dataset Describe
df_dataset.describe(include='all')

### Variables Description

Answer Here

### Check Unique Values for each variable.

In [None]:
# Check Unique Values for each variable.
for column in df_dataset.columns:
    unique_values = df_dataset[column].unique()
    print(f"Unique values in {column}:", len(unique_values))

## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:
# check missing values
df_dataset.isnull().sum().sort_values(ascending = False)

Since the 'company' and 'agent' columns contain company numbers and agent numbers as data, there may be cases where customers didn't book a hotel via any agent or company, resulting in null values under these columns. In such cases, we replace the null values in the 'company' column with 'unknown' and the null values in the 'agent' column with 'no agent'. Additionally, there are also some null values in the 'country' column, so we fill those null values with 'No name'.

In [None]:
# Write your code to make your dataset analysis ready.

# null values fill with No name in country colunm
df_dataset['country'].fillna('No name', inplace=True)


In [None]:
# null values fill with Unknown in company colunm
df_dataset['company'].fillna('Unknown', inplace=True)

In [None]:
# null values present in agent colunm fill with no agent by (fillna)method
df_dataset['agent'].fillna('No agent', inplace=True)

In [None]:
df_dataset['children'].unique()

This column 'children' has 0 as value which means 0 children were present in group of customers who made that transaction. So, 'nan' values are the missing values due to error of recording data.

We will replace the null values under this column with mean value of children.

In [None]:
df_dataset['children'].fillna(df_dataset['children'].mean(), inplace = True)

In [None]:
# Checking if all null values are removed
df_dataset.isnull().sum().sort_values(ascending = False)[:6]

There are some rows with total number of adults, children or babies equal to zero. So we will remove such rows.

In [None]:
df_dataset.drop(df_dataset[df_dataset['adults'] + df_dataset['babies'] + df_dataset['children'] == 0].index, inplace = True)

In [None]:
# Visulization of data
df_dataset

# **Converting columns to appropriate datatypes.**

In [None]:
# changing datatype of column 'reservation_status_date' to data_type.
df_dataset['reservation_status_date'] = pd.to_datetime(df_dataset['reservation_status_date'], format = '%Y-%m-%d')

# **Add appropriate column to combine some column**

In [None]:
# Adding total staying days in hotels
df_dataset['total_stay'] = df_dataset['stays_in_weekend_nights'] + df_dataset['stays_in_week_nights']

# Adding total people num as column, i.e. total people num = num of adults + children + babies
df_dataset['total_people'] = df_dataset['adults'] + df_dataset['children'] + df_dataset['babies']

## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

#### Chart - 1

In [None]:
# Chart - 1 visualization code
canceled_counts = df_dataset['is_canceled'].value_counts()

# Create a pie chart
plt.pie(canceled_counts, labels=canceled_counts.index, autopct='%1.1f%%', colors=['lightblue', 'lightcoral'], startangle=90)

# Add title
plt.title('Distribution of Canceled and Not Canceled Bookings')

# Equal aspect ratio ensures that pie is drawn as a circle
plt.axis('equal')

# Show the plot
plt.show()


##### 1. Why did you pick the specific chart?

Answer Here.
I chose the pie chart for visualizing the distribution of canceled and not-canceled bookings because it effectively represents proportions and provides a clear visual comparison between the two categories.

##### 2. What is/are the insight(s) found from the chart?

Answer Here -
pie chart depicting the distribution of canceled and not-canceled bookings provides valuable insights into cancellation rates, revenue implications, operational considerations, customer behavior, and temporal trends, enabling hotels to make informed decisions and implement targeted strategies to improve performance and profitability.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here - The insights gained from analyzing the distribution of canceled and not-canceled bookings can indeed lead to positive business impacts if utilized effectively. However, there are also potential insights that, if not addressed appropriately, could result in negative growth.

#### Chart - 2

In [None]:
# Chart - 2 visualization code
# Histogram for vizulization of lead time:
lead_time_data = df_dataset['lead_time']
plt.hist(lead_time_data, bins=30, color='skyblue', edgecolor='black')

# Add labels and title
plt.xlabel('Lead Time (Days)')
plt.ylabel('Frequency')
plt.title('Distribution of Lead Times for Bookings')

# Show the plot
plt.show()

##### 1. Why did you pick the specific chart?

Answer Here. I chose a histogram to visualize the distribution of lead times for bookings because histograms are well-suited for displaying the frequency distribution of continuous data, such as lead times.

##### 2. What is/are the insight(s) found from the chart?

Answer Here - Histogram can reveal the most common lead times for bookings, whether there are any outliers or unusual patterns in lead times, and the overall spread or variability of lead times.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here - By identifying peak periods of booking lead times, hotels can adjust staffing levels and inventory management to meet demand more effectively. Additionally, understanding the distribution of lead times can inform pricing strategies and promotional efforts to maximize revenue during high-demand periods.

#### Chart - 3

In [None]:
# Chart - 3 visualization code
df3 = pd.DataFrame(grouped_by_hotel['days_in_waiting_list'].agg(np.mean).reset_index().rename(columns={'days_in_waiting_list': 'avg_waiting_period'}))

# Plotting
plt.figure(figsize=(8, 5))
plt.bar(df3['hotel'], df3['avg_waiting_period'], color='skyblue')
plt.xlabel('Hotel Type')
plt.ylabel('Average Waiting Period (Days)')
plt.title('Average Waiting Period by Hotel Type')
plt.xticks(rotation=30)  # Rotate x-axis labels for better readability
plt.show()

##### 1. Why did you pick the specific chart?

Answer Here. To know about the average waiting periods(days) of hotels.

##### 2. What is/are the insight(s) found from the chart?

Answer Here - City hotel has significantly longer waiting time, hence City Hotel is much busier than Resort Hotel.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here -  While insights into the average waiting period can help hotels identify opportunities for improving customer experience and operational efficiency, failure to address long wait times can lead to negative growth outcomes, including loss of competitiveness and damage to reputation. It's essential for hotels to proactively address issues related to wait times to drive positive business outcomes and ensure long-term success in the hospitality industry.

#### Chart - 4

In [None]:
# Chart - 4 visualization code
# Vizulization by BAR chart:
grp_by_country = df_dataset.groupby('country')
df = pd.DataFrame(grp_by_country.size()).rename(columns = {0:'no. of bookings'})
df = df[ :10]
sns.barplot(x = df.index, y = df['no. of bookings'])
plt.show()

##### 1. Why did you pick the specific chart?

Answer Here. - To see from which countries most of the customers visit these hotels?

##### 2. What is/are the insight(s) found from the chart?

Answer Here - Most of the customers from Angola and Argentina.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here - Knowing that most customers are from Angola and Argentina helps businesses advertise and offer services that these people like. This can make the business better and make more money. But if too many customers come from only these countries, the business might have trouble if something bad happens in those places. So, it's good for the business to also get customers from other countries to stay safe.

#### Chart - 5

In [None]:
# Chart - 5 visualization code
#Selecting and counting repeated customers bookings
repeated_data = df_dataset[df_dataset['is_repeated_guest'] == 1]
repeat_grp = repeated_data.groupby('hotel')
total_repeated_guests = repeat_grp.size()

# Counting total bookings
total_booking = df_dataset.groupby('hotel').size()

# Calculating repeat %
repeat_percentage = (total_repeated_guests / total_booking) * 100

# Plotting
plt.figure(figsize=(10, 5))
plt.bar(total_repeated_guests.index, repeat_percentage, color='pink')
plt.xlabel('Hotel Type')
plt.ylabel('Percentage of Repeated Guests')
plt.title('Percentage of Repeated Guests by Hotel Type')
plt.show()

##### 1. Why did you pick the specific chart?

Answer Here - I picked a  bar chart because it effectively displays the percentage of repeated guests for each hotel type in a clear and straightforward manner.

##### 2. What is/are the insight(s) found from the chart?

Answer Here - We see their is high percentage of repeated guests in resort hotels.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here - Hotels with higher percentages of repeated guests may focus on enhancing their loyalty programs, personalized marketing efforts, and customer service initiatives to further strengthen customer relationships and encourage repeat business. Hotels with lower percentages of repeated guests may experience decreased revenue and market share if they fail to address underlying issues affecting customer retention.

#### Chart - 6

In [None]:
# Chart - 6 visualization code
agent_counts = df_dataset['agent'].value_counts().nlargest(10)  # Change 10 to the desired number of top agents

# Plotting
plt.figure(figsize=(12, 6))
agent_counts.plot(kind='bar', color='purple')
plt.xlabel('Agent')
plt.ylabel('Number of Bookings')
plt.title('Number of Bookings per Agent')
plt.xticks(rotation=90)  # Rotate x-axis labels for better readability
plt.tight_layout()  # Adjust layout to prevent overlapping labels
plt.show()

##### 1. Why did you pick the specific chart?

Answer Here.  A bar plot is suitable for visualizing categorical data like agents and their respective counts.

##### 2. What is/are the insight(s) found from the chart?

Answer Here - Identification of the top agents with the most bookings: The chart highlights which agents are responsible for the highest number of bookings.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here - Identifying top-performing agents allows hotels to nurture these relationships and potentially negotiate better terms or incentives, leading to increased bookings and revenue.

#### Chart - 7

In [None]:
# Chart - 7 visualization code
# Vizulization by BAR chat:
grouped_by_hotel = df_dataset.groupby('hotel')
d1 = pd.DataFrame((grouped_by_hotel.size()/df.shape[0])*100).reset_index().rename(columns = {0:'Booking %'})      #Calculating percentage
plt.figure(figsize = (8,5))
sns.barplot(x = d1['hotel'], y = d1['Booking %'] )
plt.show()

##### 1. Why did you pick the specific chart?

Answer Here. To know about the percentage of bookings in each hotels(City hotels and Resort hotels).

##### 2. What is/are the insight(s) found from the chart?

Answer Here - Their are couples of people who likes to stay in city hotels.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here - The insight that there are couples of people who prefer to stay in city hotels can indeed have a positive business impact. Understanding the preferences of customers allows hotels to tailor their offerings and services to better meet the needs and expectations of their target market. By focusing on city hotels and catering to the preferences of customers who prefer urban accommodations, hotels can potentially attract more guests, improve occupancy rates, and increase revenue.
However, solely focusing on city hotels and neglecting the demand for resort hotels could potentially lead to negative growth.

#### Chart - 8

In [None]:
# Chart - 8 visualization code
# Convert 'reservation_status_date' column to datetime format
df_dataset['reservation_status_date'] = pd.to_datetime(df_dataset['reservation_status_date'])

# Sort DataFrame by 'reservation_status_date'
df_sorted = df_dataset.sort_values(by='reservation_status_date')

# Group by 'reservation_status_date' and 'reservation_status', and count the number of occurrences
status_counts = df_sorted.groupby(['reservation_status_date', 'reservation_status']).size().unstack(fill_value=0)

# Plotting
plt.figure(figsize=(12, 6))

# Plot each reservation status separately
for status in status_counts.columns:
    plt.plot(status_counts.index, status_counts[status], marker='o', label=status, linewidth=2)

plt.xlabel('Date')
plt.ylabel('Number of Bookings')
plt.title('Trend of Reservation Status Over Time')
plt.legend(title='Reservation Status')
plt.grid(True)
plt.tight_layout()
plt.show()

##### 1. Why did you pick the specific chart?

Answer Here. line chart is indeed an appropriate choice for visualizing trends over time


##### 2. What is/are the insight(s) found from the chart?

Answer Here - Their is high number of canceled booking in between 2015 to 2016.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here - By analyzing the trend of reservation status over time, hotels can gain insights into booking patterns, such as peak booking periods, seasonal variations, and overall booking trends. Understanding these patterns allows hotels to better allocate resources, adjust staffing levels, and optimize inventory management to meet fluctuating demand effectively.

A persistent upward trend in the number of canceled bookings over time may indicate issues with the booking process, dissatisfaction with hotel services, or external factors affecting travel plans.

#### Chart - 9

In [None]:
# Chart - 9 visualization code
group_by_dc = df_dataset.groupby('distribution_channel')
chart = pd.DataFrame(round(group_by_dc['lead_time'].median(),2)).reset_index().rename(columns = {'lead_time': 'median_lead_time'})
plt.figure(figsize = (7,5))
sns.barplot(x = chart['distribution_channel'], y = chart['median_lead_time'])
plt.show()


##### 1. Why did you pick the specific chart?

Answer Here. To see which channel is mostly used for early booking of hotels

##### 2. What is/are the insight(s) found from the chart?

Answer Here - TA/TO is mostly used for planning Hotel visits ahead of time. But for sudden visits other mediums are most preferred.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here -  Knowing that most people use travel agents or online travel agencies (TA/TO) to plan hotel visits in advance can help hotels tailor their marketing strategies and partnerships to reach potential guests during the planning phase. This insight could lead to positive impacts such as increased bookings and revenue. However, relying solely on TA/TO for bookings may limit the hotel's visibility to spontaneous travelers who prefer other mediums. If hotels don't adapt to reach these customers through alternative channels, they may miss out on potential revenue opportunities, leading to negative growth.

#### Chart - 10

In [None]:
# Chart - 10 visualization code
# Grouping by hotel type and calculating the sum of adults, children, and babies
guest_counts = df_dataset.groupby('hotel')[['adults', 'children', 'babies']].sum()

# Plotting
guest_counts.plot(kind='bar', stacked=True, figsize=(10, 6))
plt.xlabel('Hotel Type')
plt.ylabel('Number of Guests')
plt.title('Distribution of Guests by Hotel Type')
plt.xticks(rotation=0)
plt.legend(title='Guest Type')
plt.tight_layout()
plt.show()

##### 1. Why did you pick the specific chart?

Answer Here. I chose to use a stacked bar chart to visualize the distribution of guests (adults, children, babies) by hotel type because it allows for a clear comparison of the total number of guests and the composition of guests (adults, children, babies) within each hotel type

##### 2. What is/are the insight(s) found from the chart?

Answer Here - By this chart we clearly see that adults have more stays in hotels in comparision to child and babies.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here - Understanding the demographic profile of guests in different hotel types allows hotels to tailor their marketing strategies and promotions to target specific customer segments effectively.

If a hotel relies heavily on one demographic group (e.g., families with children) and neglects other potential guest segments, it may limit its market reach and growth opportunities.

#### Chart - 11

In [None]:
# Chart - 11 visualization code
# Grouping by arrival date month and calculating the count of bookings
monthly_bookings = df_dataset.groupby('arrival_date_month').size().reset_index(name='bookings_count')

# Sorting the data by month order
months_order = ['January', 'February', 'March', 'April', 'May', 'June', 'July', 'August', 'September', 'October', 'November', 'December']
monthly_bookings['arrival_date_month'] = pd.Categorical(monthly_bookings['arrival_date_month'], categories=months_order, ordered=True)
monthly_bookings = monthly_bookings.sort_values('arrival_date_month')

# Plotting
plt.figure(figsize=(10, 6))
sns.barplot(data=monthly_bookings, x='arrival_date_month', y='bookings_count', palette='viridis')
plt.xlabel('Month')
plt.ylabel('Number of Bookings')
plt.title('Number of Bookings per Month')
plt.xticks(rotation=45)
plt.show()

##### 1. Why did you pick the specific chart?

Answer Here. A bar chart is indeed suitable for visualizing the comparison of bookings in different months.

##### 2. What is/are the insight(s) found from the chart?

Answer Here - By this chart we clearly that couples of people like to stay in hotels in august.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here - Understanding the booking patterns across different months provides insights into seasonal demand fluctuations. This information can help hotels better anticipate peak seasons and adjust their marketing strategies, pricing, and staffing levels accordingly.

If there are significant disparities in booking volumes across different months, it may indicate seasonal vulnerabilities or challenges. For example, if a hotel experiences a sharp decline in bookings during the off-peak season, it could lead to revenue loss and decreased profitability.

#### Chart - 12

In [None]:
# Chart - 12 visualization code
group_by_dc_hotel = df_dataset.groupby(['distribution_channel', 'hotel'])
d5 = pd.DataFrame(round((group_by_dc_hotel['adr']).agg(np.mean),2)).reset_index().rename(columns = {'adr': 'avg_adr'})
plt.figure(figsize = (7,5))
sns.barplot(x = d5['distribution_channel'], y = d5['avg_adr'], hue = d5['hotel'])
plt.ylim(40,140)
plt.show()


##### 1. Why did you pick the specific chart?

Answer Here.  To See which distribution channel brings better revenue generating deals for hotels

##### 2. What is/are the insight(s) found from the chart?

Answer Here - GDS channel brings higher revenue generating deals for City hotel, in contrast to that most bookings come via TA/TO. City Hotel can work to increase outreach on GDS channels to get more higher revenue generating deals.

Resort hotel has more revnue generating deals by direct and TA/TO channel. Resort Hotel need to increase outreach on GDS channel to increase revenue.

# 3. Will the gained insights help creating a positive business impact?

 Are there any insights that lead to negative growth? Justify with specific reason.

Answer - The insights suggest that both the City hotel and the Resort hotel can optimize their revenue by leveraging different booking channels. Focusing on GDS channels for the City hotel and increasing outreach on GDS for the Resort hotel can diversify their revenue streams and potentially increase profitability. However, over-reliance on a single channel may lead to vulnerability if market conditions change or if there are disruptions in that channel. Therefore, while these insights can create positive business impacts in the short term, hotels should maintain a balanced approach to distribution channels to mitigate the risk of negative growth in the long te

#### Chart - 13

In [None]:
# Chart - 13 visualization code
data = grouped_by_hotel['lead_time'].median().reset_index().rename(columns = {'lead_time':'median_lead_time'})
plt.figure(figsize = (8,5))
sns.barplot(x = data['hotel'], y = data['median_lead_time'] )
plt.show()

##### 1. Why did you pick the specific chart?

Answer Here. To know the median lead time for each hotel type.

##### 2. What is/are the insight(s) found from the chart?

Answer Here - City hotel has slightly higher median lead time. Also median lead time is significantly higher in each case, this means customers generally plan their hotel visits way to early.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here - Knowing that people book hotels early can help hotels plan better and advertise more effectively. But if hotels rely too much on early bookings, they might have problems if too many people book at once or if they can't change prices easily. It's important for hotels to find a balance between booking early and being flexible to make the most money.

#### Chart - 14 - Correlation Heatmap

In [None]:
# Correlation Heatmap visualization code
# Selecting relevant numerical columns
numerical_columns = ['stays_in_weekend_nights', 'stays_in_week_nights', 'adults', 'children']

# Creating a correlation matrix
correlation_matrix = df_dataset[numerical_columns].corr()

# Plotting the heatmap
plt.figure(figsize=(10, 8))
sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm', linewidths=0.5)
plt.title('Correlation Heatmap of Numerical Variables')
plt.show()

##### 1. Why did you pick the specific chart?

Answer Here. - The correlation heatmap was chosen because it provides a visual representation of the correlation coefficients between numerical variables in the dataset.

##### 2. What is/are the insight(s) found from the chart?

Answer Here - The heatmap reveals insights into the relationships between the selected numerical variables. Each cell in the heatmap represents the correlation coefficient between two variables, with values ranging from -1 to 1.

#### Chart - 15 - Pair Plot

In [None]:
# Pair Plot visualization code
# Selecting relevant columns
relevant_columns = ['total_of_special_requests', 'booking_changes', 'arrival_date_week_number']

# Creating a copy of the DataFrame and filtering it to include only the relevant columns
df_filtered = df_dataset[relevant_columns].copy()

# Plotting the pair plot
sns.pairplot(df_filtered, diag_kind='kde', markers='.')
plt.suptitle('Pair Plot of Total Special Requests, Booking Changes, and Arrival Date Week Number', y=1.02)
plt.show()

##### 1. Why did you pick the specific chart?

Answer Here. The pair plot chart is chosen because it allows for the simultaneous visualization of pairwise relationships between multiple variables.


##### 2. What is/are the insight(s) found from the chart?

Answer Here - The pair plot provides insights into the relationships between booking changes and arrival date week numbers.

## **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ?
Explain Briefly.

**Diversify Customer Base**: By targeting customers from various countries, the client can reduce dependency on specific markets and mitigate the risk of revenue fluctuations due to factors impacting those markets.

**Optimize Pricing Strategy:** Adjusting pricing based on the correlation between ADR and total guests allows the client to maximize revenue while ensuring competitiveness, especially in the face of competition from City hotels.

**Enhance Guest Experience:** Improving service efficiency and reducing waiting times can lead to higher guest satisfaction, positive reviews, and repeat business, ultimately driving revenue growth.

**Implement Effective Marketing Strategies:** Tailoring marketing efforts based on customer booking behaviors and exploring alternative channels helps the client reach a wider audience and capitalize on revenue opportunities.

**Improve Loyalty Programs:** Investing in loyalty programs and service quality can foster customer loyalty, leading to repeat visits and positive word-of-mouth, which are crucial for sustainable revenue growth.

**Diversify Booking Channels:** Maintaining a balanced approach to booking channels reduces dependency on any single channel and minimizes the risk of revenue loss due to disruptions or changes in the market landscape.

**Address High Cancellation Rates:** Streamlining the booking process and implementing policies to reduce waiting times can lower cancellation rates and enhance customer satisfaction, ultimately leading to increased revenue.


By implementing these recommendations, the client can leverage the insights gained from the analysis to optimize operations, improve customer satisfaction, and drive revenue growth, thereby achieving their business objectives effectively.







# **Conclusion**

Write the conclusion here.

The analysis provided valuable insights that can guide strategic decision-making, optimize revenue generation, and enhance operational efficiency within the hotel industry. By implementing the recommendations outlined in the analysis, hotels can position themselves for sustainable growth and success in an increasingly competitive market landscape.







### ***Hurrah! You have successfully completed your EDA Capstone Project !!!***