# **Project Name**   - Hotel Booking Analysis



##### **Project Type**    - EDA on Hotel Bookings Dataset
##### **Contribution**    - Khalid Karim
##### **Team Member 1 -**
##### **Team Member 2 -**
##### **Team Member 3 -**
##### **Team Member 4 -**

# **Project Summary -**

Write the summary here within 500-600 words.

**Project Title**: Optimizing Hotel Operations through Data-Driven Insights

**Objective**:
The primary aim of this project is to analyze hotel booking data to identify trends, optimize pricing strategies, enhance customer satisfaction, and improve operational efficiency.

**Scope**:
This analysis will focus on data collected over the past three years, encompassing guest demographics, booking channels, occupancy rates, revenue metrics, and customer feedback.

***Key Components***:

**Data Collection**:

Importing data from the dataset, including reservation details, guest profiles, room inventory, and historical occupancy data.

**Data Cleaning and Preparation**:

Ensure data accuracy and completeness by removing duplicates, filling in missing values, and standardizing formats.

**Analysis Techniques**:

***Descriptive Analysis***: *Summarize historical booking patterns, including occupancy rates and average daily rates.
Predictive Analysis: Utilize time series forecasting to predict future occupancy and revenue trends based on historical data.
Segmentation Analysis: Identify distinct customer segments based on booking behavior and preferences.*
Key Metrics to Evaluate:

***Occupancy Rate***: *Calculate and analyze trends over time.
Revenue per Available Room : Assess financial performance.
Cancellation Rates: Examine patterns to mitigate future cancellations.
Length of Stay: Analyze typical stays to develop targeted marketing strategies.*

***Visualization:***

*Create dashboards using data visualization tools (e.g., Matplotlib, Seaborn) to present insights in an accessible format for stakeholders.*

***Recommendations:***

*Based on analysis, provide actionable recommendations to enhance pricing strategies, improve customer loyalty programs, and optimize marketing efforts.*

***Expected Outcomes:***

*Improved understanding of booking trends and customer behavior.
Enhanced pricing strategies leading to increased revenue.
Targeted marketing campaigns that resonate with identified customer segments.
Increased customer satisfaction through tailored services and improved operational practices.*

# **GitHub Link -**

Link - https://github.com/Khalid619-Goku/Exploratory-Data-Analysis-in-Hotel-Bookings/blob/main/README.md

# **Problem Statement**


**Write Problem Statement Here.**

#### **Define Your Business Objective?**

Answer Here.

# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required.
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits.
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 20 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule.

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Import Libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import ast
import seaborn as sns
import os

### Dataset Loading

In [None]:
# Load Dataset
from google.colab import drive
drive.mount('/content/drive')

In [None]:
# Load Dataset
database = r'/content/drive/My Drive/Hotel Bookings.csv'
df = pd.read_csv(database)

### Dataset First View

In [None]:
# Dataset First Look
df

In [None]:
# First 5 rows of the dataframe
df.head()

In [None]:
# Lastv 5 rows of the DataFrame
df.tail()

### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count
df_rows , df_columns = df.shape
print(f'No of rows are {df_rows}')
print(f'No of columns are {df_columns}')

### Dataset Information

In [None]:
# Dataset Info
df.info()

#### Duplicate Values

In [None]:
# Dataset Duplicate Value Count
duplicate_count= df.duplicated().sum()
print(f'The Dataset Duplicate Value count is {duplicate_count} rows')

#### Missing Values/Null Values

In [None]:
# Missing Values/Null Values Count
null_count = df.isnull().sum()
print(f'null value count per column is {null_count}\n')
print('The total Missing/Null value count is ', null_count.sum())

In [None]:
# Visualizing the missing values
plt.figure(figsize=(12, 6))
sns.barplot(x=null_count.index, y=null_count.values)
plt.title('Missing Values per Column')
plt.xlabel('Columns')
plt.ylabel('Number of Missing Values')
plt.xticks(rotation=90)
plt.show()

### What did you know about your dataset?

This Datast has a total of 129425 Missing/Null Values under which the columns Country, Agent and Company has 488 , 16340, 112593 Missing/Null Vaalues respectively.
Again most fo the data types are int64 , less are object and the least are float64.

## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns
df.columns

In [None]:
# Dataset Describe
df.describe()

### Variables Description

After observing the description of the data we got to know that the total entries were made are 119390.00. Lead time was on average of 104 days and average daily rate is 101. Average no of cancellations were 37%. People check-in the least on 1st day of a month but the rate is maximised on the last day of a month(specially the months from Nov-Jan) with weekends stays upto 19%.
55% are adults and 10% are children or babies.
Repeate guests are 1% and maximum cancellation rate is 26%.

### Check Unique Values for each variable.

In [None]:
# Check Unique Values for each variable.
unique_values = {col:df[col].unique() for col in df.columns}
unique_values
for col,values in unique_values.items():
  print(f'Unique values for {col} are {len(values)}')
  # print(f'Unique values for {col} are {values}')          This is to get the values

## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:
# Write your code to make your dataset analysis ready.
df.dropna(subset=['country','agent','company'],inplace=True)
df

In [None]:
df.isnull().sum()

### What all manipulations have you done and insights you found?

As I stated earlier I encountered null values in cloumn Country, Agent and Company. So, I dropped all the Null/Missing Values from the dataset as it could lead to a bias.

## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

#### Chart - 1

In [None]:
plt.figure(figsize=(10, 6))
sns.countplot(data=df, x='hotel')
plt.title('Number of Bookings by Hotel Type')
plt.xlabel('Hotel Type')
plt.ylabel('Number of Bookings')
plt.show()


##### 1. Why did you pick the specific chart?

I wanted to see which types of hotels are more engaging. For that I preffered a bar plot as I could clearly see the trend.

##### 2. What is/are the insight(s) found from the chart?

The insights where that Resort Hotels tend to generate more traffic than City oHtels. But it is unclear whether they can do it throughout the year or not. I have checked the hypothesis in the next cell.
These values are from 2015-2017

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

**Answer**- It is undeniable that Resort Hotels welcome way more guests than City Hotels. Even if they cant sustain it throughout the year still it suggests that City Hotels are relatively behind them.
So, it will be safe to say that Resort Hotels are currently booming in hospitality service sectors.

Answer Here

#### Chart - 2

In [None]:
# Chart - 2 visualization code for checking how much booking does a Resort Hotel gets
# Converting the values of the column to datetime
df['reservation_status_date'] = pd.to_datetime(df['reservation_status_date'])
# Mapping month and year from the column
df['month']=df['reservation_status_date'].dt.month
df['year']=df['reservation_status_date'].dt.year

resort_hotel=df[df['hotel']=='Resort Hotel']
city_hotel=df[df['hotel']=='City Hotel']

plt.figure(figsize=(12,8))
sns.histplot(data=resort_hotel,x='month',bins=12,kde=False,color='blue')
plt.title('Number of bookings for Resorts each month',bbox={'facecolor':'0.8','pad':3})
plt.xlabel('Month')
plt.ylabel('Number of Bookings')
plt.xticks(ticks=range(1,13),labels=['Jan','Feb','Mars','Apr','May','June','July','Aug','Sep','Oct','Nov','Dec'])
plt.show()
# df.columns

# Same Chart for the City hotels
plt.figure(figsize=(12,8))
sns.histplot(data=city_hotel,x='month',bins=12,kde=False,color='green')
plt.title('Number of bookings for City Hotels each month',bbox={'facecolor':'0.8','pad':3})
plt.xlabel('Month')
plt.ylabel('Number of Bookings')
plt.xticks(ticks=range(1,13),labels=['Jan','Feb','Mars','Apr','May','June','July','Aug','Sep','Oct','Nov','Dec'])
plt.show()

In [None]:
plt.figure(figsize=(12,8))
sns.countplot(data=resort_hotel,x='year',hue='hotel')
plt.title('Number of bookings for Resorts each year',bbox={'facecolor':'0.8','pad':3})
plt.xlabel('Year')
plt.ylabel('Number of Bookings')
# plt.xticks(ticks=range(1,13),labels=['Jan','Feb','Mars','Apr','May','June','July','Aug','Sep','Oct','Nov','Dec'])
plt.show()
# df.columns

# Same Chart for the City hotels
plt.figure(figsize=(12,8))
sns.countplot(data=city_hotel,x='year',hue='hotel')
plt.title('Number of bookings for City Hotels each year',bbox={'facecolor':'0.8','pad':3})
plt.xlabel('Year')
plt.ylabel('Number of Bookings')
# plt.xticks(ticks=range(1,13),labels=['Jan','Feb','Mars','Apr','May','June','July','Aug','Sep','Oct','Nov','Dec'])
plt.show()

##### 1. Why did you pick the specific chart?

Answer Here.

As I stated that the previous  barplot gave an insight on which types of hotels get more bookings. It showed that from 2015-2017 a significant amount of people likes Resort Hotels more than City Hotels. This raised a question that whether people are drawn to Resorts more than City Hotels throughout the year or not. To test this hypothesis I chose to create a Histogram for Resort Hotels and City Hotels to get the frequency of bookings through out the year.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

The two charts show that City Hotels perform well throughout the year where the resorts get bookings only on November but its way more than City Hotels which they get the whole year.
**Jan**- City hotels get 60% // Resort Hotels get almost 4%

**Feb**- City hotels get 10% // Resort Hotels get almost 4%

**Mar**- City hotels get 10% // Resort Hotels get almost 4%

**Apr**- City hotels get 0% // Resort Hotels get almost 1%

**May**- City hotels get 50% // Resort Hotels get almost 10%

**June**- City hotels get 50% // Resort Hotels get almost 2%

**July**- City hotels get 20% // Resort Hotels get almost 4%

**Aug**- City hotels get almost 100% // Resort Hotels get almost 3%

**Sept**- City hotels get 50% // Resort Hotels get almost 3%

**Oct**- City hotels get almost 100% // Resort Hotels get almost 4%

**Nov**- City hotels get 10% // ***Resort Hotels get almost 120%***

**Dec**- City hotels get 10% // Resort Hotels get almost 4%

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 3

In [None]:
# Chart - 3 visualization code
plt.figure(figsize=(10,6))
sns.histplot(data=resort_hotel,x='month',bins=12,kde=False,hue='is_canceled',palette='Paired_r')
plt.title('Number of bookings cancelled for Resort hotel each month', bbox={'facecolor': '0.8', 'pad' :3})
plt.xlabel('Month')
plt.ylabel('Number of Bookings')
plt.xticks(ticks=range(1,13),labels=['Jan','Feb','Mars','Apr','May','June','July','Aug','Sep','Oct','Nov','Dec'])
plt.show()
# df.columns

# Same Chart for the City hotels
plt.figure(figsize=(10,6))
sns.histplot(data=city_hotel,x='month',bins=12,kde=False,hue='is_canceled',palette='CMRmap')
plt.title('Number of bookings cancelled for City hotel each month', bbox={'facecolor': '0.8', 'pad' :3})
plt.xlabel('Month')
plt.ylabel('Number of Bookings')
plt.xticks(ticks=range(1,13),labels=['Jan','Feb','Mars','Apr','May','June','July','Aug','Sep','Oct','Nov','Dec'])
plt.show()

##### 1. Why did you pick the specific chart?

A countplot with a hue of cancellation status showed the contrast of bookings and cancellations with differenet hotel types.

##### 2. What is/are the insight(s) found from the chart?
I wanted to see the contrast between bookings and cancellations for both of the hotel Types.
It shows that For Resort Hotels the cancellation is highest on the month of November i.e 5%
On April the cancellation is around 2% but actually April has no records of booking. So the cancellation bar suggests that in April all the customerd cancelled their bookings.
And for city hotels the canellations are not that significant.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.


**Certainly** the above charts show that people tend to prefer the RESORT Hotels in Holiday Season i.e from late Oct to late January. Hence the cancellation rates are also high in these months specially in between Nov-Dec as shown in the figure.

As for CITY hotels their buisness remain constant through out the year.Although a hike can be seen in late Jan , August and October. This suggests that people who are travelling prefers city hotel and hence the travelling seasons are the above mentioned Months. Again maximum people stays in city  hotels during their work related trips.

Overall the insight is that Resorts do more buisness during the Holiday Season Globally and get immense profit.
While City Hotels do buisness for the most of the Yea except for the peAK HOLIDAY SEASON i.e Nov-Dec therefore the profit is still lagging behind Resorts.

**SUGGESTION** Some Resorts can partner up with City Hotels and help each other to upgrade.

#### Chart - 4

In [None]:
def threshold(dsitribution,threshold=3):
  filtered = dsitribution[dsitribution >= threshold]
  other = dsitribution[dsitribution < threshold].sum()
  if other > 0:
    filtered['Other'] = other
  return filtered
resort_distribution = resort_hotel['country'].value_counts()
city_distribution = city_hotel['country'].value_counts()
resort_distribution = threshold(resort_distribution)
city_distribution = threshold(city_distribution)


def pei_chart(dsitribution, title):
  plt.figure(figsize=(10, 12))
  plt.title(title)
  plt.pie(dsitribution, labels=dsitribution.index, autopct='%1.1f%%', startangle=90)
  plt.axis('equal')
  plt.legend(loc='upper left', bbox_to_anchor=(1, 0, 0.5, 1))
  plt.show()

pei_chart(resort_distribution, 'Resort Hotel')
pei_chart(city_distribution, 'City Hotel')
df['country'].value_counts()


grouped_data = df.groupby('country')[['adults', 'babies', 'children']].sum().reset_index()

# Display the grouped data
print(grouped_data)

##### 1. Why did you pick the specific chart?

Answer Here.
Pie chart is vey useful to get insights of the frequency of different variables in a visually appelling fromat.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

Plotted the country frequency data in a pie chart to see which nationality has chosen what type of Hotels.

**Resort Hotel**

Top 3 countries are -  38.25% Portuguese , 17.6% Australia, 11.8% France

**City Hotel**

top 3 countries are - 36.2% Portuguese, 21.3% Germany, 10.6% france.



**None of the contries had babies/children checked-in except for Portuegese and its safe to say that majority of the family checked-in at Resort Hotel.**





##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

Portuguese and France people tend to spend more on Holiday travels. They target luxurious resorts to spend quality time with their family. Therefore to get more engagement resort hotel as well as city hotel should emphasise more on creating a relaxing family environment providing different activities, good services, swimming pools, games , etc.

#### Chart - 5

In [None]:
#lets create a function which will give us bar chart of data respective with a columns

def get_count_from_column_bar(df, column_label):
  df_grpd = df[column_label].value_counts()
  df_grpd = pd.DataFrame({'index':df_grpd.index, 'count':df_grpd.values})
  return df_grpd



def plot_bar_chart_from_column(df, column_label, t1):
  df_grpd = get_count_from_column_bar(df, column_label)
  fig, ax = plt.subplots(figsize=(14, 6))
  c= ['g', 'r','b', 'c', 'y']
  ax.bar(df_grpd['index'], df_grpd[ 'count'], width = 0.4, align = 'edge', edgecolor = 'black', linewidth = 4, color = c, linestyle = ":", alpha = 0.5)
  plt.title(t1, bbox={'facecolor': '0.8', 'pad' :3})
  plt.legend()
  plt.ylabel('Count')
  plt.xticks (rotation = 15) # use to format the lable of x-axis
#plt.xlabel(column_label) f
plt.show()

plot_bar_chart_from_column(df, 'distribution_channel', 'Distibution Channel Volume')

##### 1. Why did you pick the specific chart?

Answer Here.
A  bar chart to visualise the hotel distribution channel.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

Hotels get highest customers (120%) from the corporate channel. Both online and offline TA/TO also provides a good amount (60%) of customers to the hotels.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

In the digital age to get more engagement online marketing is a key aspect and a strong networking/referral chain.

#### Chart - 6

In [None]:
average_adr =df.groupby( 'hotel')['adr'].mean()
average_adr
plt.subplots(figsize=(8, 5))
average_adr.plot(kind = 'barh', color = ('b', 'g'))
plt.xlabel("Average ADR", fontdict={'fontsize': 12, 'fontweight' : 5, 'color' :'Brown'})
plt.ylabel("Hotel Name", fontdict={'fontsize': 12, 'fontweight' : 5, 'color': 'Brown'} )
plt.title("Average ADR of Each Hotel types", fontdict={'fontsize': 12, 'fontweight' : 5, 'color': 'Black'},bbox={'facecolor':'0.8','pad':3})


##### 1. Why did you pick the specific chart?

Answer Here.
A horizontal barplot for the average ADR of hotel types gives a clear visualisation of the contrast.

##### 2. What is/are the insight(s) found from the chart?

Answer Here
City hotels have 120% more ADR throughout the year where as Resort Hotels have 55% ADR.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

It support the hypotheiss that people mostly single people tend to choose City hotels through out the year for work, travel, stay etc purposes because its cheaper and in the hearts of cities.

But when it comes to vacation and holidays the Resort hotels get more preference over the city hotels as people like calm relaxing environment away from the city to spend quality time with their families.

#### Chart - 7

In [None]:
# # # Chart - 7 visualization code
plt.figure(figsize = (8,5))
hotel_wise_revenue = df.groupby( 'hotel') [ 'revenue'].sum()
ax = hotel_wise_revenue.plot(kind = 'bar', color = ('gold', 'blue'))
plt.xlabel("Hotel", fontdict={'fontsize': 12, 'fontweight' : 5, 'color': 'Brown'})
plt.ylabel("Total Revenue", fontdict={'fontsize': 12, 'fontweight' : 5, 'color': 'Brown'})
plt.title("Total Revenue", fontdict={'fontsize': 12, 'fontweight' : 5, 'color': 'Green' } ,bbox={'facecolor':'0.8','pad':3})

##### 1. Why did you pick the specific chart?

Answer Here.
A bar plot to show the total revenue between different types of hotel.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

Even though the ADR of city hotel is pretty high than the resort hotels still total revenue generated by the resort hotel surpasses the city hotel by a significant amount.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

People tend to spend more with their families during holiday season even with the price hike.

#### Chart - 8

In [None]:
# Chart - 8 visualization code
plot_bar_chart_from_column(df, 'assigned_room_type', 'Assigment of room by type')

##### 1. Why did you pick the specific chart?

Answer Here.
Bar Plot is very usefull to visualise contrast between different variables.

##### 2. What is/are the insight(s) found from the chart?

Answer Here
Top 3 assigned room types are A, D, E

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 9

In [None]:
# Chart - 9 visualization code
def total_nights(df, hotel_type):
    df['total_stays_in_nights'] = df['stays_in_week_nights'] + df['stays_in_weekend_nights']
    filtered_data = df[(df['hotel'] == hotel_type) & (df['adr'] < 1000)]

    plt.figure(figsize=(12, 6))
    sns.lineplot(y='total_stays_in_nights', x='adr', data=filtered_data)
    plt.title(f'ADR vs Total Stay in Nights ({hotel_type})')
    plt.xlabel('Average Daily Rate (ADR)')
    plt.ylabel('Total Stay in Nights')
    plt.show()

total_nights(df, 'Resort Hotel')
total_nights(df, 'City Hotel')

##### 1. Why did you pick the specific chart?

Answer Here.

Line Plot shows trends between total stay in nights vs ADR

##### 2. What is/are the insight(s) found from the chart?

Answer Here
In resort hotel for 0-50 ADR the total stay in nights is 15 and it decreases afterwarwards gradually.

in city hotel fooor 0-50 the total stay in nights is a decreasing slope from 2 nights. But at 75-110 the total stay in nights touches a spike of 7 nights and 5 nights.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 10

In [None]:
# Chart 10 visualisation code
market_segment_df = pd.DataFrame(df['market_segment'])
market_segment_df_data = market_segment_df.groupby('market_segment')['market_segment'].count()
market_segment_df_data.sort_values(ascending=False, inplace=True)
plt.figure(figsize=(15,6))
y= np.array([4,5,6])
market_segment_df_data.plot(kind='bar', color=['r', 'g', 'y', 'b', 'pink', 'black', 'brown'], fontsize = 20, legend= 'True')

##### 1. Why did you pick the specific chart?

Answer Here.

Highlightens the differences between the variables of market segment.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

Corporate and Online TA tops the market segment.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 11

In [None]:
# Chart - 11 visualization code
repeated_guest_counts =df.groupby(['hotel','is_repeated_guest']).size().unstack(fill_value=0)
for hotel_type in repeated_guest_counts.index:
    counts = repeated_guest_counts.loc[hotel_type]
    plt.figure(figsize=(10, 6))
    plt.pie(counts, labels=['Not Repeated', 'Repeated'], autopct='%1.1f%%', startangle=90,explode=(0.1, 0)  )
    plt.title(f'Repeated Guests in {hotel_type}')
    plt.legend()
    plt.axis('equal')
    plt.show()

##### 1. Why did you pick the specific chart?

Answer Here.

Percentage wise categorisation of repeated guest in different hotels.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

Repeated Guest for Resort Hotel is 5.9% and for City hotel is 10.6%

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

People tend to spend more on Resort Hotels but only once, majority of them doest not like to repeat.

Whereas in city hotels the repeated guests are twice as many as resort hotels.

City hotels are way more convinient for people travelling daily because of their location at the hearts of city, cheap, and easy to commute.

#### Chart - 12

In [None]:
# Chart - 12 visualization code
df['total_stays_in_nights'] = df['stays_in_weekend_nights'] + df['stays_in_week_nights']
df['revenue'] = df['adr'] * df['total_stays_in_nights']

# Create the line plot
plt.figure(figsize=(12, 6))
sns.scatterplot(data=df, x='lead_time', y='revenue', hue='hotel')
plt.title('Lead Time vs Revenue for Each Hotel Type')
plt.xlabel('Lead Time (Days)')
plt.ylabel('Average Revenue')
plt.legend(title='Hotel Type')
plt.grid()
plt.show()

##### 1. Why did you pick the specific chart?

Answer Here.
Scatter plot to visualise the frequency of lead time vs revenue variable.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

Hotels generate more revenue when the lead time (days) is less.

Resort hotels collects the highest revenue over less lead time period.Whereas City hotels keeptheir prices stagnent irrespective of lead time.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 13

In [None]:
# Chart - 13 visualization code
sns.set(style="whitegrid")

# Create the violin plot
plt.figure(figsize=(12, 6))
sns.boxplot(x='assigned_room_type', y='adr', data=df)
plt.title('Distribution  Room Type')
plt.xlabel('Assigned Room Type')
plt.ylabel('adr')
plt.grid()
plt.show()

In [None]:
# Chart - 13 visualization code
sns.set(style="whitegrid")

# Create the violin plot
plt.figure(figsize=(12, 6))
sns.boxplot(x='assigned_room_type', y='reserved_room_type', data=df)
plt.title('Distribution  Room Type')
plt.xlabel('Assigned Room Type')
plt.ylabel('reserved')
plt.grid()
plt.show()

##### 1. Why did you pick the specific chart?

Answer Here.

Boxplot helps to identify the outliers which may cause bias in the data set.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

When the assigned room types were compared to ADR it showed that the most demanding rooms were A D and F and most of the people got it.
But room E has got more outliers suggesting that people who reserved it only few got assigned.

And when compared to Reserved room to Assigned room it showed that overall F and D are the highest reserved room which got assigned as well.
Here also the outliers are at room E suggesting the same

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

As we can see the most reserved rooms are A D F but we can aslo clearly see that people also like to get room E. But due to certain reasons they were not assigned the room E which may create a bad imoression and decrease the repeated guest engagement. Therefore hotels should prioritise on assigning the reserved rooms.

#### Chart - 14 - Correlation Heatmap

In [None]:
# Correlation Heatmap visualization code
numeric_df = df.select_dtypes(include=['float64', 'int64'])

# Calculate the correlation matrix
correlation_matrix = numeric_df.corr()

# Create a heatmap
plt.figure(figsize=(20, 18))
sns.heatmap(correlation_matrix, annot=True, fmt=".2f", cmap='coolwarm', square=True)
plt.title('Correlation Heatmap')
plt.show()

##### 1. Why did you pick the specific chart?

Answer Here.

Correlation Heatmap shows us the negative and positive relationship among all the variables.

##### 2. What is/are the insight(s) found from the chart?

Answer Here
There are many insights some of them are:

Revenue is positively related with total_stays_in_nights, total_stays_in_weekend_nights and total_stays_in_week_nights whereas very negatively related to arrival_date_year

#### Chart - 15 - Pair Plot

In [None]:
# Pair Plot visualization code
sample_df = df.sample(n=150)
relevant_columns = sample_df[['adults', 'children', 'babies', 'adr', 'lead_time','country','is_canceled','hotel']]
plt.figure(figsize=(16,10))
sns.pairplot(relevant_columns, hue='hotel', kind='reg')
plt.title('Pairplot of Hotel Data')
plt.show()

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

## **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ?
Explain Briefly.

Answer Here.

As we can see people tend to spend more when they are with their families in holidays and most of the bookings are down through croporate chain and online. Therefore the clients should boost their marketing online and build strong corporate chain network.

# **Conclusion**

Write the conclusion here.

### ***Hurrah! You have successfully completed your EDA Capstone Project !!!***