<a href="https://colab.research.google.com/github/Vipin1184/Hotel-Booking-Analysis/blob/main/EDA_project_file.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Hotel Booking Analysis**    -



##### **Project Type**    - EDA
##### **Contribution**    - Individual



# **Project Summary -**

In this project we have given a dataset regarding hotel booking and we have to perform EDA on this dataset to find customer behavior and patterns of hotel booking.The dataset contains two types of hotels one is resort hotel and other is city hotel.It contain details about customer's preffered booking channel. It also contain details about their arrival time and booking time. And it have details about their booking preferrences like room type. It also have details about thier special requests like space for car parking and meal etc.EDA contains data reading, data cleaning, univariate/bivariate analysis and data visualisations. By performing these steps we will know about patterns regarding booking.And lastly we have to list our insights and conclusion.

# **GitHub Link -**

Provide your GitHub Link here.
https://github.com/Vipin1184/Hotel-Booking-Analysis

# **Problem Statement**



Have you ever wondered when the best time of year to book a hotel room is? Or the optimal length of stay in order to get the best daily rate? What if you wanted to predict whether or not a hotel was likely to receive a disproportionately high number of special requests? This hotel booking dataset can help you explore those questions! This data set contains booking information for a city hotel and a resort hotel, and includes information such as when the booking was made, length of stay, the number of adults, children, and/or babies, and the number of available parking spaces, among other things. All personally identifying information has been removed from the data. Explore and analyse the data to discover important factors that govern the bookings.

#### **Define Your Business Objective?**

We have to Identify key factors influencing booking patterns and customer satisfaction to optimize revenue and enhance guest experience.

# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required.
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits.
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 20 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule.

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Import Libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from numpy import math

### Dataset Loading

In [None]:
# Mount google drive
from google.colab import drive
drive.mount("/content/drive")

In [None]:
# Load Dataset
file_path = "/content/drive/MyDrive/Hotel Bookings.csv"
df = pd.read_csv(file_path)
df

### Dataset First View

In [None]:
# Dataset First Look
df.head()

In [None]:
df.tail()

### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count
df.shape

### Dataset Information

In [None]:
# Dataset Info
df.info()

#### Duplicate Values

In [None]:
# Dataset Duplicate Value Count
df.duplicated().sum()

#### Missing Values/Null Values

In [None]:
# Missing Values/Null Values Count
df.isnull().sum()

In [None]:
# Visualizing the missing values
missing_values = df.isnull().sum()

plt.figure(figsize=(10, 6))
missing_values.plot(kind='bar')
plt.xlabel('Columns')
plt.ylabel('Number of Missing Values')
plt.title('Missing Values in Each Column')
plt.show()

### What did you know about your dataset?

The dataset given here is about hotel bookings. It contains information about hotel type, its cancellation, reservation etc. This dataset contains 32 columns and 119390 rows. It has some duplicate values and also missing values in some columns.Also some columns have irrelevant data types.


## ***2. Understanding Your Variables***

In [None]:
 # Dataset Columns
df.columns

In [None]:
# Dataset Describe
df.describe(include='all')

### Variables Description

* **Hotel** : Type of hotel(city,resort)
* **is cancelled** : Weather the booking was cancelled or not(0,1)
* **lead_time** : Number of days between booking and arrival.
* **arrival_date_year** : Year of arrival.
* **arrival_date_month** : month of arrival.
* **arrival_date_week_number** : week number of arrival.
* **arrival_date_day_of_month** : date of arrival.
* **stays_in_weekend_nights** : number of weekend nights stay(saturday,sunday)
* **stays_in_week_nights** : number of week nights stay other than saturday,sunday.
* **adults** : number of adults.
* **childs** : number of childs.
* **babies** : number of babies.
* **meal** : type of meal booked.
* **country** : country of customer.
* **market_segment** : market segment designation (online TA/TO, Offline TA/TO, direct, coorperate).
* **distribution_channel** : preffered booking channel(direct,coorperate,TA/TO)
* **is_repeated_guest** : whether the guest is repeated or not(0,1)
* **previous_bookings_not_cancelled** : number of previously not cancelled bookings.
* **reserved_room-type** : type of room booked.
* **assigned_room_type** : type of room assigned.
* **booking_changes** : number of changes in bookings.
* **deposit_type** : Type of deposit(e.g. no deposit,no refund)
* **agent** : ID of the travel agent
* **company** : ID of the company
* **days_in_waiting_list** : number of days the booking was on the waiting list
* **customer_type** : type of customer(transient,contract)
* **adr** : Average daily rate(price per day)
* **required_car_parking_spaces** : Number of car parking space required by customer
* **total_of_special_request** : number of special request made by customer.
* **reservation_status** : current status of booking(check out,cancelled)
* **reservation_status_date**: date of last reservation status


### Check Unique Values for each variable.

In [None]:
# Check Unique Values for each variable.
df.nunique()


## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:
# create a copy of original dataframe
df_copy = df.copy()
df_copy.shape

In [None]:
# drop duplicates
df_copy.drop_duplicates(inplace=True)
df_copy.duplicated().sum()

In [None]:
# checking the shape
df_copy.shape

In [None]:
# drop 'company' column
df_copy.drop(['company'],axis=1,inplace=True)

In [None]:
# drop 'agent' column
df_copy.drop(['agent'],axis=1,inplace=True)

In [None]:
# drop 'arrival_date_week_number' column
df_copy.drop(['arrival_date_week_number'],axis = 1, inplace = True)

In [None]:
# checking the shape
df_copy.shape

In [None]:
# Creating column 'total_person' and 'total_stays'
df_copy['total_person'] = df_copy['adults'] + df_copy['children'] + df_copy['babies']
df_copy['total_stays'] = df_copy['stays_in_week_nights'] + df_copy['stays_in_weekend_nights']
df_copy.head()

In [None]:
# Checking the info
df_copy.info()

In [None]:
# Changing the data type of 'reservation_status_date' to datetime datatype
df_copy['reservation_status_date'] = pd.to_datetime(df_copy['reservation_status_date'], format='%Y-%m-%d')

In [None]:
# Dropping the unnecessary columns
df_copy.drop(['adults','children','babies','stays_in_week_nights','stays_in_weekend_nights'],axis=1,inplace=True)


In [None]:
# Checking the shape
df_copy.shape

In [None]:
# Handelling the missing values
df_copy['country'] = df_copy['country'].fillna('others')
df_copy['total_person'] = df_copy['total_person'].fillna(0)

df_copy.isnull().sum()

### What all manipulations have you done and insights you found?


**Manipulations**

* First of all we made a copy of our original dataset so as to keep it as a reference point and whatever operation we do, it does not affect our original dataset.
* Then, since our dataset contains 31994 duplicate values, so we dropped the duplicated values.
* Then we dropped the column 'company' because it contains large number of missing values.
*  Then we dropped some irrelevant columns like: agent and arrival date week number because they were irrelevant for data analysis task.    
* And then we created two columns;
  1. 'total person' by adding children, adults and babies.
  2. 'total stays' by adding weeknights and weekend nights stays.

* After that we change the data type of reservation status date column to date time format.
* After that we dropped some extra  unnecessary columns like 'adults', 'children', 'babies', 'weekend night stays', 'weeknight stays' because they have been merged under the columns 'total person' and 'total stays'.
* And lastly we removed the missing values from the dataset.Since previously we had four columns with missing values and we dropped two of them which are 'agent' and 'company',So we are left with two columns one is 'country' and other is 'children' which has been merged under 'total person' column.To handel those missing values we replaced null values of 'country' column with 'others' and null values of 'total person' column with '0'.

**Insights**

* **Improved Data Integrity**: Removing duplicates ensures a cleaner dataset for analysis.

* **Enhanced Data Quality**: Dropping the 'company' column with its numerous missing values improves the overall quality of the dataset.

* **Streamlined Data**: Removing less relevant columns like 'agent' and 'arrival_date_week_number' focuses the analysis on key factors.

* **Simplified Analysis**: Creating the 'total_person' and 'total_stays' columns makes the comparisons simple.

* **Time-Based Insights**: Converting 'reservation_status_date' to datetime format enables time-series analysis, helping identify booking patterns over time.

* **Reduced Missing Values**: Handling missing values in 'country' and
'total_person' simplyfies the dataset for analysis.

* And finally our dataset is ready for visualisation.





## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

# Univariate analysis

#### Chart - 1


In [None]:
# Chart - 1 Visualising the percentage of bookings of both hotels.
exp2 = [0,0.05]
hotel_counts = df_copy['hotel'].value_counts()
plt.pie(hotel_counts.values, labels=hotel_counts.index, autopct='%1.1f%%', explode=exp2)
plt.title('Distribution of Bookings of Hotels')
plt.show()

##### 1. Why did you pick the specific chart?

For showing the percentage of bookings of city hotels and resort hotels.

##### 2. What is/are the insight(s) found from the chart?

City hotels have maximum number of bookings as compared to resort hotels.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes gained insights help creating a positive business impact on coty hotels.

**Positive insight:**
One can focus their marketing and sales effort on city hotels to increase more revenue and bookings.

**Negative insight:**
We can't solely focus on city hotels only.We have to find solution to increase booking in resort hotel and to make a balance in both segments.

#### Chart - 2

In [None]:
# Chart - 2 Visualising the percentage of cancellation.
cancellation_counts = df_copy['is_canceled'].value_counts()
plt.pie(cancellation_counts.values, labels=cancellation_counts.index, autopct='%1.1f%%')
plt.title('Distribution of cancellation and non cancellation')
plt.show()


##### 1. Why did you pick the specific chart?

We use this pie chart to show the percentage of cancellation and non cancellation.Here '0' means non cancellation and '1' means cancellation.

##### 2. What is/are the insight(s) found from the chart?

Here we found that 27.5% bookings got cancelled and 72.5% bookings are not cancelled.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes the gained insights help in creating a positive business impact.

**Positive impact:**
A lower cancellation rate generally leads to more revenue.It shows that customers are more satisfied with thier bookings.

**Negative impact:**
Here we see that 27.5% bookings got cancelled.Though the cancellation rate is low, but still we have to find the reasons behind canccellation.

#### Chart - 3

In [None]:
# Chart - 3 Visualising the bookings by Distribution channel
sns.countplot(x='distribution_channel', data=df_copy)
plt.title('Distribution of Bookings by Distribution Channel')
plt.xlabel('Distribution Channel')
plt.ylabel('Count')
plt.show()

##### 1. Why did you pick the specific chart?

for showing the distribution of distribution channels.

##### 2. What is/are the insight(s) found from the chart?

Here we found that maximum number of bookings occur through TA/TO channel.It shows people are more preffering TA/TO channel for their bookings.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes this shows positive business impact.

**Positive impact**
We have to focus our existing TA/TO partners to increase the revenue.

**Negative impact**
We have seen that except TA/TO channel other channels are not much popular among peoples. Depending too heavily on the TA/TO channel could make your business vulnerable if there are changes in the travel industry or if a major partner decides to stop working with you because they typically charge commissions, which can eat into your profit margins.



#### Chart - 4

In [None]:
# Chart - 4 visualising the count of room types.
sns.countplot(x='reserved_room_type', data=df_copy,color = 'purple')
plt.title('Count of Each Room Type Reserved')
plt.xlabel('Reserved Room Type')
plt.ylabel('Count')
plt.show()

##### 1. Why did you pick the specific chart?


we select this bar chart for showing distribution of reserved room type.

##### 2. What is/are the insight(s) found from the chart?

We found that room type 'A' is most preffered by the customers.This could be because of more features and best pricing of type 'A' room.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes,the insights gain can definitely help in positive business impact.

**Positive impact:** By ensuring a sufficient number of type 'A' room hotel can increse their revenue.

**Negative impact:** If the less popular room types remain vacant then it can affect overall revenue. So hotel should find reasons behind lower demand of certain rooms and explore strtegies to meet their demand.

#### Chart - 5

In [None]:
# Chart - 5 visualising the count of customer types.
customer_type_counts = df_copy['customer_type'].value_counts()
plt.barh(customer_type_counts.index, customer_type_counts.values,color = 'green')
plt.title('Distribution of Customer Types')
plt.xlabel('Count')
plt.ylabel('Customer type')
plt.show()


##### 1. Why did you pick the specific chart?

Because a bar chart allows easy comparison for showing the distribution of different customer type.

##### 2. What is/are the insight(s) found from the chart?

The chart shows that maximum number of bookings done by "Transient" customer type, followed by "Transient-Party" and then "Contract.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes the gained insights shows positive impact.

**Positive impact:** Hotels should focus on 'Transient' and 'Transient-party' type customers in order to generate more revenue.

**Negative impact:** Hotel should also focus on other customer types in order to balance the demand due to seasonality or economic changes.

#### Chart - 6

In [None]:
# Chart - 6 visualising the percentage of meal types.
plt.figure(figsize=(8,6))
meal_counts = df_copy['meal'].value_counts()
plt.pie(meal_counts.values, labels=meal_counts.index, autopct='%1.1f%%',explode = [0.05,0,0,0,0])
plt.title('Distribution of Meal Types')
plt.show()

##### 1. Why did you pick the specific chart?

A pie chart is an effective way to display the proportions of different meal types chosen by guests.

##### 2. What is/are the insight(s) found from the chart?

The chart clearly shows that 'BB' meal type plan is the most popular choice among guests.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes the gained insights shows positive impact.

**Positive impact:** Hotels should ensure a variety and quantity of "BB" meal type plan to meet the demand.

**Negative impact:** The chart also shows low popularity of 'HB' , 'FB' and 'SC' meal plans. This could lead to missed revenue opportunities.

#### Chart - 7

In [None]:
# Chart - 7 visualising the count of bookings by market segment.
sns.countplot(x='market_segment', data=df_copy)
plt.title('Distribution of Bookings by Market Segment')
plt.xlabel('Market Segment')
plt.ylabel('Count')
plt.xticks(rotation=90)
plt.show()

##### 1. Why did you pick the specific chart?

To visualise the distribution of bookings across different market segments.

##### 2. What is/are the insight(s) found from the chart?

The chart shows that 'Online TA' market segment contributes the most in terms of bookings.This indicates that this channel is more preffered by customers for bookings.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes the gained insights shows some positive impact.

**Positive impact:** Hotels should strenthen thier relationship with Online and Offline Travel agents as they are the primary source of bookings in order to generate more revenue.

**Negative impact:** Hotels should develop strategies to increase bookings
through other channels to reach a broader range of customer segments. This can increase profit margins.

# Bivariate and Multivariate analysis

#### Chart - 8

In [None]:
# Chart - 8 visualising the number of bookings on monthly basis.
month_order = ['January', 'February', 'March', 'April', 'May', 'June', 'July', 'August', 'September', 'October', 'November', 'December']
df_copy['arrival_date_month'] = pd.Categorical(df_copy['arrival_date_month'], categories=month_order, ordered=True)

monthly_bookings = df_copy.groupby('arrival_date_month')['hotel'].count()
plt.figure(figsize=(12, 6))
plt.plot(monthly_bookings.index, monthly_bookings.values, marker = 'o')
plt.xlabel('Month')
plt.ylabel('Number of Bookings')
plt.title('Monthly Booking Trend')
plt.grid(True)
plt.show()

##### 1. Why did you pick the specific chart?

Because i want to show the trends of bookings on monthly basis and a line chart is a good choice for that.

##### 2. What is/are the insight(s) found from the chart?

The graph shows that the higher number of bookings occured in months of may, june, july and august. This might be due to the summer holidays. Additionaly these months often have pleasant weather in many destination.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes the gained insights create some positive impact.
Knowing that May, June, July, and August are peak months hotels should adjust their pricing strategies and they can promote special offers and packages during these seasons to attract more guests in order to increase their revenue.

#### Chart - 9

In [None]:
# Chart - 9 visualising the top 10 countries in terms of bookings.
country_person = df_copy.groupby('country')['total_person'].sum().sort_values(ascending=False)
top_country = country_person.head(10)
plt.figure(figsize=(8,6))
sns.barplot(x=top_country.index, y=top_country.values)
plt.xlabel('Country')
plt.ylabel('Total_People')
plt.title('Top 10 countries vs total people')
plt.show()

##### 1. Why did you pick the specific chart?

We choose bar chart for showing the distribution of top 10 countries(categorical column) with total person(numerical column).And a bar chart is a good choice for that.

##### 2. What is/are the insight(s) found from the chart?

Here we can see that maximum number of peoples are from portugal(PRT).Thus more bookings were made from purtgal.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes the gained insights create a positive impact.

**Positive impact:** Knowing the top contributing countries hotels can tailor promotions, packages, and even language options to atrract those countries poeple.

**Negative impact:** Focusing solely on the top countries might lead to neglecting potential growth from other emerging markets.

#### Chart - 10

In [None]:
# Chart - 10 visualising the average adr of hotels on monthly basis.
adr_by_month_hotel = df_copy.groupby(['arrival_date_month', 'hotel'])['adr'].mean().unstack()
adr_by_month_hotel.plot(kind='bar')
plt.xlabel('Month')
plt.ylabel('Average ADR')
plt.title('Average ADR by Month and Hotel Type')
plt.show()

##### 1. Why did you pick the specific chart?

I picked this combination of bar chart to show the comparison of average daily rate (ADR) across different months,split by type of hotel.

##### 2. What is/are the insight(s) found from the chart?

Both city hotel and resort hotel have higher ADR during peak seasons like summer months.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes the gained insights create positive business impact.

**Positive impact:** Understanding seasonal trends hotels should adjust their pricing strategies to maximize revenue. They can increase prices during peak season and offer discounts or promotions during the off-season to attract more guests.



#### Chart - 11

In [None]:
# Chart - 11 visualising the total revenue of both hotels.
df_copy['revenue'] = df_copy['adr'] * df_copy['total_stays']
revenue_by_hotel = df_copy.groupby('hotel')['revenue'].sum()
revenue_by_hotel.plot(kind='bar')
plt.xlabel('Hotel Type')
plt.ylabel('Total Revenue')
plt.title('Total Revenue by Hotel Type')
plt.show()


##### 1. Why did you pick the specific chart?

To show the comparison of total revenue by hotel type and a bar chart is a good choice.

##### 2. What is/are the insight(s) found from the chart?

We found that city hotels are generating the more revenue.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes the gained insights creates positive impact.

**Positive impact:** Focus on refining pricing strategies for city hotels, especially during peak demand periods, to capture maximum revenue.And by enhancing the guest experience at city hotels to maintain high occupancy rates and positive reviews.

**Negative impact:** As resort hotels are generating less revenue so resort hotels should develop some strategies to increase their occupacncy rates.They might offer some discount and packages to attract customers.

#### Chart - 12

In [None]:
# Chart - 12 visualising the total revenue based on customer types.
df_copy['revenue'] = df_copy['adr'] * df_copy['total_stays']
revenue_by_hotel = df_copy.groupby('customer_type')['revenue'].sum()
revenue_by_hotel.plot(kind='bar')
plt.xlabel('Customer Type')
plt.ylabel('Total Revenue')
plt.title('Total Revenue by customer Type')
plt.show()

##### 1. Why did you pick the specific chart?

We choose bar chart for showing the distribution of total revenue by customer type.

##### 2. What is/are the insight(s) found from the chart?

The chart will clearly shows that Transient customer types generate the most revenue. This information is crucial to attract and retain these high-value segments

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes the gained insights create some positive impact.

**Positive impact:** Hotels should develop specific packages, promotions, or loyalty programs tailored to the needs and preferences of each valuable customer segment. This can enhance customer satisfaction and create a more personalized guest experience, leading to positive reviews and brand loyalty.

**Negative impact:** Focusing too heavily on a single customer type can make the business vulnerable.Develop a balanced approach that caters to a variety of customer types, reducing dependence on a single segment.

#### Chart - 13

In [None]:
# Chart - 13 visualising the count of cancellations of both the hotels.
grouped_df = df_copy.groupby(['hotel', 'is_canceled'])['hotel'].count().unstack()
grouped_df.plot(kind='bar', stacked = True)
plt.title('Distribution of booking cancellation based on hotel type')
plt.xlabel('Hotel type')
plt.ylabel('Count')
plt.xticks(rotation=0)
plt.show()
print('Here', '0 means not canceled' , '1 means canceled')

##### 1. Why did you pick the specific chart?

Because bar chart allows for easy comparison of cancellation rates between city hotels and resort hotels.  

##### 2. What is/are the insight(s) found from the chart?

We found that city hotels have higher cancellations than resort hotels and it must be because city hotels have higher bookings than resort hotels.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes the gained insights create some positive impact.We found that both hotel types have less cancellation than bookings.Hotels can implement strategies to minimize those cancellations, directly impacting revenue.By tailoring cancellation policies or fees based on hotel type can lead to increased revenue and reduced last-minute vacancies.Proactive communication with hotel guests can improve their overall experience, leading to better reviews and repeat bookings.

#### Chart - 14 - Correlation Heatmap

In [None]:
# Chart - 14 visualising the correlation between columns.
col = ['is_canceled', 'lead_time', 'is_repeated_guest', 'previous_cancellations', 'days_in_waiting_list', 'adr', 'total_person', 'total_stays']
corr_matrix = df_copy[col].corr()
plt.figure(figsize=(10, 8))
sns.heatmap(corr_matrix, annot=True, cmap='coolwarm')
plt.title('Correlation Heatmap')
plt.show()


##### 1. Why did you pick the specific chart?

Because heatmap is best chart to show correlation between the numerical columns.

##### 2. What is/are the insight(s) found from the chart?

1.   'lead_time' and 'is_canceled' show a moderate positive correlation, suggesting that bookings made further in advance are more likely to be canceled.
2.   'adr' (average daily rate) and 'total_person' have a positive correlation, indicating that bookings with more people tend to have higher average daily rates.
3.  'is_repeated_guest' and 'is_canceled' have a negative correlation, implying that repeat guests are unlikely to cancel their bookings.
4.   There is negative correlation of 'is repeated guest' with the columns of 'adr' , 'total person' and 'total stays'.   
5.   Most other variable pairs show weak or no significant correlation.
6.   Here highest positive coorelation is 0.39 which is between 'total person' and 'adr' and highest negative correlation is -0.16 which is between 'total person' and 'is repeated guest'.





#### Chart - 15 - Pair Plot

In [None]:
# Chart - 15 visualising the pairwise relations between numerical columns.
col = ['is_canceled', 'lead_time', 'is_repeated_guest', 'previous_cancellations', 'days_in_waiting_list', 'adr', 'total_person', 'total_stays']
sns.pairplot(df_copy[col])
plt.suptitle('Pair Plot')
plt.show()

##### 1. Why did you pick the specific chart?

To visualise the pairwise relationships between multiple numerical variables

##### 2. What is/are the insight(s) found from the chart?

#### Here the insights we found are similar to the insights of above heatmap chart.

1.   'lead_time' and 'is_canceled' show a moderate positive trend, suggesting that bookings made further in advance are more likely to be canceled.
2.   'adr' (average daily rate) and 'total_person' have a positive trend, indicating that bookings with more people tend to have higher average daily rates.
3.  'is_repeated_guest' and 'is_canceled' have a negative trend, implying that repeat guests are unlikely to cancel their bookings.
4.   There is negative correlation of 'is repeated guest' with the columns of 'adr' , 'total person' and 'total stays'.
5.   Most other variable pairs show weak or no significant relations.


## **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ?
Explain Briefly.

**Optimize pricing strategies:**

1.Hotels should adjust rates based on demand,seasonality and competitor pricing.

2.Offer discounts and packages during the off seasons to attract more guests.

**Enhanced guests experience:**

1.Focus on improving customer satisfaction for both city and resort hotels.

2.Personalised the guest experience based on customer preferences and needs.

**Diversify marketing and distribution channels:**

Develop strategies to increase bookings through other channels to reach broader raage of customer to reduce dependencies on a single channel.

**Optimize the resource allocation:**

1.Ensure sufficient availability of popular room types to meet the demand.

2.Explore strategies to increase the occupacy of other room types.

Lastly , monitor booking patterns, cancellations and guests preferrences to make informed decisions.

# **Conclusion**

1. City hotels have more bookings and cancellations than resort hotel.
2. In both the hotels bookings are highest during summer months(may-august).
3. Transient customers are the most valueable segment.
4. Online travel agents(TA/TO) are the primary booking channel.
5. City hotels are more popular and generate more revenue.
6. Room type 'A' and 'BB' type meal are the most popular among guests.
7. The average length of stay are 1 to 4.
8. Maximum number of bookings are from Portugal(PRT) followed by great britain(GBR) countries.

### ***Hurrah! You have successfully completed your EDA Capstone Project !!!***