<a href="https://colab.research.google.com/github/GunjanKishore21/Data-Analysis-for-Hotel-Bookings/blob/main/eda_copy2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Project Name   - Hotel Booking Analysis**





##### **Project Type   - EDA**
##### **Contribution    - Individual**
##### **Name- Gunjan Kishore**

# **Project Summary -**

**This project aims to perform Exploratory Data Analysis(EDA) on a dataset of hotel booking which containing information about the booking pattern,cancellation rate,customer preferences and behaviour and other relevant factors.By conducting We have to identify the insights from it that can be useful in decision making,improve customer experience and optimise hotel management strategies.**





# **GitHub Link -**

Provide your GitHub Link here.

# **Problem Statement**


* Cleaning data by handling missing values,duplicates,data type inconsistencies and any other data related issue.
* Visualize the cleaned data to gain insights from them into booking patterns ,preferences in room type,market segment and other relevant factors.
* Generate visualization (like barplot,countplot,pieplot) to show the relation between the different variables.
* Identify the pattern or trend from the data that could be useful for business prospective.

#### **Define Your Business Objective?**


* Enhancing Customer Experience: Improving customer satisfaction by understanding customer preferences, such as room types, amenities, and booking channels, and tailoring services accordingly.

* Reducing Cancellations: Minimizing booking cancellations by analyzing cancellation patterns, implementing flexible policies, and offering incentives to encourage customers to keep their bookings.

* Increasing Market Share: Expanding market share by identifying key market segments, targeting specific customer groups, and developing marketing campaigns to attract new customers.

* Optimizing Operations: Streamlining operations by analyzing booking trends, optimizing staff allocation, and improving inventory management to reduce costs and enhance efficiency.

* Managing Seasonality: Managing seasonality by understanding peak booking periods, adjusting pricing and marketing strategies accordingly, and offering promotions during off-peak times to maintain a steady flow of bookings.

* Competitive Analysis: Monitoring competitor performance, pricing strategies, and customer reviews to stay competitive and identify opportunities for improvement.

* Sustainability Initiatives: Implementing sustainability initiatives, such as energy-efficient practices and waste reduction programs, to appeal to environmentally conscious customers and reduce operational costs.

# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required.
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits.
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 20 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule.

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Import Libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns


### Dataset Loading

In [None]:
# Load Dataset
df=pd.read_csv("Hotel Bookings (1).csv")

### Dataset First View

In [None]:
# Dataset First Look
df.head()

### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count
df.shape

### Dataset Information

In [None]:
# Dataset Info
df.info()

#### Duplicate Values

In [None]:
# Dataset Duplicate Value Count
df1=df[df.duplicated()]
len(df1)

#### Missing Values/Null Values

In [None]:
# Missing Values/Null Values Count
df.isnull().sum()

In [None]:
# Visualizing the missing values(chart-1)
sns.heatmap(df.isnull(),cbar=False)
plt.show()

In [None]:
# Visualizing the missing values(chart-2)
sns.barplot(df.isnull())
plt.xticks(rotation=90)
plt.show()

### What did you know about your dataset?




-This is the data set about hotel bookings which has (119390, 32) rows and columns.

*   Total no. of duplicated values in data set is=31994
*   Total no. of missing or null values is :-

> country =488

> children=4

> agent=16340

> company=112593

* Dataset includes of int,float,object(string) data types
*There is incorrect datatype in the  columns(reservation_status_date) of the dataset.














-

## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns
columns=df.columns
columns

In [None]:
# Dataset Describe
df.describe(include='all').T

### Variables Description

**Hotel**--->          H1=Resort hotel,H2=City hotel

**is_cancelled**--->   if the booking was cancelled(1) or not(0)

**lead_time**-->      Numbers of days that elapsed between the entering date of the booking into PMS and arrival date

**arrival_date_year**--> Year of arrival date

**arrival_date_month**--> Month of arrival date

**arrival_date_week_number**-->Week number of arrival date

**arrval_date_day_of_the_month**-->Day of arrival date

**stays_in_weekend_nights**-->Number of weekend nights(saturday or sunday)the guest stayed or booked to stay at the hotel

**stays_in_week_nights**-->Number of week nights(monday to friday)the guest stayed or booked to stay at the hotel

**adults**-->Number of adults

**children**--> Number of children

**babies**--> Number of babies

**meal**--> kind of meal opted for

**country**--> Country code

**market_segment**--> Which segment the customer belongs to

**distribution_channel**--> How the customer accessed the stay-corperate booking/direct/TA.TO

**is_repeated_guest**--> Guest coming for first time or not

**previous_cancellation**-->Was there a cancellation before

**previous_bookings_not_canceled**-->Count of previous bookings

**reserved_room_type**--> Type of room reserved

**assigned_room_type**--> Type of room assigned

**booking_changes**--> Count of changes made to booking

**deposit_type**--> Deposit type

**agent**--> Booked through agent

**company**-->The company that made the number of bookings by company ID

**days_in_waiting_list**-->Number of days in waiting list

**customer_type**--> Type of customer

**adr**-->Average daily rate

**required_car_parking**--> if car parking is required

**total_of_special_requests**-->Number of additional special requirements

**reservation_status**--> Reservation of status

**reservation_status_date**-->Date of the specific status


### Check Unique Values for each variable.

In [None]:
# Check Unique Values for each variable.
col=df.columns.tolist()                                                 # making a list of all column names of dataset.
for i in col:                                                           #iteration over list(i.e col)
 print("No. of unique values in ",i,"is-",df[i].nunique())
                                                                        #nunique=count the number of unique value in a column of dataset(df).

In [None]:
col=df.columns.tolist()
for j in col:
  print(f"Unique value in {j}:{df[j].unique()}")
  print('â€”'*100)

## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:
# Write your code to make your dataset analysis ready.
# creating a copy of dataset.
newdf=df.copy()
newdf.head()

In [None]:
#Dropping all the duplicates values from the dataset.

newdf.drop_duplicates(inplace=True)

In [None]:
#Checking shape of dataset after dropping duplicates.

newdf.shape

In [None]:
#Dropping column[agent,company] that is columns having maximum missing values.

newdf.drop(['agent','company'],axis=1,inplace=True)

In [None]:
#Checking columns
print(newdf.columns)

In [None]:
#Dropping all the null and NaN values.

newdf.dropna(inplace=True)


In [None]:
#Checking for Null values

newdf.isnull().sum()

In [None]:
sns.heatmap(newdf.isna())

In [None]:
# change the datatype
newdf['reservation_status_date']=pd.to_datetime(newdf['reservation_status_date'])


In [None]:
newdf.info()

In [None]:
#checking for outliers

for out in newdf.columns:
  sns.boxplot(newdf[out])
  plt.xticks(rotation=90)
  plt.figure(figsize=(0.1,0.2))
  plt.show()

In [None]:
newdf=newdf[newdf['adults']<50]
newdf=newdf[newdf['children']<10]
newdf=newdf[newdf['babies']<9]
newdf=newdf[newdf['days_in_waiting_list']<350]
newdf=newdf[newdf['adr']<5000]
newdf=newdf[newdf['lead_time']<700]


In [None]:
newdf.info()

### What all manipulations have you done and insights you found?

Answer Here.


* There are some duplicates in the dataset so we had removed them.

* There are excessive missing values in some columns(agents,company),so we also drop them.
* Also removed the NAN/NULL values from the dataset.


* Checked the outliers through boxplot and also removed them.





## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

#(Chart-1) Let see what is the average lead time in hotel?


In [None]:
sns.barplot(x=newdf['arrival_date_year'],y=newdf['lead_time'],hue=newdf['hotel'],palette=['olive','cyan'])
plt.title("Average Lead time per year in different hotels")
plt.show()

##### 1. Why did you pick the specific chart?

***--Bar plots are excellent for comparing values between two different categorical data or groups.It visualize the average lead time between the two hotel i.e city hotel and resort hotel per year in the form of bars.***
***
***The lenght of bar indicates the average lead time per year of the two hotels.This plot is simple and easy to understand by wide range of audience without any detail explanation.***











##### 2. What is/are the insight(s) found from the chart?

***We found that the average lead time of resort hotel is larger than city hotel and its getting increases every year where it is maximum in 2017.***
***
***We also visualize that the gap/difference between the bars(average lead time) of the two hotels getting reduce in every year.It was earlier maximum in 2015 but gets minimum upto 2017,but still resort hotel has large than city hotel.***
***
***The decreasing gap between the lead times of resort hotels and city hotels could indicate a shift in customer booking patterns. It may suggest that customers are increasingly booking city hotels further in advance, possibly due to changing travel preferences or improved marketing strategies targeting city hotels.***

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

***--A larger lead time for the resort hotel could indicate that customers are booking their stays further in advance, possibly due to high demand or limited availability. While this may suggest that the resort hotel is popular and in demand, it could also lead to increased customer satisfaction issues if guests prefer shorter lead times or encounter difficulty securing reservations.***
***
***If the gap between the lead times of the resort hotel and the city hotel is decreasing over time, it may suggest that the resort hotel is becoming more efficient in managing bookings and optimizing revenue. This could lead to better revenue performance for the business overall.***



#(Chart-2) Which hotel is mostly peferred by the people?



In [None]:
# Chart - 2 visualization code
# Count the number of hotels in each category
hotel_counts = newdf['hotel'].value_counts()
hotel_counts

In [None]:
# Create a pie chart
plt.pie(hotel_counts, labels=hotel_counts.index, autopct='%1.1f%%')
plt.title('Most Preferred Hotel')
plt.show()

##### 1. Why did you pick the specific chart?


***--Pie chart help us to see the proportion of data i.e City Hotel(61.4%),Resort Hotel(38.6%) and it is effective for categorical data.It is a circular plot which indicates the overall data where the different portion in the plot show the proportion of that specific data.***

##### 2. What is/are the insight(s) found from the chart?


***From the above chart,we can see that most of the people prefer city hotel than resort one i.e 61.4% customer prefer the city hotel.***
***
***City hotels are typically located in urban areas with easy access to attractions, restaurants, and public transportation. This insight could indicate that customers prefer the convenience and cultural experiences offered by city locations.***
***

***Preference for city hotels over resort hotels may vary seasonally. For example, city hotels may be more popular during business travel seasons or cultural events, while resort hotels may be preferred during leisure travel seasons.***

***

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.


***So,From this insight we clearly see that most of the people prefer city hotel than resort hotel,so we can plan and strategize keeping insights  in the mind for the future use.As more people preferring city hotel it might needed a efficient management,good staffs,more cleaniness so that it hold its position in competitive markets.***

# (Chart - 3) Proportion of cancellation

In [None]:
# Chart - 3 visualization code
cancel=newdf['is_canceled'].value_counts()
plt.pie(cancel,labels=['Not cancelled','Cancelled'],autopct='%1.2f%%',colors=['pink','lightblue'])
plt.title("Proportion of cancellation ")
plt.show()


##### 1. Why did you pick the specific chart?

***--Pie chart can be  suitable choice for this dataset as they are effective for showing the proportion of each category in the dataset.It is a circular form of plot in which the different category of categorical data represented by a proportion of the plot.***

##### 2. What is/are the insight(s) found from the chart?

***--From this chart we clearly see that 27.58% of booking were cancelled while the majority of booking i.e 72.42% were not cancelled out of the total bookings.***
***
***The majority of people across both resort and city hotels did not cancel their bookings. This suggests that most guests are satisfied with their bookings or have reasons to stick to their plans.***

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

***--This insight is valuable for understanding the overall cancellation behavior in your dataset and could inform strategies for managing booking cancellations in the future.We must also analysis the factors that causing cancellation and reduce them to reduce the rate of cancellation.***
***
***Offer incentives for guests to keep their bookings, such as discounts on additional services or future stays. This can encourage guests to maintain their reservations even if their plans change.***
***
***Provide excellent customer service throughout the booking process and stay. A positive experience can increase guest satisfaction and reduce the likelihood of cancellations.***
***
***Encourage guests to provide feedback and reviews after their stay. This can help identify areas for improvement and enhance the overall guest experience, reducing the likelihood of cancellations in the future.***







# (Chart - 4) Cancellation rate in City hotel Vs Resort Hotel

In [None]:
# Chart - 4 visualization code
plt.figure(figsize=(8,5))
sns.countplot(data=newdf,x='hotel',hue='is_canceled',palette=['lightgreen','orange'])
plt.legend(['Not Cancelled','Cancelled'])
plt.title("Cancellation rate in City hotel VS Resort hotel ")
plt.show()

##### 1. Why did you pick the specific chart?

***--This chart preferred to find out the frequency of each and every category of dataset.***
***It counts the variables in the dataset.***

##### 2. What is/are the insight(s) found from the chart?

**--From this chart we visualize that majority of the people had not cancelled the booking either in resort or city hotels.**
* ***Resort hotel:Cancellation rate in resort hotel is comparative less than city hotel which shows that people did not liking to cancel their booking.This could indicate that guests booking resort hotels are more committed to their reservations, possibly due to the nature of resort stays or the types of guests they attract.***
***
 * ***City hotel:City hotels have higher cancellation rate than resort hotel.It might suggest that there may be some factors that will be responsible for leading the higher cancellation,such as business travelers changing plans, last-minute changes in itinerary, or the availability of a wider range of accommodation options in urban areas, making it easier for guests to switch hotels.***


##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

***To reduce cancellations, city hotels may need to analyze these factors and implement strategies to increase booking commitment. This could include offering more flexible cancellation policies, providing incentives for early bookings, or improving the overall guest experience to encourage guests to stick to their plans.***

# (Chart - 5) Most Popular market Segment...



In [None]:
# Chart - 5 visualization code
sns.barplot(newdf['market_segment'],palette=['lightgreen','lightblue','cyan','pink'])
plt.title("Booking per Market segment")
plt.show()

##### 1. Why did you pick the specific chart?

***--This chart is chosen as it can compare the values of different categories in the dataset.***


##### 2. What is/are the insight(s) found from the chart?

***--In this visualization its clearly seems that  Aviation is the major market segment where most bookings were done followed by complemetary,online TA etc where as
Groups is least segment.***
***
***Aviation segment: As it is major segment,it indicating that a significant portion of bookings is attributed to travelers associated with the aviation industry. This could include airline crew, aviation staff, or passengers on layovers.***
***
***Complementary Segment: The complementary segment, which likely includes bookings that are complementary to other services or events, such as package deals or event-related bookings, is the second most significant segment.***
***
***Online Travel Agencies (TA): Bookings through online travel agencies (TA) are also prominent, suggesting that a considerable number of guests prefer to book through these platforms. This could indicate the importance of online presence and partnerships with TAs for hotel bookings.***
***
***Groups Segment: The groups segment is the least significant, indicating that group bookings, such as those for conferences, events, or tours, make up a smaller portion of overall bookings. This could have implications for marketing strategies and targeted promotions to attract more group bookings.***


##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

***--It illustrates the distribution of booking across the different segment.It also show the popularity of segment among the guests.***
***
***Understanding the distribution of bookings across different market segments can help tailor marketing strategies to target specific segments more effectively. For example, focusing on partnerships with aviation companies or online TAs to increase bookings from these segments.***

# (Chart - 6) Lets see which type of room people prefer and which room get assigned.

In [None]:
# Chart - 6 visualization code
cols3=['lightblue','lightgreen','yellow','brown','pink','orange','grey','red','blue','black']
plt.figure(figsize=(8,5))
sns.countplot(x=newdf['reserved_room_type'],palette=cols3)
plt.title("Reserved room")
plt.show()


plt.figure(figsize=(8,5))
sns.countplot(x=newdf['assigned_room_type'],palette=cols3)
plt.title("Assigned room")
plt.show()


##### 1. Why did you pick the specific chart?

***--This visualization shows the count of each type of room reserved or assigned in the data***

##### 2. What is/are the insight(s) found from the chart?

***-- We found the insights that most of the people reserved the 'A' type room and mostly get assigned,after that D,E and so on.***
***None of the people reserved the 'L' and 'P' type room.***
***We also found that 'I' and 'K' type of room are also get assigned which are not reserved by anyone.***
***
***The fact that most people reserve and are assigned 'A' type rooms suggests that this room type is popular among guests. This could be due to various factors such as room size, amenities, location within the hotel, or pricing.***
***
***The observation that 'I' and 'K' type rooms are assigned but not reserved indicates a potential discrepancy between room demand and reservation patterns. These room types may be more popular among guests than initially anticipated, suggesting that the hotel could consider adjusting its room allocation strategy to meet this demand.***
***
***The fact that 'L' and 'P' type rooms are neither reserved nor assigned suggests that these room types may be less desirable to guests. The hotel could explore reasons for this, such as room condition, location within the hotel, or pricing, and consider strategies to make these rooms more appealing or adjust their availability.***


##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

***Understanding room reservation and assignment patterns can help the hotel improve its inventory management. By analyzing which room types are most in demand and which are underutilized, the hotel can optimize its room allocation strategy to maximize revenue and guest satisfaction.***
***
***The popularity of certain room types could also inform the hotel's pricing strategy. Rooms that are in high demand could be priced higher, while less popular room types could be offered at discounted rates to increase their attractiveness to guests.***
***
***Ensuring that guests are assigned their preferred room types, when available, can enhance the overall guest experience. This could lead to higher guest satisfaction and potentially increase repeat bookings and positive reviews.***

# (Chart - 7) In Which month most of the customers visit to hotel?

In [None]:
# Chart - 7 visualization code
month=newdf['arrival_date_month'].value_counts()
sns.countplot(x=newdf['arrival_date_month'],hue=newdf['hotel'],palette=['lightgreen','pink'])
plt.xticks(rotation=90)
plt.title("Most preferred Month")
plt.show()

##### 1. Why did you pick the specific chart?

***--This chart is better to count the occurence of the categorical data in the form of bars.In this we are finding in which month maximun visiting are there to the hotel.***

##### 2. What is/are the insight(s) found from the chart?

***--We found that most of the people like to visit in the month of July and August and least prefer January month for both type of hotel i.e city hotel and resort hotel.***
***
***Knowing the peak months allows the business to focus its marketing efforts and promotions during these times to attract more guests. Special packages or events tailored to these months can be highly effective.***
***
***Dependence on peak seasons like July and August can lead to seasonal fluctuations in revenue. During off-peak months such as The least preferred month, January, indicates a potential need for strategies to attract guests during this period. This could include offering discounts, promotions, or special events to increase bookings.***

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

***--Yes it will create a positive impact as Understanding the seasonal preferences can help in revenue management strategies. For example, during peak months, hotels can adjust their pricing strategies to maximize revenue, while during off-peak months, they can offer incentives to attract guests.
 we know that people prefer july and august month for a visit ,so we can give different offers(like membership card,discount for the visitors who came 2 times,some complimentory meal, etc) and make better arrangement for the visitors (more cleaniness,better appliances,better services,etc) so that it create a good impact on the visitors so that they prefer and suggest the hotel and visit again.This might help to grow our business.***

# (Chart - 8) How many cancellation are there per year??

In [None]:
cancelyear=newdf['is_canceled'].value_counts()
cancelyear

In [None]:
# Chart - 8 visualization code

sns.countplot(x=newdf['arrival_date_year'],hue=newdf['is_canceled'],palette=['green','red'])
plt.show()

##### 1. Why did you pick the specific chart?

***--This chart is used to find how many cancellation were  made per year and also show the comparsion of cancel and not cancel per year.It give the count of cancel and not cancel bookings of each year.***

##### 2. What is/are the insight(s) found from the chart?

***--It seems that in 2016 maximum booking were made out of all three year.There is maximum cancellation too in this year.Least booking were done in 2015 which also has minimum amount of cancellation.***
***
***--It also seems that there is a large increment from 2015 to 2016 but from 2016 to 2017 there is decrement in the booking.***
***
***The observation that 2016 had the maximum bookings and also the maximum cancellations suggests a potential correlation between booking volume and cancellation rates. This could indicate that external factors or market conditions in 2016 may have influenced both booking behavior and cancellations.***
***



##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

***--There is a positive impact from 2015 to 2016 but had a negative impact from 2016 to 2017 that is point need to be concerned and should verify which factors causing a fall.Understanding the reasons behind the fluctuations in booking and cancellation rates can provide insights into customer behavior and preferences. This can help the business tailor its offerings and services to better meet customer needs and reduce cancellations.***

# (Chart - 9) Which deposit type prefer by the customers?

In [None]:
# Chart - 9 visualization code
sns.countplot(x=newdf['deposit_type'],palette=['turquoise','coral','olive'])
plt.title("Cancellation per year")
plt.title("Deposit Type")
plt.show()

##### 1. Why did you pick the specific chart?

***--The chart we've selected visualizes the count of different deposit types using a count plot. This type of plot is suitable for categorical data and provides a clear overview of the distribution of deposit types in the dataset.***

##### 2. What is/are the insight(s) found from the chart?

***--We can observe the frequency of each deposit type, which helps in understanding the preferred deposit methods among customers. By analyzing the distribution, we can identify which deposit type is most commonly used or if there's a significant imbalance among the different deposit types.***
***
***As we can see the deposit type, No deposit is taken from the customer while booking this might be why the booking gets cancelled.***
***
 ***The observation that "No deposit" is taken from the customer while booking suggests a possible link between this deposit type and cancellations. Customers may be more likely to cancel bookings for which no deposit is required, as there is no financial commitment involved.***


##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

***By analyzing the frequency of each deposit type, the business can understand which deposit methods are most commonly used by customers. This can help in tailoring payment options and policies to better suit customer preferences.***
***
***Understanding the impact of different deposit types on cancellations can have revenue implications. The business may consider implementing deposit policies for certain types of bookings to reduce cancellations and secure revenue.***

# (Chart - 10) By which country cancellation rate is more?

In [None]:
# Chart - 10 visualization code
plt.figure(figsize=(8,8))
top=newdf['country'].value_counts()[:10]
plt.pie(top,autopct='%1.1f%%',labels=top.index)
plt.title("Country with cancelled booking(top 10)")
plt.show()

##### 1. Why did you pick the specific chart?

***-- The pie is choosen to visualize the distribution of cancelled booking from the different country (top 10).This chart effectively show the proportion of cancellation from each country relative to total number of cancelled bookings.***

##### 2. What is/are the insight(s) found from the chart?

***--We found the insights that 38.4% of cancellation were done from the Portugal(PRT) followed by the United Kingdom of Great Britian (GBR)14.6% and 12.4%France (FRA).***

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

***--We overview from the insight that which country had done maximum cancelled bookings and could explore the reason behind the cancellation rate from specific countries which might create an positive impact on our business and also reduce the rate of cancellation.***
***
***Understanding cancellation patterns by country can help the business tailor its marketing and promotional strategies to different market segments. For example, offering targeted promotions or flexible cancellation policies to customers from countries with higher cancellation rates may help reduce cancellations.***
***
***High cancellation rates from certain countries can have revenue implications. The business may need to adjust pricing strategies or implement measures to reduce cancellations, such as requiring deposits or offering incentives for non-cancellable bookings.***


# (Chart - 11) Lets check which type of customers visits more?

In [None]:
# Chart - 11 visualization code
sns.countplot(x=newdf['customer_type'],palette=['teal','yellow','coral','olive'])
plt.title("Type of Customers")
plt.show()


##### 1. Why did you pick the specific chart?

***--This specific chart visualize the frequency of different type of customers visiting to the hotel.It show the frequency in the form bars of each customer type along the x axis.***

##### 2. What is/are the insight(s) found from the chart?

***--We observe that mostly transient type of customer like to  visit to hotel which stay for a short time followed by transient party ,contract and the least group.***
***
***Transient Customers: The observation that transient customers, who stay for a short time, are the most common type of customer visit suggests that the hotel may cater primarily to short-stay guests. This could have implications for pricing, room turnover, and service offerings tailored to this customer segment.***
***
***Transient Party: The transient party segment, which likely includes guests visiting for events or parties, is the second most common type of customer visit. Understanding the needs and preferences of this segment can help the hotel tailor its offerings and services for event-related bookings.***
***
***Contract and Group: The lower frequency of contract and group visits suggests that these segments may not be as prominent in the hotel's customer base. The business may consider strategies to attract more contract and group bookings, such as offering group discounts or packages tailored to these segments.***


##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

***-- Since transient customers may not have a long-standing relationship with the business, there might be a need to focus on quick, efficient, and convenient services to cater to their needs. This could include streamlined booking processes, fast check-ins, and responsive customer service.***
***
***Understanding the distribution of customer visit types can help the business optimize its revenue management strategies too by offering dynamic pricing for transient customers based on demand patterns or adjusting room availability for transient party bookings during peak event seasons.***

# (Chart - 12)Let see  which is hightest distribution of channel?

In [None]:
# Chart - 12 visualization code
sns.countplot(x=newdf['distribution_channel'],hue=newdf['arrival_date_year'])
plt.title("Distribution of channel per year")
plt.show()

##### 1. Why did you pick the specific chart?

***--Count plots provide a clear visual representation of the distribution of bookings across different channels. Each bar in the plot represents the count of bookings for a specific channel, making it easy to interpret and compare.Count plots allow for easy comparison between different channels. By visually comparing the heights of the bars, viewers can quickly identify which channels have the highest and lowest counts of bookings.***

***--A count plot, which is essentially a histogram showing the counts of observations in each categorical bin, is a good option for visualizing the distribution of hotel bookings across different channels.***

##### 2. What is/are the insight(s) found from the chart?

***-- We visualize that most of the booking comes from the travel agencies and tour operators followed by the direct,contract channels.Highest Booking by TA/TO were done in the 2016.Least Amount of booking out of the total bookings were done by Global Distribution System (GDS).***

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

***--This insight can be valuable for marketing strategies and partnership decisions.It indicates the effectiveness of partnerships with travel agencies or tour operators. If a large portion of bookings come from this channel, it might signify strong relationships or successful collaborations with these entities. Understanding which channels contribute the most to revenue can help in allocating resources effectively and optimizing marketing budgets.***

# (Chart - 13) Which type of hotel experience greater changes?

In [None]:
# Chart - 13 visualization code
sns.barplot(y=newdf['booking_changes'],hue=newdf['hotel'],palette=['orange','seagreen'])
plt.title("Changes in hotel bookings")
plt.show()

##### 1. Why did you pick the specific chart?

***--This chart allow  the clear comparison of changes between the city hotel and resort hotel.The longer bar indicates the most significant changes done in the hotel. Bar charts are intuitive and familiar to most people, making them easy to interpret without requiring extensive explanation or additional context.***

##### 2. What is/are the insight(s) found from the chart?

***-- We clearly viualize that maximum changes were done in the resort hotel compare to city hotel.***

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

***--This insights might be helpful in marketing strategies and planning budget.Financial plans or budget can be updated according to the changes made.Management may conduct a detailed analysis to understand why the resort hotel experienced greater changes and how these changes have affected key performance indicators such as occupancy rates, revenue, and customer satisfaction.Effort would be made to engage with customer and gather feedback from them to better understand their preferences and expectation.***

# (Chart - 14) - Correlation Heatmap

In [None]:
# Correlation Heatmap visualization code
continous_data=newdf.select_dtypes(include='number',exclude='object')
cd=continous_data.columns
cd

In [None]:
plt.figure(figsize=(10,10))
sns.heatmap(continous_data.corr(),annot=True,fmt='.2f',cmap='coolwarm')
plt.title("Correlation Between Data")
plt.show()

##### 1. Why did you pick the specific chart?

***--This chart is preferred to find out the correlation between the numerical features of the dataset.***

***Heatmap is used to give relationship between the variables in the form of colours.***

***This chart display the data as grid of coloured cell,each cell represent the correlation value of the data ranging from -1 to 1.***


##### 2. What is/are the insight(s) found from the chart?

***-- Positive correlation represented by positive value and both the value either increases or decreases with respect to co-relation and vice versa.***

***Closer to 1 indicates stronger positive correlation.***

***Closer to -1 indicates stronger negative correlation.***

***Zero value indicates no correlation.***


# (Chart - 15)  Pair Plot

In [None]:
# Pair Plot visualization code
sns.pairplot(newdf)
plt.title("Pair Plot of the Hotel Bookings")
plt.show()

##### 1. Why did you pick the specific chart?

***--Pair Plot give the overall relationship between the two data point.It also represent the relationship between the each and every variable of the dataset.It show the relation in form of plot which are arranged in grid form.***

##### 2. What is/are the insight(s) found from the chart?

***-- The plot show scatterplots between the each numerical variable of the dataset.Each scatterplot give the insights of the relationship between the two variables.***
***
***The diagonal of a pair plot is where each variable is compared to itself, so instead of showing a scatterplot (which wouldn't make sense for a variable compared to itself), a histogram is used to visualize the distribution of that variable. This allows you to see the spread and shape of each variable's distribution, which can be useful for understanding the underlying data and identifying potential issues like skewness, multimodality, or outliers.***

## **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ?
Explain Briefly.

**1.Average Lead Time: Since resort hotels have the highest average lead time and it's increasing per year, the client could focus on marketing strategies that target early planners, such as offering early booking discounts or promoting the benefits of booking in advance.**

**2.Hotel Preference by City: Given that city hotels are more preferred than resort hotels, the client could consider investing more in city hotel amenities and services or expanding their city hotel offerings.**

**3.Booking Cancellation Proportions: Since the majority of bookings are not canceled, the client could emphasize this reliability factor in their marketing to attract more customers. They could also consider implementing flexible cancellation policies to appeal to more cautious bookers.**

**4.Cancellation Rates in City Hotel and Resort: Understanding the cancellation rates in different types of hotels can help the client tailor their booking policies. They could potentially adjust cancellation fees or offer incentives to reduce cancellations.**

**5.Most Popular Market Segment (Aviation): Knowing that aviation is the most popular market segment could lead the client to explore partnerships or promotions targeting travelers who frequently use air travel.**

**6.Preferred Booking Months: Since July and August are the most preferred months for booking, the client could focus their marketing efforts and offers during these periods. For January, which is the least preferred month, they could consider running promotions to attract more bookings during this slower period.**

**7.Country with Maximum Cancellations (Portugal): For bookings from Portugal, the client could investigate the reasons behind the high cancellation rate and potentially tailor their offerings or marketing strategies for this market to reduce cancellations.**

**8.Type of Customer Visit (Transient): Since transient customers visit more frequently, the client could focus on offering promotions or loyalty programs to encourage repeat visits and enhance customer loyalty among this segment.**

**9.Distribution Channel Preferences (Travel Agencies/Operators): Given that travel agencies and operators are the highest distribution channels, the client could strengthen partnerships with these channels, offer them exclusive deals, or invest more in marketing through these channels to attract more customers.**

**10.Hotel Changes (Resort vs. City): Since resort hotels have more changes than city hotels, the client could investigate the reasons behind these changes and work on strategies to reduce them. This could include improving reservation management systems, enhancing staff training, or offering more flexible booking options to reduce the need for changes.**






# **Conclusion**

**Booking trends**: The analysis show that the number of booking fluctuates throughout the year,with peak months July and August.This indicates seasonal variation in demand,which hotel can leverage to optimize pricing and markets strategies.

**Customer behavior**:This analysis reveals that majority of booking were coming from Online Travel Agents(TA/TO).These channel were important for driving the booking and suggest that hotels must maintain the strong partnership wih these agents.As Transient type of customers visiting more so also suggest to have a quick and efficient management for building a strong relationship with the customers.

**Cancellation Rate**:The analysis state that most of the cancellation were from Portugal,United State of Great Britian and France so need to found out the causing factors and provide more promotion,adjust pricing strategies or implement the measure that reduce the rate.

**Preferences**:It also reveals the preferences of the customer either in hotel type or room type.We must provide the services keeping the preferences in mind to satisfy the customer so to get a positive feedback on which we can modify and strategies our business.



### ***Hurrah! You have successfully completed your EDA Capstone Project !!!***