<a href="https://colab.research.google.com/github/Yash0221/Exploratory-Data-Analysis---Hotel-Booking-Analysis/blob/main/EDA_Project_Hotel_Booking_Analysis.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Project Name**    -



##### **Project Type**    - Exploratory Data Analysis - Hotel Booking Analysis
##### **Contribution**    - Individual
##### **Team Member 1 -** Yash Dangayach

# **Project Summary -**

 The real-world data record of hotel reservations for both city and resort hotels, including bookings, cancellations, guest information, etc., is included in this project. The data record spans the years 2015 through 2017. The primary objective of the project is to comprehend and present the dataset from the perspective of the hotel and the consumer, i.e.,

*   The causes of cancellations of reservations across different parameters
*   The ideal time to reserve a lodging
*   High season

and offer recommendations on how to lower these cancellations and boost hotel profitability.



# **GitHub Link -**

GitHub Link - [EDA Hotel Booking Aanlysis GitHub Link](https://github.com/Yash0221/Exploratory-Data-Analysis---Hotel-Booking-Analysis)

# **Problem Statement**


In this project, the objective is to analyze a Hotel Booking dataset comprising information from both city hotels and resort hotels. The dataset includes details such as booking time, length of stay, number of adults, children, and/or babies, as well as information about available parking spaces, among other factors. The goal is to explore and analyze the data to identify crucial factors that influence hotel bookings. Key questions to address include determining the optimal time of year to book a hotel room, identifying the ideal length of stay for obtaining the best daily rate, and predicting whether a hotel is likely to receive an unusually high number of special requests. The analysis aims to uncover insights into the significant factors governing hotel bookings and their attributes, facilitating better understanding and decision-making in the hospitality industry.

#### **Define Your Business Objective?**

Analyse the data on Hotel Booking and discover the different factors which affects the booking

# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required.
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits.
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 20 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule.

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Import Libraries
import numpy as np
import pandas as pd
import plotly.express as px
import matplotlib.pyplot as plt
import seaborn as sns
import ast
from datetime import datetime
from datetime import date
from numpy import math
from numpy import loadtxt
from matplotlib import rcParams

### Dataset Loading

In [None]:
# Load Dataset
from google.colab import drive
drive.mount('/content/drive')

In [None]:
#Hotel booking dataset read using pd.read_csv
dataset = '/content/Hotel Bookings.csv'
df = pd.read_csv(dataset)

### Dataset First View

In [None]:
# Dataset First Look
df.head(5)

In [None]:
df.tail(5)

### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count
df.shape

### Dataset Information

In [None]:
# Dataset Info
df.info()

#### Duplicate Values

In [None]:
# Dataset Duplicate Value Count, to remove these values, we use function drop.duplicate to delete duplicate rows.
df.drop_duplicates(inplace = True)

# total rows = 119390, Duplicate Rows = 31994
uni_num_of_rows = df.shape[0]

uni_num_of_rows # now unique rows = 87396

In [None]:
df.reset_index()

#### Missing Values/Null Values

In [None]:
# Missing Values/Null Values Count
null_value = df.isnull()
df.fillna(np.nan, inplace = True)

df # we replace all the null value as NaN.

In [None]:
# Visualizing the missing values
miss_value = df.isnull().sum().sort_values(ascending = False)
miss_value # We have check the count of null value in individual columns

### What did you know about your dataset?


This data set contains a single file which compares various booking information between two hotels: a city hotel and a resort hotel.Includes information such as when the booking was made, length of stay, the number of adults, children, and/or babies, and the number of available parking spaces, among other things. The dataset contains a total of 119390 rows and 32 columns.Dataset Contains duplicated items i.e 31944 which is removed later .In this dataset we find data types of every columns i.e (Int, float ,string) and observe that some columns data types is not accurate and remove later .We find unique value of every columns it means what actual values in every columns



## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns
df_column = df.columns
df_column

In [None]:
# Dataset Describe
df.describe()

### Variables Description

## Description of individual Variable



  
**The columns and the data it represents are listed below:**

1. **hotel :** Name of the hotel (Resort Hotel or City Hotel)

2. **is_canceled :** If the booking was canceled (1) or not (0)

3. **lead_time:** Number of days before the actual arrival of the guests

4. **arrival_date_year :** Year of arrival date

5. **arrival_date_month :** Month of month arrival date

6. **arrival_date_week_number :** Week number of year for arrival date

7. **arrival_date_day_of_month :** Day of arrival date

8. **stays_in_weekend_nights :** Number of weekend nights (Saturday or Sunday) spent at the hotel by the guests.

9. **stays_in_week_nights :** Number of weeknights (Monday to Friday) spent at the hotel by the guests.

10. **adults :** Number of adults among guests

11. **children :** Number of children among guests

12. **babies :** Number of babies among guests

13. **meal :** Type of meal booked

14. **country :** Country of guests

15. **market_segment :** Designation of market segment

16. **distribution_channel :** Name of booking distribution channel

17. **is_repeated_guest :** If the booking was from a repeated guest (1) or not (0)

18. **previous_cancellations :** Number of previous bookings that were cancelled by the customer prior to the current booking

19. **previous_bookings_not_canceled :** Number of previous bookings not cancelled by the customer prior to the current booking

20. **reserved_room_type :** Code of room type reserved

21. **assigned_room_type :** Code of room type assigned

22. **booking_changes :** Number of changes/amendments made to the booking

23. **deposit_type :** Type of the deposit made by the guest

24. **agent :** ID of travel agent who made the booking

25. **company :** ID of the company that made the booking

26. **days_in_waiting_list :** Number of days the booking was in the waiting list

27. **customer_type :** Type of customer, assuming one of four categories

28. **adr :** Average Daily Rate, as defined by dividing the sum of all lodging transactions by the total number of staying nights

29. **required_car_parking_spaces :** Number of car parking spaces required by the customer

30. **total_of_special_requests :** Number of special requests made by the customer

31. **reservation_status :** Reservation status (Canceled, Check-Out or No-Show)

32. **reservation_status_date :** Date at which the last reservation status was updated

### Check Unique Values for each variable.

In [None]:
# Check Unique Values for each variable.
print(df.apply(lambda col: col.unique()))

## 3. ***Data Wrangling***

### Data Wrangling Code

# Data Cleaning

In [None]:
#to fill the NAN value in the column, let's check which column has null value, already stored the same
miss_value[:4]

In [None]:
#let's check, what is the percentage of null value in each column, starting from the company
percengtage_company_null = miss_value[0] / uni_num_of_rows*100
percengtage_company_null

In [None]:
#It is better to drop the column "company" altogether since the number if missing value is extremely high as compared to the number of rows
df.drop(['company'], axis =1 , inplace= True)

In [None]:
#now let's check for agent
percentage_agent_null = miss_value[1] / uni_num_of_rows*100
percentage_agent_null

In [None]:
# As we have seen, there is minimul null values in agent, Lets fill these value by taking mode of the all values

df['agent'].fillna(value = 0 , inplace = True)
df['agent'].isnull().sum() #we re-check that column has no null value

In [None]:
#check the percentage null value in country column

percentage_country_null = miss_value[2] / uni_num_of_rows *100
percentage_country_null

In [None]:
# We have less null vlues in country col, so we will replace null from 'other' as country name.

df['country'].fillna(value = 'others', inplace = True)
df['country'].isnull().sum() # we re-check that column has no null value

In [None]:
#Check the percentage null value in children col

percentage_children_null = miss_value[3] / uni_num_of_rows*100
percentage_children_null

In [None]:
# We have less null vlues in country col, so we will replace null from 0 as country name.

df['children'].fillna(value = 0, inplace = True)
df['children'].isnull().sum() # we re-check that column has no null value

In [None]:
#let's check whether database having any other null value

df.isnull().sum() # As we have seen, no column has any null value

# Change in datatype for required columns

In [None]:
#showing the info of the data to check datatype
df.info()

In [None]:
# We have seen that childer & agent column as datatype as float whereas it contains only int value, lets change datatype as 'int64'
df[['children', 'agent']] = df[['children', 'agent']].astype('int64')

# Addition of new column as per requirement

In [None]:
#total stay in nights
df['total_stay_in_nights'] = df ['stays_in_week_nights'] + df ['stays_in_weekend_nights']
df['total_stay_in_nights'] # We have created a col for total stays in nights by adding week night & weekend nights stay col.

In [None]:
# We have created a col for revenue using total stay * adr
df['revenue'] = df['total_stay_in_nights'] *df['adr']
df['revenue']

In [None]:
# Also, for information, we will add a column with total guest coming for each booking
df['total_guest'] = df['adults'] + df['children'] + df['babies']
df['total_guest'].sum()

In [None]:
# for understanding, from col 'is_canceled': we will replace the value from (0,1) to not_canceled, is canceled.

df['is_canceled'] = df['is_canceled'].replace([0,1], ['not canceled', 'is canceled'])
df['is_canceled']

In [None]:
#Same for 'is_repeated_guest' col
df['is_repeated_guest'] = df['is_repeated_guest'].replace([0,1], ['not repeated', 'repeated'])
df['is_repeated_guest']

In [None]:
#Now, we will check overall revenue hotel wise
hotel_wise_total_revenue = df.groupby('hotel')['revenue'].sum()
hotel_wise_total_revenue

In [None]:
df[['hotel', "revenue"]]

### What all manipulations have you done and insights you found?

**We have done few manipulations in the Data.**

**----Addition of columns----**

We have seen that there are few columns required in Data to analysis purpose which can be evaluated from the given columns.

a) **Total Guests:** This columns will help us to evaluate the volumes of total guest and revenue as well. We get this value by adding total no. of Adults, Children & babies.

b) **Revenue:** We find revenue by multiplying adr & total guest. This column will use to analyse the profit and growth of each hotel.



**----Delete of columns----**

a)**company:** As we have seen that this columns has almost Null data. so we have delete this column as this will not make any impact in the analysis.



**----Replace of Values in columns----**

a)**is_canceled, is_not_canceled & is_repeated_guest:** We have seen, that these columns contains only 0,1 as values which represent the status of booing cancellation. We replace these values (0,1) from 'Canceled' & 'Not canceled. In the same way for column 'is_repeated_guest', we replace 0,1 from 'Repeated' & 'Not repeated'. Now this values will help to make better understanding while visulization.


**----Changes in data type of values in columns----**

a)**Agent & Children:** We checked that these columns contains float values, which is not making any sense in data as this values repreasent the count of guest & ID of agent. So we have changed the data type of these columns from 'float' to 'Integer'.


**----Removed is_null values & duplicate entries----**

a)Before visualize any data from the data set we have to do data wrangling.
For that, we have checked the null value in all the columns. After checking, when we are getting a column which has more number of null values, dropped that column by using the 'drop' method. In this way, we are dropped the 'company' column. When we are find minimal number of null values, filling thse null values with necesary values as per requirement by using .fillna().

b) In the same, we have checked if there is any duplicacy in data & we found that there are few rows have duplicate data. So we have removed those row from data set by using .drop_duplicates() method.






**In this way, we have removed unneccesary data & make our data clean and ready to analyse.**



## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

#### Chart - 1

#### Chart - 1

In [None]:
# Let's create a function which will give us bar chart of data respective with a col.
def get_count_from_column_bar(df, column_label):
  df_grpd = df[column_label].value_counts()
  df_grpd = pd.DataFrame({'index':df_grpd.index, 'count':df_grpd.values})
  return df_grpd


def plot_bar_chart_from_column(df, column_label, t1):
  df_grpd = get_count_from_column(df, column_label)
  fig, ax = plt.subplots(figsize=(14, 6))
  c= ['g','r','b','c','y']
  ax.bar(df_grpd['index'], df_grpd['count'], width = 0.4, align = 'edge', edgecolor = 'black', linewidth = 4, color = c, linestyle = ':', alpha = 0.5)
  plt.title(t1, bbox={'facecolor':'0.8', 'pad':3})
  plt.legend()
  plt.ylabel('Count')
  plt.xticks(rotation = 15) # use to format the lable of x-axis
  plt.xlabel(column_label)
  plt.show()

In [None]:
# Chart - 1 visualization code

def get_count_from_column(df, column_label):
  df_grpd = df[column_label].value_counts()
  df_grpd = pd.DataFrame({'index':df_grpd.index, 'count':df_grpd.values})
  return df_grpd

# plot a pie chart from grouped data
def plot_pie_chart_from_column(df, column_label, t1, exp):
  df_grpd = get_count_from_column(df, column_label)
  fig, ax = plt.subplots(figsize=(14,9))
  ax.pie(df_grpd.loc[:, 'count'], labels=df_grpd.loc[:, 'index'], autopct='%1.2f%%',startangle=90,shadow=True, labeldistance = 1, explode = exp)
  plt.title(t1, bbox={'facecolor':'0.8', 'pad':3})
  ax.axis('equal')
  plt.legend()
  plt.show()

In [None]:
exp1 = [0.05,0.05]
plot_pie_chart_from_column(df, 'hotel', 'Booking percentage of Hotel by Name', exp1)


##### 1. Why did you pick the specific chart?



```
# This is formatted as code
```

***To present the data that in which hotel more booking have been done.***

##### 2. What is/are the insight(s) found from the chart?

***Here, we found that the booking number is Higher in City Hotel which is 61.12% than Resort Hotel which is 38.87%. Hence we can say that City hotel has more consumption***

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

**Yes, for both Hotels, this data making some positive business impact : -**

**City Hotel :- Provided more services to attract more guest to increase more revenue.**

**Resort Hotel :- Find solution to attract guest and find what city hotel did to attract guest.**

#### Chart - 2

In [None]:
# Chart - 2 visualization code
exp4 = [0,0.2]
plot_pie_chart_from_column(df, 'is_canceled', 'Cancellation volume of Hotel', exp4)

##### 1. Why did you pick the specific chart?

**In this chart, we presented the cancellation rate of the hotels booking**

##### 2. What is/are the insight(s) found from the chart?

**Here, we found that overall more than 25% of booking got cancelled**

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

**Here, we can see, that more than 27% booking getting cancelled.**


**Solution: We can check the reason of cancellation of a booking & need to get this sort on business level**

Answer Here

#### Chart - 3

In [None]:
# Chart - 3 visualization code
plot_bar_chart_from_column(df, 'distribution_channel', 'Distibution Channel Volume')

##### 1. Why did you pick the specific chart?

**The following chart represent maximum volume of booking done through which channel to represnt the numbers in descending order we chose bar graph**

##### 2. What is/are the insight(s) found from the chart?

**As clearly seen TA/TO(Tour of Agent & Tour of operator) is highest, recommending to continue booking through TA/TO**

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

**Yes this shows positive business impact.**

**Higher the  number of TA/TO  will help to increase the revenue generation of Hotel.**

#### Chart - 4

In [None]:
# Chart - 4 visualization code
exp2 = [0.2, 0,0,0,0,0,0,0,0,0,0,0.1]
plot_pie_chart_from_column(df, 'arrival_date_month', 'Month-wise booking', exp2)

##### 1. Why did you pick the specific chart?

**To show the percentage share of booking in each month,on overall level**

##### 2. What is/are the insight(s) found from the chart?

**The above percentage shows month May, July and Aug are the highest booking months due to holiday season. Recommending aggressive advertisement to lure more and more customers.**

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

**Yes, with increased volume of visitors will help hotel to manage revenue in down time, will also help employee satisfaction and retention.**

#### Chart - 5

In [None]:
# Chart - 5 visualization code
exp3 = [0,0.3]
plot_pie_chart_from_column(df, 'is_repeated_guest', 'Guest repeating status', exp3)


##### 1. Why did you pick the specific chart?

**To show the percentage share of repeated & non-repeated guests.**

##### 2. What is/are the insight(s) found from the chart?

**Here, we can see that the number of repeated guests is very less as compared to overall guests**

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

**We can give alluring offers to non-repetitive customers during Off seasons to enhance revenue**

#### Chart - 6

In [None]:
# Chart - 6 visualization code
plot_bar_chart_from_column(df, 'assigned_room_type', 'Assigment of room by type')

##### 1. Why did you pick the specific chart?

**To show distribution by volume, which room is alotted.**

##### 2. What is/are the insight(s) found from the chart?

**This chart shows room type 'A' is most prefered by guest**.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

**Yes, Positive impact because 'A','D','E' is more prefered by guest due to better services offered in room type.**

#### Chart - 7

In [None]:
guest_month_wise = pd.DataFrame(df[['arrival_date_month', 'total_guest']])
guest_month_wise_df = guest_month_wise.groupby(['arrival_date_month'])['total_guest'].sum()
guest_month_wise_df.sort_values(ascending = False, inplace = True)

In [None]:
df['total_guest']

In [None]:
market_segment_df = pd.DataFrame(df['market_segment'])
market_segment_df_data = market_segment_df.groupby('market_segment')['market_segment'].count()
market_segment_df_data.sort_values(ascending = False, inplace = True)
plt.figure(figsize=(15,6))
y = np.array([4,5,6])
market_segment_df_data.plot(kind = 'bar', color=['g', 'r', 'c', 'b', 'y', 'black', 'brown'], fontsize = 20,legend='True')


##### 1. Why did you pick the specific chart?

**In this chart, we have seen market segment by which hotel has booked**

##### 2. What is/are the insight(s) found from the chart?

**Online TA has been used most frequently to book hotel by the guest.**

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

**Yes, it is creating positive business impact that guests are using Online TA market segment as most prefered to book hotels.**

#### Chart - 8

In [None]:
# Chart - 8 visualization code
guest_country_wise = pd.DataFrame(df[['country', 'total_guest']])
guest_country_wise_df = guest_country_wise.groupby(['country'])['total_guest'].sum()
guest_country_wise_df.sort_values(ascending = False, inplace = True)
top_10_country_by_guest = guest_country_wise_df.head(10)

In [None]:
# Assuming 'top_10_country_by_guest' is a Pandas Series or DataFrame containing the top 10 countries and their guest counts

plt.figure(figsize=(12, 6))
sns.barplot(x=top_10_country_by_guest.index, y=top_10_country_by_guest)
plt.title('Top 10 Countries by Guest')
plt.xlabel('Country Code')
plt.ylabel('Number of Guests')

# Adding country code explanations
country_codes = {
    'PRT': 'Portugal',
    'GBR': 'Great Britain & Northern Ireland',
    'FRA': 'France',
    'ESP': 'Spain',
    'DEU': 'Germany',
    'ITA': 'Italy',
    'IRL': 'Ireland',
    'BRA': 'Brazil',
    'BEL': 'Belgium',
    'NLD': 'Netherlands'
}

plt.xticks(ticks=range(len(top_10_country_by_guest)), labels=[country_codes[code] for code in top_10_country_by_guest.index])

plt.show()


##### 1. Why did you pick the specific chart?

**We have seen that mostly from which country Guests is coming**

***Chart is showing for top 10 country***

##### 2. What is/are the insight(s) found from the chart?

**As we can see, that maximum guest is coming from Portugal**

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

**We can do more advertising & can provide attractive offers to  Portugal guests to enhance the customer volume**

#### Chart - 9

In [None]:
import numpy as np
import matplotlib.pyplot as plt

order = ['January', 'February', 'March', 'April', 'May', 'June',
         'July', 'August', 'September', 'October', 'November', 'December']

# Assuming 'df' is your DataFrame
ordered_hotel_df = df[df['is_canceled'] == 'not canceled']['arrival_date_month'].value_counts().reindex(order)

plt.subplots(figsize=(10, 6))
ticks = np.arange(0, 20, 150)
labels = ["{}".format(i//1) for i in ticks]
plt.yticks(ticks, labels)

plt.xticks(rotation=60)

for xy in zip(ordered_hotel_df.index, ordered_hotel_df.values):
    plt.annotate(text="{}".format(xy[1]//1), xy=xy, textcoords='data')

plt.plot(ordered_hotel_df.index, ordered_hotel_df.values, linewidth=4, color='r', linestyle='dotted', marker='+', markersize=20, alpha=1)

plt.xlabel("Months", fontdict={'fontsize': 12, 'fontweight': 5, 'color': 'Brown'})
plt.ylabel("Counts", fontdict={'fontsize': 12, 'fontweight': 5, 'color': 'Brown'})
plt.title("Month-wise Booking", fontdict={'fontsize': 20, 'fontweight': 5, 'color': 'Green'})

plt.show()



In [None]:
plt.figure(figsize = (8,5))
hotel_wise_revenue = df.groupby('hotel')['revenue'].sum()
hotel_wise_revenue
ax = hotel_wise_revenue.plot(kind = 'bar', color = ('b', 'y'))
plt.xlabel("Hotel", fontdict={'fontsize': 12, 'fontweight' : 5, 'color' : 'Brown'})
plt.ylabel("Total Revenue", fontdict={'fontsize': 12, 'fontweight' : 5, 'color' : 'Brown'} )
plt.title("Total Revenue", fontdict={'fontsize': 12, 'fontweight' : 5, 'color' : 'Green'} )

In [None]:
average_adr = df.groupby('hotel')['adr'].mean()
average_adr
plt.subplots(figsize=(8, 5))
average_adr.plot(kind = 'barh', color = ('g', 'r'))
plt.xlabel("Average ADR", fontdict={'fontsize': 12, 'fontweight' : 5, 'color' : 'Brown'})
plt.ylabel("Hotel Name", fontdict={'fontsize': 12, 'fontweight' : 5, 'color' : 'Brown'} )
plt.title("Average ADR of Hotel", fontdict={'fontsize': 12, 'fontweight' : 5, 'color' : 'Green'} )

##### 1. Why did you pick the specific chart?

**To specify the average ADR for both hotels**

##### 2. What is/are the insight(s) found from the chart?

**As we can see the average ADR of City hotel is higher than Resort hotel, so the profit and revenue will be higher for city hotel**

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

**Here, we can do more advertising for City hotel to get more customer, which result higher profit**

#### Chart - 10

In [None]:
# Chart - 10 visualization code
plt.figure(figsize = (12,6))
sns.scatterplot(y = 'total_stay_in_nights', x = 'adr', data = df[df['adr'] < 1000])
plt.show() #

##### 1. Why did you pick the specific chart?

**To show comparision & affect of total stay days vs ADR**

##### 2. What is/are the insight(s) found from the chart?

**Here, we found that if guest's stay days is getting decreased, ADR is getting high**

#### Chart - 11

In [None]:
# Chart - 11 visualization code
plt.figure(figsize = (12,10), dpi = 100)
hotel_wise_meal = df.groupby(['hotel', 'meal'])['meal'].count().unstack()
hotel_wise_meal.plot(kind ='bar', figsize = (12,8))
hotel_wise_meal

##### 1. Why did you pick the specific chart?

**To show the meal preferance of the guest hotel-wise**

##### 2. What is/are the insight(s) found from the chart?

**As we can see, BB (Bed & breakfast) meal is most prefered by guests in both the hotels. So Hotel can give more delisious dishes in this meal to get customer repeat & attaract new customer**

#### Chart - 12 - Correlation Heatmap

In [None]:
# Correlation Heatmap visualization code
corr_df = df[['lead_time','previous_cancellations', 'previous_bookings_not_canceled', 'total_guest',
                    'booking_changes', 'days_in_waiting_list', 'adr', 'required_car_parking_spaces', 'total_of_special_requests']].corr()
f, ax = plt.subplots(figsize=(12, 12))
sns.heatmap(corr_df, annot = True, fmt='.2f', annot_kws={'size': 10},  vmax=1, square=True, cmap="YlGnBu")

##### 1. Why did you pick the specific chart?

**To understand the relationsip between different numerical values**

##### 2. What is/are the insight(s) found from the chart?

**Highest corelation value between axis is 39% positive & lowest corelation    value between the axis is -9% negative.**

## **5. Solution to Business Objective**

**Business objective attained as follows:**

1. For hotel business to flourish few things which we need to consider is high revenue generation, customers satisfaction and employeee retention.

2. We are able achieve the same by showing the client which are the months which are high in revenue generation by pie chart distribution

3. Increasing the revenue achieved by bar chart distribution of which typre room are most reserved and what are the months likely for visitors

4. So for these the client can be well prepare in advance so that minimum grievances would be faced by clients in long run and would help in further enhancement of their hospitality.

5. Outliers like higher the visitor then adr has reduced drastically was shown in scattered plot so in off season client can engage with offices for bulk booking this will aslo help extra revenue generation

6. We are able to show the trend of arrivals of visitor at client locations through which client engaged visitos well advance for there entaertainment and leisure activities

7. We where also able to co relate the values showing the max and min percentage between them so that the percenytage lying those numbers can be enhanced by various medium

# **Conclusion**

1. City Hotel seems to be more preferred among travellers and it also generates more revenue & profit.

2. Most number of bookings are made in July and August as compared rest of the months.

3. Room Type A is the most preferred room type among travellers.

4. Most number of bookings are made from Portugal & Great Britain.

5. Most of the guest stays for 1-4 days in the hotels.

6. City Hotel retains more number of guests.

7. Around one-fourth of the total bookings gets cancelled. More cancellations are from City Hotel.

8. New guest tends to cancel bookings more than repeated customers.

9. Lead time, number of days in waiting list or assignation of reserved room to customer does not affect cancellation of bookings.

10. Corporate has the most percentage of repeated guests while TA/TO has the least whereas in the case of cancelled bookings TA/TO has the most percentage while Corporate has the least.

11. The length of the stay decreases as ADR increases probably to reduce the cost.

### ***Hurrah! You have successfully completed your EDA Capstone Project !!!***