<a href="https://colab.research.google.com/github/Ruchikkale09/Python-for-data-science/blob/main/EDA_Submission_for_hotal_booking.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Project Name**    - Hotel Booking Analysis



##### **Project Type**    - EDA
##### **Contribution**    - Individual


# **Project Summary - Hotel Booking Analysis** 

**BUSINESS PROBLEM OVERVIEW**

Have you ever wondered when the best time of year to book a hotel room is? Or the optimal length of stay in order to get the best daily rate? What if you wanted to predict whether or not a hotel was likely to receive a disproportionately high number of special requests? This hotel booking dataset can help you explore those questions!
This data set contains booking information for a city hotel and a resort hotel, and includes information such as when the booking was made, length of stay, the number of adults, children, and/or babies, and the number of available parking spaces, among other things. All personally identifying information has been removed from the data.
Explore and analyze the data to discover important factors that govern the bookings.

#### **Define Your Business Objective?** 

Explore and analyze the data to discover important factors that govern the bookings.

# **GitHub Link -**

Provide your GitHub Link here.

# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required. 
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits. 
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 20 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule. 

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Import Libraries
import numpy as np
import pandas as pd
from numpy import math
from numpy import loadtxt
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline
from matplotlib import rcParams
import matplotlib
pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)
pd.set_option('display.width', None)
print(matplotlib.__version__)


In [None]:
pip install matplotlib==3.1.3

### Dataset Loading

In [None]:
# Load Dataset
from google.colab import drive
drive.mount('/content/drive')
path = '/content/drive/MyDrive/'


In [None]:
df = pd.read_csv(path+'Hotel Bookings.csv', encoding = 'latin-1')

### Dataset First View

In [None]:
# Dataset First look
df.head()

### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count
df.shape

### Dataset Information

In [None]:
# Dataset Info
df.info()

#### Duplicate Values

In [None]:
# Dataset Duplicate Value Count
len(df[df.duplicated()])

Total **31994** duplicates row we have need to remove duplicate rows

In [None]:
print("In main df total we have",len(df),"rows")
df = df.drop_duplicates()
print("After removing duplicates we have total",len(df),'rows')

#### Missing Values/Null Values

In [None]:
df['children'].unique()


In [None]:
# Missing Values/Null Values Count
print(df.isnull().sum())

**Handling country field None value with 'No available' string**

In [None]:
df['country'].fillna('No available', inplace=True)

**Handling children, agent and company column missing values using columns values mean**


In [None]:
df['children'] = df['children'].fillna(df['children'].mean())
df['agent'] = df['agent'].fillna(df['agent'].mean())
df['company'] = df['company'].fillna(df['company'].mean())

**After handling missing values count will be like 0**

In [None]:
print(df.isnull().sum())

**There are some rows with total number of adults, children or babies equal to zero. So we will remove such rows.**

In [None]:
df[df['adults']+df['babies']+df['children'] == 0].shape

In [None]:
df.drop(df[df['adults']+df['babies']+df['children'] == 0].index, inplace = True)

**Converting columns to appropriate datatypes.**

In [None]:
# Converting datatype of columns 'children', 'company' and 'agent' from float to int.
df[['children', 'company', 'agent']] = df[['children', 'company', 'agent']].astype('int64')

In [None]:
# changing datatype of column 'reservation_status_date' to data_type.
df['reservation_status_date'] = pd.to_datetime(df['reservation_status_date'], format = '%Y-%m-%d')

**Adding important columns.**

In [None]:
# Adding total staying days in hotels
df['total_stay'] = df['stays_in_weekend_nights']+df['stays_in_week_nights']

# Adding total people num as column, i.e. total people num = num of adults + children + babies
df['total_people'] = df['adults']+df['children']+df['babies']

In [None]:
# Visualizing the missing values
# Checking Null Value by plotting Heatmap
sns.heatmap(df.isnull(), cbar=False)

### What did you know about your dataset?

This data set contains booking information for a city hotel and a resort hotel and includes information such as when the booking was made, length of stay, the number of adults, children, and/or babies, and the number of available parking spaces, among other things. All personally identifying information has from the data. We will perform exploratory data analysis with python to get insight from the data.

## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns
df.columns

In [None]:
# Dataset Describe
df.describe(include='all')

### Variables Description 

* **hotel:**(H1 = Resort Hotel or H2 = City Hotel)
* **is_canceled          :**Value indicating if the booking was canceled (1) or not (0)

* **lead_time            :**Number of days that elapsed between the entering date of the booking into the PMS and the arrival date
* **arrival_date_year                :**Year of arrival date
* **arrival_date_month                :**Month of arrival date
* **arrival_date_week_number                :**Week number of year for arrival date
* **arrival_date_day_of_month                :**Day of arrival date
* **stays_in_weekend_nights                :**Number of weekend nights (Saturday or Sunday) the guest stayed or booked to stay at the hotel
* **stays_in_week_nights                :**Number of week nights (Monday to Friday) the guest stayed or booked to stay at the hotel
* **adults                :**Number of adults
* **children                :**Number of children
* **babies                :**Number of babies
* **meal                :**Type of meal booked. Categories are presented in standard hospitality meal packages: Undefined/SC – no meal package; BB – Bed & Breakfast; HB – Half board (breakfast and one other meal – usually dinner); FB – Full board (breakfast, lunch and dinner)
* **country                :**Country of origin. Categories are represented in the ISO 3155–3:2013 format
* **market_segment                :**Market segment designation. In categories, the term “TA” means “Travel Agents” and “TO” means “Tour Operators”
* **distribution_channel                :**Booking distribution channel. The term “TA” means “Travel Agents” and “TO” means “Tour Operators”
* **is_repeated_guest                :**Value indicating if the booking name was from a repeated guest (1) or not (0)
* **previous_cancellations                :**Number of previous bookings that were cancelled by the customer prior to the current booking
* **previous_bookings_not_canceled                :**Number of previous bookings not cancelled by the customer prior to the current booking
* **reserved_room_type                :**Type of room reserved
* **assigned_room_type                :**Type of room Assigned
* **booking_changes                :**count of changes made to booking
* **deposit_type                :**deposit_type
* **agent                :**Booked through agent
* **company:**ID of the company/entity that made the booking or responsible for paying the booking. ID is presented instead of designation for anonymity reasons
* **days_in_waiting_list:** Number of days in waiting list
* **customer_type:**Type of customer
* **adr:** Average Daily Rate as defined by dividing the sum of all lodging transactions by the total number of staying nights
* **required_car_parking_spaces:**  If car parking is required
* **total_of_special_requests :** Number of additional special requirement
* **reservation_status:** reservation_status
* **reservation_status_date:** Date of specific date

### Check Unique Values for each variable.

In [None]:
# Check Unique Values for each variable.


print( df.apply(lambda col:col.unique()))

## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:
# Write your code to make your dataset analysis ready.
hotel_df = df.copy()
hotel_df.info()

In [None]:
# Selecting and counting number of cancelled bookings for each hotel.
cancelled_data = hotel_df[hotel_df['is_canceled'] == 1]
cancel_grp = cancelled_data.groupby('hotel')
D1 = pd.DataFrame(cancel_grp.size()).rename(columns = {0:'total_cancelled_bookings'})

print(D1)

In [None]:
# Which Hotel has more bookings
total_people = hotel_df['total_people'].sum()
print('Total people visited in hotel is',total_people)
print('number of people choose hotel type ')
p1 = hotel_df.groupby('hotel').sum()
print(p1['total_people'])

### What all manipulations have you done and insights you found?

In the given dataframe, there were 31994 duplicate values. So those values were removed.
There were 4 columns which have missing values and the columns were 'company','agent','country','children'. The values from these columns are replaced by zero.
In dataframe added two columns total_stay and total_people.
Three columns 'adults','children','babies' had valuen zero which means no booking has done here, so these columns were removed.

## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

### **Univariate analysis**

#### Chart 1  Which agent makes most no. of bookings?


In [None]:
# Chart - 1 Which agent makes most no. of bookings?
d1 = pd.DataFrame(hotel_df['agent'].value_counts()).reset_index().rename(columns = {'index':'agent','agent':'num_of_bookings'}).sort_values(by = 'num_of_bookings', ascending = False)
d1['percent'] = round((d1['num_of_bookings'] / 
                  d1['num_of_bookings'].sum()) * 100,2)
d1 = d1[:10]                                                   # Selecting top 10 performing agents
plt.figure(figsize = (11,5))    
sns.barplot(x = 'agent', y = 'percent', data = d1, order = d1.sort_values('percent', ascending = False).agent) 
for i in range(len(d1['agent'])):
        plt.text(i,d1['percent'][i],d1['percent'][i],ha = 'center') 
plt.title('Most no. of bookings by the agent', fontsize=20)
plt.ylabel('Number of bookings % form', fontsize=15)
plt.xlabel('Agent number', fontsize=15)
plt.show()

##### 1. Why did you pick the specific chart?

**I choose barplot here because it gives data visualization in pictorial form and due to this comparison of data is easy.**

##### 2. What is/are the insight(s) found from the chart?

* **Agent no. 9,240 has made most no. of bookings.**
* **Agent no 8,1 has low number of bookings as compare to 9 and 240**

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

* **Positive impact** Yes agent 9 and 240 almost made most of the booking  which makes positive impact and even in future if we give proper treatment from hotel to this agent then they will also give them more bookings
* **Negative impact** Agent no. 1 and 6 has less bookins which makes neative impact.

#### Chart - 2 Number of cancellation bookings

In [None]:
# Chart - 2 visualization code
# Total number of cancelations are
total_cancelations  = hotel_df['is_canceled'].sum()
print("Total cancelations",total_cancelations)

# City hotel canclation bookings are
city_hotel_cancelations = hotel_df.loc[hotel_df['hotel'] == 'City Hotel']['is_canceled'].sum()
print("City hotel number of nooking cancelations", city_hotel_cancelations)

# resort hotel cancelations of bookings are
resort_hotel_cancelations =  hotel_df.loc[hotel_df['hotel'] == 'Resort Hotel']['is_canceled'].sum()
print("Resort hotel number of booking cancelations are",resort_hotel_cancelations)

# Total Cancellation percentage
percentage_total_cancelations =  round((total_cancelations /hotel_df.shape[0]) *100,2)
percentage_city_hotel_cancelations = round((city_hotel_cancelations/hotel_df.loc[hotel_df['hotel'] == 'City Hotel'].shape[0] )*100)
percentage_resort_hotel_cancelations = round((resort_hotel_cancelations/hotel_df.loc[hotel_df['hotel'] == 'Resort Hotel'].shape[0] )*100)

print(f'Total Cancellation percentage is {percentage_total_cancelations}')
print(f'Total Resort Hotel Cancellation percentage is {percentage_resort_hotel_cancelations}')
print(f'Total City Hotel Cancellation percentage is {percentage_city_hotel_cancelations}')

plt.figure(figsize=(18,7))
#Canceled=1, Not canceled= 0
labels = ['0-Not canceled','1-Canceled', ]
hotel_df['is_canceled'].value_counts().plot.pie( labels = labels,autopct='%1.1f%%' , shadow=True, figsize=(10,8),fontsize=20)
plt.title('Total Cancellation Percentage', weight='bold')
plt.legend()
# sns.countplot(df1['is_canceled'], palette='husl')
plt.show()

##### 1. Why did you pick the specific chart?

I use pie chart because pie chart gives simple and easy to understand picture that its show me cancellation of bookings in hotel

##### 2. What is/are the insight(s) found from the chart?

Around 27.5% of booking were canceled from overall bookings

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

* **Positive impact** yes we got number of cancelation of booking and so we from this one we can point of reason of cancelations
* **I cant find any negative impact because already we are trying to generate count of cancelations bookings**

#### Chart - 3 Which month has more number of bookings

In [None]:
# Chart - 3 visualization code
print(hotel_df['arrival_date_month'].value_counts())
plt.figure(figsize=(12,5))
graph=sns.countplot(data= hotel_df, x='arrival_date_month', hue = 'hotel',order = ['January', 'February', 'March', 'April', 'May', 'June', 'July',
          'August', 'September', 'October', 'November', 'December'], palette = "Set1")
graph.set(title='Months of Arrival')
graph.set_xlabel('Months')
graph.set_ylabel('Reservation Count')


##### 1. Why did you pick the specific chart?

**This chart is used to represent the occurrence(counts) of the observation present in the categorical variable.In this plot we are counting number of booking in hotel as per the group (resort or city)**

##### 2. What is/are the insight(s) found from the chart?

As we can see **august and july** month having large number of booking as compare to another month in both type of hotel

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

**As per the visualization In the month of july and august we can say in future we will get more booking in same month and we can also make extra arrangement in the same month.Also we can give more offers in same months to increase hotel bookings**

#### Chart - 4 most preferred room type by the customers

In [None]:
# Chart - 4 visualization code

room_type = hotel_df['assigned_room_type'].value_counts().reset_index().rename(columns={'index':'assigned_room_type','assigned_room_type':'no_of_booking'})
room_type['percent'] = round((room_type['no_of_booking'] / 
                  room_type['no_of_booking'].sum()) * 100,2)
print(room_type)
fig, axes = plt.subplots(1, 2, figsize=(18, 8))   
sns.barplot(ax = axes[0],x = 'assigned_room_type', y = 'percent', data = room_type, order = room_type.sort_values('percent', ascending = False).assigned_room_type) 
sns.boxplot(ax = axes[1], x = room_type['assigned_room_type'], y = room_type['percent'])
for i in range(len(room_type['assigned_room_type'])):
        plt.text(i,room_type['percent'][i],room_type['percent'][i],ha = 'center') 
plt.title('Most no. of bookings by the agent', fontsize=15)
plt.ylabel('Number of bookings % form', fontsize=12)
plt.xlabel('assigned_room_type', fontsize=12)
plt.show()

##### 1. Why did you pick the specific chart?

**I choose barplot here because it gives data visualization in pictorial form and due to this comparison of data is easy.**

##### 2. What is/are the insight(s) found from the chart?

**The insighte found from the chart is A type rooms are most prefered rooms and the count is 46283 and after that D type rooms are prefered by the guest and count is 22419.**

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

**A type rooms are most preferred rooms. This make positive impact on business.H,I,K,L type rooms are less preferred this insight makes neative impact.This is beacause type A rooms have 46283 bookings and type L room has only one booking.**

#### Chart - 5  Which meal is preferred more by customers

In [None]:
# Chart - 5 visualization code
meal=hotel_df.meal.value_counts(normalize=True)
print(meal)
meal_labels= ['BB','HB','SC','FB', 'Undefined']

plt.figure(figsize=(10,10))
plt.pie(meal, explode=None, labels=meal_labels,  autopct='%1.1f%%', startangle=40,wedgeprops = { 'linewidth' : 1, 'edgecolor' : 'gray' }) 
plt.title('Meal Types', weight='bold')
plt.legend()
plt.show()

##### 1. Why did you pick the specific chart?

**I use pie chart because pie chart gives simple and easy to understand picture that shows which meal is preferred more by customers**

##### 2. What is/are the insight(s) found from the chart?

**The insight found here is BB type food is most preferred and FB type of food is less preferred.**

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

* **BB type of food is most preferred food this makes positive impact on business.**


* **Undefined and FB type of food is less preferred this insight makes neative impact on business.**

* **The BBtype food is preferred by 67907 guests and FB type of food is preferred by only 360 guests.**

#### Chart - 6 which countries most of the customers visit these hotels?

In [None]:
# Chart - 6 visualization code
grp_by_country = hotel_df.groupby('country')
d2 = pd.DataFrame(grp_by_country.size()).rename(columns = {0:'no. of bookings'}).sort_values('no. of bookings', ascending = False)
d2 = d2[:10]  
print(d2)                  # we will only choose top 10 countries
sns.barplot(x = d2.index, y = d2['no. of bookings'])
plt.show()

##### 1. Why did you pick the specific chart?

**I choose barplot here because it gives data visualization in pictorial form. So comparison becomes easy.**

##### 2. What is/are the insight(s) found from the chart?

From **PRT,GBR,FRA** countries most of people booked hotel

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

* **Positive impact**: People coming from PRT,GBR and FRA booked more hotel so for from this coming in future for them we can give special booking offer
* **Negative impact** Their are alot of contries having only one count so we need are getting that much of booking from this countries

#### Chart - 7 Which distribution channel is mostly used for hotel booking?

In [None]:
# Chart - 7 visualization code
# distribution channel value count
distribution_channel_counts = hotel_df['distribution_channel'].value_counts()
distribution_channel_counts

In [None]:
# distribution channel count in df format
distribution_channel_df = hotel_df['distribution_channel'].value_counts().reset_index().rename(columns={'index':"distribution_channel",'distribution_channel':'count'})
distribution_channel_df

In [None]:
# booking by distribution channel in percent 
distribution_channel_df_percent = pd.DataFrame(round((distribution_channel_counts/hotel_df.shape[0])*100,2)).reset_index().rename(columns={'index':'distribution_channel','distribution_channel':'% booking'})
distribution_channel_df_percent

In [None]:
#Visualization of mostly used distribution channels using barplot
plt.figure(figsize=(14,7))
sns.barplot(data=distribution_channel_df_percent, x="distribution_channel", y="% booking")
for i in range(len(distribution_channel_df_percent['distribution_channel'])):
        plt.text(i,distribution_channel_df_percent['% booking'][i],distribution_channel_df_percent['% booking'][i],ha = 'center') 
plt.title("Mostly used distribution Channels", fontsize = 20)
plt.xlabel('Distribution Channel', fontsize = 15)

plt.ylabel('Booking by distribution channel in percent', fontsize = 15)

##### 1. Why did you pick the specific chart?

**Because barplot gives simple and easy to understand pictorial chart.**

##### 2. What is/are the insight(s) found from the chart?

**Mostly used distribution channel is TA/TO channel.The total count of booking is 69028 and booking in percent is 79.13.**

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

* Distribution channel TA/TO is mostly used channel this makes positive impact.
* Distribution channel GDS and undefined is less used channels this makes negative impact.
* Use of TA/TO is 79.13% and use of GDS is 0.21%.
* Other channels can provide those facilities which are provided by TA/TO channel.

#### Chart - 8  which year had highest bookings?

In [None]:
# Chart - 8 visualization code

year_count = hotel_df['arrival_date_year'].value_counts().sort_index().reset_index().rename(columns={'index':'arrival_date_year','arrival_date_year':'no_of_booking'})
year_count

In [None]:

# Visualization of year wise booking using countplot chart
plt.figure(figsize=(14,7))
grp = sns.countplot(x=hotel_df['arrival_date_year'],hue=hotel_df['hotel'])

plt.title('Year wise Bookings', fontsize = 20)
plt.xlabel('Arrival_date_year', fontsize = 15)
plt.ylabel('Count of bookings', fontsize = 15)

##### 1. Why did you pick the specific chart?

**Because countplot is easy to understand.**

##### 2. What is/are the insight(s) found from the chart?

**2016 had highest bookings and 2015 had lowest bookings.**

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

* Year 2016 had highest bookings this makes positive impact.
* Year 2015 had lowest bookings this makes negative impact.
* In 2016 there were 42313 bookings and In 2015 there were 13284 bookings.

### **Bivariate and Multivariate analysis**

#### Chart - 9 Which hotel has longer waiting time?

In [None]:
# Chart - 9 visualization code
Waiting_time = pd.DataFrame(hotel_df.groupby('hotel')['days_in_waiting_list'].mean().round().reset_index())
Waiting_time
     

In [None]:

# Visualization of hotel which has longer waiting time by using barplot
plt.figure(figsize=(14,7))
sns.barplot(x=Waiting_time['hotel'],y=Waiting_time['days_in_waiting_list'])
plt.title('Waiting time for each hotel type', fontsize=20)
plt.xlabel('Type of hotel',fontsize=15)
plt.ylabel('Waiting time', fontsize=15)

##### 1. Why did you pick the specific chart?

**I choose barplot bacuase it gives easy to understand pictorial diagram for the visualization of which hotel has longer waiting time.**

##### 2. What is/are the insight(s) found from the chart?

**City hotel has average  waiting time is one and resort hotel average waiting time is zero.Therefore city hotel is much busier than Resort hotel.**

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

* **City hotel has longer waiting time this makes positive impact on business.**
* **Resort hotel has less waiting time this makes negative impact on business.**
* **The mean of days in waiting list for city hotel is about 1 and for resort hotel is about 0**
* **Resort hotel need to increase their facilities so that their bookings increases.**

#### Chart - 10 Which distribution channel brings better revenue generating deals for hotels?

In [None]:
# Chart - 10 visualization code
group_by_dc_hotel = hotel_df.groupby(['distribution_channel', 'hotel'])
d5 = pd.DataFrame(round((group_by_dc_hotel['adr'].mean()),2)).reset_index().rename(columns = {'adr': 'avg_adr'})
print(d5)
plt.figure(figsize = (15,10))
g1=sns.barplot(y = d5['distribution_channel'], x = d5['avg_adr'], hue = d5['hotel'],palette = "Blues")

plt.show()

##### 1. Why did you pick the specific chart?

**I choose horizontal barplot bacuase it gives easy to understand pictorial diagram for the visualization of which hotel has longer waiting time.**

##### 2. What is/are the insight(s) found from the chart?

**GDS channel brings higher revenue generating deals for City hotel, in contrast to that most bookings come via TA/TO. City Hotel can work to increase outreach on GDS channels to get more higher revenue generating deals.**

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

**Resort hotel has more revnue generating deals by direct and TA/TO channel. Resort Hotel need to increase outreach on GDS channel to increase revenue.**

#### Chart - 11 Booking cancellation Analysis

##### Which significant distribution channel has highest cancellation percentage?

In [None]:
group_by_dc = hotel_df.groupby('distribution_channel')
d1 = pd.DataFrame((group_by_dc['is_canceled'].sum()/group_by_dc.size())*100).drop(index = 'Undefined').rename(columns = {0: 'Cancel_%'})
print(d1)
plt.figure(figsize = (10,5))
g1 = sns.barplot(x = d1.index, y = d1['Cancel_%'])

plt.show()

In [None]:
waiting_bookings = hotel_df[hotel_df['days_in_waiting_list'] !=0]  # Selecting bookings with non zero waiting time
fig, axes = plt.subplots(1, 2, figsize=(18, 8))
sns.kdeplot(ax=axes[0],x = 'days_in_waiting_list', hue = 'is_canceled' , data = waiting_bookings)
sns.kdeplot(ax = axes[1], x = hotel_df['lead_time'], hue = hotel_df['is_canceled'])
plt.show()

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 12 Which hotel type has the highest ADR?

In [None]:
# Chart - 12 visualization code
highest_adr = hotel_df.groupby('hotel')['adr'].mean().reset_index()
highest_adr

##### 1. Why did you pick the specific chart?

**I choose bar plot because it gives simple pictorial diagram and it also easy to understand.**

##### 2. What is/are the insight(s) found from the chart?

**The insight found from the chart is City hotel has highest adr that means city hotel generate more revenue.**

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

* **City hotel has high adr this makes positive impact.**
* **Resort hotel has less adr as compaire to city hotel this makes negative impact.**
* **City hotel has adr 111.27 means more revenue and resort hotel has 99.05 adr means less revenue than city hotel.**
* **Resort hotel should have increase there facilitis which increase revenue.**






#### Chart - 13 which hotel has longer waiting time?

In [None]:
# Chart - 13 visualization code
Waiting_time = hotel_df.groupby('hotel')['days_in_waiting_list'].mean().reset_index()
Waiting_time
     

In [None]:

# Visualization of hotel which has longer waiting time by using barplot
plt.figure(figsize=(14,7))
sns.barplot(x=Waiting_time['hotel'],y=Waiting_time['days_in_waiting_list'])
plt.title('Waiting time for each hotel type', fontsize=20)
plt.xlabel('Type of hotel',fontsize=15)
plt.ylabel('Waiting time', fontsize=15)

##### 1. Why did you pick the specific chart?

**I choose barplot bacuase it gives easy to understand pictorial diagram for the visualization of which hotel has longer waiting time.**

##### 2. What is/are the insight(s) found from the chart?

**City hotel has longer waiting time.Therefore city hotel is much busier than Resort hotel.**

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

* **City hotel has longer waiting time this makes positive impact on business.**
* **Resort hotel has less waiting time this makes negative impact on business.**
* **The mean of days in waiting list for city hotel is about 1.02 and for resort hotel is about 0.32.**
* **Resort hotel need to increase their facilities so that their bookings increases.**

#### Chart - 14 - Correlation Heatmap

In [None]:
# Correlation Heatmap visualization code

hotel_df.head(2)

In [None]:
# Correlation Heatmap visualization code
plt.figure(figsize=(20,10))
sns.heatmap(hotel_df.corr(),annot=True)
plt.title('Correlation of the columns')
plt.show()

##### 1. Why did you pick the specific chart?

**I choose heatmap here becuase heatmap display a more eneralized view of neumeric values and also utilize color coded systems.**

##### 2. What is/are the insight(s) found from the chart?

* **arrival_date_year and arrival_date_week_number columns has negative correlation which is -0.51.**
* **stays_in_week_nights and total_stay has positive correlation which is 0.95.**

#### Chart - 15 - Pair Plot 

In [None]:
# Pair Plot visualization code
df2= hotel_df[['hotel','is_canceled','stays_in_week_nights','stays_in_weekend_nights','total_stay','total_people','agent','company']]

#sns.pairplot(pair_plot1, hue="hotel")

In [None]:
df2.head()

In [None]:
sns.pairplot(df2, hue="hotel")

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

## **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ? 
Explain Briefly.

1) To increase hotel business some factors are important like high revenue, generation, customers satisfaction, facilities provided by hotel etc.

2) I am able to achieve the same things by showing to client which hotel is most preferred , percentage of repeated guests, mostly preferred food by guests, then which hotel has highest adr etc.

3) Most preferred room type is achieved by countplot so the client can be well prepare in advance and this insight help client for further enhancement of their hospatility.

4) I am able to show which food type is mostly preferred so client can offer the mostly preferred food to the guests.

5) Most preferred month are shown by barplot so client can be well prepared in advanced so that minimum grivances would be faced by client.

6) Using barplot I am able to show which hotel type has high adr so client can analyse which hotel has high income.

7) I am able to show which hotel is busiest hotel sp client can do relatable changes in facilities in less busy hotel type.

8) I am able to show the relationship between repeated guests and previous bookings not cancelled so client can preferred repeated guests.

9) Using barplot relationship between adr and total number of people is shown so client can preferred maximum number of people.

# **Conclusion**

1. **agent 9 and 240 almost made most of the booking which makes positive impact
and even in future if we give proper treatment from hotel to this agent then they will also give them more bookings**
2. **Around 27.5% of booking were canceled from overall bookings**
3. **A type rooms are most prefered rooms and the count is 46283 and after that D type rooms are prefered by the guest and count is 22419.**
4. **BB type food is most preferred and FB type of food is less preferred.**
5. **august and july month having large number of booking as compare to another month in both type of hotel**
6. **From PRT,GBR,FRA countries most of people booked hotel**
7. **Mostly used distribution channel is TA/TO channel.The total count of booking is 69028 and booking in percent is 79.13.**
8. **City hotel has average waiting time is one and resort hotel average waiting time is zero.Therefore city hotel is much busier than Resort hotel.**
9. ** GDS channel brings higher revenue generating deals for City hotel, in contrast to that most bookings come via TA/TO. City Hotel can work to increase outreach on GDS channels to get more higher revenue generating deals.**
10. **arrival_date_year and arrival_date_week_number columns has negative correlation which is -0.51.**
11. **stays_in_week_nights and total_stay has positive correlation which is 0.95.**
12. **City hotel has longer waiting time.Therefore city hotel is much busier than Resort hotel.**


### ***Hurrah! You have successfully completed your EDA Capstone Project !!!***