**PROJECT NAME: Hotel Booking Analysis
Exploratory Data Analysis**


##### **Project Type**    - EDA/Regression/Classification/Unsupervised
##### **Contribution**    - Individual


 **Project Summary**

Almabetter Hotel Booking EDA Project Using Python by Vishal Pal-
This project refer to hotel reservations and includes a variety of city hotels and resort hotels. There are 119390 rows and a total of 32 column in the dataset.Data collection, data cleansing and manipulation, and EDA (Exploratory data analysis) are the three categories into which the workflow for data manipulation is divided. The names of some of the columns, including hotel, is_cancelled, lead_time , arrival_date_year, arrival_date_month, arrival_date_week_number, arrival_date_day_of_month, and stays_in_weekend_nights have been updated as the data collection process has progressed. This is done by coading head(), tail(), info(), describe(),columns() and other methods used for data collection. As we proceed we identify the distinct value for each column, create a list in tabular format, and also verify the dataset type for each column. Identify some columns with inaccurate data types and fix them afterward. As we discover duplicate items, which we later discard from the dataset , duplicate data items must also be removed during data cleansing phase.

We must first perform data manipulation before visualizing any data from data source. To do that, we examined each column's null value.
After checking, drop the columns using the 'drop' method if we find one that
has a greater percentage of null values. We are so removed from the 'company' column. When there are only a few null values, we fill those null values with the necessary values using the formula.fill().

To achieve greter understanding and business goals, many charts are utilized for data visualization.

# **GitHub Link -**

Provide your GitHub Link here.

# **Problem Statement**


Have you ever thought the ideal season of the year to reserve a hotel room? Alternatively, how long should I remain to get the greatest daily rate? What if you wanted to foretell wheather a hotel would unresonably frequently recieve unusual requests? You can investigate those questions using the data from hotel reservation! This data collection comprises reservation details for a city hotel and a resort hotel, as well as details like the date the reservation was made, the duration of the stay, the number of adults, kids, babies and the number of parking spaces that are available. That data is devoid of any information that may be used to identifya specific person. Investigate and evaluate the information to find crucial elements that control reservations.


#### **BUSINESS OBJECTIVE**

Analyse the reservation data for the City Hotel and Resort Hotel to learn more about the various elements that influences a reservation.

# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required.
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits.
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 20 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule.

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Import Libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

In [None]:
from google.colab import drive
drive.mount('/content/drive')

### Dataset Loading

In [None]:
# Load Dataset
database = ('/content/drive/My Drive/Almabetter/Copy of Hotel Bookings.csv')
hotel_booking_df = pd.read_csv(database)


In [None]:
# DataSet_link-
DataSet_link = "https://drive.google.com/file/d/1BSvruwIUbxi6pSixDncOlA9yIzPYNbYv/view?usp=drive_link"

### Dataset First View

In [None]:
# Dataset First Look
hotel_booking_df

### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count
print(hotel_booking_df.columns)
print(hotel_booking_df.shape)



### Dataset Information

In [None]:
# Dataset Info
hotel_booking_df.info()

#### Duplicate Values

In [None]:
# Dataset Duplicate Value Count
duplicates=hotel_booking_df.duplicated()

#Here we use value_counts method to find number of duplicates value
duplicates.value_counts()


In [None]:
#We store the duplicate values in duplicate_row variable
duplicate_row = hotel_booking_df[duplicates]
print(duplicate_row)

In [None]:
#Removing the duplicate value
hotel_booking_df.drop_duplicates(inplace=True,subset=None)
#Unique rows after dropping the duplicate values
uni_num_rows = hotel_booking_df.shape
print(uni_num_rows)

In [None]:
hotel_booking_df

#### Missing Values/Null Values

In [None]:
# Missing Values/Null Values Count
miss_value = hotel_booking_df.isnull().sum().sort_values(ascending=False)
miss_value

In [None]:
# Visualizing the missing values
null_value=hotel_booking_df.isnull()

#Replacing all the null values as nan
hotel_booking_df.fillna(np.nan,inplace=True)


### What did you know about your dataset?

A single file in this data collection analyses different booking details between two hotels: a city hotel and a resort hotel. Include details like the data the reservation was made, the number of people staying, the number of adults, kids and babies, and the amount of parking space available and other things. There are 32 columns and 119390 rows in the entire dataset. Dataset contains duplicate items, such as 31944, which is later removed. Every column in this dataset has a data type, such as an integer, a float or a text. We note that some of these data types are inaccurate and eliminate those columns afterward. We calculate the distinct value for each column, which represents the actual values for each column.

## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns
df_columns = hotel_booking_df.columns
df_columns

In [None]:
# Dataset Describe
hotel_booking_df.describe()

### Variables Description

**The columns and the data it represents are listed below:**
1. **hotel-** Name of hotel (Resort Hotel or City Hotel).
2. **is_cancelled-** If the booking was cancelled (1) or not (0).
3. **lead_time:** Number of days before the actual arrival of the guest.
4. **arrival_date_year-** Year of arrival date.
5. **arrival_date_month-** Month on which guest are going to come.
6. **arrival_date_week_number-** Month's week number of guest arrival.
7. **arrival_date_day_of_month:** Day of arrival Month.
8. **stays_in_weekend_nights:** Number of weekend nights (Sunday or Saturday)
   spent at the hotel by the guest.
9. **stays_in_week_nights:**Number of weeknights(Monday to Friday) spent at the
   hotel by the guest.
10. **adults:** Number of adults among guests.
11. **children:** Number of children among guests.
12. **babies:** Number of babies among guests.
13. **meal:**Type of meal booked.
14. **country:** Country of guests.
15. **market_segment:** Designation of market segment.
16. **distribution_channel:** Name of booking distribution channel.
17. **is_repeated_guest:** If the booking was from a repeated guest(1) or not (0).
18. **previous_cancellation:** Number of previous booking that were cancelled by  
   the customer prior to the current booking.
19. **previous_booking_not_cancelled:** Number of previous booking not cancelled by the customer prior to their current booking
20. **reserved_room_type:** Code of room type reserved.
21. **assigned_room_type:** Code of room type assigned.
22. **booking_changes:** Number of changes/amendments made to the booking.
23. **deposit_type:** Type of the deposit made by the guest.
24. **agent:** ID of travel agent who made the booking.
25. **company:** ID of the company that made the booking.
26. **days_in_waiting_list:** Number of days the booking was in the waiting list.
27. **customer_type:** Type of customer, assuming one of four categories.
28. **adr:** Average daily rate, as defined by dividing the sum of all lodging
   transactions by the total number of staying nights.
29. **required_car_parking_spaces:** Number of car parking spaces required by the
   customer.
30. **total_of_special_requests:** Number of special request made by the customer.
31. **reservation_status:** Resevation status (Cancelled , checkout or No Show)
32. **reservation_status_date:** Date at which the last reservation status was
 updated.


### Check Unique Values for each variable.

In [None]:
# Check Unique Values for each variable.
hotel_booking_df.nunique().sort_values(ascending=False)


## 3. ***Data Wrangling***

In [None]:
hotel_booking_df['is_canceled'].sort_values(ascending=False)

In [None]:
hotel_booking_df.is_canceled.value_counts()

### Data Wrangling Code

In [None]:
# Write your code to make your dataset analysis ready.
#Checking the percentage of null values in each column, starting from company
percentage_company_null = (hotel_booking_df['company'].isnull().sum()/len(hotel_booking_df)*100)
print(percentage_company_null)


In [None]:
#To fill the nan values in the column, let's check which columns has null value-
hotel_booking_df.isnull().sum().sort_values(ascending=False).head(5)

In [None]:
# It is better to drop the column 'company' altogether since the number of missing value are very high.
hotel_booking_df.drop(['company'],axis=1,inplace=True)
hotel_booking_df.shape


In [None]:
#Now check for agent column
percentage_agent_null=hotel_booking_df['agent'].isnull().sum()/len(hotel_booking_df)*100
print(percentage_agent_null)


In [None]:
# As we have seen that there are only 13.95 percentage null value in agent, lets fill these value by taking mode of all the values.
# Calculate the mode of the 'agent' column
mode_value = hotel_booking_df['agent'].mode()[0]

# Fill NaN values in the 'agent' column with the mode value
hotel_booking_df['agent'].fillna(mode_value, inplace=True)

#Rechecking that column has no null value
hotel_booking_df['agent'].isnull().sum()

In [None]:

#checking the percentage null value in country column
percentage_country_null= hotel_booking_df['country'].isnull().sum()/len(hotel_booking_df)*100
print(percentage_country_null)

In [None]:
#As the null value in country column are very less so we will replace it by 'other' country name
hotel_booking_df['country'].fillna(value='other',inplace=True)
#Rechecking percentage of country null , wheather it is zero or not
hotel_booking_df['country'].isnull().sum()

In [None]:
#Checking the percentage of null value in children column
percentage_children_null=hotel_booking_df['children'].isnull().sum()/len(hotel_booking_df)*100
percentage_children_null

In [None]:
# As we have seen that there are only 0.00457 percentage null value in 'children', lets fill these value by taking mode of all the values.
# Calculate the mode of the 'children' column
mode_value = hotel_booking_df['children'].mode()[0]

# Fill NaN values in the 'children' column with the mode value
hotel_booking_df['children'].fillna(mode_value, inplace=True)

#Rechecking that column has no null value
hotel_booking_df['children'].isnull().sum()

In [None]:
#Lets check wheather the database having any other null values
hotel_booking_df.isnull().sum()

In [None]:
#change In the datatype for required columns
#Showing the info of the data to check datatype
hotel_booking_df.info()

In [None]:
#Let's create a new column in dataset "total_stays_in_night"by adding week night and weekend night.
hotel_booking_df['total_stay_in_nights']= hotel_booking_df['stays_in_weekend_nights'] + hotel_booking_df['stays_in_week_nights']
hotel_booking_df['total_stay_in_nights']

In [None]:
# We have created a column for revenue using total stay * adr
hotel_booking_df['revenue']= hotel_booking_df['total_stay_in_nights'] * hotel_booking_df['adr']
hotel_booking_df['revenue']

In [None]:
hotel_booking_df.columns

In [None]:
#Also for information, we will add a column with total guest coming for each booking
hotel_booking_df['total_guest']= hotel_booking_df['adults']+hotel_booking_df['children']+hotel_booking_df['babies']
hotel_booking_df['total_guest'].sum()

In [None]:
#For understanding , from column "is_canceled": we will replace the value from(0,1) to not_canceled or is_canceled.
hotel_booking_df['is_canceled'] = hotel_booking_df['is_canceled'].replace([0,1],['not canceled','is canceled'])
hotel_booking_df['is_canceled']

In [None]:
#Same for 'is_repeated_guest' col
hotel_booking_df['is_repeated_guest'] = hotel_booking_df['is_repeated_guest'].replace([0,1],['not repeated','repeated'])
hotel_booking_df['is_repeated_guest']

In [None]:
#Now we will check overall revenue hotel wise
hotel_wise_total_revenue = hotel_booking_df.groupby('hotel')['revenue'].sum()
hotel_wise_total_revenue

In [None]:
hotel_booking_df[['hotel','revenue']].sort_values(by='revenue',ascending=False)

### What all manipulations have you done and insights you found?

**The changes made in dataset are-**
**COLUMN ADDED-**
**1.**NUMBERS OF GUESTS: We may assess the overall number of guests and income by using these columns. This value is obtainedby summing the total number of adults, children and infants.
**2.**INCOME: ADR and total guests are multiplied to find revenue. This column will be used to examine each hotel's growth and profitability.

**COLUMN DELETED**
**1.** COMPANY: As we can see, this column almosyt has no data. As a result, we had to eliminate this column because it had no bearing on the analysis.

**Replace Values in Column BY**

is cancelled, Isn't canelled, and is a repeat visitor: As we have seen, these columns only had the value of 0,1 to indicate that the boycott is now being cancelled. These values(0,1) from "Canceled" and "Not Canceled" are changed. The same procedure is used to change 0,1 from "Repeated" and "Not Repeated" in the column "is_repeated_guest". These values will now facilitate greater comprehension during visualization.

**Chnages To Various Data Type In Columns**

**1. Agent & Kids:** We verified that these columns contain flaot values, which don't make sense in the data because they represent the guest count and agent ID. Therefore, we have updated the 'float' data type of these columns to 'Integer'.

**Removed duolicate entries and is_null values**

**1.**Data wrangling must be done before any data from the data set can be visulaized.

We have examined each column's null value in order to determine that. After checking, drop the column using the 'drop' method if we find one that has greater percentage of null values. We are so removed from the "Company" column. When there are only a few  null values, we fill those null values with the necessery value using the fillna() function.

**2.** In the same, we looked to see wheather there was any data duplication and discovered that a few rows had duplicate data. As a result, we used the drop_duplicates() method to delete those rows from the data set.
By doing this, we have eliminated the unnecessary data.


Answer Here.

## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

#### Chart - 1

In [None]:

# Let's create a function which will give us bar chart of data respective with a col.
#Prepares a DataFrame with counts of unique values from a specified column.
def get_count_from_column_bar(df,column_label):
    df_grpd = df[column_label].value_counts()
    # df_grpd = pd.DataFrame(df_grpd)
    df_grpd = pd.DataFrame({'index':df_grpd.index,'count':df_grpd.values})
    return df_grpd

#plot a bar chart from given data
#Uses this DataFrame to plot a bar chart with specified customizations.
def plot_bar_chart_from_column(df,column_label,t1):
  df_grpd = get_count_from_column(df,column_label)
  fig,ax = plt.subplots(figsize=(14,9))
  c= ['g','r','b','c','y']
  ax.bar(df_grpd['index'],df_grpd['count'],color=c,width=0.4,align='edge',edgecolor='black',linewidth=4,linestyle=':',alpha=0.5)
  plt.title(t1,bbox={'facecolor':'0.8','pad':3})
  plt.legend()
  plt.ylabel('Count')
  plt.xticks(rotation=15) #use to format the label of a-axis
  plt.show()

In [None]:

#Let's create a function which will give us bar chart of data respective with a col.
#Chart -1 visualization code
def get_count_from_column(df,column_label):
  df_grpd = df[column_label].value_counts()
  df_grpd = pd.DataFrame({'index':df_grpd.index,'count':df_grpd.values})
  return df_grpd

#Plot a pie chart from grouped data
def plot_pie_chart_from_column(df,column_label,t1,exp):
  df_grpd = get_count_from_column(df,column_label)
  fig,ax = plt.subplots(figsize=(14,9))
  ax.pie(df_grpd.loc[:,'count'],labels=df_grpd.loc[:,'index'],autopct='%1.2f%%',startangle=90,labeldistance=1.2,explode=exp)
  plt.title(t1,bbox={'facecolor':'0.8','pad':3})
  ax.axis('equal')
  plt.legend()
  plt.show()

In [None]:
exp1 = [0.05,0.05]
plot_pie_chart_from_column(hotel_booking_df,'hotel','Booking percentage of Hotel by Name',exp1)

##### 1. Why did you pick the specific chart?

To present the data that in which hotel more number of booking have been done.

##### 2. What is/are the insight(s) found from the chart?

Here, we found that the booking number is Higher in City Hotel which is 61.13% in City Hotel and 38.87% in Resort Hotel . Hence we can say that City Hotel has more consumption.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, for both Hotels, this data making some positive business impact:-

**City Hotel:-** Provided more services to attract more guest to increase more revenue.

**Resort Hotel-** Find solution to attract guest and find what city hotel did to attract guest.


#### Chart - 2

In [None]:
# Chart - 2 visualization code
exp4 =[0,0.1]
plot_pie_chart_from_column(hotel_booking_df,'is_canceled','Cancellation volume of Hotel',exp4)

##### 1. Why did you pick the specific chart?

**In this chart, we presented the cancellation rate of the hotel booking.**

##### 2. What is/are the insight(s) found from the chart?

**Here, we found that overall more 27.49% of booking got cancelled.**


##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

**Here we can see that overall more than 25% of booking got cancelled, So we can check the reason of this high cancellation rate and find ways to sort it at business level.**

#### Chart - 3

In [None]:
# Chart - 3 visualization code
plot_bar_chart_from_column(hotel_booking_df,'distribution_channel','Distribution Channel Volume')


##### 1. Why did you pick the specific chart?

The following chart represent maximum volume of booking done through a specific channel. To represent the number in descending order we choose bar graph.

##### 2. What is/are the insight(s) found from the chart?

As clearly seen TA/TQ (Tour of agent & Tour of operator) is highest, recommending to continue booking through TA/TO.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes this shows positive business impact.
Higher number of TA/TO will help to increses the revenue generation of hotel.

#### Chart - 4

In [None]:
# Chart - 4 visualization code
exp2 = [0.2,0.0,0,0,0,0,0,0,0,0,0,0]
plot_pie_chart_from_column(hotel_booking_df,'arrival_date_month', 'Month-Wise Booking',exp2)

##### 1. Why did you pick the specific chart?

**To show the percentage share of booking in each month.**

##### 2. What is/are the insight(s) found from the chart?

**The above percentage shows month May, July and Aug are the highest booking months due to holiday season. Recommending aggressive advertisement to lure more and more customers.**

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

**Yes, with increased volume of visitors will help hotel to manage revenue in down time, will also help employee satisfaction and retention**

#### Chart - 5

In [None]:
# Chart - 5 visualization code
exp3 = [0,0.2]
plot_pie_chart_from_column(hotel_booking_df,'is_repeated_guest','Guest repeating status',exp3)

##### 1. Why did you pick the specific chart?

**To show the percentage share of repeated & non-repeated guests.**

##### 2. What is/are the insight(s) found from the chart?

**Here, we can see that the number of repeated guests is very less as compared to overall guests.**


##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

**We can give alluring offer to non-repetitive customer during Off season to enhance revenue.**

#### Chart - 6

In [None]:
# Chart - 6 visualization code
plot_bar_chart_from_column(hotel_booking_df,'assigned_room_type','Assigned of room by type')


##### 1. Why did you pick the specific chart?

**To show distribution by volume, which room is alloted.**

##### 2. What is/are the insight(s) found from the chart?

**This chart shows room type A is most preffered by guest.**

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

**Yes, Positive impact because 'A','D','E' is more preffered by guest due to better sevices in room type.**

#### Chart - 7

In [None]:
# Chart - 7 visualization code
market_segment_df = pd.DataFrame(hotel_booking_df['market_segment'])
market_segment_df_data = market_segment_df.groupby('market_segment')['market_segment'].count()
market_segment_df_data.sort_values(ascending=False,inplace=True)
plt.figure(figsize=(15,6))
y = np.array([4,5,6])
market_segment_df_data.plot(kind = 'bar', color=['r','g','y','b','pink','black','brown'],fontsize=20,legend='True')

##### 1. Why did you pick the specific chart?

**In this chart, we have seen market segment by which hotel has booked**

##### 2. What is/are the insight(s) found from the chart?

**Online TA has been used most frequently to book hotel by the guest.**

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

**Yes , it is creating positive business impact that guest are using Online TA market segment as most prefered to book hotel.**

#### Chart - 8

In [None]:
# Chart - 8 visualization code
#Top 10 countries by guest
guest_country_wise = pd.DataFrame(hotel_booking_df[['country','total_guest']])
guest_country_wise_df = guest_country_wise.groupby(['country'])['total_guest'].sum()
guest_country_wise_df.sort_values(ascending= False, inplace =True)
top_10_country_by_guest = guest_country_wise_df.head(10)

In [None]:
plt.figure(figsize=(15,6))
sns.barplot(x=top_10_country_by_guest.index,y=top_10_country_by_guest).set(title='Top 10 Countries by Guest')
print("\n\nPRT = Portugal\nGBR=Great Britain & Northern Ireland\nFRA=France\nESP=Spain\nDEU =Germany\nITA=Italy\nIRL = Ireland\nBRA = Brazil\nBEl=Belgium\nNLD=Netherland")

##### 1. Why did you pick the specific chart?

**We use this chart to observe that the majority of visitors come from which countries. A chart of top 10 nationsis displayed**

##### 2. What is/are the insight(s) found from the chart?

**As we can see that the majority of visitors are from Portugal.**

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

**To increse the number of customers, we can increase our marketting efforts and provide exciting deals to visitors from portugals.**

#### Chart - 9

In [None]:
# Chart - 9 visualization code
#Average ADR of city hotel and resort hotel
average_adr = hotel_booking_df.groupby('hotel')['adr'].mean()
average_adr
average_adr.plot(kind='barh', color=('g','r'))
plt.xlabel("Average ADR",fontdict={'fontsize':12,'fontweight':5,'color':'brown'})
plt.ylabel("Hotel Name", fontdict={'fontsize':12,'fontweight':5,'color':'Brown'})
plt.title("Average ADR of Hotel", fontdict={'fontsize':12, 'fontweight':5,'color':'Green'})

##### 1. Why did you pick the specific chart?

**This bar chartclearly shows differencebetween ADR from resort hotel and city hotel.**

##### 2. What is/are the insight(s) found from the chart?

**This bar chartclearly shows differencebetween ADR from resort hotel and city hotel.**

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

**Yes, Insight from this graph is very usefull as from this chart we came to know that Average daily rate (ADR's) of city hotel is more than average daily rate of Resort Hotel, So we can find out reasons for low average ADR of resort hotel and can work on that.**

#### Chart - 10

In [None]:
# Chart - 10 visualization code
#Revenue wise comparision of city and resort hotel
plt.figure(figsize=(8,5))
hotel_wise_revenue = hotel_booking_df.groupby('hotel')['revenue'].sum()
hotel_wise_revenue
ax = hotel_wise_revenue.plot(kind='bar',color=['blue','green'])
plt.xlabel('Hotel', fontdict={'fontsize':12,'fontweight':5,'color':'Brown'})
plt.ylabel('Total Revenue',fontdict={'fontsize':12,'fontweight':5,'color':'Brown'})
plt.title('Total Revenue',fontdict={'fontsize':12,'fontweight':5,'color':'Green'})

##### 1. Why did you pick the specific chart?

**This bar chart clearly shows difference between Revenue from resort hotel and city hotel.**

##### 2. What is/are the insight(s) found from the chart?

**This bar shows difference between Revenue from resort hotel and city hotel.**

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.


**Yes, Insight from this graph is very usefull as from this chart we came to know that Revenue of city hotel is more than Revenue of Resort Hotel, So we can find out reasons for low average ADR of resort hotel and can work on that.**

#### Chart - 11

In [None]:
# Chart - 11 visualization code
#
plt.figure(figsize=(12,10),dpi=100)
hotel_wise_meal = hotel_booking_df.groupby(['hotel','meal'])['meal'].count().unstack()
hotel_wise_meal.plot(kind='bar',figsize=(12,8))
hotel_wise_meal

##### 1. Why did you pick the specific chart?

**As this chart clearly display the visitor's dining experiance hotel-wise.**

##### 2. What is/are the insight(s) found from the chart?

**As you can see, guests at both the hotels prefer the BB(Bed & Breakfast) meal. In order to increase customer retention and draw a new customer , the hotel can serve more delicious meals during dinner**

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

**Yes , As you can see, guests at both the hotels prefer the BB(Bed & Breakfast) meal. In order to increase customer retention and draw a new customer , the hotel can serve more delicious meals during dinner**

#### Chart - 12

In [None]:
#Chart - 12 visualization code
#Chart to indicate the effect of increased ADR on "total stays in night."
plt.figure(figsize=(12,6))
sns.scatterplot(y='total_stay_in_nights',x='adr',data=hotel_booking_df[hotel_booking_df['adr']<1000])
plt.show()

##### 1. Why did you pick the specific chart?

**This scatter plot show the relation between effect of Increased ADR on total number of night stay.**

##### 2. What is/are the insight(s) found from the chart?

**The insight which we get is that with incrase in ADR total night stay get decreased.**

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

**Yes the gained insight can help to improve bussiness , as it shows the direct relation between ADR and number of night stays , if we can decrease the ADR by some amount it may result in increase in total number of night stays.**

#### Chart - 13

In [None]:
# Chart - 13 visualization code
#Data visualization code for month wise booking.
order =['January','February','March','April','May','June','July','August','September','October','November','December']
plt.figure(figsize=(12,6))
sns.countplot(x='arrival_date_month',data=hotel_booking_df,order=order)
plt.show()


##### 1. Why did you pick the specific chart?

**This chart shows the month wise hotel booking.**


##### 2. What is/are the insight(s) found from the chart?

**As we can see that the number of booking are more in vacation months, lik July and august.**

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

**Ys with the gained insight we can see that the number of booking in Vacation months are more so we can give exciting offer in vacation months , which can lad to increase in number visiting guest.**

#### Chart - 14 - Correlation Heatmap

In [None]:
# Correlation Heatmap visualization code
corr_df = hotel_booking_df[['lead_time','previous_cancellations','previous_bookings_not_canceled','total_guest','booking_changes','days_in_waiting_list','adr','required_car_parking_spaces','total_of_special_requests']].corr()
f,ax = plt.subplots(figsize=(12,12))
sns.heatmap(corr_df,annot=True,fmt='.2f',annot_kws={'size':10}, vmax=1,square=True,cmap='YlGnBu')

##### 1. Why did you pick the specific chart?

**To comprehend the relationships among various numerical quantites.**

##### 2. What is/are the insight(s) found from the chart?

**The axis's highest corelation value in 39% positive and its lowest correlation value -9% negative.**

#### Chart - 15 - Pair Plot

In [None]:
# Pair Plot visualization code

sns.pairplot(hotel_booking_df)
plt.show()

##### 1. Why did you pick the specific chart?

**This plot chart shows all the pairwise co-relation between the different variables in the dataset.**

##### 2. What is/are the insight(s) found from the chart?

**Insight from this graph are very usefull as from this chart we came to know that Revenue of city hotel is more than Revenue of Resort Hotel, So we can find out reasons for low average ADR of resort hotel and can work on that.**

## **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ?
1. In order for the hotel industry to prosper, a few factors must be taken into account, including hogh revenue generation, customer happiness and employee retention.
2. By using a pie chart distribution, we can demonstrate to the client which months generates the most money.
3. Increasing revenue by using a bar chart to show which types of rooms are most frequently booked and when visitors are most likely to travel.
4. As a result, the customer can be properly prepared in advance, mimimising long term complaint and contribution to furthur improvement of their hospitality.
5. To encourage clients to contact offices for bulk reservation during the off-season, outliers such as large visitors number than average were sprinkled across the plot. This helped generate more money.
6. We can display the visitor arrival trend at client venues, allowing clients to schedule visitors in advance for their entertainment and leisure activities.
7. In order for the percentage undermeath those numbers to be improved by a variety of mediums, we were also able to corelate the values indicating the maximum and minimum % between them.

# **Conclusion**

1. City Hotel generates greater income and profits and appears to be more popular among travellers.
2. Compared to the other months, the majority of resevations are made in month of July and August.
3. Travellers favour accommodation is Type A over all other accommodation types.
4. Portugal and the United Kingdom make the most reservations.
5. The majority of visitors stay in hotels for 1-4 days.
6. The city Hotel keeps a large number of visitors.
7. About one-fourth of all reservation are cancelled. City Hotel is the source of more cancellation.
8. Compared to returning consumers, new guests frequently cancel reservations.
9. Booking cancellation are unaffected by lead time, waiting list length or client assigment of a reserved room.
10. Corporate has the most percentage of repeated guests while TA/TO has the least whearas in the case of cancellation bookings TA/TO has the most percentage while Corporate has the least.
11. The length of the stay decreases as ADR increases probably to reduce the cost.

### ***Hurrah! You have successfully completed your EDA Capstone Project !!!***