<a href="https://colab.research.google.com/github/akshatbhuryan/ak/blob/main/Hotel_Booking_EDA_project.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Project Name**    - **Hotel Booking**

##### **Project Type**    - EDA
##### **Contribution**    - Individual

# **Project Summary -**

This data set contains booking information for a city hotel and a resort hotel, and includes information such as when the booking was made, is the booking cancelled, arrival day/week/month/year, stays in weekend, length of stay, the number of adults, children, and/or babies, and the number of available parking spaces, among other things. This data set contains 1,19,390 rows and 32 columns.


# **GitHub Link -**

My GitHub Link.

* **https://github.com/akshatbhuryan**

# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required.
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits.
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 20 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule.

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Import Libraries
# Import Libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import math


### Dataset Loading

In [None]:
from google.colab import drive
drive.mount('/content/drive')

In [None]:
# Load Dataset
filepath = '/content/drive/MyDrive/Hotel Bookings.csv'
df = pd.read_csv(filepath)

### Dataset First View

### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count
df.shape

### Dataset Information

In [None]:
# Dataset Info
df.info()

In [None]:
df.shape

#### Duplicate Values

In [None]:
# Dataset Duplicate Value Count
len(df[df.duplicated()])

#### Missing Values/Null Values

In [None]:
# Missing Values/Null Values Count
df.isnull().sum()

### What did you know about your dataset?

The dataset is all about hotel booking. The things which are happening within the hotels like bookings, arriving date of the customer, customer details, cancellation policies, types of meal, how broad the market is, different types of rooms, customers prefferable rooms, reservation status and so on.

## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns
list(df.columns)

In [None]:
# Dataset Describe
df.describe(include='all')

### Variables Description

* **Hotel                :** H1 = Resort Hotel, H2=City Hotel


* **is_cancelled       :** If the booking was cancelled(1) or not(0)

* **lead_time           :** Number of days that elapsed between the entering date of the booking into the PMS and the arrival date

* **arrival _date _year           :** Year of arrival date

* **arrival _date _month           :** Month of arrival date

* **arrival_date_week_number       :** Week number of arrival date

* **arrival_date_day             :** Day of arrival date

* **stays_in_weekend_nights         :**  Number of weekend nights (Monday to Friday) the guest stayed or booked to stay at the hotel

* **adults         :** Number of adults

* **children          :** Number of children

* **babies         :** Number of babies

* **meal         :** Kind of meal opted for

* **country         :** Country code

* **market_segment        :** Which segment the customer belongs to

* **Distribution_channel     :** How the customer accessed the stay-corporate booking/Direct/TA.TO

* **is_repeated_guest         :** Guest coming for first time or not

* **previous_cancellation         :** Was there a cancellation before

* **previous_bookings        :** Count of previous bookings

* **reserved_room_type   :** Type of room reserved

* **assigned_room_type   :** Type of room assigned

* **booking_changes   :** Count of changes made to booking

* **deposit_type   :** Type of room reserved

* **agent   :** Booked through agent

* **days_in_waiting_list   :** Number of days in waiting list

* **Customer_type   :** Type of customer

* **required_car_parking   :** If car parking is required

* **total_of_special_req   :** Number of addtional special requirements

* **reservation_status   :** Reservation of status

* **reservation_status_date  :** Date of the specific status

### Check Unique Values for each variable.

In [None]:
# Check Unique Values for each variable.
for i in df.columns.tolist():
  print("Number of unique values in",i,'is',df[i].nunique())

## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:
# Write your code to make your dataset analysis ready.
# Dropping the company column, because out of 119390 rows 112593 rows are null.

df.drop('company',axis=1,inplace=True)
df.shape

In [None]:
# Dropping the duplicates.

df.drop_duplicates()
df.shape

In [None]:
# in agent column filling the mean value on the place of NAN value.

df['agent'] = df['agent'].fillna(float(round(np.mean(df['agent']))))

In [None]:
df['children'] = df['children'].fillna(0)


In [None]:
# converting the data type to datetime.
df['reservation_status_date']= pd.to_datetime(df['reservation_status_date'])

In [None]:
df['customer_type'].unique()

In [None]:
df.groupby('hotel')['adults'].sum()

In [None]:
# Finding all the customers.
df['total_customers'] = df['adults']+df['children']+df['babies']

In [None]:
df.head()

In [None]:
# Getting the unique values from customer type column.
df['customer_type'].unique()

In [None]:
# Conveting the datatype to date.
df['arrival_date_year'] = pd.to_datetime(df['arrival_date_year'],format='%Y')

### What all manipulations have you done and insights you found?

Firstly i have checked for the null values and the data type for each of the columns. I found that the Company column have most of the null values. So, i have dropped that column. Then i have searched for null values for other columns as well and try to fill in up some value into it, which makes some sense for that column. Last but not the least I have created one column, where I have add on all the columns which is related to the customers like- adults, children and babies which is called as total_customers.

## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

#### Chart - 1

In [None]:
# Chart - 1 visualization code

#
df.groupby('hotel')['total_customers'].sum().plot(kind='bar',color = ['#FFC436','#0174BE'])
plt.title("Hotel preferences of customers", fontsize = 20)
plt.xlabel('Hotels', fontsize = 15)
plt.ylabel('No. of customers', fontsize = 10)
plt.show()


##### 1. Why did you pick the specific chart?

A bar plot is a graph that represents the category of data with rectangular bars with lengths and heights that is proportional to the values which they represent.

##### 2. What is/are the insight(s) found from the chart?

With the insights of this chart we get to know the preferences of the customers for the hotel. Here we can clearly see the difference, the number of cutomers for City hotel is approximately double to the Resort hotel.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

In this chart we clearly see the differences between City hotel and Resort hotel. Most of the customers prefers City hotel that's a positive point for the City hotel. On the other hand a very less number of customers prefer to go with Resort hotel that's the negative point for the Resort hotel. And this is one of the main reason of growth

#### Chart - 2

In [None]:
# Chart - 2 visualization code

df.groupby('hotel')['is_canceled'].sum().plot(kind='pie',
                              figsize=(15,6),
                               autopct="%1.1f%%",
                               startangle=90,
                               shadow=True,
                               title='Hotels cancelled'
                              )

##### 1. Why did you pick the specific chart?

A Pie Chart is a circular statistical plot that can display only one series of data. The area of the chart is the total percentage of the given data. The area of slices of the pie represents the percentage of the parts of the data.

##### 2. What is/are the insight(s) found from the chart?

By this chart we get the cancellation percentage for the hotels.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Cancellation for the hotels impact the businesses, nearly 75% of the bookings for the City hotel got cancelled, which affect the business very badly as most of the bookings cancelled by the customers. And only 25% of the bookings got cancelled by the customers for Resort hotel which is good part for Resort hotel.

#### Chart - 3

In [None]:
from ctypes import sizeof
# Chart - 3 visualization code
# sns.barplot(x='stays_in_weekend_nights',y='stays_in_week_nights',hue='hotel',data=df)
df.groupby('hotel')[['stays_in_weekend_nights','stays_in_week_nights']].sum().plot(kind='bar',title='Stays in week and weekend',xlabel='Hotels',ylabel='No. of Customers',grid=True,color = ['#FFC436','#0174BE'])

##### 1. Why did you pick the specific chart?

A bar plot is a graph that represents the category of data with rectangular bars with lengths and heights that is proportional to the values which they represent.

##### 2. What is/are the insight(s) found from the chart?

Comparison between stay in week and stay in weekends for both of the hotels.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Bookings for City hotel is more than the Resort hotel but here as we comparing the stay in week and weekend nights, we can clearly see the differnce that stay in week nights is more than the stay in weekend nights.

#### Chart - 4

In [None]:
# sns.barplot(x='hotel',y='agent',data=df)
df.groupby('hotel')['agent'].sum().plot(kind='pie',
                              figsize=(15,6),
                               autopct="%1.1f%%",
                               startangle=90,
                               shadow=True,
                               title='Booking through agent'
                              )

##### 1. Why did you pick the specific chart?

A Pie Chart is a circular statistical plot that can display only one series of data. The area of the chart is the total percentage of the given data. The area of slices of the pie represents the percentage of the parts of the data.

##### 2. What is/are the insight(s) found from the chart?

This chart is about the bookings. The number of bookings done through agent for both the hotels.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

The large number of bookings for Resort hotel done by agent on the other hand a very small number of bookings for City hotel done by agent, as we can clearly see the proportion of booking through agent in the graph.

#### Chart - 5

In [None]:
df.groupby('arrival_date_year')['total_customers'].sum().plot(kind='line')
plt.xlabel('Years',fontsize=12)
plt.ylabel('No. of customers',fontsize=12)
plt.title('Arrived customers in different years',fontsize=16)
plt.show()

##### 1. Why did you pick the specific chart?

A line chart is a type of chart used to show information that changes over time. Line charts are created by plotting a series of several points and connecting them with a straight line. Line charts are used to track changes over short and long periods.

##### 2. What is/are the insight(s) found from the chart?

In this chart, we can see the number of customers over a period of time who came into the hotel.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

In this chart we can clearly see the growth in the number of customers from 2015 to 2016 and than it starts falling which is not a good thing for the hotels.

#### Chart - 6

In [None]:
# Chart - 6 visualization code
df.groupby('hotel')['is_repeated_guest'].sum().plot(kind='bar',color = ['#FFC436','#0174BE'])
plt.title("Number of repeated customers", fontsize = 20)
plt.xlabel('Hotels', fontsize = 15)
plt.ylabel('No. of customers', fontsize = 10)
plt.show()

##### 1. Why did you pick the specific chart?

A bar plot is a graph that represents the category of data with rectangular bars with lengths and heights that is proportional to the values which they represent.

##### 2. What is/are the insight(s) found from the chart?

Looking for the repeated customers, who come multiple times into the hotel.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

In this chart we get to know that the customers are repeated, it means that customers likes the hotels and they books the hotel for the next time also. This is a very good sign for the growth of the hotel. There is no any major difference between the repeated customers for both the hotels. This is one of the positive impact for the business.

## **5. Solution to Business Objective / Conclusion**

#### What do you suggest the client to achieve Business Objective ?
Explain Briefly.

In this Hotel booking analysis, there are two hotels City hotel and Resort hotel. Resort hotel is very less preferable as compare with City hotel, on the other hand the large number of booking for City hotel got cancelled which is not good thing for the hotel, as the customers reaching to the hotel but due to some reason they cancelled the booking.

People mostly prefer to travel on the weekends but here as we analyze, we got to know on the week nights the bookings are more as compared to weekend nights.
it can happen because the hotels charges more on the weekend as compared to week days.

The customers for the resort hotel mostly dependent on the agents for booking, which is one of the main reason for very less number of booking. The agent also charges some amount in between there should be a platform where the customer can directly approach for the hotels.


As I analyzed for year by year growth, in the year between 2015-2016 there was a insane growth, but after that in between 2016-2017 from there it starts fall in drastically which is not good sign for the future.

With this analysis the hotels should charge appropriately on the basis of their infrastructure and the services they are providing. Hotel service plays a vital role in the growth of the hotel if the customers likes the hotel service or room service then they will definitely come back to the same hotel and they will also recommend the hotel to others.

### ***Hurrah! You have successfully completed your EDA Capstone Project !!!***