<a href="https://colab.research.google.com/github/AnjanaAnoop/Hotel-Booking-Analysis-EDA-Project/blob/main/Hotel_Booking_Analysis_EDA.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Project Name**    - Hotel Booking Analysis



##### **Project Type**    - EDA
##### **Contribution**    - Individual
##### **Team Member 1 -** Anjana K
##### **Team Member 2 -**


# **Project Summary -**

Hotel Bookings Analysis project consists with the real - world data record of hotel bookings of a city hotel and a resort hotel for the period 2015 - 2017 respectively. The project data record consists of information such as type of hotel booked, average daily rate, booking details, arrival date, length of the stay, the number of adults, children, and / or babies booked, customer country, meal preferences, type of the customer, parking space details, reservation status, channels used for booking, booking cancellation details, booking lead time details, among other details. Data analysis and data visualization will be performed using Python libraries.

I started the Hotel Bookings Analysis project to analyse the data and explore the key factors that govern the hotel bookings. In this project, I have downloaded the given dataset of hotel bookings (CSV file) to use it as a pandas dataframe using pandas library. I have came to know more details about the datset by using df.info() method. I have checked the number of data types in the dataset using value_counts() method. If any column of the dataset has highest missing values, the respective coulmn should be removed using drop() method. I have performed data wrangling using value_counts() method and sort_values() method to sort the accordingly. Fianlly, the Vizualization is done in a structured way while following the "UBM" U - Univariate Analysis, B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical), and M - Multivariate Analysis visualize using matplotlib and seaborn libraries respectively.

This data set contains booking information for a city hotel and a resort hotel, and includes information such as when the booking was made, length of stay, the number of adults, children, and/or babies, and the number of available parking spaces, among other things.

All personally identifying information has been removed from the data.

# **GitHub Link -**

https://github.com/AnjanaAnoop

# **Problem Statement**


**The hotel bookings analysis project has information about the number of adults, children, and / or babies, booking cancellation details, the length of the stay, distribution channel details, among other details. To perform data analysis and visualization to explore the key factors that govern the hotel bookings.**

#### **Define Your Business Objective?**

The main objectives of this project is to explore the key factors driving the hotel bookings such as :

To know which hotel is making more revenue.
To identify most common customer type.
To find the most preferred length of stay in each hotel.
To understand the peak season.
To know the most preferred meal type by customers.
To find the pecentage of bookings in each hotel.
To identify which hotel has highest bookings cancellation percentage.
To find which country customers have highest bookings.
To know average ADR for each hotel.
To identify which room type is in most demand and which room type generate the highest adr.
To know the most preferred distribution channel for bookings.
To know which distribution channel has the highest cancellation percentage.
To identify which types of customers make the most bookings.

# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required.
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits.
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 20 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule.

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Import Libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
from sklearn.preprocessing import StandardScaler
pd.set_option('display.max_columns', 500)
# To ignore the warnings
import warnings
warnings.filterwarnings('ignore')

### Dataset Loading

In [None]:
# Load Dataset
from google.colab import drive
drive.mount('/content/drive')

### Dataset First View

In [None]:
# Dataset First Look - Reading and viewing the csv file
df = pd.read_csv('/content/drive/MyDrive/Hotel Booking Analysis EDA/Hotel Bookings.csv')
df.head()

### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count
row,column = df.shape
print("Total number of rows in dataframe :",row)
print("Total number of columns in dataframe :",column)


### Dataset Information

In [None]:
# Dataset Info - To get a concise summary of the dataframe
df.info()

#### Duplicate Values

In [None]:
# Dataset Duplicate Value Count
df[df.duplicated()].shape

In [None]:
# Dropping Duplicate values
df.drop_duplicates(inplace = True)

In [None]:
# Dataset shape after removing the duplicates
df.shape

#### Missing Values/Null Values

In [None]:
# Checking the missing values in the columns
df.isnull().sum().sort_values(ascending = False)

We can see that we have 4 columns (company, agent, country, children) with missing values. Let's check these values as percentages.

In [None]:
# Column-wise null percentage
round(100*(df.isnull().sum().sort_values(ascending = False)/len(df.index)),2)

The columns “agent” and “company” have a high percentage of missing values. As these columns won’t be relevant for our analysis, we can delete them.

In [None]:
# Deleting 'agent' and 'company' columns
df.drop(['agent','company'],axis=1,inplace = True)

The columns “children” and “country” have a low percentage of missing values. We will remove the full row on missing cells.

In [None]:
# Deleting rows with empty cells
df.dropna(axis = 0, inplace = True)

Let's check again the missing values.

In [None]:
# Checking the missing values in the columns
df.isnull().sum().sort_values(ascending = False)

In [None]:
# Visualizing the missing values

### What did you know about your dataset?

Answer Here

## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns
df.columns

In [None]:
# Dataset Describe - To get some basic statistical details of the numerical columns
df.describe()

### Variables Description

**The columns and the data it represents are listed below :**

1.   **hotel** : Type of hotel (City or Resort)
2.   is_canceled : If the booking is cancelled(1) else (0)
3.   lead_time : Number of days before the actual arrival of the guests
4.   arrival_date_year : Year of arrival date
5.   arrival_date_month : Month of arrival date
6.   arrival_date_week_number : Week number of arrival date
7.   arrival_date_day_of_month : Day of arrival date
8.   stays_in_weekend_nights : Number of weekend nights spent at the hotel by the guests
9.   stays_in_week_nights : Number of week nights spent at the hotel by the guests
10.  adults : Number of adults among the guests
11.  children : Number of children among the guests
12.  babies : Number of babies among the guests
13.  meal : Type of meal booked
14.  country : Country of guests
15.  market_segment : Designation of the market segment
16.  distribution_channel : Name of booking distribution channel
17.  is_repeated_guest : If the booking is repeated by a guest then (1) else (0)
18.  previous_cancellations : Number of previous bookings that were cancelled by the customer prior to the current booking
19.  previous_bookings_not_canceled : Number of previous bookings that were not cancelled by the customer prior to the current booking
20.  reserved_room_type : Code of room type reserved
21.  assigned_room_type : Code of room type assigned




### Check Unique Values for each variable.

In [None]:
# Check Unique Values for each variable.
for col in df.columns:
  if df[col].nunique()<500:
    print(f"The unique values in {col} column are :")
    print(df[col].unique())
    print('\n')

## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:
# Write your code to make your dataset analysis ready.

### What all manipulations have you done and insights you found?

Answer Here.

## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

#### Chart - 1

In [None]:
# Chart - 1 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 2

In [None]:
# Chart - 2 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 3

In [None]:
# Chart - 3 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 4

In [None]:
# Chart - 4 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 5

In [None]:
# Chart - 5 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 6

In [None]:
# Chart - 6 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 7

In [None]:
# Chart - 7 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 8

In [None]:
# Chart - 8 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 9

In [None]:
# Chart - 9 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 10

In [None]:
# Chart - 10 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 11

In [None]:
# Chart - 11 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 12

In [None]:
# Chart - 12 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 13

In [None]:
# Chart - 13 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 14 - Correlation Heatmap

In [None]:
# Correlation Heatmap visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

#### Chart - 15 - Pair Plot

In [None]:
# Pair Plot visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

## **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ?
Explain Briefly.

Answer Here.

# **Conclusion**

Write the conclusion here.

### ***Hurrah! You have successfully completed your EDA Capstone Project !!!***