<a href="https://colab.research.google.com/github/Jatingpt/Hotel-Booking-Analysis/blob/main/Hotel_Booking_Analysis_Capstone_Project.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## **Problem Statement:**

## **Have you ever wondered when the best time of year to book a hotel room is? Or the optimal length of stay in order to get the best daily rate? What if you wanted to predict whether or not a hotel was likely to receive a disproportionately high number of special requests? This hotel booking dataset can help you explore those questions!**
## **Explore and analyze the data to discover important factors that govern the bookings.**

## **About Dataset**

## **This data set contains booking information for a city hotel and a resort hotel, and includes information such as when the booking was made, length of stay, the number of adults, children, and/or babies, and the number of available parking spaces, among other things. All personally identifying information has been removed from the data.**

## **Approach used:**

The approach we have used in this project is defined in the given format-

1) **Loading our data :** In this section we just loaded our dataset in colab notebook and read the csv file.


2) **Data Cleaning and Processing :** In this section we have tried to remove the null values and for some of the columns we have replaced the null values with the appropriate values with reasonable assumptions.

3) **Analysis and Visualization :** In this section we have tried to explore all variables which can play an important role for the analysis. In the next parts we have tried to explore the effect of one over the other. In the next part we tried to answers our hypothetical questions.

## **Python Libraries we used:**

* **Numpy**

* **Pandas**

* **Seaborn**

* **Matplotlib**

*  **klib**




# **What is EDA?** 

 **EDA** stands for **“Exploratory Data Analysis “** EDA is applied to **investigate** the data and **summarize** the key **insights**.
It will give you the basic understanding of your data, it’s **distribution**, null values and much more.
You can either explore data using graphs or through some **python functions**.

The following steps are involved in the **process of EDA:**

* **Acquire and loading data**
* **Understanding the variables**
* **Cleaning dataset**
* **Exploring and Visualizing Data**
* **Analyzing relationships between variables**

## **Understanding the column names-**
- **hotel** - Name of hotel ( City or Resort)
- **is_canceled** - Whether the booking is canceled or not (0 for no canceled and 1 for canceled)
- **lead_time** - time (in days) between booking transaction and actual arrival.
- **arrival_date_year** - Year of arrival
- **arrival_date_month** - month of arrival
- **arrival_date_week_number** - week number of arrival date.
- **arrival_date_day_of_month** - Day of month of arrival date
- **stays_in_weekend_nights** - No. of weekend nights spent in a hotel
- **stays_in_week_nights** - No. of weeknights spent in a hotel
- **adults** - No. of adults in single booking record.
- **children** - No. of children in single booking record.
- **babies** - No. of babies in single booking record. 
- **meal** - Type of meal chosen 
- **country** - Country of origin of customers (as mentioned by them)
- **market_segment** - What segment via booking was made and for what purpose.
- **distribution_channel** - Via which medium booking was made.
- **is_repeated_guest** - Whether the customer has made any booking before(0 for No and 1 for Yes)
- **previous_cancellations** - No. of previous canceled bookings.
- **previous_bookings_not_canceled** - No. of previous non-canceled bookings.
- **reserved_room_type** - Room type reserved by a customer.
- **assigned_room_type** - Room type assigned to the customer.
- **booking_changes** - No. of booking changes done by customers
- **deposit_type** - Type of deposit at the time of making a booking (No deposit/ Refundable/ No refund)
- **agent** - Id of agent for booking
- **company** - Id of the company making a booking.
- **days_in_waiting_list** - No. of days on waiting list.
- **customer_type** - Type of customer(Transient, Group, etc.)
- **adr** - Average Daily rate.
- **required_car_parking_spaces** - No. of car parking asked in booking
- **total_of_special_requests** - total no. of special request.
- **reservation_status** - Whether a customer has checked out or canceled,or not showed 
- **reservation_status_date** - Date of making reservation status.

- **Total number of rows in data** - 119390
- **Total number of columns** - 32



# **Table of content**
- Loading Data

- Checking for NaN values

- Handling NaNs

- Analysis

**Performed EDA and tried answering the following questions:**

Question1) From which country maximum number of customers are booking hotels?

Question2) Give a table of all the countries and their repeated customers showing the highest and the lowest country with repeated customers.

Question3) In which hotel there is maximum chances of cancellation?

Question4)What is the data for the repetition of guests for both of the hotels?

Question5) Which hotel has longer waiting time?

Question6) Find the number of customers who booked Resort hotel and City and not cancelled booking further.

Question7) Find the first three months with maximum number of bookings and average rent across all the months for both Resort Hotel and City hotel.

Question8)  Find out the average rent and waiting time for different types of customers for City hotel and Resort Hotel.

Question9) Find the Agent who has done most number of bookings for Resort hotel and City Hotel.

Question10)  Find out the count of customers who booked tickets through various modes and through which mode highest booking was made.

Question 11) Find the most popular Rooms booked and their respective rents.

In [1]:
# installing Klib library
!pip install klib

import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import matplotlib
import seaborn as sns 
import pandas as pd
import klib

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting klib
  Downloading klib-1.0.1-py3-none-any.whl (20 kB)
Collecting Jinja2<4.0.0,>=3.0.3
  Downloading Jinja2-3.1.2-py3-none-any.whl (133 kB)
[K     |████████████████████████████████| 133 kB 2.1 MB/s 
Installing collected packages: Jinja2, klib
  Attempting uninstall: Jinja2
    Found existing installation: Jinja2 2.11.3
    Uninstalling Jinja2-2.11.3:
      Successfully uninstalled Jinja2-2.11.3
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
flask 1.1.4 requires Jinja2<3.0,>=2.10.1, but you have jinja2 3.1.2 which is incompatible.[0m
Successfully installed Jinja2-3.1.2 klib-1.0.1


## **Mounting the data from drive**

In [2]:
from google.colab import drive
drive.mount('/content/drive/')

Mounted at /content/drive/


## **Loading the data**

In [4]:
file_path = '/content/drive/MyDrive/Copy of Hotel Bookings.csv'
df = pd.read_csv(file_path)
df

Unnamed: 0,hotel,is_canceled,lead_time,arrival_date_year,arrival_date_month,arrival_date_week_number,arrival_date_day_of_month,stays_in_weekend_nights,stays_in_week_nights,adults,...,deposit_type,agent,company,days_in_waiting_list,customer_type,adr,required_car_parking_spaces,total_of_special_requests,reservation_status,reservation_status_date
0,Resort Hotel,0,342,2015,July,27,1,0,0,2,...,No Deposit,,,0,Transient,0.00,0,0,Check-Out,2015-07-01
1,Resort Hotel,0,737,2015,July,27,1,0,0,2,...,No Deposit,,,0,Transient,0.00,0,0,Check-Out,2015-07-01
2,Resort Hotel,0,7,2015,July,27,1,0,1,1,...,No Deposit,,,0,Transient,75.00,0,0,Check-Out,2015-07-02
3,Resort Hotel,0,13,2015,July,27,1,0,1,1,...,No Deposit,304.0,,0,Transient,75.00,0,0,Check-Out,2015-07-02
4,Resort Hotel,0,14,2015,July,27,1,0,2,2,...,No Deposit,240.0,,0,Transient,98.00,0,1,Check-Out,2015-07-03
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
119385,City Hotel,0,23,2017,August,35,30,2,5,2,...,No Deposit,394.0,,0,Transient,96.14,0,0,Check-Out,2017-09-06
119386,City Hotel,0,102,2017,August,35,31,2,5,3,...,No Deposit,9.0,,0,Transient,225.43,0,2,Check-Out,2017-09-07
119387,City Hotel,0,34,2017,August,35,31,2,5,2,...,No Deposit,9.0,,0,Transient,157.71,0,4,Check-Out,2017-09-07
119388,City Hotel,0,109,2017,August,35,31,2,5,2,...,No Deposit,89.0,,0,Transient,104.40,0,0,Check-Out,2017-09-07
