<a href="https://colab.research.google.com/github/debobratopaul/CAPSTONE-PROJECT-HOTEL-BOOKING-ANALYSIS/blob/main/EDA_HOTEL_BOOKING.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Project Name**    -Hotel Booking Analysis



##### **Project Type**    - EDA
##### **Contribution**    - Individual

# **Project Summary -**

The hotel booking analysis project aimed to extract meaningful insights from a comprehensive dataset to optimize revenue generation and improve customer satisfaction in the hotel industry. By employing exploratory data analysis, visualizations, and statistical techniques, the project explored various aspects of hotel bookings, including seasonal patterns, cancellation behavior, market segments, pricing strategies, distribution channels, and customer preferences. The findings provided valuable insights for strategic decision-making to drive positive business outcomes.

One key observation from the analysis was the existence of clear seasonal patterns in hotel bookings, with peak periods occurring during the summer months and holiday seasons. This insight enables hotels to anticipate and prepare for high demand periods, ensuring optimal staffing, inventory management, and pricing strategies. By efficiently allocating resources, hotels can enhance operational efficiency and maximize revenue potential.


Market segmentation analysis played a crucial role in understanding the preferences and behaviors of different customer groups. Online Travel Agents (OTAs) and the Groups segment were identified as significant contributors to hotel bookings. By tailoring pricing strategies and marketing campaigns to these segments, hotels can effectively target their offerings, maximize customer acquisition, and increase revenue.

Pricing analysis across market segments and hotel types unveiled variations in the average daily rate (ADR). This information allows hotels to optimize their pricing strategies, identify competitive advantages, and capture the right market share. By understanding the market dynamics, hotels can align their pricing with customer expectations and market demand, resulting in improved revenue generation.

Distribution channel analysis provided insights into the preferred booking channels of guests. Online channels, especially OTAs and direct hotel websites, emerged as dominant distribution channels. This knowledge empowers hotels to optimize their channel management strategies, negotiate favorable partnerships with OTAs, and invest in their direct booking platforms to improve profitability.

Customer satisfaction and loyalty were explored through the analysis of booking changes and special requests. Understanding common booking modifications and customer preferences enables hotels to enhance their service offerings, improve the guest experience, and foster loyalty. By tailoring services to meet customer expectations, hotels can achieve higher guest satisfaction levels and drive repeat business.

The project also shed light on the geographical origin of guests, providing valuable market insights. Identifying key source markets enables hotels to customize marketing campaigns, target specific regions with high booking volumes, and optimize their promotional efforts. By effectively allocating marketing resources and focusing on high-potential markets, hotels can expand their customer base and increase market share.

The findings highlighted the importance of understanding seasonal patterns, managing cancellations, segmenting the market, implementing dynamic pricing strategies, optimizing distribution channels, and personalizing guest experiences. By leveraging these insights, hotels can make informed decisions, drive positive business impacts, and establish a competitive edge in the dynamic hotel industry. Ultimately, the project demonstrated the significance of data analysis and strategic decision-making in achieving long-term success in the hotel sector.

# **GitHub Link -**

https://github.com/debobratopaul

# **Problem Statement**


**BUSINESS PROBLEM OVERVIEW**

Have you ever wondered when the best time of year to book a hotel room is? Or the optimal length of stay in order to get the best daily rate? This hotel booking dataset can help you explore those questions!
This data set contains booking information for a city hotel and a resort hotel, and includes information such as when the booking was made, length of stay, the number of adults, children, and/or babies, and the number of available parking spaces, among other things. All personally identifying information has been removed from the data.


#### **Define Your Business Objective?**

Identifying important factors that influence bookings in order to optimize revenue and customer satisfaction.To forecast hotel booking demand for different periods of the year, allowing the hotel to adjust pricing, inventory, and staffing accordingly.

# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required.
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits.
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 20 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule.

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Import Libraries
import numpy as np
import pandas as pd
import matplotlib
import matplotlib.pyplot as plt
import seaborn as sns
import folium
import plotly.express as px
%matplotlib inline

### Dataset Loading

In [None]:
# Load Dataset
from google.colab import drive
drive.mount('/content/drive')
hotel_df=pd.read_csv("/content/drive/MyDrive/Hotel Bookings.csv")

### Dataset First View

In [None]:
# Dataset First Look
hotel_df.head()

### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count
hotel_df.shape

### Dataset Information

In [None]:
# Dataset Info
hotel_df.info()

#### Duplicate Values

In [None]:
# Dataset Duplicate Value Count
hotel_df[hotel_df.duplicated()].shape

In [None]:
# Dropping duplicate values
hotel_df.drop_duplicates(inplace = True)
hotel_df.shape

#### Missing Values/Null Values

In [None]:
# Missing Values/Null Values Count
hotel_df.isna().sum().sort_values(ascending=False)[:6].reset_index()

In [None]:
# Visualizing the missing values

# Create a heatmap to visualize missing values
plt.figure(figsize=(12, 8))
sns.heatmap(hotel_df.isnull(), cmap='viridis', cbar=False)
plt.title('Missing Values Heatmap', fontsize=16)
plt.show()

### What did you know about your dataset?

This data set contains booking information for a city hotel and a resort hotel,We see that there are 119390 rows and 32 columns in the dataset.There are 31994 duplicates values are there in the data set and the columns like children company,country and agents have the null value.company column has almost 90 percent of null values and so it will be better to drop the column for further analysis.The main goal is to Analyze historical booking data to determine the optimal pricing strategy for different room types, seasons, and lengths of stay. Identify pricing patterns and opportunities to maximize revenue and occupancy rates.

## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns
hotel_df.columns

In [None]:
# Dataset Describe
hotel_df.describe()

### Variables Description

**Hotel**:Types of hotel(Resort Hotel,City Hotel)

**is_cancelled**:(True=1, False=0)

**lead_time**:Number of days that elapsed between the entering date of the booking into the PMS and the arrival date

**arrival_date_year**:Year of arrival

**arrival_date_month**:Month of arrival

**arrival_date_week_number**:week number of the arrival date

**arrival_date_day**:Day of arrival date

**stays_in_weekend_nights**:Number of weekend nights(saturday or sunday) the guest stayed or booked to stay at the hotel

**stays_in_week_nights**::Number of week nights(monday to friday) the guest
                        stayed or booked to stay at the hotel

**adults**:Number of adults

**children**:Number of children

**babies**:Number of babies

**meal**:Kind of meal opted for

**country**:Country code

**market_segment**:Which segment the customer belongs to

**Distribution_channel**:How the customer accessed the stay-corporate booking/  direct/TA.TO

**is_repeated_guest**:Guest coming for first time or not

**previous_cancellation**:Was there a cancellation before

**previous_bookings**:Count of previous bookings

**reserved_room_type**:Type of room reserved

**assigned_room_type**:Type of room assigned

**booking_changes**:Count of changes made to booking

**deposit_type**:Deposit type

**agent**:Booked through agent

**days_in_waiting_list**:Number of days in waiting list

**customer_type**:Type of customer

**required_car_parking**:If car parking is required

**total_of_special_req**:Number of additional special requirements

**reservation_status**:Reservation status

**reservation_status_date**:Date of specific status

### Check Unique Values for each variable.

In [None]:
# Check Unique Values for each variable.
hotel_df['hotel'].unique()

In [None]:
hotel_df['is_canceled'].unique()

In [None]:
hotel_df['arrival_date_year'].unique()

In [None]:
hotel_df['meal'].unique()

In [None]:
hotel_df['market_segment'].unique()

In [None]:
hotel_df['distribution_channel'].unique()

In [None]:
hotel_df['children'].unique()

In [None]:
hotel_df['distribution_channel'].unique()

In [None]:
hotel_df['is_repeated_guest'].unique()

In [None]:
hotel_df['reserved_room_type'].unique()

In [None]:
hotel_df['deposit_type'].unique()

In [None]:
hotel_df['reservation_status'].unique()

In [None]:
hotel_df['customer_type'].unique()

In [None]:
hotel_df['agent'].unique()

## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:
# Write your code to make your dataset analysis ready.
# FIRST OF ALL WE WILL CREATE A COPY OF THE ORIGINAL DATAFRAME TO PERFORM THE ANALYSIS
# Creating a copy of dataframe

df = hotel_df.copy()

**Cleaning data**

Cleaning data is crucial step before EDA as it will remove the ambigous data that can affect the outcome of EDA.

**Step1: Handling missing values.**

Since the company column has more than 80 percent null value.

In [None]:
# Dropping the company columns
df=df.drop("company",axis=1)

Since,  agent column have agent numbers as data. There may be some cases when customer didnt booked hotel via any agent. So in that case values can be null under these column. We will replace null values by 0 in these columns

In [None]:
df['agent'] = df['agent'].fillna(0)

This column 'children' has 0 as value which means 0 children were present in group of customers who made that transaction. So, 'nan' values are the missing values due to error of recording data.

We will replace the null values under this column with mean value of children.

In [None]:
df['children'].fillna(df['children'].mean(), inplace = True)

Next column with missing value is 'country'. This column represents the country of oriigin of customer. Since, this column has datatype of string. We will replace the missing value with 'others'

In [None]:
df['country'].fillna('others', inplace = True)

There are some rows with total number of adults, children or babies equal to zero. So we will remove such rows.

In [None]:
df[df['adults']+df['babies']+df['children'] == 0].shape

In [None]:
df.drop(df[df['adults']+df['babies']+df['children'] == 0].index, inplace = True)

**Step 2: Converting columns to appropriate datatypes.**

In [None]:
df[['children', 'agent']] = df[['children', 'agent']].astype('int64')

# Convert 'reservation_status_date' column to datetime format
df['reservation_status_date'] = pd.to_datetime(df['reservation_status_date'])

**Step 3: Adding important columns.**

In [None]:
# Adding total staying days in hotels
df['total_stay'] = df['stays_in_weekend_nights']+df['stays_in_week_nights']

# Adding total people num as column, i.e. total people num = num of adults + children + babies
df['total_people'] = df['adults']+df['children']+df['babies']

**Analysis**

In [None]:
# Count the number of bookings for each hotel type
hotel_type_bookings = df['hotel'].value_counts()

# Calculate the average lead time for bookings
avg_lead_time = df['lead_time'].mean()

# Calculate the average length of stay for bookings
avg_length_of_stay = df['total_stay'].mean()

# Calculate the average number of booking changes
average_booking_changes = df['booking_changes'].mean()

# Calculate the average daily rate (ADR)
average_adr = df['adr'].mean()

# Print the results
print("Each hotel type booking-",hotel_type_bookings)
print("Average lead time for bookings:", avg_lead_time)
print("Average length of stay for bookings:", avg_length_of_stay)
print("Average number of booking changes:", average_booking_changes)
print("Average Daily Rate (ADR):", average_adr)


To find the yearly bookings and cancellations

In [None]:
# Calculate the total number of bookings
total_bookings_per_year = df.groupby('arrival_date_year')['hotel'].count()
print("Total bookings per year-",total_bookings_per_year)

# Calculate the total number of cancellations
canceled_bookings_df = df[df['is_canceled'] == 1]
total_cancellations_per_year = canceled_bookings_df.groupby('arrival_date_year')['hotel'].count()
print("Total cancellations per year-",total_cancellations_per_year)



### What all manipulations have you done and insights you found?

# **Manipulations**

All the duplicate rows have been removed.

The null values of agent columns have been replaced by "0",and the company columns have been droped due to large nmber of null values.The null values of children columns have been replaced by mean of the column

There are some rows with total number of adults, children or babies equal to zero. So such rows have been removed.

The arrival_date_year column has been converted to datetime format to enable further analysis based on the year of arrival,and also the children an agent columns have been converted to int

New columns of total_stay and total_people has been added.


# **Insights**

Unique values of each columns have been identified.


Each hotel type booking-

City Hotel  -    53274

Resort Hotel  -  33956



Average lead time for bookings: 79.97101914478964

Average length of stay for bookings: 3.6285337613206465

Average number of booking changes: 0.268497076693798

Average Daily Rate (ADR): 106.51803072337499

Total bookings per year-

2015  -  13284

2016  -  42313

2017   -   31633

Total cancellations per year-

2015  -   2703

2016  -  11200

2017  -  10106

## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

#### Chart - 1

In [None]:
# Chart - 1 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 2

In [None]:
# Chart - 2 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 3

In [None]:
# Chart - 3 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 4

In [None]:
# Chart - 4 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 5

In [None]:
# Chart - 5 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 6

In [None]:
# Chart - 6 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 7

In [None]:
# Chart - 7 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 8

In [None]:
# Chart - 8 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 9

In [None]:
# Chart - 9 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 10

In [None]:
# Chart - 10 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 11

In [None]:
# Chart - 11 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 12

In [None]:
# Chart - 12 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 13

In [None]:
# Chart - 13 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 14 - Correlation Heatmap

In [None]:
# Correlation Heatmap visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

#### Chart - 15 - Pair Plot

In [None]:
# Pair Plot visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

## **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ?
Explain Briefly.

Answer Here.

# **Conclusion**

Write the conclusion here.

### ***Hurrah! You have successfully completed your EDA Capstone Project !!!***