<a href="https://colab.research.google.com/github/SUDHANSHU0/Almabatter/blob/main/SUDHANHU_RAI_EDAProject_HotelBookings.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Project Name**    - Hotel Booking



##### **Project Type**    - EDA
##### **Contribution**    - SUDHANSHU RAI (Individual)

# **Project Summary -**

**Hotel Booking Analysis – Project Summary**

**Introduction**

This project focuses on performing Exploratory Data Analysis (EDA) on a hotel booking dataset to understand customer booking behavior, uncover trends, and gain actionable insights. EDA plays a crucial role in any data science project, as it helps us understand the data’s structure, quality, and relationships before applying any advanced modeling or predictive techniques.

**About the Dataset**

The dataset consists of booking records from two types of hotels: City Hotels and Resort Hotels. Each record provides detailed information such as the type of hotel, booking and arrival dates, the number of adults, children, and babies, country of origin, room details, customer type, and whether or not the booking was canceled. These variables allow for a thorough analysis of booking trends and customer behavior.

**Step 1: Data Cleaning**

Before diving into the analysis, the dataset was cleaned to ensure accuracy and reliability. This involved identifying and handling missing values, such as filling in missing entries for variables like “children,” “agent,” “company,” and “country.” Duplicated records were also removed to avoid skewed results. Additionally, data types for some columns were converted to the appropriate format (e.g., converting agent and company IDs to integers), and new calculated columns like total stay duration and total number of guests were created to enrich the analysis.

**Step 2: Exploring the Data**

After cleaning the data, I performed descriptive and visual exploration:

**▪ Basic Statistical Analysis:**
I calculated the average values for variables such as lead time (the number of days between booking and arrival), length of stay (both weekend and weekday nights), and the number of adults, children, and babies. These statistics provided a general understanding of guest behavior and preferences.

**▪ Data Visualization Techniques:**
To gain deeper insights, I used a variety of visualizations:

Bar charts to compare the number of canceled and non-canceled bookings by hotel type.

Pie charts to show the distribution of customer types (e.g., Transient, Group).

Scatter plots to examine the relationship between total stay duration and the number of special requests.

 **Key Insights from the Analysis**

**Hotel Preference:**
City Hotels received more bookings than Resort Hotels, suggesting higher demand in urban locations, possibly due to business travel or better connectivity.

**Booking Cancellations:**
Cancellation rates were significantly high, especially for City Hotels. This indicates the need to investigate possible reasons, such as booking policies or overbooking strategies.

**Country-wise Guest Origins:**
Most bookings came from Portugal, the United Kingdom, France, and Spain. This insight can help hotels focus their marketing strategies on these high-conversion regions.

**Seasonal Trends:**
The majority of bookings were made during the summer months (June to August), highlighting a peak travel season. Hotels can prepare for higher occupancy and adjust staffing and pricing strategies accordingly.

**Customer Types:**
Most guests were Transient customers, meaning individual travelers rather than corporate or group bookings. This information is useful for tailoring services and promotions.

 **Why This Analysis Matters**

The insights obtained from this project can significantly benefit hotel management and marketing teams. By understanding customer preferences, seasonal trends, and cancellation behaviors, hotels can:

Optimize pricing and availability based on demand.

Improve marketing campaigns targeted at key customer groups or regions.

Enhance guest experience by addressing frequent special requests.

Develop smarter cancellation and booking policies to reduce revenue loss.

 **Conclusion**

This project provided a comprehensive walkthrough of data cleaning, exploration, and visualization techniques using Python. I learned how to handle real-world data, identify patterns, and extract insights that can drive business decisions. Exploratory Data Analysis proved to be an essential foundation for understanding the dataset and preparing it for more advanced analytics. It also highlighted how data-driven decisions can lead to better operations, marketing, and customer satisfaction in the hospitality industry.

# **Links -**

#### **GitHub Link:** - [EDA-PROJECT OF HOTEL BOOKING](https://github.com/SUDHANSHU0/Almabatter/blob/main/SUDHANHU_RAI_EDAProject_HotelBookings.ipynb)

#### **NoteBook Link(colab):** - [EDA PROJECT NOTEBOOK OF HOTEL BOOKING](https://colab.research.google.com/drive/1cErm4yCg3D5C8OXMDAl6MpMTIAloHnRn?usp=sharing)

# **Problem Statement**


###  Problem Statement.
We want to study hotel booking data to understand how people book rooms.  
This will help hotels make better choices, plan well, and give good service to customers.  


## **Define Your Business Objective?**

### 1 Make More Money
- Learn when and how people book rooms
- Use this to set smart prices and give offers

### 2 Work Better and Faster
- Help hotels plan for busy times
- Use staff and rooms the right way
- Stop problems like cancellations or empty rooms

### 3 Keep Customers Happy
- Know what different guests like
- Give good service so they return

### 4 Avoid Problems
- Watch out for things like too many cancellations
- Use data to make smart hotel decisions


# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required.
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits.
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 20 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule.

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Import Libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

### Dataset Loading

In [None]:
# df = pd.read_csv("/content/Hotel Bookings.csv")
url = 'https://drive.google.com/uc?id=1C9AxF9fcVzMw0Bgs0NaRrNML2WwX1Ehm'
df = pd.read_csv(url)

### Dataset First View

In [None]:
# Dataset First Look
df

### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count
df.shape

### Dataset Information

In [None]:
# Dataset Info
df.info()

#### Duplicate Values

In [None]:
# Check how many duplicate rows are in the dataset.

duplicate_count = df.duplicated().sum()
print("Duplicate rows:", duplicate_count)

In [None]:
# Keep only unique records and get total number of unique rows.

df.drop_duplicates(inplace=True)
print("Unique rows after removing duplicates:", df.shape[0])

#### Missing Values/Null Values

In [None]:
# Find missing (null) values in the top 6 columns.

missing_values = df.isnull().sum().sort_values(ascending=False).head(6)
print(missing_values)

In [None]:
# Handle Missing Data

df['company'].fillna(0, inplace=True)
df['agent'].fillna(0, inplace=True)
df['country'].fillna('Others', inplace=True)
df['children'].fillna(0, inplace=True)

In [None]:
# Fill missing values with suitable defaults.
df.isna().sum().sort_values(ascending=False)

### What did you know about your dataset?

**Summary of Dataset**

  * Rows: 119,390 | Columns: 32

* Unique rows: 87,396 | Duplicates: 31,994

* Columns with most nulls: company (82,137), agent (12,193), country (452), children (4)

**Data includes:**

* Hotel type (City, Resort)

* Booking status, date info, guest count

* Agent, company, country, stay duration

* Customer type, ADR, room type, special requests

## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns
df.columns

In [None]:
# Dataset Describe
df.describe().round(2)

### Variables Description

* Variable Description (with Summary Stats)

* is_canceled: ~27.5% canceled bookings

* lead_time: Avg = 80 days

* arrival_date_year: 2015–2017

* stays: Avg = 1 weekend night + 2.6 week nights

* people: Adults, children, babies info

* adr: Avg ~106.3 (rate per night)

* special requests: Avg = 0.57 per booking

* agent, company: Travel agency/company IDs

### Check Unique Values for each variable.

In [None]:
# Unique Values in Each Column

for col in df.columns:
  print(f"{col} - Unique Values: {df[col].nunique()}")

## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:
df.info()

In [None]:
# Data Type Conversion + New Columns
# Convert data types

df[['children', 'company', 'agent']] = df[['children', 'company', 'agent']].astype(int)
df['reservation_status_date'] = pd.to_datetime(df['reservation_status_date'])

In [None]:
# Create new columns
df['total_stay'] = df['stays_in_weekend_nights'] + df['stays_in_week_nights']
df['total_people'] = df['adults'] + df['children'] + df['babies']

In [None]:
print(df[['total_stay', 'total_people']].head())

### What all manipulations have you done and insights you found?

1.Data Type:
 * I changed the data type of certain columns to the correct format. Specifically, I converted the data types of the columns `children`, `company`, and `agent` to integer format. This ensures that calculations and analyses involving these fields can be performed accurately and efficiently without any type-related issues.

2.Creating New Columns from Existing Ones to Gain More Insight:
* I created a new column named `total_stay` by adding the values from the `stays_in_weekend_nights` and `stays_in_week_nights` columns. This new column gives a clearer understanding of the total number of nights a guest stayed, regardless of whether the stay was during the weekend or on weekdays. By analyzing this column, I can better understand the total duration of stay for each booking across both types of hotels.

Additionally, I created another column named `total_people` by summing the values of the `adults`, `children`, and `babies` columns. This column helps to identify the total number of people included in a single booking. It provides useful insight into the overall occupancy of each room or hotel stay, helping to assess capacity and service needs more accurately.



## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

In [None]:
df.info()

#### Chart - 1

In [None]:
# Chart - Booking Cancellation Comparison by Hotel Type
plt.figure(figsize=(8,6))
sns.countplot(data=df, x='is_canceled', hue='hotel')
plt.xlabel('Booking Cancellation Status')
plt.ylabel('Count')
plt.title('Count of Canceled vs Not Canceled Bookings')
plt.show()

##### 1. Why did you pick the specific chart?

**📊Chart Type: Bar Chart**

📌Used to compare the number of canceled vs. not canceled bookings for different hotel types.

##### 2. What is/are the insight(s) found from the chart?

➡️ **City hotels have more cancellations than resort hotels.**

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

✅Yes. Helps the hotel focus on reducing cancellations by adjusting policies or communication.  
⚠️ High cancellations in city hotels could lead to revenue loss.

#### Chart - 2

In [None]:
# Chart - 2 Types of Hotels Booked by Percentage
hotel_data = df['hotel'].value_counts()

# Draw pie chart
plt.pie(hotel_data, labels=hotel_data.index, autopct='%1.1f%%')
plt.title('Types of Hotels Booked')
plt.axis('equal')  # Makes the pie look like a circle
plt.show()


##### 1. Why did you pick the specific chart?

**📊Chart Type: Bar Chart**

📌To see what type of hotel people book more – City Hotel or Resort Hotel.

##### 2. What is/are the insight(s) found from the chart?

➡️ *Most of the bookings are for City Hotels.*.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

**✅Yes. Hotels can use this info to invest more in the hotel type with more demand.**

⚠️  If resort hotels are not getting many bookings, they can improve offers or advertise more.**

#### Chart - 3

In [None]:
# Chart 3: Top 10 Countries Contributing to Hotel Bookings
top_countries = df['country'].value_counts().head(10)
top_countries.plot(kind='bar', color='teal')
plt.title('Top 10 Countries by Number of Bookings')
plt.xlabel('Country')
plt.ylabel('Number of Bookings')
plt.show()

##### 1. Why did you pick the specific chart?

**📊Chart Type: Bar Chart**

📌Displays the top 10 countries with the most hotel bookings.

##### 2. What is/are the insight(s) found from the chart?

➡️ **Portugal and other countries like the UK and France have more bookings.**

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

✅Yes. Hotels can target marketing in these top countries.

 ⚠️ Not promoting in these countries may reduce customer reach.

#### Chart - 4

In [None]:
# Chart - 4 Monthly Booking Trends for Resort & City Hotels

plt.figure(figsize=(15, 5))
sns.countplot(data=df, x='arrival_date_month', hue="hotel")
plt.xticks(rotation=45)
plt.xlabel('Arrival Month')
plt.ylabel('Count')
plt.title("Arrival Date Month Distribution")
plt.show()

##### 1. Why did you pick the specific chart?

**📊 Chart Type: Line Chart or Grouped Bar Chart**

📌 Illustrates the number of bookings each month, comparing between resort and city hotels.

##### 2. What is/are the insight(s) found from the chart?

➡️ July and August have the most bookings — summer holidays.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

✅Yes. Hotels can prepare more staff and rooms during busy months.

*⚠️* planning for high seasons can affect customer experience.

#### Chart - 5

In [None]:
#Chart - 5 Percentage of Different Customer Types
customer_type = df['customer_type'].value_counts()
plt.figure(figsize=(8, 6))
plt.pie(customer_type.values, labels=customer_type.index, autopct='%1.1f%%')
plt.title('Customer Type Distribution')
plt.axis('equal')
plt.show()

##### 1. Why did you pick the specific chart?

**📊 Chart Type: Pie Chart**

📌 Shows the proportion of bookings from different customer types (e.g., transient, group, etc.).

##### 2. What is/are the insight(s) found from the chart?

➡️ Most bookings are by transient customers (not part of groups).

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

✅Yes. Helps hotels focus on what type of customer needs more attention.

⚠️Ignoring smaller segments (like group customers) may reduce long-term bookings.

#### Chart - 6

In [None]:
# Chart - 6 Booking Sources by Market Segment (Online, Corporate, etc.)

market_segment = df['market_segment'].value_counts()
plt.figure(figsize=(10, 8))
plt.pie(market_segment.values, labels=market_segment.index, autopct='%1.1f%%', startangle=190)
plt.title("Market Segment Distribution")
plt.axis('equal')
plt.legend(market_segment.index, loc="center left", bbox_to_anchor=(1, 0, 1, 1))
plt.show()

##### 1. Why did you pick the specific chart?

**📊 Chart Type: Bar Chart**

📌 Represents the distribution of bookings across various market segments.

##### 2. What is/are the insight(s) found from the chart?

➡️Most bookings are made via Online Travel Agents (OTA), followed by direct bookings and corporate channels

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

✅Yes. Hotels can focus partnerships and marketing efforts on channels that drive the most traffic.

⚠️ Over-reliance on OTAs can be risky if their policies change or fees increase.

#### Chart - 7

In [None]:
# Chart - 7 Relationship Between Guest Requests and Length of Stay

plt.figure(figsize=(8,6))
sns.scatterplot(data=df, y='total_stay', x='total_of_special_requests', hue='hotel', size='is_canceled')
plt.title('Special Requests vs Total Stay')
plt.xlabel("Total Number of Special Requests")
plt.ylabel("Total Stay")
plt.grid(True)
plt.show()

##### 1. Why did you pick the specific chart?

**📊 Chart Type: Scatter Plot**

📌 Explores the correlation between the number of special requests and total stay duration, differentiated by hotel and cancellation status.

##### 2. What is/are the insight(s) found from the chart?

➡️ Guests staying longer often request more special services. Also, some of these bookings end up canceled.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

✅Yes. Helps hotel staff to better plan amenities and fulfill customer needs based on expected length of stay.

⚠️ Not fulfilling special requests for long-stay guests might lead to dissatisfaction or cancellations.

## **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ?
Explain Briefly.

What should the hotel do to reach its business goals?

1 Set Better Prices – Use booking and season data to change room prices smartly. Charge more in busy times, less in slow times.

2 Make Guests Happier – Look at what guests ask for and how long they stay. Offer better service and personalized help.

3 Add More Food Choices – If many guests want different meals, give them more food options and special offers.

4 Smarter Ads – Use customer data to show the right ads to the right people (like families, solo travelers, etc.).

5 Make Booking Easier – Let people book easily and cancel if needed. Use a simple system that works fast.


# **Conclusion**

Using the data, the hotel can make smart choices to grow its business.
They can earn more, keep guests happy, and beat their competition by improving prices, service, food, ads, and booking systems.

### ***Hurrah! You have successfully completed your EDA Capstone Project !!!***