# HOTEL BOOKING DEMAND ANALYSIS

## DOMAIN:
HOSPITALITY & TOURISM ANALYTICS

## OBJECTIVE OF THE PROJECT



The main goal of this project is to analyze hotel booking data to understand:

What factors affect booking cancellations

Guest booking patterns

Seasonal trends

Customer behavior

How hotels can reduce cancellations and improve revenue


This analysis helps hotels make better decisions in marketing, pricing, staffing, and inventory management.


## ABOUT THE DATASET

#### ✅ Source URL

#### https://www.kaggle.com/datasets/jessemostipak/hotel-booking-demand

#### ✅Rows:
~85,500 bookings

#### ✅Columns:
32 features

#### ✅Data From Years:
2015 – 2017

#### ✅Hotels included:

City Hotel

Resort Hotel




#### The dataset contains detailed information such as:

Booking dates

Stay duration

Number of guests

Meal type

Market segment (online, travel agents, direct, etc.)

Deposit type

ADR (Average Daily Rate)

Whether the customer cancelled or not

## DATA LOADING AND INITIAL OVERVIEW

### Import required libraries

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

### Load the dataset

In [None]:
df = pd.read_csv("HOTEL BOOKINGS DATASET FOR PROJECT.csv")

### View the shape of the dataset

This shows the total number of rows and columns in the dataset

In [None]:
df.shape

(85597, 32)

### Display first 5 rows

This helps you quickly understand the structure of the dataset and how values are stored

In [None]:
df.head()

Unnamed: 0,hotel,is_canceled,lead_time,arrival_date_year,arrival_date_month,arrival_date_week_number,arrival_date_day_of_month,stays_in_weekend_nights,stays_in_week_nights,adults,...,deposit_type,agent,company,days_in_waiting_list,customer_type,adr,required_car_parking_spaces,total_of_special_requests,reservation_status,reservation_status_date
0,Resort Hotel,0,342,2015,July,27,1,0,0,2,...,No Deposit,,,0,Transient,0.0,0,0,Check-Out,7/1/2015
1,Resort Hotel,0,737,2015,July,27,1,0,0,2,...,No Deposit,,,0,Transient,0.0,0,0,Check-Out,7/1/2015
2,Resort Hotel,0,7,2015,July,27,1,0,1,1,...,No Deposit,,,0,Transient,75.0,0,0,Check-Out,7/2/2015
3,Resort Hotel,0,13,2015,July,27,1,0,1,1,...,No Deposit,304.0,,0,Transient,75.0,0,0,Check-Out,7/2/2015
4,Resort Hotel,0,14,2015,July,27,1,0,2,2,...,No Deposit,240.0,,0,Transient,98.0,0,1,Check-Out,7/3/2015


### Check data types of each column

This shows which columns are integers, floats, or objects (strings), Missing values count, Memory usage

In [None]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 85597 entries, 0 to 85596
Data columns (total 32 columns):
 #   Column                          Non-Null Count  Dtype  
---  ------                          --------------  -----  
 0   hotel                           85597 non-null  object 
 1   is_canceled                     85597 non-null  int64  
 2   lead_time                       85597 non-null  int64  
 3   arrival_date_year               85597 non-null  int64  
 4   arrival_date_month              85597 non-null  object 
 5   arrival_date_week_number        85597 non-null  int64  
 6   arrival_date_day_of_month       85597 non-null  int64  
 7   stays_in_weekend_nights         85597 non-null  int64  
 8   stays_in_week_nights            85597 non-null  int64  
 9   adults                          85597 non-null  int64  
 10  children                        85593 non-null  float64
 11  babies                          85597 non-null  int64  
 12  meal                            

### Summary statistics

This gives basic statistical summaries like: Mean, Median, Standard deviation, Min/max and Unique values for categorical data.It helps identify outliers and unusual patterns early

In [None]:
df.describe(include='all')

Unnamed: 0,hotel,is_canceled,lead_time,arrival_date_year,arrival_date_month,arrival_date_week_number,arrival_date_day_of_month,stays_in_weekend_nights,stays_in_week_nights,adults,...,deposit_type,agent,company,days_in_waiting_list,customer_type,adr,required_car_parking_spaces,total_of_special_requests,reservation_status,reservation_status_date
count,85597,85597.0,85597.0,85597.0,85597,85597.0,85597.0,85597.0,85597.0,85597.0,...,85597,72633.0,5133.0,85597.0,85597,85597.0,85597.0,85597.0,85597,85597
unique,2,,,,12,,,,,,...,3,,,,4,,,,3,926
top,City Hotel,,,,August,,,,,,...,No Deposit,,,,Transient,,,,Canceled,10/21/2015
freq,45537,,,,9803,,,,,,...,70859,,,,63351,,,,42975,1461
mean,,0.515953,109.859586,2016.034931,,27.538488,15.706345,0.968363,2.636085,1.850953,...,,109.992662,186.664913,2.950466,,97.304385,0.070458,0.467353,,
std,,0.499748,111.329765,0.740025,,13.986958,8.787501,1.044739,2.049997,0.59519,...,,117.973836,132.959831,19.96494,,53.326963,0.259725,0.743321,,
min,,0.0,0.0,2015.0,,1.0,1.0,0.0,0.0,0.0,...,,1.0,6.0,0.0,,-6.38,0.0,0.0,,
25%,,0.0,19.0,2015.0,,16.0,8.0,0.0,1.0,2.0,...,,9.0,62.0,0.0,,62.8,0.0,0.0,,
50%,,1.0,73.0,2016.0,,28.0,16.0,1.0,2.0,2.0,...,,34.0,178.0,0.0,,88.0,0.0,0.0,,
75%,,1.0,170.0,2017.0,,39.0,23.0,2.0,3.0,2.0,...,,240.0,274.0,0.0,,120.0,0.0,1.0,,


## DATA PRE-PROCESSING (CLEANING)

### ✅Handling Missing Values

#### Check missing values:

In [None]:
df.isnull().sum().sort_values(ascending=False)

company                           80464
agent                             12964
country                             487
children                              4
arrival_date_month                    0
arrival_date_week_number              0
hotel                                 0
is_canceled                           0
stays_in_weekend_nights               0
arrival_date_day_of_month             0
adults                                0
stays_in_week_nights                  0
babies                                0
meal                                  0
lead_time                             0
arrival_date_year                     0
distribution_channel                  0
market_segment                        0
previous_bookings_not_canceled        0
is_repeated_guest                     0
reserved_room_type                    0
assigned_room_type                    0
booking_changes                       0
previous_cancellations                0
deposit_type                          0


#### Fill missing children values with 0

In [None]:
df["children"] = df["children"].fillna(0)

#### Fill agent and company with 0 (since they are ID numbers)

In [None]:
df["agent"] = df["agent"].fillna(0)
df["company"] = df["company"].fillna(0)

#### Fill country with 'Unknown'

In [None]:
df["country"] = df["country"].fillna("Unknown")

### ✅Remove Duplicates

Duplicates can affect statistics, correlations, and visualizations.Removing them ensures accuracy.

In [None]:
df.drop_duplicates(inplace=True)

### ✅Convert data types

Dates must be converted to datetime type for: time-series analysis, extracting month/year, visualizations.

##### Convert date columns:

In [None]:
df["reservation_status_date"] = pd.to_datetime(df["reservation_status_date"])

##### Create a full arrival date:

In [None]:
df["arrival_date"] = pd.to_datetime(
    df["arrival_date_year"].astype(str) + "-" +
    df["arrival_date_month"] + "-" +
    df["arrival_date_day_of_month"].astype(str)
)

### ✅ Create New Derived Columns

These new columns help analyze hotel trends like: seasonal demand, room occupancy, pricing changes, cancellation patterns.

#### Total nights stayed

In [None]:
df["total_nights"] = df["stays_in_weekend_nights"] + df["stays_in_week_nights"]

#### Total number of guests

In [None]:
df["total_guests"] = df["adults"] + df["children"] + df["babies"]

#### Arrival month & year

In [None]:
df["arrival_month"] = df["arrival_date"].dt.month
df["arrival_year"] = df["arrival_date"].dt.year
df["arrival_month_name"] = df["arrival_date"].dt.month_name()

### ✅ Final Check After Cleaning

Ensures no remaining unwanted missing values.

In [None]:
df.isnull().sum()

hotel                             0
is_canceled                       0
lead_time                         0
arrival_date_year                 0
arrival_date_month                0
arrival_date_week_number          0
arrival_date_day_of_month         0
stays_in_weekend_nights           0
stays_in_week_nights              0
adults                            0
children                          0
babies                            0
meal                              0
country                           0
market_segment                    0
distribution_channel              0
is_repeated_guest                 0
previous_cancellations            0
previous_bookings_not_canceled    0
reserved_room_type                0
assigned_room_type                0
booking_changes                   0
deposit_type                      0
agent                             0
company                           0
days_in_waiting_list              0
customer_type                     0
adr                         