# **Project Name**    -  **Hotel Booking Analysis**



Dataset : https://drive.google.com/file/d/1C9AxF9fcVzMw0Bgs0NaRrNML2WwX1Ehm/view?usp=drive_link

##### **Project Type**    - EDA/Regression/Classification/Unsupervised


# **Project Summary -**

The Hotel Booking Exploratory Data Analysis (EDA) offers a deep understanding of booking trends, customer preferences, cancellations, and pricing strategies. This analysis is essential for hotel managers and marketers aiming to optimize operations, reduce cancellations, and enhance profitability.

1. **Dataset Overview and Initial Inspection**

The analysis begins by loading the hotel booking dataset, which typically contains variables such as booking date, customer type, room type, price, number of nights, and cancellation status. Initial inspection of the data focuses on identifying missing values, erroneous data, and basic descriptive statistics. This step helps ensure data quality and prepares the dataset for meaningful analysis.

2. **Booking Trends Over Time**

The booking trends analysis provides valuable insights into peak booking periods and off-peak seasons. By aggregating bookings on a monthly or seasonal basis, one can identify high-demand times, such as holidays or specific months, which may suggest opportunities for strategic pricing adjustments. For instance, hotels may see peak demand during summer or major holidays, prompting a price increase or promotional offers during slower months. This kind of insight can help hotels plan staff allocation and marketing campaigns accordingly.

3. **Customer Segmentation**

Customer segmentation is a crucial part of the EDA, enabling the analysis of various customer types, such as transient, group, and contract customers. Understanding which segments dominate can guide marketing efforts and help tailor services to specific customer needs. For instance, if the majority of bookings come from transient customers, marketing campaigns can be focused on individuals, while group or corporate bookings may require specialized service offerings.

4. **Room Type Preferences**

Room preference analysis reveals the most and least popular room types among guests. Visualizing room demand helps hotels adjust their room offerings, pricing strategies, and even room allocation. If certain room types, such as suites or deluxe rooms, are frequently booked, hotels might consider adjusting their availability or offering premium prices. Conversely, less popular room types could be targeted with promotions or discounts to increase occupancy.

5. **Cancellation Analysis**

Understanding cancellation behavior is critical for managing bookings and minimizing lost revenue. The cancellation analysis identifies the proportion of bookings that are canceled and explores factors such as lead time and special requests that may contribute to cancellations. Hotels can use this information to implement policies like non-refundable bookings or stricter cancellation rules for bookings with long lead times. Additionally, cancellations are often higher when guests book well in advance, indicating a need for strategies like offering incentives for early, non-refundable bookings.

6. **Price and Lead Time Analysis**

The relationship between price and lead time provides actionable insights for revenue management. Typically, bookings made well in advance are cheaper, while last-minute bookings command higher prices. Analyzing the price distribution by room type also helps hotels adjust pricing based on demand for certain rooms. For example, if the price of deluxe rooms is significantly higher but demand remains strong, the hotel could further raise the price to capitalize on the popularity.

7. **Duration of Stay**

Analyzing the duration of stay helps understand whether guests tend to book short weekend stays or longer vacations. This insight allows hotels to tailor packages and promotions accordingly. For example, if most guests book for 2-3 nights, hotels might offer extended stay discounts to encourage longer visits.

8. **Booking Sources and Special Requests**

Understanding the sources of bookings, whether through direct hotel websites, online travel agencies (OTAs), or agents, helps in refining marketing strategies. Hotels might focus on promoting direct bookings to avoid paying high commissions to OTAs. Additionally, analyzing special requests can give insights into guest preferences, enabling hotels to enhance the guest experience by fulfilling common requests like late checkouts or specific room views.

9. **Correlation Analysis**

Finally, a correlation matrix reveals relationships between key variables, such as the link between lead time, price, and cancellations. These insights can be leveraged to build predictive models for demand forecasting or cancellation prediction.

# **Problem Statement**


The hotel industry operates in a highly competitive and dynamic environment, where maximizing occupancy rates, minimizing cancellations, and optimizing pricing strategies are critical to profitability. Hotels need to understand customer behavior, booking trends, and operational inefficiencies to make data-driven decisions. However, a lack of clear insights into peak booking periods, customer preferences, and the factors contributing to cancellations poses challenges to effective resource allocation, marketing, and pricing strategies.

This study aims to analyze hotel booking data to uncover patterns and trends related to customer segmentation, room preferences, booking sources, and pricing. Specifically, the analysis will focus on:

Booking Trends: Identifying peak and off-peak booking periods to help hotels adjust their marketing campaigns and staffing strategies.
Customer Segmentation: Understanding different customer types (e.g., transient, group) and their booking behaviors to tailor services and promotions.
Room Type Preferences: Analyzing demand for different room types to optimize pricing and room availability.
Cancellation Analysis: Investigating the factors that contribute to booking cancellations and proposing measures to reduce them.
Pricing Strategies: Examining how room prices vary based on booking lead times, room types, and seasonality to optimize revenue.
Duration of Stay: Exploring guest stay patterns to offer targeted promotions and packages.
Booking Sources: Understanding where bookings originate to refine marketing strategies and reduce reliance on third-party platforms.
The objective is to provide actionable insights that can help hotel management enhance occupancy rates, minimize revenue losses from cancellations, and improve customer satisfaction through data-driven decision-making.

# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Import Libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns



### Dataset Loading

In [None]:
from google.colab import drive
drive.mount('/content/drive')

In [None]:
# Load Dataset
path = "/content/drive/MyDrive/Dataset/Hotel Bookings.csv"
df = pd.read_csv(path)
df.head(5)


### Dataset First View

In [None]:
# Dataset First Look
df.head()



### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count
df.shape

### Dataset Information

In [None]:
# Dataset Info
df.info()

#### Duplicate Values

In [None]:
# Dataset Duplicate Value Count
df.duplicated().sum()

#### Missing Values/Null Values

In [None]:
# Missing Values/Null Values Count
df.isnull().sum()

In [None]:
# Visualizing the missing values
sns.heatmap(df.isnull())

### What did you know about your dataset?

The dataset has 119390 rows, 32 columns. The dataset has three datatypes 4 columns has float64, 16 columns has int64, the 12 columns has object. The datatype has 31994 number of duplicate values. The columns country, agent, company has 488, 16340, 112593 null values respectively.

## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns
df.columns

In [None]:
# Dataset Describe
df.describe()

### Variables Description
arrival_date_day_of_month, agent, company days_in_waiting_list, reservation_status_date, country, stays_in_week_nights, previous_bookings_not_canceled, previous_cancellations, lead_time, adr, booking_changes, arrival_date_week_number, adults, stays_in_weekend_nights this columns holds continues values.

total_of_special_requests, babies, arrival_date_year, deposit_type, arrival_date_month, distribution_channel, reservation_status, market_segment, is_canceled, hotel, children, meal, customer_type, required_car_parking_spaces, assigned_room_type, is_repeated_guest, reserved_room_type this columns holds categorical values.



In [None]:
# Check Unique Values for each variable.
def var(df):
    unique_list = pd.DataFrame([[i,len( df[i].unique())] for i in df.columns])
    unique_list.columns = ['name','uniques']

    total_var = set(df.columns)
    cat_var = set(unique_list.name[(unique_list.uniques<=12)      |
                                   (unique_list.name=='Country')  |
                                   (unique_list.name=='Agent')
                                  ])
    con_var = total_var - cat_var

    return cat_var, con_var


cat_var, con_var = var(df)

print("Continuous Variables (",len(con_var),")\n",con_var,'\n\n'
      "Categorical Variables(",len(cat_var),")\n",cat_var)

## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:
# Write your code to make your dataset analysis ready.
df = df.drop(['agent','company'],axis=1)   #dropping column with high null values
df = df.dropna(axis = 0)    #dropping null values.


In [None]:
df.isnull().sum()

### What all manipulations have you done and insights you found?

Manipulations:

* Handling Missing Values: Missing values were identified and dealt with through strategies such as filling missing data with default values or dropping rows/columns with significant missing data.

* Converting Data Types: Dates were converted to proper datetime formats to facilitate time-based analysis.

* Removing Duplicates: Duplicate records were removed to avoid skewed analysis.

Insights:

* Clean data is critical for accurate analysis, as missing or erroneous data can lead to incorrect conclusions.

* Proper formatting of date columns allows for time-based trend analysis.

## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

#### Chart - 1

In [None]:
# Chart - 1 visualization code
plt.figure(figsize=(10, 6))
plt.pie(x = df['is_canceled'].value_counts(), autopct='%1.1f%%', colors=('red', 'green'))    # Creating Pie Chart To See The Cancelled Booking Percentage
plt.title(' Cancelled Booking')
plt.show()

##### 1. Why did you pick the specific chart?

The primary purpose of this analysis is to show the percentage of canceled vs. non-canceled bookings. A pie chart excels at visualizing parts of a whole, making it easy to understand what proportion of the total bookings are canceled compared to those that are completed.

##### 2. What is/are the insight(s) found from the chart?

*According to the pie chart, 63% of bookings were not canceled and 37% of the bookings were canceled at the Hotel.*

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

*While the insight highlights a majority of successful bookings, the presence of cancellations warrants further investigation and strategic actions to optimize the booking process, enhance customer satisfaction, and potentially reduce the cancellation rate for continued positive business growth.*

#### Chart - 2

In [None]:
# Chart - 2 visualization code
plt.figure(figsize=(10, 6))
plt.pie(x = df['hotel'].value_counts(), labels = df['hotel'].value_counts().index, autopct='%1.1f%%', colors=("red", "green"))    # Creating Pie Chart
plt.title(' Hotel Type')
plt.show()

##### 1. Why did you pick the specific chart?

To see the preference of customer while using hotel in percentage.

##### 2. What is/are the insight(s) found from the chart?

66.7% customer choose city hotel over resort hotel.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

The insight that 66.7% of customers choose city hotels over resort hotels suggests an opportunity to strategically focus on city hotel offerings. By tailoring marketing strategies, services, and promotions to align with the predominant customer preference, the business can positively impact revenue. But to sustain positive growth, it's crucial to balance resources and offerings to cater to both segments effectively.

#### Chart - 3

In [None]:
# Chart - 3 visualization code
plt.figure(figsize=(10, 6))
sns.countplot(df, x="customer_type")   # Create a Count Plot
plt.title(' Distribution of Cancellation Status')
plt.show()

##### 1. Why did you pick the specific chart?

To see distribution of customer type.

##### 2. What is/are the insight(s) found from the chart?

The maximum type of customer are of transient. Transient-party type customer are also significant. The contract type customer are less.But the group type of customer are very less almost negligible

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.



The dominance of transient and transient-party customer types, with a lesser presence of contract and almost negligible group customers, suggests an opportunity to tailor business strategies towards transient-focused services. By optimizing marketing and offerings for transient and transient-party segments, the business can likely capitalize on its predominant customer base.

#### Chart - 4

In [None]:
# Chart - 4 visualization code
plt.figure(figsize=(12, 6))
sns.countplot(df, x="arrival_date_month")   # Create a Count Plot
plt.title(' Distribution of Arriving Month')
plt.xlabel('Arriving Month')
plt.ylabel('Customer Count')
plt.show()

##### 1. Why did you pick the specific chart?

To see distibution of number of customer on the basis of month.

##### 2. What is/are the insight(s) found from the chart?

The pattern looks that from january to august the custumor arriving increase. After the sepetember to december it decreses. From which the month august has max number of arriving customer, while the january has minimum customer arrivel.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

The observed seasonal pattern with increasing customer arrivals from January to August and a subsequent decrease from September to December provides actionable insights for positive business impact. Leveraging this knowledge, the business can optimize resource allocation, marketing efforts, and promotions to capitalize on peak months, ensuring a positive impact on revenue. However, neglecting the potential decrease in customer arrivals during the latter part of the year may lead to underutilized capacity and missed revenue opportunities, potentially resulting in negative growth. Strategically aligning services and promotions with seasonal trends is vital for sustained positive business impact.

#### Chart - 5

In [None]:
# Chart - 5 visualization code
plt.figure(figsize=(15, 8))
plt.subplot(1, 2, 1)
sns.countplot(x='children',hue='hotel', data= df, palette='cool')
plt.title("Number of Children in both hotels",fontweight="bold", size=20)
plt.subplot(1, 2, 2)
sns.countplot(data = df, x = 'children', hue='is_canceled', palette='Set2')
plt.title('Children vs Cancelations',fontweight="bold", size=20)
plt.subplots_adjust(right=1.7)

plt.show()

##### 1. Why did you pick the specific chart?

To look after the decision of customer changes on Children.

##### 2. What is/are the insight(s) found from the chart?

In first chart we can see that most of the families arriving without children and are preferring city hotel over resort. While the families with children preferece is restorent. In second chart we can see that in both cases if they have children or not the cancelation rate is low.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

To ensure sustained positive business impact, the business should continue to differentiate its services for families with and without children, ensuring a family-friendly environment and maintaining low cancellation rates. Neglecting the specific needs of families with children may result in missed opportunities for growth within this segment. Therefore, a comprehensive strategy that caters to the diverse preferences of families can contribute to positive business outcomes.

#### Chart - 6

In [None]:
# Chart - 6 visualization code
plt.figure(figsize=(8, 4))

# Create the countplot
ax1 = sns.countplot(x='hotel', hue='is_canceled', data=df, palette="Set2")

# Customize legend location
legend_labels, _ = ax1.get_legend_handles_labels()
ax1.legend(bbox_to_anchor=(1, 1))

# Set plot title and axis labels
plt.title('Reservation status in different hotels', size=20, color='Black')
plt.xlabel('Hotel',color='Black')
plt.ylabel('Number of Reservations', color='Black')

# Customize legend labels
plt.legend(['Not Cancelled', 'Cancelled'])

# Show the plot
plt.show()


##### 1. Why did you pick the specific chart?

To see the booking diffrence between Resort Hotel and city Hotel

##### 2. What is/are the insight(s) found from the chart?

In comparison to resort hotels, city hotels have more bookings. Its possible that resort hotels are more expensive that those in cities.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 7

In [None]:
# Chart - 7 visualization code
plt.figure(figsize = (10, 6))
sns.barplot(data = df, x = 'hotel', y = 'is_repeated_guest')    # Create Bar Plot
plt.xlabel('Hotel Type')
plt.ylabel('Repeated Guest')
plt.title(' HotelType vs Repeated Guest')
plt.show()

##### 1. Why did you pick the specific chart?

To obeserve the repeating customers on the basis of hotel type.

##### 2. What is/are the insight(s) found from the chart?

The resort hotel has more repeated guest than city hotel.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

The insight that the resort hotel attracts more repeated guests compared to the city hotel presents a positive opportunity for business impact. Leveraging this knowledge, the business can tailor marketing strategies, loyalty programs, and customer engagement initiatives to further enhance guest retention in the resort segment.

#### Chart - 8

In [None]:
# Chart - 8 visualization code
plt.figure(figsize=(12,6))
sns.lineplot(x='arrival_date_month', y='adr', hue='hotel', data= df)
plt.show()

##### 1. Why did you pick the specific chart?

To observe relation between adr and arrival month.

##### 2. What is/are the insight(s) found from the chart?

this line plot helps identify seasonal demand patterns, distinguish pricing strategies for different hotel types, and recognize key opportunities for revenue optimization during peak and off-peak periods.


##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

For Resort Hotel, ADR is more expensive during July, August & September and for City Hotel, ADR is slightly more during March, April & May.


#### Chart - 9

In [None]:
# Chart - 9 visualization code
plt.figure(figsize=(12,5))

# Calculating average daily rate per person
df['adr_pp'] = df['adr'] / (df['adults'] + df['children'])
actual_guests = df.loc[df["is_canceled"] == 0]
actual_guests['price'] = actual_guests['adr'] * (actual_guests['stays_in_weekend_nights'] + actual_guests['stays_in_week_nights'])
sns.lineplot(data = actual_guests, x = 'arrival_date_month', y = 'price', hue = 'hotel')
plt.show()

##### 1. Why did you pick the specific chart?

To observe the plot between arrival month and price.

##### 2. What is/are the insight(s) found from the chart?

Prices of resort hotel are much higher. Prices of city hotel do not fluctuate that much.

#### Chart - 10

In [None]:
# Chart - 10 visualization code
columns_of_interest = ['lead_time', 'adr']
sns.set(style="ticks", color_codes=True)
sns.pairplot(df[columns_of_interest])
plt.show()

##### 1. Why did you pick the specific chart?

To observe the relation between lead time and adr.

##### 2. What is/are the insight(s) found from the chart?

The plot suggest the positive relation between the adr and the lead time. This means that to increase the adr we needs to increase lead time.

#### Chart - 11

In [None]:
# Chart - 11 visualization code
import plotly.express as px
import plotly.graph_objects as go
import plotly.figure_factory as ff
from plotly.colors import n_colors
from plotly.subplots import make_subplots
# Minmax scaler
from sklearn.preprocessing import MinMaxScaler

country_visitors = df[df['is_canceled'] == 0].groupby(['country']).size().reset_index(name = 'count')


import plotly.express as px

px.choropleth(country_visitors,
                    locations = "country",
                    color= "count" ,
                    hover_name= "country", # column to add to hover information
                    color_continuous_scale="Viridis",
                    title="Home country of visitors")

Chart-12

In [None]:
# Chart - 12 visualization code
df.replace([np.inf, -np.inf], np.nan, inplace=True)
axes = df.hist(figsize=(20,14)) # assign the result of df.hist() to a new variable, axes
plt.show()

##### 1. Why did you pick the specific chart?

Histograms visually reveal the distribution of your data. This can help you understand Customer Behaviour, operational Efficiency,Risk Assessment.

##### 2. What is/are the insight(s) found from the chart?

A histogram is basically used to represent data provided in a form of some groups.It is accurate method for the graphical representation of numerical data distribution.It is a type of bar plot where X-axis represents the bin ranges while Y-axis gives information about frequency

# **Conclusion**

Increasing prices are associated with a higher rate of cancellations. To mitigate reservation cancellations, hotels could refine their pricing strategies by offering reduced rates for specific locations and providing discounts to customers.
The resort hotel experiences a higher ratio of cancellations compared to the city hotels. Therefore, hotels should consider offering competitive room price discounts on weekends and holidays.
During the month of January, hotels can launch marketing campaigns with attractive offers to boost their revenue, especially since cancellations tend to peak during this period.
Enhancing the quality of hotels and their services, particularly in Portugal, can be an effective approach.