<a href="https://colab.research.google.com/github/ANDUGULA-SAI-KIRAN/Hotel-Booking-EDA/blob/main/Hotel_booking_analysis_EDA.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Project Name**    - Hotel Booking Analysis



##### **Project Type**    - EDA
##### **Contribution**    - Team
##### **Team Member 1 - ANDUGULA SAI KIRAN(self)**
##### **Team Member 2 - Y CHANDRODAY**



# **Project Summary -**

**In the realm of hotel bookings and hospitality management, a comprehensive analysis can yield valuable insights into guest behaviors, preferences, and industry trends. This project aims to unravel the nuances(opinion) of hotel bookings by employing a diverse array of data exploration and analysis techniques. The dataset has features such as booking cancellations, lead times, guest demographics, and reservation details, offers a wealth of information to be unearthed.**

The primary objectives of this project is, investigation of booking cancellations. We aim to discover patterns that may influence the likelihood of a bookings being cancelled. This understanding can be improve strategies to minimize cancellations and optimize booking processes.

Seasonal booking patterns come into play as we delve into the temporal distribution of bookings across months. Such analysis can uncover trends, revealing peak seasons and quieter periods, thereby facilitating efficient resource allocation and marketing strategies. Moreover, guest demographics analysis provides a window into the origins of guests and their accompanying adults, children, and babies. This information could be invaluable for targeted marketing campaigns.

An exploration of booking channels contributes to a better grasp of the distribution channels through which reservations are made. By identifying trends associated with different channels, the hotel can optimize marketing efforts and distribution strategies to maximize bookings.

Furthermore, the interaction between special requests and cancellations is scrutinized. Are special requests correlated with booking cancellations? This analysis could uncover whether meeting guest expectations for special requests might influence their decision to cancel a booking.

Demographics and customer types are vital aspects of the hospitality industry. By identifying patterns in customer types and analyzing their booking behaviors, the project aims to provide insights that could lead to tailored services and enhanced customer experiences.

The dataset's numeric features offer the opportunity for correlation analysis, affording a closer look at relationships between different variables. A correlation heatmap and pair plot can uncover hidden dependencies, potentially guiding decision-making processes.

In conclusion, this hotel booking analysis project sets out to unlock the power of data for informed decision-making in the hospitality sector. By blending exploratory data analysis and correlation studies, the project aims to empower hotel managers with insights that can optimize resource allocation, marketing strategies, and guest experiences. The resulting impact on guest satisfaction and operational efficiency could potentially reshape the landscape of hotel management.

# **GitHub Link -**

https://github.com/ANDUGULA-SAI-KIRAN/Hotel-Booking-EDA

# **Problem Statement**


**In the dynamic landscape of hotel management, understanding the complexities of booking behaviors, cancellations, guest preferences, uncovering seasonal trends, understanding guest demographics, deciphering the role of booking channels, optimizing operational efficiency and enhancing guest satisfaction. Resulting in optimized operational strategies, predicting booking cancellations, identifying influential factors, and tailoring services to meet guest expectations and a competitive edge in the ever-evolving world of hospitality management.**


#### **Define Your Business Objective?**

**The primary business objective for hotel booking is to optimize revenue generation, operational efficiency, and guest satisfaction by leveraging data-driven insights and strategies. This encompasses several specific goals aimed to achieving sustainable growth and providing exceptional experiences to guests**

# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required.
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits.
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 20 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule.

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
#from google.colab import drive
#drive.mount('/content/drive')

### Dataset Loading

Data set link - https://drive.google.com/file/d/1HhW1A9kcUbE4l2kVsH4yQ7eln2p9VYmF/view?usp=sharing

In [None]:
data = pd.read_csv('/content/Hotel Bookings.csv')
hotel_df = pd.DataFrame(data)

### Dataset First View

In [None]:
# Dataset First Look
pd.set_option('display.max_columns',32)  #displays all column names
hotel_df.head() #displays 1st five rows

### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count
hotel_df.shape # displays the count of rows and columns

### Dataset Information

In [None]:
# Dataset Info
hotel_df.info()  #displays the total count of non-null values and data types

#### Duplicate Values

In [None]:
# Dataset Duplicate Value Count
len(hotel_df[hotel_df.duplicated()])  #gives total number of duplicated values

#### Missing Values/Null Values

In [None]:
# Missing Values/Null Values Count
print(hotel_df.isnull().sum()) #gives total count of null values in each column

In [None]:
# Visualizing the missing values
sns.heatmap(hotel_df.isnull(), cbar = False)

### What did you know about your dataset?

The dataset offers a wealth of information about hotel bookings like factors affect hotel booking, guest demographics, preferences, booking cancellations and booking outcomes. Aiming to provide valuable insights to hotel management to enhance their operations and services, ultimately improving guest experiences.

## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns
hotel_df.columns

In [None]:
# Dataset Describe
hotel_df.describe(include = 'all')

### Variables Description

* **hotel:** Type of hotel (resort hotel or city hotel)

* **is_canceled:** Indicates whether the booking was canceled (1) or not (0).

* **lead_time:** Number of days that elapsed between the entering date of booking into the PMS and the arrival date.
* **arrival_date_year:** Year of arrival.
* **arrival_date_month:** Month of arrival.
* **arrival_date_week_number:** Week number of the year for arrival.
* **arrival_date_day_of_month:** Date of arrival in that month.
* **stays_in_weekend_nights:** Number of weekend nights (Saturday or Sunday) the guest stayed or booked to stay at hotel.
* **stays_in_week_nights:** Number of weeknights(Monday to Friday) the guest stayed or booked to stay at hotel.
* **adults:** number of adults among the guests.
* **children:** number of children among the guests.
* **babies:** Number of babies among the guests.
* **meal:** Type of meal opted during stay.
* **country:** Country of origin of the guest.
* **market_segment:** designation of market segment.
* **distribution_channel:** Name of Booking distribution channel.
* **is_repeated_guest:** Indicates if the guest is a repeated guest (1) or not repeated guest (0).
* **previous_cancellations:** Number of previous booking cancellations by the guest prior to current booking.
* **previous_bookings_not_canceled:** Number of previous bookings not cancelled by the guest prior to current booking.
* **reserved_room_type:** Code of the room type reserved.
* **assigned_room_type:** Code of the room type assigned.
* **booking_changes:** Number of changes made to the booking.
* **deposit_type:** Type of deposit made for the booking.
* **agent:** ID of the travel agent making the booking.
* **company:** ID of the company/entity making the booking.
* **days_in_waiting_list:** Number of days the booking was in the waiting list before confirmed.
* **customer_type:** Type of booking, such as transient, contract, group, or other.
* **adr:** Average daily rate.
* **required_car_parking_spaces:** Number of car parking spaces required by the client.
* **total_of_special_requests:** Number of special requests made by the guest.
* **reservation_status:** Current status of the reservation.
* **reservation_status_date:** Date at which the last status was updated.

### Check Unique Values for each variable.

In [None]:
# Check Unique Values for each variable.
print(hotel_df.apply(lambda col: col.unique()))

## 3. ***Data Wrangling***

### Data Wrangling Code

**Step 1: Removing duplicate rows if any**

In [None]:
#1 checking for any duplicate rows
print('total number of rows and columns',hotel_df.shape)
hotel_df.drop_duplicates(inplace = True)
print('total number of rows and columns after removing duplicates',hotel_df.shape)

**step 2: handling mising values**

In [None]:
# 2 checking for null values in data set and replacing the null values with according to column data type
hotel_df.isnull().sum().sort_values(ascending = False) # gives the sum of null values in each column

#2.1 Deletion of company column:
hotel_df.drop(['company'], axis =1, inplace = True)

#2.2 Filling null values with 0 for column agent:
hotel_df['agent'].fillna(0, inplace = True)

#2.3 replacing null values with others
hotel_df['country'].fillna('others', inplace = True)

#2.4 replacing null values in children column with 0:
hotel_df['children'].fillna(0, inplace = True)

**step 3: Type casting(conversion of data type)**

In [None]:
#type casting children & agent columns from float to int:
hotel_df[['children','agent']] = hotel_df[['children','agent']].astype('int64')

#type casting reservation_status_date from object to date time:
hotel_df['reservation_status_date'] = pd.to_datetime(hotel_df['reservation_status_date'], format = '%Y-%m-%d')

**step 4: column assignment/adding**


In [None]:
#assigning a new column "total_guests" by adding follwoin columns adults, children, babies
hotel_df['total_guests'] = hotel_df.children + hotel_df.babies + hotel_df.adults

#assigning new column "total_stay" by adding following columns stays_in_weekend_nights & stays_in_week_nights
hotel_df['total_stay'] = hotel_df.stays_in_weekend_nights + hotel_df.stays_in_week_nights

**step 5: removing outliers:**

In [None]:
'''
the following rows contains total guests count as 0, other entries in these rows are filled with
values, we can assume booking were made and guests didn't checkin, therefore we can delete these rows
'''
print('total number of rows with total guests as 0 is:',hotel_df[hotel_df['total_guests'] == 0].shape)
# or hotel_df[hotel_df['children'] + hotel_df['adults'] + hotel_df['babies'] ==0].shape


In [None]:
#displaying the outlier in scatter plot
plt.figure(figsize = (12,6))
sns.scatterplot(data= hotel_df, x= 'total_stay', y= 'adr', label = 'scatter plot with outlier')

In [None]:
#displaying the plot after removing outlier
hotel_df.drop(hotel_df[hotel_df['adr']> 5000].index, inplace = True ) #removing the outlier which is above 5000
plt.figure(figsize = (12,6))
sns.scatterplot(data= hotel_df, x= 'total_stay', y= 'adr', label = 'scatter plot without outlier')
plt.show()
print('shape of data frame is ',hotel_df.shape)

### What all manipulations have you done and insights you found?

**1. Duplicates rows were investigated within the dataset:** The initial dataset comprised 119,390 rows, and subsequent removal of duplicate entries resulted in 87,396 remaining rows.

**2. handling cleaning & missing values:**

2.1 The "Company" column contains a null values of 82,137 out off 87,396 which is 94%. This level of missing data can significantly impact the accuracy and reliability of analysis performed on data set, Hence delete the company column makes sense.

2.2 The "agent" column represents agent ID, column contains 12,193 null values out of 87,396 which is 14%. this indicates that there was no assigned agent for that particular bookings, by replacing null with 0, the column remains consistent interms of data type(integer) and can be included in data manipulation without issues related to missing data.

2.3 The "Country" column represents country code of hotel, column contains 452 null values out of 87,396 which is 0.52%, country code for these hotels were missing, by replacing null values with 'others', the column remains consistent interms of data type(string) and can be included in data manipulations without issues related to missing data.

2.4 The "Children" column represents number of children stayed at hotel along with adults. column contains 4 null values out of 87,396 which is very less, by replacing null values with 0.

**therefore by this step we have removed all duplicated rows and null values in each column were replaced.**

**3. Type casting:**

type casting involves changing the data type of a variable or value from one type to another, it plays a crucial role in data manipulation like data compatibility, data integrity and mathematical operations

**4. column assignment/adding:**

- A new column has been assigned "total guests" it represents total count of guests, by adding following columns "adults", "children", "babies."
- A new column has been assigned "total stay" it represents total count of stay in days, by assing following columns "stay in weekend nights" & "stay in week nights."

**5. Removing Outlier:**
Outliers are data points that significantly deviate from the majority of data points in dataset, these extreme values can distort statistical measures and models. leading to inaccurate or biased results.
Here after obeserving the plot majority data points in average daily rate is around 500 units and only on data point is above 5000 which doesn't make any sense, hence removing the outlier. **by this step total rows are 87,395 and columns are 33**

## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

### **Univariate Analysis -**

#### **Chart - 1 City Hotel vs. Resort hotel bookings Analysis**

In [None]:
# Chart - 1 visualization code
hotel_type = hotel_df['hotel'].value_counts() # value count gives the sum of similar value in columns
hotel_type.reset_index()

In [None]:
# Chart - 1 visualization plot
hotel_type.plot.pie(x= 'City Hotel', y='Resort Hotel',
                    autopct='%.2f%%',textprops={'weight': 'bold'}) #plots a pie plot
plt.title('Hotel Booking percentage', fontweight="bold", size=12 )
plt.xlabel('')
plt.ylabel('')
plt.show()

##### 1. Why did you pick the specific chart?

A pie chart is effective to represent a categorial data as a part-to-whole relationship(percentage of each category)

##### 2. What is/are the insight(s) found from the chart?

Data shows that city hotel has more bookings which is 61% compared to resort hotel which is 39%.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

**From the obeservation made from plot**

**City Hotel:**

**positive business impact:** City hotel has year-round demand of bookings due various reasons like bussiness travelers, tourists, conventions. This consistent flow of visitors can provide a stable revenue throught year.

**Negative business impact:** Even after having year-round demand compitition is high, to achieve maximum bookings hotel prices shall be minimum compared to market pricing by compromising profit margins, not compromising in providing services.

**Resort Hotel:**

**positive business impact:** resort hotels have there own advantage of providing guests a relaxing and leisure-focused experience, resort hotels are usually booked for vacations, holiday trips, wedding ceremony's, etc,. this lead to longer stays of guests.

Apart from room rates resort hotel provides various amenities like spas, swimming pool, loud music, dance floors, recretional activities etc,. which gives additional revenue.


**Negative business impact:** Resort hotel are highly seasonal dependency with peak demand during vacations and festival holidays this can lead to reduced occupancy and revenue. since resorts provide various amenities maintenance costs increases during non-season maintaining these expenses is high.

#### **Chart - 2 Seasonal and Yearly Hotel Booking Patterns**

In [None]:
# Chart - 2 visualization code/plot
#yearly booking patterns - comparision of bookings for both hotels by year
sns.countplot(x= 'arrival_date_year', data = hotel_df, hue = 'hotel')
plt.title('Yearly booking patterns', fontweight = 'bold', size=12)
plt.xlabel('year',fontweight = 'bold')
plt.ylabel('No. of bookings',fontweight = 'bold')
plt.show()

In [None]:
#sum of city hotel bookings month wise
city_hotel_df = hotel_df[hotel_df['hotel'] == 'City Hotel'] #gives output of only city hotels list
city_hotel_booking = city_hotel_df.arrival_date_month.value_counts() #groups the column month and gives sum of guests arrival in  specific month
city_hotel_booking

In [None]:
#sum of resort hotel bookings month wise
resort_hotel_df =  hotel_df[hotel_df['hotel'] == 'Resort Hotel'] #gives output of only resort hotel list
resort_hotel_booking = resort_hotel_df['arrival_date_month'].value_counts() #groups the column month and gives sum of guests arrival in  specific month
resort_hotel_booking

In [None]:
#seasonal booking patterns - comparision of bookings for both hotels by months
plt.figure(figsize = (12,6))
sns.countplot(data = hotel_df, x = 'arrival_date_month', hue = 'hotel')
plt.title('Seasonal booking patterns', fontweight = 'bold', size = 12)
plt.xlabel('booking moths')
plt.ylabel('No. of bookings')
plt.show()

##### 1. Why did you pick the specific chart?

**counter plot:** A counter plot is also known as countplot, is used to visualize the frequency or count of categorical data, it displays the the number of occureneces of each category within a single categorical variable.

##### 2. What is/are the insight(s) found from the chart?

**Bookings by Year:**

booking patterns by year shows there are high bookings in year 2016 compared to 2015 & 2017, there isn't much data to predict why bookings were high.

**Bookings by Month:**

Booking patterns by month shows there are high bookings in month of july & august and low in month of november. december and january


##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

**Bookings by month:**

**positive:** by observing the plot, clearly we can see bookings are high in month of 'july' & 'august' and low in the month of 'november', 'december','january'.
according to these months we can maintain staff/service employees more in high booking months and low staff in low booking months by this we can not only optimise the revenue in long periods and provide adequate services and amenities for all guests in busy months.

#### **Chart - 3 Average Daily Rate Analysis**

In [None]:
# Chart - 3 visualization code
adr_mean = hotel_df.groupby('hotel')['adr'].mean().reset_index() #grouping by column 'hotel' and calculating mean
adr_mean

In [None]:
# Chart - 3 visualization plot
sns.barplot(x=adr_mean.hotel	, y= adr_mean.adr) #plots a bar plot
plt.title('Average Daily Rate', fontweight = 'bold', size = 12)
plt.xlabel('Hotel Type', fontweight = 'bold')
plt.ylabel('Rate Units', fontweight = 'bold')
plt.show()

##### 1. Why did you pick the specific chart?

**Bar plot:** A bar plot is used to visualize the relationship between categorical variable and a numerical variable. Here hotel is a categorical variable and adr is numerical variable

##### 2. What is/are the insight(s) found from the chart?

The "City hotel" generates a higher average daily revenue (ADR) compared to the "Resort hotel." Mean ADR of "City hotel" is approximately 110.89, whereas the mean ADR for the "Resort hotel" is approximately 99.03.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Resort hotel and City hotel has there own purposes. City hotels are mostly occupied with bussiness meetings/conferences, city tourisms etc,. while resort hotels are mostly located city outskirts with greenery and lake view sites. Guests mostly prefer to visit resort hotels for vacations, wedding events to get relax.

**areas to focus for bussiness imporvements:**

**- identifying city hotels strengths:** identifying fators that contribute to city hotel success such as location, services, amenities, pricing strategy and implimenting them to resort hotel.

**- Diversify offerings:** diversifying the services and amenities at hotels like conference facilities, restaurants, entertainment options etc,. this can attract more guests.

**- Market research:** conducting a market research to understand why guests prefer the city hotel over resort hotel, using this information to tailor marketing efforts and services to meet customer expectations.

**- Online presence/ digital marketing:** Implimenting digital marketing strtegies to attract more customers, improving their user experience and transactions smooth. (in next plot we will be plotting booking percentage by distribution channel for more insights)

#### **Chart - 4 Booking Channel preferrence Analysis**

In [None]:
# Chart - 4 visualization code
booking_distribution = hotel_df['distribution_channel'].value_counts() #calculates sum of values in distribution channel column, total bookings = 87395
booking_distribution = pd.DataFrame(booking_distribution).reset_index()
booking_distribution #shows tha distribution channel with number of bookings

In [None]:
total_bookings = hotel_df['distribution_channel'].value_counts().sum()
booking_distribution['booking_percentage'] = (booking_distribution['distribution_channel']/total_bookings) * 100
booking_distribution #a new column has created showing percentage of booking channels

In [None]:
# Chart - 4 visualization plot
sns.barplot(data = booking_distribution, x = 'index', y= 'booking_percentage')
plt.xlabel('Booking Distribution Channels',fontweight = 'bold')
plt.ylabel('Booking Percentage',fontweight = 'bold')
plt.title('Booking Channel Preference Analysis', fontweight = 'bold', size = 12 )
plt.show()

##### 1. Why did you pick the specific chart?

**Bar plot:** A bar plot is used to visualize the relationship between categorical variable and a numerical variable. Here booking distribution channel is a categorical variable and booking percentage is numerical variable

##### 2. What is/are the insight(s) found from the chart?

The majority of bookings (79.11%) come through Travel Agents/Tour Operators (TA/TO), indicating the significance of this distribution channel. This insight highlights the importance of maintaining strong relationships with TA/TO partners.


Direct bookings account is smaller but notable share (14.86%) of the total bookings, indicating the value of the hotel's online presence and marketing efforts in attracting guests directly. Encouraging more direct bookings can lead to cost savings from reduced commissions to third-party agents.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

From obeservations made from graph we can make certian benchmark for each distribution channels and monitoring for each month or quarterly, by this we understand which distribution channel to focus more for improvements

#### **Chart - 5 Meal Preference Analysis**

In [None]:
# Chart - 5 visualization code
preferred_meal_df = hotel_df['meal'].value_counts().reset_index() #gives the sum pf meal types
preferred_meal_df # calculats the sum of values in meal column

In [None]:
def meal_name(index): #renaming meal names with full names
  for i in index:
    if i in 'BB':
      return 'Bed and Breakfast'
    elif i in 'SC':
      return 'Self-Catering'
    elif i in 'HB':
      return 'Half Board'
    elif i in 'FB':
      return 'Full Board'
    else:
      return 'Undefined'

In [None]:
preferred_meal_df['meal_names'] = preferred_meal_df['index'].apply(meal_name)
preferred_meal_df

In [None]:
# Chart - 5 visualization plot
sns.barplot(data = preferred_meal_df, x = 'meal_names', y = 'meal') #plots a bar plot
plt.xlabel('Type of Meal', fontweight = 'bold')
plt.ylabel('No. of Meals', fontweight = 'bold')
plt.title('Meal Preference Analysis',fontweight='bold', size = 12)
plt.xticks(rotation = 40)
plt.show()

##### 1. Why did you pick the specific chart?

**Bar plot:** A bar plot is used to visualize the relationship between categorical variable and a numerical variable. Here meal type is a categorical variable and count of meals is numerical variable

##### 2. What is/are the insight(s) found from the chart?

"Bed and Breakfast" (BB) is the most popular meal option, representing a significant majority (approximately 79.62%) of bookings, followed by "Self-Catering" (SC) at 11.10% and "Half Board" (HB) at 10.64%. The "Full Board" (FB) option has a much smaller share of bookings (approximately 0.82%). Understanding these meal preferences can help tailor dining services and marketing strategies to guest preferences and potentially increase revenue by promoting certain meal plans.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

**Positive impact:** Since most customers preffered BB type meal, the bussiness can promote and offer attractive packages with this meal plan.

**Negative impact:** The low number of bookings for Full Board meal indicates a potential negative impact on revenue from this meal option. The business might consider adjusting pricing or marketing strategies to increase the popularity of Full Board packages.

For further understanding taking customer feeback and preferences could provide more insights why guests are prefering certain meal than others

#### **Chart - 6 Customer Type Analysis**

In [None]:
# Chart - 6 visualization code
customer = hotel_df.groupby('customer_type')['customer_type'].value_counts() #goups by customer type and gives sum of values in column
customer #groups by customer type and calculates the sum of values in 'customer type' column

In [None]:
# Chart - 6 visualization plot
sns.countplot(data = hotel_df, x = 'customer_type') # plots the counter plot
plt.xlabel('Customer Type', fontweight='bold')
plt.ylabel('No. of Bookings', fontweight='bold')
plt.title('Customer Type Analysis',fontweight='bold', size=12)
plt.show()

##### 1. Why did you pick the specific chart?

**Bar plot:** A bar plot is used to visualize the relationship between categorical variable and a numerical variable. Here customer type is a categorical variable and no. of bookings is numerical variable

##### 2. What is/are the insight(s) found from the chart?

"Transient" type customer has significant majority of bookings 82.36%, "Transient-party" is the next most bookings 13.4% followed by "Contract" type customers with 3.6% of bookings and Group with 0.62% of customers

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

**Positive bussiness impact:** Transient customers are highly booked bussiness can continue to target and cater to this segment, strategies to attract and retain these customers shall be implemented.

**Negative grrowth or concerns:** Low number of bookings for group customers may indicate a negative impact on revenue from this segment, business might consider implementing marketing campaigns or promotions to increase revenue in this segment.

**additional consideration:** understanding the reasons behind bookings of each segment like property size, amenities, visiting nearest locations these insights help in making promotions.

#### **Chart - 7 Repeated Guests Analysis**

In [None]:
# Chart - 7 visualization code
repeated_guests = hotel_df['is_repeated_guest'].value_counts() #gives the sum of values in column
repeated_guests = repeated_guests.reset_index()
repeated_guests

In [None]:
def repeated_guest(col): #replacing index column value '0' with 'not repeated guest' and '1' with 'repeated guest'
  if col == 0:
    return 'Not repeated Guest'
  else:
    return 'Repeated Guest'

repeated_guests['index'] = repeated_guests['index'].apply(repeated_guest)
repeated_guests = repeated_guests.rename(columns = {'index': 'Booking_type'})
repeated_guests

In [None]:
# Chart - 7 visualization plot
repeated_guests.plot.pie(y='is_repeated_guest',explode=[0.03, 0.03], autopct='%1.2f%%',
                         figsize=(14,7), labels = repeated_guests.Booking_type, textprops={'weight': 'bold'})
plt.title('Percentage of Repeated Guests ',fontsize = 12, fontweight = 'bold')
plt.ylabel('')
plt.show()

##### 1. Why did you pick the specific chart?

A pie chart is effective to represent a categorial data as a part-to-whole relationship(percentage of each category)

##### 2. What is/are the insight(s) found from the chart?

Majority of Guests are 'Not repeated Guests' is significantly dominates with 96.09% while 'Repeated Guests' is 3.91%

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

**positive business impacts:**

**Customer retentions:** Having repeated guests is a positive sign as it indicates that the business has successfully retained a portion of its customer base, repeated customers are consistent revenue for hotels.

**word of mouth marketing:** Satisfied/repeated guests may recommend their friends and family, contributing to positive word-of-mouth marketing.

**Negative business impacts:**

**Limited growth:** having repeated guests is beneficial, but low percentage indicates that majority of guests are not returning, this might limit the potential of growth through repeat bookings.

**Customer attribution:** Its essential to understand why 96% of guests are not returning, customers dissatisfaction could be the reason, ddressing the reasons for this attrition is important to mitigate negative impacts.

#### **Chart - 8 Booking Confirmation Time**

In [None]:
# Chart - 8 visualization code/plot
waiting_time = hotel_df.groupby('hotel')['days_in_waiting_list'].mean().reset_index() #groupin by hotel type and calculating mean of days in waiting list
sns.barplot(x=waiting_time['hotel'], y=waiting_time['days_in_waiting_list']) # plots a bar plot
plt.xlabel('Hotel type', fontweight = 'bold')
plt.ylabel('waiting time in days', fontweight = 'bold')
plt.title('Booking confirmation time', fontweight = 'bold', size =12)
plt.show()

##### 1. Why did you pick the specific chart?

**Bar plot:** A bar plot is used to visualize the relationship between categorical variable and a numerical variable. Here hotel type is a categorical variable and waiting time is numerical variable

##### 2. What is/are the insight(s) found from the chart?

Data shows that, average guests at the "City Hotel" experience a longer waiting time (approximately 1.02 days) in the waiting list compared to guests at the "Resort Hotel," the average waiting time is significantly shorter (approximately 0.32 days).

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

**Positive Business Impact:**

**Efficiency Improvements:** The insights suggest that the "Resort Hotel" has a more efficient booking process with shorter waiting times. This can be seen as a positive aspect for the "Resort Hotel," as shorter waiting times are generally preferred by guests.

**Customer Satisfaction:** Shorter waiting times can contribute to higher customer satisfaction, as guests appreciate a quick and hassle-free booking experience. Satisfied customers are more likely to return and recommend the hotel to others.

**Operational Efficiency:** For the "Resort Hotel," shorter waiting times may indicate effective reservation management, reducing the chances of overbooking or underbooking rooms.

**Negative business Impacts:**

**Long Waiting Time:** For the "City Hotel" the longer average waiting time could lead to potential negative impacts. Guests who have to wait longer for their bookings to be confirmed may become frustrated or choose other hotels with faster confirmation.

**Reduced Bookings confirmations:** Lengthy waiting times may discourage potential guests from completing their bookings, resulting in reduced conversion rates and potential loss of revenue.

**Operational Challenges:** The "City Hotel" may face operational challenges in managing reservations with longer waiting times. It may need to allocate more resources to handle booking confirmations efficiently.


#### **Chart - 9 Booking Analysis by weekday or weekend**

In [None]:
# Chart - 9 visualization code
# Converting 'arrival_date' column to a datetime data type if it's not already
hotel_df['arrival_date'] = pd.to_datetime(hotel_df['arrival_date_year'].astype(str) + '-' +
                                          hotel_df['arrival_date_month'] + '-' +
                                          hotel_df['arrival_date_day_of_month'].astype(str))

# Extract the day of the week from the arrival date (0 = Monday, 6 = Sunday)
hotel_df['day_of_week'] = hotel_df['arrival_date'].dt.dayofweek

# Create a new column to categorize the days as 'Weekday' or 'Weekend'
hotel_df['day_category'] = hotel_df['day_of_week'].apply(lambda x: 'Weekend' if x >= 5 else 'Weekday')

# Group by the day category and hotel type, and count the number of bookings
booking_counts = hotel_df.groupby(['day_category', 'hotel'])['hotel'].count().unstack(fill_value=0)

booking_counts

In [None]:
# Chart - 9 visualization plot
#booking_counts.plot(kind='bar', stacked=False)
booking_counts.plot.bar()
plt.xlabel('Week Category', fontweight='bold')
plt.ylabel('No. of Bookings', fontweight='bold')
plt.xticks(rotation = 0)
plt.title('Bookings by Weekday vs. Weekend', fontweight='bold', size=12)
plt.show()

##### 1. Why did you pick the specific chart?

**Bar plot:** A bar plot is used to visualize the relationship between categorical variable and a numerical variable. Here Week Category is a categorical variable and No. of Bookings is numerical variable

##### 2. What is/are the insight(s) found from the chart?

- Week day  bookings are significantly higher than weekend bookings for both City Hotels and Resort Hotels.
- City Hotels have a higher Bookings compared to Resort Hotels.
- The difference in the number of bookings between City and Resort Hotels is more pronounced on weekdays.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

**Positive Business Impact:**

**Optimizing weekday Business:**  Weekday bookings are substantially higher suggests an opportunity to focus marketing and promotional efforts on attracting more weekday guests. "City Hotels" can continue to invest in city-specific attractions, services, and partnerships to maintain their competitive advantage.

**Potential Negative Growth:**

**Weekend Booking Opportunities:** While weekend bookings are lower than weekdays for both hotel types, there is potential for growth in this area.
Hotels could explore strategies to promote weekend stays, such as offering weekend getaway packages, entertainment options, or family-friendly activities.

**Reducing Weekday-Weekend Disparity:** There significant difference in bookings between weekdays and weekends for both Hotels thls could lead to uneven revenue distribution. Efforts to reduce this disparity, such as offering weekend discounts or events, might help balance bookings throughout the week.

### **Bivariate Analysis -**

#### **Chart - 10 ADR Across Distribution Channel**

In [None]:
# Chart - 10 visualization code
dist_channel = hotel_df.groupby(['distribution_channel','hotel'])['adr'].mean().reset_index()
dist_channel #grouped by distribution channel & hotel, calculation mean of ADR

In [None]:
# Chart - 10 visualization plot
plt.figure(figsize=(8,6))
sns.barplot(data=dist_channel,x='distribution_channel',y='adr',hue='hotel')
plt.title('ADR Across Distribution Channel', fontweight = 'bold', size =12)
plt.xlabel('Distribution channels', fontweight = 'bold')
plt.ylabel('Average Daily Revenue', fontweight = 'bold')
plt.show()

##### 1. Why did you pick the specific chart?

**Bar plot:** A bar plot is used to visualize the relationship between categorical variable and a numerical variable. Here Distribution channel is a categorical variable and ADR is numerical variable

##### 2. What is/are the insight(s) found from the chart?

- ADR in 'City Hotel' tends to be higher in 'Resort Hotel' across various distribution channels
- 'Resort Hotel' under Corporate & Global Distribution System has much lower ADR compared to 'City Hotel'

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

**Positive Business Impact:**

**Pricing strategy:** insights into ADR variations can help the hotel management optimize pricing strategies for different distribution channels and hotel types. Higher ADR's in certain channels or hotel types can be leveraged for increased revenue.

**Negative Business Impact:**

"Undefined" channel in the "City Hotel" has an exceptionally low ADR. This may indicate issues with pricing or marketing strategies in this channel. Addressing this could lead to increased revenue and positive growth.


#### **Chart - 11 Room Type Preferences Analysis**

In [None]:
# Chart - 11.1 visualization code
sns.countplot(data = hotel_df, x= 'assigned_room_type', hue = 'hotel') #plots a count plot for assigned room type
plt.xlabel('Assigned Room Type',fontweight = 'bold')
plt.ylabel('No. of Bookings',fontweight = 'bold')
plt.title('Assigned Room Type Ratio',fontweight = 'bold', size =12)
plt.show()

In [None]:
# Chart - 11.2 visualization code
sns.countplot(data = hotel_df, x= 'reserved_room_type', hue = 'hotel') #plots a count plot for reserved room type
plt.xlabel('Reserved Room Type Ratio',fontweight = 'bold')
plt.ylabel('No. of bookings',fontweight = 'bold')
plt.title('Reserved Room Type Ratio',fontweight = 'bold', size =12)
plt.show()

##### 1. Why did you pick the specific chart?

**counter plot:** A counter plot is also known as countplot, is used to visualize the frequency or count of categorical data, it displays the the number of occureneces of each category within a single categorical variable.

##### 2. What is/are the insight(s) found from the chart?

insights found from graphs is in both cases i,e. reserved room type and assigned room type majority of guests prefered to stay in room type 'A' & 'D' followed by 'E', 'F' and 'G'

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

how ever both city hotel and resort hotel serve for different purposes.

**City hotel room preferences:**

**- bussiness travelers:** city hotels often booked by bussiness travelers whose preference will be spacious work-desk, high speed internet. These travelers prioritize convenience and functionality.

**- Noise considerations:** if the hotel is in bussiest areas  guests prefer higher floors or those facing away from busy streets to reduce noise levels.

**-Budget constraints:** price sensitive travelers may choose more affordable room types, especially if they plan to spend most of time exploring city rather than staying at hotel.

**Resort hotel room preferences:**

**- Leisure and vacation:** resort hotels often cater to leisure travelers and vacationers who may seek spacious and themed rooms.

**- Amenities and activities:** Guests at resort hotels often look for amenities and activities like water sports, restaurant, music area. Their room preferences may align with easy access to these facilities.


#### **Chart - 12 Special Requests Analysis**

In [None]:
# Chart - 12 visualization code
hotel_df['total_of_special_requests'].value_counts() #gives sum of values in column

In [None]:
# Chart - 12 visualization plot
sns.countplot(data = hotel_df, x='total_of_special_requests', hue = 'hotel')
plt.xlabel('Types of Special Requests', fontweight = 'bold')
plt.ylabel('No. of bookings', fontweight = 'bold')
plt.title('Special Requests Analysis', fontweight = 'bold', size =12)
plt.show()

##### 1. Why did you pick the specific chart?

**counter plot:** A counter plot is also known as countplot, is used to visualize the frequency or count of categorical data, it displays the the number of occureneces of each category within a single categorical variable.

##### 2. What is/are the insight(s) found from the chart?

43,893 out of 80,395 i,e. 54.6% of bookings have no special requests.

29,017 guests i,e. 36.1% of bookings have only one special requests.

2,317 guests i,e. 2.9% has of bookings 2 special requests and so on.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

**Positive Business Impacts:**

**Efficiency and Cost Savings:** The high number of bookings with few or no special requests can streamline hotel operations, reduce complexity, and potentially lead to cost savings in providing services. This allows the hotel to allocate resources more efficiently.

**Personalized Service:** For bookings with special requests, hotel staff can focus on providing personalized services to enhance the guest experience. Meeting specific guest needs can lead to higher customer satisfaction and loyalty.

**Negative Growth or Concerns:**

**Resource Allocation:** While meeting special requests is essential for guest satisfaction, a high volume of requests can strain resources. The hotel must strike a balance between accommodating requests and maintaining operational efficiency.

**Potential Negative Impact:** If the hotel cannot efficiently handle a high volume of special requests or consistently meet guest expectations, it could lead to negative guest reviews and decreased customer satisfaction.

#### **Chart - 13 Booking Cancellation Across Hotels Analysis**

In [None]:
# Chart - 13 visualization code
def booking_cancelation(col): #replacing column values '0' with 'Booking canceled' and '1' with 'Not canceled'
  if col == 0:
    return 'Booking canceled'
  else:
    return 'Not canceled'

booking_cancellation = hotel_df.copy()
booking_cancellation['is_canceled'] = hotel_df['is_canceled'].apply(booking_cancelation)

In [None]:
booking_cancellation['is_canceled'].value_counts() #gives the sum of values in column

In [None]:
# Chart - 13 visualization plot
sns.countplot(data =booking_cancellation, x='is_canceled', hue = 'hotel')
plt.xlabel('cancelation status', fontweight='bold')
plt.ylabel('No. of cancelations',fontweight='bold')
plt.title('Booking Cancelation Analysis',fontweight='bold', size = 13)
plt.show()

##### 1. Why did you pick the specific chart?

**counter plot:** A counter plot is also known as countplot, is used to visualize the frequency or count of categorical data, it displays the the number of occureneces of each category within a single categorical variable.

##### 2. What is/are the insight(s) found from the chart?

**High cancellation rates:** There are significantly more canceled bookings (63,371) compared to non-canceled bookings (24,024) suggests that the hotel experiences a relatively high booking cancellation rate.

**Potential Revenue loss:** The large number of canceled bookings implies that the hotel may experience revenue loss due to cancellations. When customers cancel their bookings, it can lead to vacant rooms, which means potential revenue that could have been earned is lost. This insight underscores the importance of managing and reducing booking cancellations.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

**Optimizing Revenue Management:** Understanding the cancellation rate can help the hotel in optimizing its revenue management strategies. It can adjust pricing overbooking policies and promotional strategies to minimize revenue loss due to cancellations.

**Enhancing Customer Experience:** By analyzing the reasons for cancellations, the hotel can identify areas for improvement in its services or policies. Addressing these issues can lead to a better customer experience and potentially reduce cancellations.

**Resource Allocation:** The hotel can adjust staffing and resource allocation based on booking patterns. For instance, during peak cancellation periods, the hotel can reduce staff or services temporarily to minimize operational costs.

**Negative Growth impact (if not taken care):**

**Reduced Customer Loyalty:** Frequent cancellations can also lead to reduced customer loyalty. Guests who repeatedly experience cancellations or uncertainties may choose other hotels in the future, affecting long-term customer relationships.

**Revenue Loss:** If the high cancellation rate is not effectively managed, it can lead to a significant loss of revenue over time. This can impact the overall business potential.

#### **Chart - 14 Booking Changes Analysis**

In [None]:
# Chart - 14 visualization code/plot
hotel_df['booking_changes'].value_counts() #give the sum of values in columns
sns.countplot(data = hotel_df, x= 'booking_changes', hue = 'hotel') #plots a counter plot
plt.xlabel('No. of Booking Changes by single Guests', fontweight = 'bold')
plt.ylabel('No. of booking changes', fontweight = 'bold')
plt.title('Booking Changes Analysis', fontweight = 'bold' ,size=12)
plt.show()

##### 1. Why did you pick the specific chart?

**counter plot:** A counter plot is also known as countplot, is used to visualize the frequency or count of categorical data, it displays the the number of occureneces of each category within a single categorical variable.

##### 2. What is/are the insight(s) found from the chart?

Data shows that significant number of bookings have no changes i,e. 71,494 has 0 changes, substantial number of bookings are with minimum no. of changes

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

**Positive Business Impact:**

**Customer Preferences and Flexibility:** Understanding the frequency of booking changes can help the hotel tailor its booking policies and offerings to match customer preferences for flexibility. It can offer flexible booking options that allow customers to make changes without penalty, potentially attracting more bookings.

**Resource Allocation:** The hotel can adjust its staffing and room allocation strategies based on booking change patterns. For example, during peak periods of changes, the hotel can allocate more resources to handle customer requests efficiently.

**Negative business impact:**

**Operational Challenges:** If the hotel does not effectively manage bookings with frequent changes, it can lead to operational challenges. For instance, last-minute changes can strain resources staff/amenities and room availability.

**Customer Experience:** Flexibility is important factor, but too many changes can disrupt the booking managements and lead to dissatisfaction. achieveing the right balance between flexibility and stability is crucial.

**Loss of Efficiency:** Managing a large number of booking changes can be administratively difficult and lead to inefficiencies if not handled properly.

#### **Chart - 15 Booking Cancelation Analysis by Deposit type**

In [None]:
# Chart - 15 visualization code
deposit = hotel_df.groupby('hotel')['deposit_type'].value_counts() #goups by 'hotel' and gives sum of values in column
deposit

In [None]:
# Chart - 15 visualization plot
sns.countplot(data = hotel_df, x='deposit_type', hue = 'hotel')
plt.xlabel('Deposite Type', fontweight = 'bold')
plt.ylabel('No. of Cancelations', fontweight = 'bold')
plt.title('Booking Cancelation By Deposit Type', fontweight = 'bold', size =12)
plt.show()

##### 1. Why did you pick the specific chart?

**counter plot:** A counter plot is also known as countplot, is used to visualize the frequency or count of categorical data, it displays the the number of occureneces of each category within a single categorical variable.

##### 2. What is/are the insight(s) found from the chart?

The data shows the distribution of deposit types for guests at City Hotel and Resort Hotel.

**For City Hotel:**

- "No Deposit" is the most common deposit type (52,568 bookings).
- "Non Refund" is the second most common deposit type (844 bookings).
- "Refundable" is the least common deposit type (15 bookings)

**For Resort Hotel:**

- "No Deposit" is the most common deposit type (33,683 bookings).
- "Non Refund" is the second most common deposit type (193 bookings).
- "Refundable" is also present but less common compared to the other two types (92 bookings).

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

**Positive Business impacts:**

**Understanding Customer Preferences:** These insights help hotels understand their guests' deposit preferences. For example, the majority of guests at both hotels prefer "No Deposit," indicating that flexibility in payment is valued.

**Booking Flexibility:** Offering a variety of deposit options, including "No Deposit," can attract a broader range of guests and enhance booking flexibility.

**Revenue Management:** Knowing which deposit types are most popular allows hotels to optimize their revenue management strategies, such as pricing and booking policies, to maximize revenue while meeting customer preferences.

**Negative business Impact:**

**Risk management:**  "No Deposit" may be popular, it poses risk to hotels because guests can cancel without penalty. If not managed properly, a high percentage of "No Deposit" bookings may lead to revenue loss due to cancellations.

**Cash Flow Considerations:** Depending on the deposit type, hotels may receive payments at different times. For example, "No Deposit" guests pay upon arrival, while "Refundable" guests may have already paid in advance. Managing cash flow and financial planning is important to ensure the hotel's stability.

**Customer Satisfaction:** Mismanagement of deposit types, especially when guests expect refunds, can lead to dissatisfaction and negative reviews if refunds are delayed or mishandled.


### **Multi-variate Analysis -**

#### **Chart - 16 Correlation Heatmap**

In [None]:
# Correlation Heatmap visualization code
hotel_df.corr()

In [None]:
# Correlation Heatmap visualization plot
plt.figure(figsize=(20,12))
ax = sns.heatmap(hotel_df.corr(numeric_only=True), annot = True,linewidth = 0.5, fmt = '.2f', cmap = 'plasma')
plt.title('correlation of hotel analysis', fontweight = 'bold', fontsize = 30)
plt.xticks(fontsize = 20)
plt.yticks(fontsize = 20)
plt.show()


**-1 indicating a strong negative correlation**

**0 indicating no correlation**

**1 indicating a strong positive correlation between two variables**

##### 1. Why did you pick the specific chart?

Heatmap: A correlation heatmap is a graphical representation of a correlation matrix, where the correlation coefficients between multiple variables are displayed in a matrix format using colors to represent the strength and direction of the relationships between the variables. Correlation coefficients quantify the degree to which two variables are linearly related to each other.

##### 2. What is/are the insight(s) found from the chart?

**1. correlation between "is_canceled" & "lead_time":** "is_canceled" has a moderate positive correlation with "lead_time" (0.18). This suggests that as the lead time increases, the likelihood of a booking being canceled also increases.

**2. correlation between "is_canceled" & "total_of_special_requests":** "is_canceled" has a negative correlation with "total_of_special_requests" (-0.12). Fewer special requests may indicate a higher chance of cancellation.

**3. correlation between "is_canceled" & "required_car_parking_spaces":** "is_canceled" has a negative correlation with "required_car_parking_spaces" (-0.18), suggesting that bookings with parking space requirements may have a higher cancellation rate.

 **4. Correlation between "total_stay" and "lead time":**
"total_stay" has positive correlations with "lead_time" (0.32). This suggests that longer stays are associated with longer lead times and potentially higher average daily rates.

**5. stay in weekend nights" has a correlation of 0.24 with "lead time":** This suggests a weak positive linear relationship between the number of weekend nights stayed and the lead time. As the lead time increases, the number of weekend nights stayed tends to increase slightly.

**6. "stay in week nights" has a correlation of 0.31 with "lead time":** This indicates a weak positive linear relationship between the number of week nights stayed and the lead time. As the lead time increases, the number of week nights stayed tends to increase slightly.

**7. "stay in week nights" has a correlation of 0.56 with "stay in weekend nights":** This suggests a moderate positive linear relationship between the number of week nights stayed and the number of weekend nights stayed. When guests stay more week nights, they also tend to stay more weekend nights, which is a logical relationship.

**8. "previous booking not canceled" has a correlation of 0.44 with "is a repeated guest":** This indicates a moderate positive linear relationship between the two variables. Guests who have not canceled previous bookings are more likely to be repeated guests.

**9. "previous booking not canceled" has a correlation of 0.39 with "previous cancellation":**
There is a moderate positive linear relationship between the variable "previous booking not canceled" and "previous cancellation." This suggests that guests who have a history of not canceling bookings tend to have fewer previous cancellations.

**10. "adr" has a correlation of 0.26 with "adults":**
There is a weak positive linear relationship between the average daily rate (adr) and the number of adults. As the number of adults increases, the average daily rate tends to increase slightly.

**10.1 "adr" has a correlation of 0.35 with "children":** There is a moderate positive linear relationship between the average daily rate (adr) and the number of children. As the number of children increases, the average daily rate tends to increase.

**11. "total guests" has a correlation of 0.41 with "adr":** There is a moderate positive linear relationship between the total number of guests and the average daily rate (adr). As the number of guests increases, the average daily rate tends to increase.



#### **Chart - 17 Pair Plots for various columns**

**Pair plot-1 for Booking related columns:**


- 'lead_time': This column represents the number of days between booking and arrival. It can be interesting to see how lead time relates to other booking-related variables.
- 'stays_in_weekend_nights': This column represents the number of weekend nights stayed.
- 'stays_in_week_nights': This column represents the number of week nights stayed.
- 'booking_changes': This column represents the number of changes made to the booking.

In [None]:
# Pair Plot-1 visualization

Booking_related_columns = hotel_df[['hotel','lead_time', 'stays_in_weekend_nights', 'stays_in_week_nights' , 'booking_changes']].copy()

sns.pairplot(data = Booking_related_columns, hue = 'hotel')

**Pair plot-2 Guest-related columns**

- 'adults': This column represents the number of adults in the booking.
- 'children': This column represents the number of children in the booking.
- 'babies': This column represents the number of babies in the booking.
- 'total_of_special_requests': This column represents the total number of special requests made by guests during their stay.


In [None]:
# Pair Plot-2 visualization

Guest_related_columns = hotel_df[['hotel','adults', 'children', 'babies', 'total_of_special_requests']].copy()
Guest_related_columns
sns.pairplot(Guest_related_columns, hue = 'hotel')

**pair plot-3 Booking cancelation history**

- 'previous_cancellations': The number of previous cancellations by the guest.
- 'previous_bookings_not_canceled': The number of previous bookings that were not canceled by the guest.
- 'is_repeated_guest': Indicates whether the guest is a repeated guest (binary, 0 or 1).



In [None]:
# Pair Plot-3 visualization

Booking_cancelation_history = hotel_df[['hotel', 'previous_cancellations', 'previous_bookings_not_canceled', 'is_repeated_guest', 'adr']]

sns.pairplot(Booking_cancelation_history, hue = 'hotel')

##### 1. Why did you pick the specific chart?

A pair plot is a data visualization technique that allows us to plot pairwise relationships between variables within a dataset, pairs of numerical variables in a dataset, aiding in the exploration of relationships, correlations, and outliers.






##### 2. What is/are the insight(s) found from the chart?

**Pair plot-1 Booking related Columns:**
- A positive correlation between 'stays_in_weekend_nights' and 'stays_in_week_nights,' indicating that guests who stay more on weekends also tend to stay more on weekdays.
- 'lead_time' has near to zero correlation with 'booking_changes,' suggesting that guests who book in advance are less likely to make changes to their booking.

**Pair plot-2 Guest related Columns:**
- Observing the plot, it becomes evident that guests with a higher number of adults tend to make more special requests and additionally, in resort hotels there is an increased frequency of special requests.

**Pair plot-3 Booking cancelation HIstory:**
- In the context of resort hotels, it's notable that repeated guests tend to have a higher rate of previous booking cancellations. Furthermore, within this category of hotels, there is a discernible pattern where the average daily rate (ADR) tends to be higher when previous cancellations have occurred.







## **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ?
Explain Briefly.

**Marketing and Promotion:** Develop targeted marketing campaigns to attract a diverse range of customers, such as families, couples, and corporate groups. Leverage online and offline marketing channels, including social media, email marketing, and partnerships with travel agencies.

**Analysing guests preferences:** Utilize data analytics to gain insights into guest preferences and behavior. Analyze booking patterns, customer reviews, and historical data to make informed decisions about pricing policies, promotions, and service improvements.

**Pricing Strategy:** Implement dynamic pricing strategies to optimize room rates based on demand and seasonality. Offer packages and discounts during off-peak periods to attract more visitors. Stay informed about industry trends and the competitive landscape. Adapt to changing consumer preferences.

**Diversification of Revenue Streams:** Explore opportunities to generate additional revenue, such as hosting events, offering spa services, or partnering with local businesses for tours and activities.

**Reduce Operational Costs:** Implement energy-saving practices and eco-friendly initiatives, Streamline housekeeping schedules for efficiency, Evaluating and optimize the supply chain for food and beverage services. Invest in energy-efficient technologies like lighting and HVAC systems. Use automated check-in and check-out procedures to reduce manpower.

**Online Reputation Management:** Monitor and respond to guest reviews and feedback on platforms like TripAdvisor and Yelp. Address concerns promptly and use positive feedback as testimonials in marketing materials.

**Staff Training:** Invest in staff training and development to ensure high-quality service. Happy and well-trained employees contribute to positive guest experiences.

**Customer Experience Enhancement:** Continuously improve the guest experience to encourage return visits and positive reviews. This can include personalized services, exceptional dining experiences, recreational activities, and maintaining high cleanliness standards.

**Customer Relationship Management (CRM):** Implement a CRM system to manage guest information and preferences. Use this data to provide personalized experiences, including room preferences, special occasions, and loyalty programs.

# **Conclusion**

**1. City Hotel vs. Resort Hotel Bookings:** City hotels have a higher proportion of bookings (61%) compared to resort hotels (39%), indicating that city hotels are more popular among guests. This insight can guide marketing strategies and resource allocation.

**2. Seasonal and Yearly Hotel Booking Patterns:** Bookings were highest in the year 2016, but the reasons for this peak are unclear.
There are seasonal patterns with higher bookings in July and August and lower bookings in November, December, and January. These insights can help with staff scheduling and inventory management.

**3. Average Daily Rate (ADR) Comparison:** City hotels have a higher overall mean ADR (110.89) compared to resort hotels (99.03), suggesting that city hotels can potentially command higher room rates. Pricing strategies can be adjusted accordingly.

**4. Booking Channel Preference:** The majority of bookings (79.11%) come through Travel Agents/Tour Operators (TA/TO), highlighting their significance. Encouraging more direct bookings (14.86%) can be cost-effective. Maintaining strong relationships with TA/TO partners is crucial.

**5. Room Type Preferences:** Guests tend to prefer room types 'A' and 'D' followed by 'E,' 'F,' and 'G' in both reserved and assigned room types. This information can inform room allocation strategies and renovations.

**6. Meal Preference:** "Bed and Breakfast" (BB) is the most popular meal plan (79.62%), followed by "Self-Catering" (SC) and "Half Board" (HB). Focusing on promoting these preferred meal plans can increase revenue.

**7. Customer Type Analysis:** "Transient" customers constitute the majority (82.36%), followed by "Transient-Party," "Contract," and "Group." Tailoring services to the preferences of the predominant customer type can enhance guest satisfaction.

**8. Repeated Guests:** The analysis reveals a significant dominance of "New booking" (96.09%) over "Repeated booking" (3.91%). Implementing loyalty programs and incentives for repeated bookings can be explored to increase guest retention.

**9. Special Requests During Booking:** A significant portion of bookings (54.6%) do not have any special requests, while 36.1% have only one special request, and a smaller percentage have multiple special requests. Hotel management can optimize resources and services based on the prevalence of special requests.

**10. Booking Cancellation Analysis:**
- High cancellation rates (63,371) compared to non-canceled bookings (24,024) suggest a relatively high booking cancellation rate.
- Potential revenue loss due to canceled bookings emphasizes the need to manage and reduce cancellations through strategies like flexible cancellation policies.

**11. Booking Changes Analysis:** The data shows that a substantial number of bookings (71,494) have no changes, indicating stability in initial reservations. Monitoring patterns of booking changes can help improve booking management processes.

**12. Booking Confirmation Time:** Guests at the "City Hotel" typically experience a longer waiting time (approximately 1.02 days) in the waiting list compared to guests at the "Resort Hotel" (approximately 0.32 days). This insight highlights the importance of efficient reservation handling and waitlist management, particularly for city hotels.

**13. Booking Cancelation Analysis by Deposit type:** Deposit types for guests at both City and Resort hotels are detailed:

- "No Deposit" is the most common deposit type in both hotel types, highlighting the importance of flexible booking options.
- "Non Refund" is also prevalent, indicating a subset of non-refundable bookings.
- "Refundable" deposits are less common but still present, giving guests flexibility in their booking choices. This information can inform deposit policies and marketing strategies.


These conclusions provide valuable insights for decision-making in areas such as marketing, pricing, service offerings, and customer relationship management within the hotel business.

### ***Hurrah! You have successfully completed your EDA Capstone Project !!!***