# **Project Name**    -



##### **Project Type**    - EDA
##### **Contribution**    - Individual
##### **Team Member 1 -**  Dharmesh J Gadhiya



# **Project Summary -**

The hotel booking dataset spans three years, from **(2015 ,2016 and 2017)** and includes data for two types of hotels: **City Hotel**  and **Resort hotel**. The dataset provides year, month, and day-wise booking information, focusing on guest stays, including weekend nights (Saturday-Sunday) and weeknights (Monday-Friday), along with the total number of family members (adults, children, babies) in the bookings.

The dataset also analyzes various market segments to determine the sources of bookings, such as direct bookings from customers, online travel agents, offline travel agents, and tour operators. It identifies the most frequent and new guests, examines which types of customers make the most bookings, and assesses the availability of parking spaces. Additionally, it tracks the total number of special requests made by guests, highlighting the most common types of requests.

The data was cleaned, manipulated, and analyzed, followed by visualizations to provide insights into these various aspects of hotel bookings.


# **GitHub Link -**

Provide your GitHub Link here.

# **Problem Statement**


**Write Problem Statement Here.**

> The hotel booking dataset contains detailed information about bookings
understand the patterns in hotel bookings, customer behavior, and cancellations.
Understand the cancellation rates by hotel type, lead time, arrival year/month, and other relevant features.
Check if guests tend to stay more on weekends or weekdays and whether this affects cancellations.
Segment customers based on demographics (adults, children, country) and booking behavior (repeated guests, previous cancellations, market segment) to identify patterns.
Analyze whether the pricing of the room (ADR – Average Daily Rate) has anything to do with cancellations or whether the guest is a first-time or repeat customer.

The objective is to perform hotel managers make better decisions about pricing, marketing Targeting, trends and improving guest experience. It also helps in reducing cancellations and increasing the number of successful bookings.





#### **Define Your Business Objective?**

The business objective of the analysis aims to provide valuable insights for hotels make smarter choices to increase bookings strategies, reduce cancellations, and enhance customer satisfaction.
The insights gained can drive targeted marketing, and reduce costs, contributing to long-term business growth.


# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required.
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits.
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 20 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule.

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





In [None]:
from google.colab import drive
drive.mount('/content/drive/')

# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Import Libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
warnings.filterwarnings('ignore')

### Dataset Loading

In [None]:
# Load Dataset
hotel_booking_df = pd.read_csv('/content/drive/MyDrive/Colab Notebooks/Module2Numerical_Programming_in_Python/Capstone project/Hotel Bookings.csv')

### Dataset First View

In [None]:
# Dataset First Look
hotel_booking_df.head() # Data view Top

In [None]:
hotel_booking_df.tail(5)# Data View Bottom

### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count
hotel_booking_df.shape
#Rows Count
print(f"Total number of rows are: {hotel_booking_df.shape[0]}")
#Columns Count
print(f"Total number of columns are: {hotel_booking_df.shape[1]}")

### Dataset Information

In [None]:
# Dataset Info
hotel_booking_df.info()

#### Duplicate Values

In [None]:
# Dataset Duplicate Value Count
duplicate_val_count = hotel_booking_df.duplicated().sum()
print(f"{duplicate_val_count}")

In [None]:
plt.figure(figsize=(10,5))
hotel_booking_df.duplicated().value_counts().plot(kind='bar')  # Remove hotel_df[] to plot directly
plt.title('Distribution of Duplicated Rows')  # Add a descriptive title (optional)
plt.xlabel('Is Duplicated')  # Add x-axis label
plt.ylabel('Count')  # Add y-axis label
plt.show()

#### Missing Values/Null Values

In [None]:
# Missing Values/Null Values Count
# Calculate missing values sum()
missing_data= hotel_booking_df.isnull().sum().loc[lambda x:x > 0].sort_values(ascending=False)
#Filters only columns with missing values greater than 0.
missing_data
# Filter columns with missing values greater than 0
# missing_data_filtered = missing_data[missing_data > 0].sort_values(ascending=False)
# missing_data_filtered

In [None]:
# Visualizing the missing values
# Set style
sns.set_style("darkgrid")
plt.figure(figsize=(12, 5))#figsize width hight
#create a linepot
sns.lineplot(x= missing_data.values,y= missing_data.index)
# Add titles and labels
plt.title('Top 4 Columns with Missing Values', fontsize=10, fontweight='bold')
plt.xlabel('Number of Missing Values', fontsize=8)
plt.ylabel('Column Names', fontsize=8)
# Add data labels ,in total missing values data show
for index, value in enumerate(missing_data.values):
   plt.text(value,index, f'{value}', color='black', va='center', fontsize=9)

# Show the plot
plt.tight_layout()
plt.show()


### What did you know about your dataset?

The dataset has three years of booking data (2015-2017) for Resort and City Hotels.
This data set has a total number of rows 119390 and columns 32.
This data frame total number data types 3:-(float64(4), int64(16), object(12)).This is Dataset Duplicate Value Count number 31994.and number of missing vaules 4 columns((company:-112593),(agent:16340),(country:488),
(children	4)).
It includes details like booking status, stay duration, guest counts (adults, children, babies), and revenue.
It also shows customer preferences, booking methods, and cancellations, helping analyze trends and improve hotel operations.




## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns
hotel_booking_df.columns

In [None]:
# Dataset Describe
hotel_booking_df.describe()

### Variables Description

Hotel :-
* H1 = Resort Hotel
* H2 = City Hotel

is_cancelled :-  If the booking was cancelled (1) or not(0)

lead_time :- Number of days that elapsed between the entering date of the booking into the PMS and the arrival date

arrival_date_year :-  Year of arrival date

arrival_date_month :- Month of arrival date

arrival_date_week_number :- Week number for arrival date

arrival_date_day_of_month :- Day of arrival date

stays_in_weekend_nights :- Number of weekend nights (Saturday or Sunday) the guest stayed or booked to stay at the hotel

stays_in_week_nights :- Number of week nights (Monday to Friday) the guest stayed or booked to stay at the hotel

adults :- Number of adults

children :- Number of children

babies :- Number of babies

meal :- Kind of meal opted for
*   BB :- Bed & Breakfast
*   FB :- Full Board (Beakfast, Lunch and Dinner)
*   FB :- Full Board (Beakfast, Lunch and Dinner)
*   HB :- Half Board (Breakfast and Dinner normally)
*   SC/Undefined :- no meal opted

country :-  Country name

market_segment :- Which segment the customer belongs to

Distribution_channel :- How the customer accessed the stay- corporate booking/Direct/TA.ΤΟ

is_repeated_guest :- Guest coming for first time or not(0  first time', '1 repeated guest)

previous_cancellation :- Was there a cancellation before

previous_bookings_not_cancelend :- count of previous booking sucessfully made by the customer

reserved_room_type :- Type of room reserved

assigned_room_type :- Type of room assigned

booking_changes :- Count of changes made to booking

deposit_type :- Deposit type

agent:- Booked through agent

Company :- Booked through Company

days_in_waiting_list :- Number of days in waiting list

customer_type:- Type of customer

Adr :- Average Daily Rate (revenue per available room per day).

required_car_parking :- If car parking is required

total_of_special_req :- Number of additional special requirements

reservation_status:- Reservation of status (Canceled, Check-Out or No-Show)

reservation_status_date :- Date of the specific status


### Check Unique Values for each variable.

In [None]:
# Check Unique Values for each variable.
for col in hotel_booking_df.columns:
  print(f'Unique values count of {col} :',hotel_booking_df[col].nunique())

In [None]:
hotel_booking_df['hotel'].unique()

In [None]:
hotel_booking_df['is_canceled'].unique()

In [None]:
hotel_booking_df['arrival_date_year'].unique()

In [None]:
hotel_booking_df['arrival_date_month'].unique()

In [None]:
hotel_booking_df['arrival_date_week_number'].unique()

In [None]:
hotel_booking_df['arrival_date_day_of_month'].unique()

In [None]:
hotel_booking_df['stays_in_weekend_nights'].unique()# (Saturday or Sunday)

In [None]:
hotel_booking_df['stays_in_week_nights'].unique()#(Monday to Friday)

In [None]:
hotel_booking_df['adults'].unique()

In [None]:
hotel_booking_df['children'].unique()

In [None]:
hotel_booking_df['babies'].unique()

In [None]:
hotel_booking_df['meal'].unique()

In [None]:
hotel_booking_df['country'].unique()

In [None]:
hotel_booking_df['market_segment'].unique()

In [None]:
hotel_booking_df['distribution_channel'].unique()

In [None]:
hotel_booking_df['is_repeated_guest'].unique()

In [None]:
hotel_booking_df['previous_cancellations'].unique()

In [None]:
hotel_booking_df['previous_bookings_not_canceled'].unique()

In [None]:
hotel_booking_df['reserved_room_type'].unique()

In [None]:
hotel_booking_df['assigned_room_type'].unique()

In [None]:
hotel_booking_df['booking_changes'].unique()

In [None]:
hotel_booking_df['deposit_type'].unique()

In [None]:
hotel_booking_df['agent'].unique()

In [None]:
hotel_booking_df['company'].unique()

In [None]:
hotel_booking_df['days_in_waiting_list'].unique()

In [None]:
hotel_booking_df['customer_type'].unique()

In [None]:
hotel_booking_df['adr'].unique()

In [None]:
hotel_booking_df['required_car_parking_spaces'].unique()

In [None]:
hotel_booking_df['total_of_special_requests'].unique()

In [None]:
hotel_booking_df['reservation_status'].unique()

In [None]:
hotel_booking_df['reservation_status_date'].unique()

## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:
# main data set in copy data Farme
hotel_df = hotel_booking_df.copy()

In [None]:
# Write your code to make your dataset analysis ready.
# Replacing null values of company, agent and children columns with 0 and country column with 'others'
hotel_df[['company','agent','children']] = hotel_df[['company','agent','children']].fillna(0)
hotel_df[['country']] = hotel_df[['country']].fillna('others')

In [None]:
hotel_df.isnull().sum().sort_values(ascending=False)

In [None]:
#compnay , agent and  children change data type (folat to int)
hotel_df[['company','agent','children']] = hotel_df[['company','agent','children']].astype('int64')

In [None]:
# data datetime format convert
hotel_df['reservation_status_date']= pd.to_datetime(hotel_df['reservation_status_date'],format='%Y-%m-%d')

In [None]:
#check num of  duplicate values  in the data
hotel_df.duplicated().sum()

In [None]:
hotel_df.drop_duplicates(inplace=True)

In [None]:
## Add a new columns in dataset
# sum of total of stays_in_week_nights(Monday to Friday) and stays_in_weekend_nights(Saturday and Sunday)
hotel_df['total_stays_hotel'] = hotel_df['stays_in_week_nights'] + hotel_df['stays_in_weekend_nights']

In [None]:
# new columms total_gusests
hotel_df['total_guests'] = hotel_df['adults'] + hotel_df['children'] + hotel_df['babies']

In [None]:
hotel_df.shape

In [None]:
hotel_df.info()

### What all manipulations have you done and insights you found?

This dataset in insights found missing values in columns like children, country, agent, and company.
This is missing values columns in **agent, company, and children replaced with 0** and country in miss values replaced to **others**.
the changes of datatype reservation_status_date from object to **datetime** and agent and company were converted to an **integer**.
Create two new columns and add total_total_stays_hotel and total_guetes.
This dataset also contained duplicate values, so duplicate values were dropped.


## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

#### Chart - 1

In [None]:
# Chart - 1 visualization code
# Which hotel type has more bookings ?
# Plot a pie chart
hotel_counts = hotel_df['hotel'].value_counts()
plt.figure(figsize=(8, 5))
# this dataset in number of total hotel in values count percentage showing data
plt.pie(hotel_df['hotel'].value_counts().values, labels=hotel_counts.index ,explode=(0,0.04),shadow=True, autopct='%1.1f%%',startangle=90)
plt.title('Distribution of Hotel Types', fontsize=16)
plt.show()
print(hotel_counts)

##### 1. Why did you pick the specific chart?

I am pick a pie chart to visualize **Which Hotel type has more bookings.**
This chart use category percentage values show.

##### 2. What is/are the insight(s) found from the chart?

The chart shows that **City Hotel** is the most preferred, with **61.1%(53428)** of the bookings, while **Resort Hotel** has only **38.9%(33968)**.City Hotel attracts more guests compared to Resort Hotel.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

The insights show that City Hotels are doing well and attracting more guests by providing additional services, which helps increase their revenue. However, Resort Hotels have fewer bookings, indicating that guests are less interested in them compared to City Hotels.
 To improve,Resort Hotels need to upgrade their services, market more effectively, and learn from the success of City Hotels to avoid further decline and attract more guests.

#### Chart - 2

In [None]:
# Chart - 2 visualization code
# What is a year wise cancelaltion by hotel types?     #bivariate analysis
# Filter cancelled bookings
hotel_canceled_df =hotel_df[hotel_df['is_canceled'] == 1 ]# this is_canceled column in canceled data filter by city and resort hotel
plt.figure(figsize=(12, 6))
plt.subplot(1, 2, 1)
sns.set_style('whitegrid')
sns.countplot(x=hotel_canceled_df['arrival_date_year'],hue=hotel_canceled_df['hotel'])
plt.legend()
plt.title('Year-wise Cancellations by Hotel Type',fontsize=14)
plt.xlabel('year',fontsize=14)
plt.ylabel('No. of bookings cancelaltion',fontsize=14)
plt.subplot(1, 2, 2)
#using pie chart to find out %age of Booking cancelaltion over 3 years
plt.pie(x=hotel_canceled_df.hotel.value_counts(),labels=['City hotel','Resort hotel'],startangle=90,autopct="%0.1f%%",shadow=True,textprops={'fontsize':12},colors=sns.color_palette('pastel'))
plt.legend(bbox_to_anchor=(1,1))
plt.title('Hotel Booking cancelaltion')
plt.subplots_adjust(right=1.0)
plt.tight_layout()
plt.show()

##### 1. Why did you pick the specific chart?


I am pick to countplot chart visualization to **What is a year wise cancelaltion by hotel types.?**
This makes it easy to year wise compare the cancellation status for City Hotels and Resort Hotels visually, highlighting differences and patterns clearly.


##### 2. What is/are the insight(s) found from the chart?

This is visualization chart in insight City Hotels have a higher number of cancellations compared to Resort Hotels.
This is the insight of 3 year **city hotels overall 66.8% and resort hotels 33.2% cancellations compared** to data.This is a year of 2016 with the most No of bookings cancellations by City Hotel compared to Resort Hotel.



##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

This is insights a negative growth a City hotels have a higher cancellations compared to Resort hotels.
High cancellations show They can fix this by offering better prices, flexible booking options, or improving guest experiences.These changes will reduce cancellations and help increase revenue.
Promoting their reliability can attract more bookings and grow their business.

#### Chart - 3

In [None]:
# Chart - 3 visualization code
# Which booking by hotel is the Top 10 country?

# Group by hotel type and country,count the number of bookings
hotel_country_counts = (
    hotel_df.groupby(['hotel', 'country']).size()
    .reset_index(name='TotalBookings').sort_values(by='TotalBookings',ascending=False))#this Totalbookings new colums count data hotel

# Get the top 10 countries with the most bookings for each hotel type
top_countries = (hotel_country_counts.groupby('country')['TotalBookings'].sum()
    .nlargest(10)# this filter to largest vaules data show
    .index)

country_filtered_df = hotel_country_counts[hotel_country_counts['country'].isin(top_countries)]

In [None]:
city_df = hotel_df[(hotel_df['hotel'] == 'City Hotel')]
hotel_country_group= city_df.groupby('country').size().reset_index(name='CityTotalBookings')
hotel_country_group.sort_values(by='CityTotalBookings',ascending=False).head(10)

In [None]:
resort_df = hotel_df[(hotel_df['hotel'] == 'Resort Hotel')]
hotel_country_group_r= resort_df.groupby('country').size().reset_index(name='ResortTotalBookings')
hotel_country_group_r.sort_values(by='ResortTotalBookings',ascending=False).head(10)

In [None]:
# Visualization
plt.figure(figsize=(12, 6))
sns.set_style('whitegrid')

sns.barplot(
    data=country_filtered_df,
    x='country',
    y='TotalBookings',
    hue='hotel',
    palette='viridis'
)
# Chart Details
plt.title('Top 10 Countries by Hotel Type Bookings', fontsize=14)
plt.xlabel('Country', fontsize=10)
plt.ylabel('Number of Bookings', fontsize=10)
plt.xticks(rotation=45)
plt.legend(title='Hotel Type')
plt.tight_layout()
plt.show()


##### 1. Why did you pick the specific chart?

I am pick to barplot chart to visualization **Which booking by hotel is the Top 10 country?**.This chart no.of bookings for each country side by side, making it easy to compare values.

##### 2. What is/are the insight(s) found from the chart?

This is a insight found form the chart Highlight which booking by hotel type is more popular in the top countries.this is observation of a City Hotels are the most booked in countries like Portugal and United Kingdom.
Resort Hotels are more popular in Portugal and Spain.
Germany and France have a balanced Normal booking of both types of hotels.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

This is insight in positive business impact to Portugal and United Kingdom Perfect for expanding Resort Hotels and City Hotels United Kingdom, France, and Germany are key markets for urban accommodations, ideal for business-focused promotions.
City Hotels and Resort Hotels should focus on countries where they are most popular and improve in countries where they are not performing well. By using better marketing, competitive pricing, and enhancing guest experiences based on local preferences, they can grow their business and avoid losing potential customers.

#### Chart - 4

In [None]:
# Chart - 4 visualization code
# Which most profitable year-wise for individually by hotel ?

# Add a column for total revenue (ADR * total_guests * total_stays_hotel)
hotel_df['total_revenue'] = hotel_df['adr'] * hotel_df['total_guests'] * hotel_df['total_stays_hotel']
#  Group by hotel type and year, summing the total revenue
revenue_by_year = hotel_df.groupby(['hotel', 'arrival_date_year'])['total_revenue'].sum().reset_index()
# most profitable year for each hotel
most_profitable_year = revenue_by_year.loc[revenue_by_year.groupby('hotel')['total_revenue'].idxmax()]
# Group revenue for a specific hotel
city_hotel_revenue = revenue_by_year[revenue_by_year['hotel'] == 'City Hotel']
resort_hotel_revenue = revenue_by_year[revenue_by_year['hotel'] == 'Resort Hotel']

# Plot pie charts
fig, axes = plt.subplots(1, 2, figsize=(12, 4))

axes[0].pie(city_hotel_revenue['total_revenue'], labels=city_hotel_revenue['arrival_date_year'],
            autopct='%1.1f%%', startangle=140, colors=sns.color_palette('pastel'))
axes[0].set_title('City Hotel Revenue Distribution by Year',fontsize=10)

axes[1].pie(resort_hotel_revenue['total_revenue'], labels=resort_hotel_revenue['arrival_date_year'],
            autopct='%1.1f%%', startangle=140, colors=sns.color_palette('pastel'))
axes[1].set_title('Resort Hotel Revenue Distribution by Year',fontsize=10)

plt.tight_layout()
plt.show()
print(most_profitable_year)

##### 1. Why did you pick the specific chart?

I am pick Pie Chart to **Which most profitable year-wise for individually by hotel. ?** This chart use category percentage values show.

##### 2. What is/are the insight(s) found from the chart?

This is a insight to The pie chart clearly shows how much revenue each year contributed to the total income of each hotel type.
For City Hotels, the **year 2016** was the most profitable, contributing **47.8%** of the revenue.and In comparison, the year **2017** contributed less, accounting for **43.9%** of the total revenue.**Resort hotel** to chart to visualization a most profitable **year of the 2017(41.2%)** total revenue.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

The insight positive business impact to resort hotel 2017 years revenue increase comparison, the years of 2015 and 2016 had a good impact.City Hotel saw a decline in revenue in 2015, but it increased in 2016. However, 2017 revenue was low compared to 2016. This observation indicates that City Hotel negative growth compared to Resort Hotel is very low.To attract more guests, targeted campaigns and seasonal discounts can be based on years with higher revenue.If Month has more bookings, hire more staff and create special offers to handle the demand and improve results.

#### Chart - 5

In [None]:
# Chart - 5 visualization code
# Which type of market segment and distribution channels ?
plt.figure(figsize=(12, 6))
plt.subplot(1, 2, 1)
sns.countplot(data=hotel_df,x='market_segment',palette='viridis')
plt.title('Types of market segment',fontweight="bold", size=16)
plt.xlabel('Market Segment', fontsize=14)
plt.ylabel('No.of Booking', fontsize=14)
plt.xticks(rotation=45, ha='right')  # Rotate x-axis labels for better readability
plt.subplot(1, 2, 2)
# Create a horizontal bar chart
sns.countplot(data=hotel_df, x='distribution_channel', palette='viridis')
# Add labels and title
plt.title('Distribution Channels by Booking Count', fontweight="bold", size=16)
plt.xlabel('No. of Bookings', fontsize=14)
plt.ylabel('Distribution Channel', fontsize=14)
plt.xticks(rotation=45, ha='right')
plt.tight_layout()
plt.show()

plt.subplots_adjust(right=1.6)
plt.show()

In [None]:
hotel_df['distribution_channel'].value_counts(normalize=True)

##### 1. Why did you pick the specific chart?

I am pick to the Countplot Chart booking distribution **Which type of market segment and distribution channels ?**

##### 2. What is/are the insight(s) found from the chart?

The insights from the Chart Market Segment and Distribution Channel in most booking Online travel/offline agencies like Website a (Booking.com,EasyMakeMyTrip.com)and other etc.more booking wesite.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Online Travel/Offline Agencies (like online WebSite ) are the most used distribution channels, so hotels should focus on strengthening these partnerships.
market segment and distribution channel can provide insights into guest preferences and help optimize marketing and operational strategies.

#### Chart - 6

This selects the lead_time column from the grouped DataFrame. The lead_time column represents the number of days between when the booking was made and the expected arrival date.

In [None]:
# Chart - 6 visualization code
# Which Hotel Type (Resort or City) Has a Longer Lead Time for Each Month?
month_list = ['January', 'February', 'March', 'April', 'May', 'June', 'July', 'August', 'September', 'October', 'November', 'December']
# Group by 'arrival_date_month' and calculate the average 'lead_time'
lead_time_by_month = hotel_df.groupby(['arrival_date_month','hotel'])['lead_time'].mean().reset_index()
# Sort the months in order (optional)
lead_time_by_month['arrival_date_month'] = pd.Categorical(lead_time_by_month['arrival_date_month'], categories= month_list, ordered=True)
lead_time_by_month = lead_time_by_month.sort_values('arrival_date_month').reset_index(drop=True)

plt.figure(figsize=(15, 6))
sns.barplot(data=lead_time_by_month, x='arrival_date_month', y='lead_time',hue='hotel', palette='viridis')
# Customize the plot
plt.title('Lead Time by Month for Different Hotel Types')
plt.xlabel('Arrival Month',fontsize=14)
plt.ylabel('Average Lead Time (days)', fontsize=14)
plt.xticks(rotation=45)  # Rotate month labels for better readability

plt.tight_layout()
plt.show()
#print(lead_time_by_month)

##### 1. Why did you pick the specific chart?

I am pick to Barplot this chart in visually to **Which Hotel Type (Resort or City) Has a Longer Lead Time for Each Month?** This is compared to city and resort hotels arrival each month.

##### 2. What is/are the insight(s) found from the chart?

The visual insight in City hotels lead time is high in
**July and August** compared to Resort Hotels.The visual insight in seasonal planning tends to pattern this.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

This has a positive business impact on the Resort hotel longer lead time and advances booking planning trends. this negative impact on city hotel short lead time so more offers and discounts to the promoting.balanced approach considering both advance bookings and last-minute offers is essential for sustainable growth.

#### Chart - 7

In [None]:
# Chart - 7 visualization code
# What is a most preferred length of stay by the guests

stay = hotel_df[hotel_df['total_stays_hotel'] < 12]
plt.figure(figsize=(12, 6))
sns.countplot(x='total_stays_hotel', hue='hotel', data=stay, palette='viridis')
plt.title('Preferred Length of Stay by Hotel Type', fontsize=14)
plt.xlabel('Total Length of Stay', fontsize=10)
plt.ylabel('Number of Bookings', fontsize=10)
plt.xticks(rotation=45)
plt.legend(title='Hotel Type')
plt.tight_layout()
plt.show()


##### 1. Why did you pick the specific chart?

I am pick to countplot chart a **most preferred length of stay by the guests, you can analyze the total nights stayed**, which is the sum of weekend nights and week nights for each booking.

##### 2. What is/are the insight(s) found from the chart?

The insight to sum of total stays both hotel guests prefer short stays (1-2 nights) or longer stays(3-4 days).
* City  hotel maximum (3-4 days) stays .
* Resort hotel  maximum (1-2 dys ) stays.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

This is the total number of bookings and the average stays in both hotels for 2-3 days of stay guests. This is an insight into seasonal trends offers, pricing adjustments, and marketing campaigns that can be tailored to capitalize on holiday trends. Offer weekend packages or discounts for short stays to attract more guests.

#### Chart - 8

In [None]:
# Chart - 8 visualization code
# Which Hotels has the Percentage of repeated guests?
# repeated guests vs first-time guests?
plt.figure(figsize=(7, 7))
plt.pie(x=hotel_df['is_repeated_guest'].value_counts(),labels=['0 New guests', '1 Repeated guests'],
        startangle=360,autopct="%0.1f%%",shadow=True,textprops={'fontsize':12},colors=sns.color_palette('pastel'))
plt.legend(bbox_to_anchor=(1,1))
plt.title('Percentage of repeated guests')
plt.tight_layout()
plt.show()


##### 1. Why did you pick the specific chart?

I am pick to Pie chart Percentage of repeated VS new guests.**Which Hotels has the Percentage of repeated guests?**

##### 2. What is/are the insight(s) found from the chart?

This ia a chart the insight Percentage of repeated guests and new gutes bookings.
This is **City and Resort Hotels 3.9% a repeated gutes and 96.1% new gutes** hotels bookings

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

This is insights impact to first-time guests can be targeted with welcome offers or follow-up campaigns to convert them into repeat customers.This is negative impact may indicate customer dissatisfaction.Improve services based on feedback.

#### Chart - 9

In [None]:
# Chart - 9 visualization code
# What is a month in room price high each hotel type?
month_list = ['January', 'February', 'March', 'April', 'May', 'June', 'July', 'August', 'September', 'October', 'November', 'December']
# Group by 'arrival_date_month' and calculate the average 'adr'
adr_time_by_month = hotel_df.groupby(['arrival_date_month','hotel'])['adr'].mean().reset_index()
# Sort the months in order (optional)
adr_time_by_month['arrival_date_month'] = pd.Categorical(adr_time_by_month['arrival_date_month'], categories= month_list, ordered=True)
adr_time_by_month = adr_time_by_month.sort_values('arrival_date_month').reset_index(drop=True)
adr_time_by_month
plt.figure(figsize=(14, 6))
sns.lineplot(data=adr_time_by_month, x='arrival_date_month', y='adr',hue='hotel', palette='viridis')
# # Customize the plot
plt.title('Average Daily Rate (ADR) of the Month Each hotel')
plt.xlabel('Month',fontsize=14)
plt.ylabel('Average Daily Rate (ADR)', fontsize=14)
plt.xticks(rotation=45)  # Rotate month labels for better readability

plt.tight_layout()
plt.show()


##### 1. Why did you pick the specific chart?

I am Pick chart to lineplot **What is a month in room price high each hotel type?** this chart is a month wise trend highest average rates each hotels.

##### 2. What is/are the insight(s) found from the chart?

This is insight to seasonal trends, resort hotels generally see their highest Average Daily Rate (ADR) during the summer months of June, July, and August, while city hotels trends to have their mid-range ADR during the spring period of April, May, and June.


##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

 The insights gained from the analysis can significantly business impact hotels in highest ADR months seasonal trends pricing strategies can maximize revenue during peak seasons.ADR trends helps optimize pricing strategies and boost profitability while staying competitive in the market.You may observe that certain months (e.g., holidays, summer) have higher room rates due to increased demand.

#### Chart - 10

In [None]:
# Chart - 10 visualization code
# Which meal plan is the most preferred by guests?
plt.figure(figsize=(12, 6))
plt.subplot(1, 2, 1)
sns.countplot(x='meal', data=hotel_df,hue='hotel', palette='viridis')
plt.title('Meal Plan Distribution', fontsize=14)
plt.xlabel('Meal Plan', fontsize=10)
plt.ylabel('Count', fontsize=10)
plt.xticks(rotation=45)
plt.subplot(1, 2, 2)
plt.pie(x=hotel_df['meal'].value_counts(),labels=hotel_df['meal'].value_counts().index,
        startangle=360,autopct="%0.1f%%",shadow=True,textprops={'fontsize':12},colors=sns.color_palette('pastel'))
plt.legend(bbox_to_anchor=(1,1))
plt.title('Percentage of Meal Plans')
plt.subplots_adjust(right=1.0)
plt.tight_layout()
plt.show()

##### 1. Why did you pick the specific chart?

I am pick to countplot and Pie chart.This chart a count to hotels wise meal plan distribution.**Which meal plan is the most preferred by guests?**

meal :- Kind of meal opted for
*   BB :- Bed & Breakfast
*   FB :- Full Board (Beakfast, Lunch and Dinner)
*   FB :- Full Board (Beakfast, Lunch and Dinner)
*   HB :- Half Board (Breakfast and Dinner normally)
*   SC/Undefined :- no meal opted

##### 2. What is/are the insight(s) found from the chart?

This is a chart insight mostly gutes meal paln **City and Resort Hotels 78% BB(Bed&Breakfast) plan choose.**


##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

The insights creating a positive business impact mostly gutes hotel meal plan BB(Beb&Breakfast) Choose.this is insights gutes meal preferences to design custom packages for long-stay or frequent travelers.improving satisfaction and loyalty.

#### Chart - 11

In [None]:
# Chart - 11 visualization code
# Which is the most count of types of guests?
customer_counts = hotel_df.groupby(['hotel', 'customer_type']).size().reset_index(name='count')
plt.figure(figsize=(10, 6))
sns.countplot(x='customer_type', data=hotel_df, palette='viridis')
plt.title('Customer Type Distribution', fontsize=14)
plt.xlabel('Customer Type', fontsize=10)
plt.ylabel('Count', fontsize=10)
#plt.xticks(rotation=45)
plt.tight_layout()
plt.show()

In [None]:
hotel_df['customer_type'].value_counts(normalize=True)

##### 1. Why did you pick the specific chart?

I am pick to Countplot chart.**Which is the most count of types of guests**.
This chart in which category vaules counts.

* Transient: Guests who book short stays, often one night.
* Contract: Guests booking through a contract, often for work-related stays.
* Group: Guests who are part of group bookings.
* Transient-Party: Transient guests traveling with others but not in a formal group.

##### 2. What is/are the insight(s) found from the chart?

The insight most common guest type is likely Transient due to vacation booking short stays.


##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

This is insights business impact to each hotels stays short day booking.both type hotel in tranisient mostly booking.Promote group offers and vacation packages for Resort Hotels and corporate deals for City Hotels.

#### Chart - 12

In [None]:
# Chart - 12 visualization code
# What is the most room type by hotel booking guests?
room_type_booking = hotel_df.groupby(['hotel', 'assigned_room_type'])['assigned_room_type'].count().reset_index(name='count')
plt.figure(figsize=(10, 6))
sns.barplot(x='assigned_room_type', y='count', hue='hotel', data= room_type_booking, palette='Set2')

plt.title('Most Common Room Types Assigned', fontsize=14)
plt.xlabel('Room Type', fontsize=12)
plt.ylabel('Number of Bookings', fontsize=12)
plt.xticks(rotation=45)
plt.show()

##### 1. Why did you pick the specific chart?

I am pick to barplot chart **What is the most room type by hotel booking guests?** this is chart category data compares show

##### 2. What is/are the insight(s) found from the chart?

This chart insight a City & Resort hotels in room types **A&D** most common bookings.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

This is insights a positive imapact highest bookings represents the most preferred room **A&D** by customers. certain room types are in high demand, the hotel can increase their availability during peak seasons, leading to higher occupancy and revenue.

#### Chart - 13

In [None]:
hotel_df['deposit_type'].value_counts(normalize=True)

In [None]:
# Chart - 13 visualization code
# What is distribution of Deposite type ?

plt.figure(figsize=(7, 7))
sns.countplot(x='deposit_type', data=hotel_df,hue='hotel', palette='viridis')
plt.legend(bbox_to_anchor=(1, 1))
plt.title('Deposit Type')
plt.xlabel('Deposit Type')
plt.ylabel('No.of Booking')
plt.tight_layout()
plt.show()


##### 1. Why did you pick the specific chart?

This is pick countplot chart.**What is distribution of deposite type ?** This chart in count of each deposit type.

Deposit Type:
* No Deposit: Likely the most common, as it offers flexibility to guests.
* Non-Refundable: Indicates bookings with a guaranteed commitment.
* Refundable: A smaller portion, as flexible policies are less common.

##### 2. What is/are the insight(s) found from the chart?

The insight chart a City and Resort hotels likely most common **87%** **No Deposit** type choice.Only a **1%** of bookings are made with a refundable deposit type choice.


##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

The insight a positive impact a Hotel booking most common No Deposit Types Choice.This is an effective strategy to gain first-time guests and increase booking volumes.

#### Chart - 14 - Correlation Heatmap

In [None]:
hotel_df.select_dtypes(include=np.number).info()

In [None]:
# Correlation Heatmap visualization code
# Select numerical columns from your dataset
numerical_cols = ['is_canceled','lead_time','arrival_date_year','arrival_date_week_number','arrival_date_day_of_month','adults','children',
 'babies','is_repeated_guest','previous_cancellations','previous_bookings_not_canceled','booking_changes','days_in_waiting_list','adr',
 'required_car_parking_spaces','total_of_special_requests','total_stays_hotel','total_revenue']
# Adding a numeric version of 'hotel' to the dataset for analysis
hotel_numeric = pd.factorize(hotel_df['hotel'])[0]
hotel_df['hotel_numeric'] = hotel_numeric
numerical_cols.append('hotel_numeric')  # Include the numeric version of 'hotel'
# Calculate correlation matrix
correlation_matrix = hotel_df[numerical_cols].corr()
plt.figure(figsize=(16, 10))
sns.heatmap(correlation_matrix, annot=True,fmt=".2f",cmap='coolwarm',square=True,cbar=True)
plt.title("Correlation Heatmap of Numerical Features", fontsize=14)
plt.tight_layout()
plt.show()

#The heatmap uses a color scale, colors (blue) representing negative correlations and
#warmer colors (red) representing positive correlations.
#Yellow/white represents low or no correlation.

##### 1. Why did you pick the specific chart?

I am pick chart is Heatmap.This chart use to correlation heatmap visualization for numerical Dataset vaules.

1. Understand Correlation Values:
The correlation matrix in the heatmap shows how different numerical features relate to each other. The correlation coefficient values range from -1 to 1:

 * +1 : Perfect positive correlation (both variables increase together).
 *  0 :  No correlation (variables are unrelated).
 * -1 : Perfect negative correlation (one variable increases while the other decreases).

*The heatmap uses a color scale,* **colors (blue) representing negative correlations and warmer colors (red) representing positive correlations. Yellow/white represents low or no correlation**.

##### 2. What is/are the insight(s) found from the chart?

* strong positive correlation between lead time and average daity rate are associated with high positive correlation.
* is_canceled between average daity rate Negative correlation.it means canceled bookings tend to have lower room prices.
* total_stays_hotel and total_revenue are highly positively correlated, this suggests that the more guests stay,the higher the revenue. Hotels can focus on increasing the number of stays to boost revenue.

#### Chart - 15 - Pair Plot

In [None]:
# Pair Plot visualization code

# Select a subset of numerical columns (replace these with the columns from your dataset)
numerical_cols = ['lead_time', 'adr', 'stays_in_weekend_nights', 'stays_in_week_nights', 'adults', 'children', 'hotel'] # Include 'hotel' in the list
# Create a pair plot
#Since hotel is a categorical feature we will convert it into numeric type to make pairplot
hotel_numeric = pd.factorize(hotel_df['hotel'])[0]
hotel_df['hotel_numeric'] = hotel_numeric
# Include 'hotel_numeric' in the list of columns to be used in pairplot
# Hotel_type :-
# resort hotel 0
# city hotel 1
numerical_cols.append('hotel_numeric')  # Adding 'hotel_numeric' to the list
sns.pairplot(hotel_df[numerical_cols], diag_kind='kde', hue='hotel_numeric')
plt.title("Pair Plot of Numerical Features by Hotel Type")
plt.tight_layout()
plt.show()



##### 1. Why did you pick the specific chart?

Pair Plot using Seaborn to visualize relationships between different numerical variables in your dataset,compare these relationships for different hotel types.

##### 2. What is/are the insight(s) found from the chart?

City Hotels may cluster around higher values than Resort Hotels.numerical features, such as lead_time and adr, are positively or negatively correlated.

## **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ?
Explain Briefly.

 * City Hotels have a much higher cancellation rate years,which can impact revenue.can use cancellation policies,flexible booking options,or offer discounts to keep bookings.
 * Improve marketing and services in countries with fewer bookings to attract more guests.
 * Holiday packages for peak seasons, like extended stay discounts.
 * Optimize their website and offer exclusive discounts for customers who book directly, ensuring a more profitable margin by cutting out agencies fees.
 *The shorter lead times may indicate last-minute bookings, which can be addressed by introducing more offers and discounts to encourage earlier bookings and promoting last-minute easyli booking service improving .
 *More gutes not a repeted,so gutes dissatisfaction.Improving guest services, collecting feedback improve service.
 * Improving breakfast quality and promoting on HB and FB plans to packages can attract more guests.
 * Improve Offering limited-time promotions for Non-Refundable or Refundable deposits might balance flexibility and commitment, enhancing customer satisfaction and revenue.

# **Conclusion**

* City hotels **(61.1%)** always have high demand as compared to **resort hotels(38.9%)**.
City hotels are the most preferred hotel type by the guests.
High cancellations show that hotels need to understand why guests are canceling.
* **66.8%** City Hotels have a much higher cancellation rate compared to Resort Hotels **(33.2%)** over three years.
  * The year 2016 saw the most cancellations, especially for City Hotels.
* City Hotels 2016 year is most profitable 47.8% of revenue.
* Resort Hotels 2017 year is most profitable 41.2% of revenue.
* Majority of the gutes are from Portugal conutry.So the bookings are mostly with European countries, Highest is Portugal with 48.59k bookings.
   * City Hotels are most booked in Portugal and the United Kingdom.
   * Resort Hotels are more popular in Portugal and Spain.
* Market Segment and Distribution Channel in most booking Online travel/offline agencies like Website 79%
* 79.1 %  most booking Online travel/offline agencies like Website
* Resort hotel lead time is high.This Month a high lead(May,June and september).
* Number of bookings the average stays in both hotels for 2-3 days of stay guests.
* 3.9% repeted guests.and 96.1% were new guests.
* Resort hotels a highest Average Daily Rate (ADR) during the summer months of June, July, and August,while city hotels mid-range ADR during the spring period of April, May, and June.
* 77.8% most preferred by guest plan choose BB(bed & breakfast)
* 82% transient	type booking short stays guests.
* City & Resort hotels in room types **A&D** most common bookings.
* 87% most common No Deposit type choice.


In [None]:
hotel_df_filtered = hotel_df.to_csv('final_hotel_booking.csv')

### ***Hurrah! You have successfully completed your EDA Capstone Project !!!***