<a href="https://colab.research.google.com/github/CodeWithRom/Product-dissection/blob/main/hotel_booking_analysis.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Project Name**    - Hotel Booking Analysis



##### **Project Type**    - Exploratory Data Analysis
##### **Contribution**    - Individual

# **Problem Statement**


BUSINESS PROBLEM OVERVIEW AND OBJECTIVE

Hotel Booking Analysis is extremely important which focuses on exploring and analyzing booking information for both city and resort hotel. For any hotel businesses as it recognizes the clients who are likely to look for their best accommodation possible using the hotel service.

The hotel booking analysis project In the hotel industry, its important to enhance the Customer Experience and Understanding the factors such as the length of stay, special requests, and preferences of guests can contribute to a better overall experience. Improving Marketing Efforts,Forecasting Demand-By examining historical booking data, hotels can forecast future demand more accurately. This enables better resource planning, ensuring that hotels are adequately staffed and prepared to meet the needs of guests during peak periods

Optimizing Inventory Management-this involves adjusting room availability based on historical booking trends, minimizing overbooking or underutilization of rooms, and maximizing revenue per available room, Identifying Seasonal Trends,Enhancing Operational Efficiency-checkin,checkout ratios per day,Reducing Operational Costs-staff level adjustment based on the analysis.

Analyzing data can help identify potential risks and challenges, allowing hotels to implement proactive measures.This includes preparing for high-demand events, managing cancellations effectively, and mitigating the impact of unforeseen circumstances. Hotels that leverage data analytics for booking analysis gain a competitive advantage. They can stay ahead of market trends, adapt quickly to changes, and continuously improve their services based on customer feedback and preferences.

#**Data Summary and documentation**

List of items available in the Dataset as columns -

1.   hotel - type of hotel(H1 Resort Hotel /H2 City Hotel)
2.   is_cancelled - If the hotel booking was cancelled (1) or not (0)
3.   lead_time - Number of days between booking date and arrival date
4.   arrival_date_year - year of arrival date
5.   arrival_date_month- month of arrival date
6.   arrival_date_week_number - week number for arrival date
7.   arrival_date_day  - Day of arrival date
8.   stays_in_weekend_nights - Number of weekend nights(saturday/sunday) the guest stayed or booked to stay at the hotel
9.   stays_in_week_nights: Number of weekday nights stayed
10.  adults: Number of adults
11.  children: Number of children
12.  babies: Number of babies
13.  meal: Type of meal booked
14.  country: Country of origin
15.  market_segment - Which segment the customer belongs to (Online travel      Agenecy / Direct Walk in / Corporate / Offline TA.TO)
16.  distribution_channel - how customer acccessed the stay (corporate / online Travel Agency /Direct walk in / travel Agency,Travel Office)
17.  is_repeated_guest - guest coming for the first time or not
18.  previous_cancellations- was there a cancellation before
19.  previous_bookings_not_canceled - count of previous booking
20.  reserved_room_type - type of room booked
20.  assigned_room_type -type of room assigned
21.  booking_changes - Count of changes made to booking
22. deposit_type: Type of deposit made for the booking
23. agent: ID of the travel agency
25. company: ID of the company making the booking
26. days_in_waiting_list: Number of days the booking was on the waiting list
customer_type: Type of booking customer (e.g., transient, contract, group)
27. adr: Average daily rate (i.e., average room revenue per paid occupied room)
28. required_car_parking_spaces - If car parking is required
29. total_of_special_requests - Number of additional special requirement
30. reservation - reservation of status
31. reservation_status_date- date of reservation



# **GitHub Link -**

https://github.com/CodeWithRom/CodeWithRom/blob/46ea6f1d97811438d8ec5ccdb53bea57ef5a592e/hotel_booking_analysis1.ipynb

# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required.
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits.
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 20 logical & meaningful charts having important insights

# ***Let's Begin !***

## ***1. Lets Uderstand the Data***

### Import Libraries

In [None]:
# Import Libraries
import numpy as np
import pandas as pd

import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline


#Connecting Google Drive with the Colab
from google.colab import drive
drive.mount('/content/drive')

from google.colab import files #importing files in colab from pc
uploaded = files.upload()

### Dataset Loading

In [None]:
#Load Dataset
# Read the CSV file into a DataFrame
data = pd.read_csv('Hotel Bookings.csv')
# Display the dataset
df=pd.DataFrame(data)
df

In [None]:
df.columns

In [None]:
# Dataset First
df.head()

In [None]:
df.tail()

### Dataset Information

In [None]:
# Dataset Info
df.info()

In [None]:
df.describe() #statistical description of data

#### Missing Values/Null Values




In [None]:
# Missing Values/Null Values Count
print(df.isnull().sum())

In [None]:
# Replace the null values of “children” column by 0.
df['children'].fillna(0, inplace = True)

# Replace the null values in the “agent” column by mean of that column.
df['agent'].fillna(df['agent'].mean(), inplace = True)

# Drop the column "company" as it has more than 94% data missing
df = df.drop(['company'],axis=1)

# Country has 452 rows or 0.41% data missing which is negligible, hence we will remove these data.
df = df.dropna(axis=0)

# Ensure that all the null values get replace and there is no null values in any column
df.isnull().sum()

### Check Unique Values for each variable.

In [None]:
# Check Unique Values for each variable.
for i in df.columns.tolist():
  print("No. of unique values in",i,"is",df[i].nunique())

## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

In [None]:
#define a fucntion to get the total number of customers booking from each market segment for the given country
def funtion_for_country(country):
  #now filter the data of country
  filter=df[df['country']==country]

  #now count the filtered data from each market segment
  total_count=filter['market_segment'].value_counts()

  #now lets get the output
  output=f"Booking breakdown for {country}:\n"
  for segment,count in total_count.items():
    output+= f"{segment} : {count} customers\n"
  return output

country_in=input('Enter Country :')
output_result=funtion_for_country(country_in)
print(output_result)

#### Chart - 1

In [None]:
# Write your code to make your dataset analysis ready.
# Chart - 1 visualization code
# Calculate correlation matrix
numeric_columns = df.select_dtypes(include='number')
correlation_matrix = numeric_columns.corr()
#correlation_matrix = df.corr()
plt.figure(figsize=(12, 8))
sns.heatmap(correlation_matrix, annot=True, cmap='Reds', fmt='.2f')
plt.title('Correlation Matrix')
plt.show()

#### Chart - 2

In [None]:
# Chart - 2 visualization code
# Extract relevant columns
demand_data = df[['arrival_date_week_number', 'stays_in_weekend_nights']]

# Group by week number and calculate the mean demand for each week
weekly_demand = demand_data.groupby('arrival_date_week_number')['stays_in_weekend_nights'].mean().reset_index()

# Plot the demand trend over weeks
plt.figure(figsize=(12, 6))
plt.plot(weekly_demand['arrival_date_week_number'], weekly_demand['stays_in_weekend_nights'], marker='o')
plt.title('Weekly Demand for Hotel Bookings')
plt.xlabel('Week Number')
plt.ylabel('Average Stays in Weekend Nights')
plt.grid(True)
plt.show()

#### Chart - 3

In [None]:
# Chart - 3 visualization code

# Filter the data for canceled bookings
canceled_bookings = df[df['is_canceled'] == 1]

# Group by the 'arrival_date_year' and count the canceled bookings
canceled_by_year = canceled_bookings.groupby('arrival_date_year')['is_canceled'].count()

# Group by the 'arrival_date_year' and count the total bookings
total_by_year = df.groupby('arrival_date_year')['is_canceled'].count()

# Calculate the percentage of bookings canceled for each year
percentage_canceled_by_year = (canceled_by_year / total_by_year) * 100

# Plot a pie chart
plt.figure(figsize=(8, 8))
plt.pie(percentage_canceled_by_year, labels=percentage_canceled_by_year.index, autopct='%1.1f%%', startangle=90)
plt.title('Percentage of Bookings Canceled by Year')
plt.show()

#### Chart - 4

In [None]:
# Chart - 4 visualization code
#Data Exploration
# Explore the distribution of hotel types
hotel_distribution = df['hotel'].value_counts()

# Plot a bar chart to visualize the distribution
plt.figure(figsize=(8, 6))
sns.barplot(x=hotel_distribution.index, y=hotel_distribution.values, palette='viridis')
plt.title('Distribution of Hotel Types')
plt.xlabel('Hotel Type')
plt.ylabel('Count')
plt.show()

# Explore average lead time for each hotel type
average_lead_time = df.groupby('hotel')['lead_time'].mean()

# Plot a bar chart to visualize the average lead time
plt.figure(figsize=(8, 6))
sns.barplot(x=average_lead_time.index, y=average_lead_time.values, palette='mako')
plt.title('Average Lead Time for Hotel Types')
plt.xlabel('Hotel Type')
plt.ylabel('Average Lead Time (days)')
plt.show()

# Explore cancellation rates for each hotel type
cancellation_rates = df.groupby('hotel')['is_canceled'].mean()

# Plot a bar chart to visualize cancellation rates
plt.figure(figsize=(8, 6))
sns.barplot(x=cancellation_rates.index, y=cancellation_rates.values, palette='rocket')
plt.title('Cancellation Rates by Hotel Type')
plt.xlabel('Hotel Type')
plt.ylabel('Cancellation Rate')
plt.show()


#### Chart - 5

In [None]:
# Chart - 5 visualization code
# Explore trends in a specific columnn
monthly_trends = df['arrival_date_month'].value_counts().sort_index()

# Plot trends with bar plot
plt.figure(figsize=(18, 8))
sns.countplot(data=df,x='arrival_date_month',order=monthly_trends.index, color='skyblue')
plt.title('Monthly Arrival Trends')
plt.xlabel('Month')
plt.ylabel('Number of Bookings')

plt.show()


#Number of bookings done
monthly_bookings = df['arrival_date_month'].value_counts()

# Print the result
print("Number of Bookings for Each Month:")
print(monthly_bookings)

# Plot a bar chart to visualize the distribution of bookings for each month
plt.figure(figsize=(12, 6))
sns.barplot(x=monthly_bookings.index, y=monthly_bookings.values, palette='viridis')
plt.title('Number of Bookings for Each Month')
plt.xlabel('Month')
plt.ylabel('Number of Bookings')
plt.show()

# Calculate revenue per available room (RevPAR)
df['revenue_per_room'] = df['adr'] * (1 - df['is_canceled'])
revpar = df.groupby('arrival_date_month')['revenue_per_room'].mean()

# Plot RevPAR over time
plt.figure(figsize=(12, 6))
sns.lineplot(x=revpar.index, y=revpar.values)
plt.title('Monthly Revenue per Available Room (RevPAR)')
plt.xlabel('Month')
plt.ylabel('RevPAR')
plt.show()

#forecasting Demand

# Create a pivot table
pivot_table1 = pd.pivot_table(df, values='is_canceled', index='market_segment', columns='arrival_date_year', aggfunc=['sum'], fill_value=0)

# Print the pivot table
print("  Pivot Table - total number of Cancellations :")
print("\n",pivot_table1)

# Create a pivot table for demand by market segment and arrival date
demand_pivot = pd.pivot_table(df, values='is_canceled', index='arrival_date_month', columns='market_segment', aggfunc='count', fill_value=0)

# Plot the line chart
plt.figure(figsize=(15, 8))
sns.lineplot(data=demand_pivot, dashes=False)
plt.title('Demand by Market Segment Over Time')
plt.xlabel('Arrival Date')
plt.ylabel('Number of Bookings')
plt.legend(title='Market Segment', bbox_to_anchor=(1.05, 1), loc='upper left')
plt.xticks(rotation=45)
plt.show()


seasonal_demand = df.pivot_table(index='arrival_date_month', columns='arrival_date_day_of_month', values='is_canceled', aggfunc='count')

plt.figure(figsize=(14, 8))
sns.heatmap(seasonal_demand, cmap='YlGnBu', annot=True, fmt='g', cbar_kws={'label': 'Demand'})
plt.title('Seasonal Demand Variation')
plt.xlabel('Day of Month')
plt.ylabel('Month')
plt.show()

plt.figure(figsize=(10, 6))
channel_demand = df.groupby('distribution_channel')['is_canceled'].count().sort_values(ascending=False)
sns.barplot(x=channel_demand.index, y=channel_demand.values, ci=None)
plt.title('Booking Channel Contribution to Demand')
plt.xlabel('Distribution Channel')
plt.ylabel('Demand')
plt.xticks(rotation=45)
plt.show()


#### Chart - 6

In [None]:
# Chart - 6 visualization code
# Convert 'arrival_date_month' to categorical for proper ordering in the visualization
df['arrival_date_month'] = pd.Categorical(df['arrival_date_month'],categories=['January', 'February', 'March', 'April', 'May', 'June', 'July', 'August', 'September', 'October', 'November', 'December'], ordered=True)

# Line chart: Impact of seasonality on average daily rate (adr)
plt.figure(figsize=(12, 6))
sns.lineplot(data=df, x='arrival_date_month', y='adr')
plt.title('Impact of Seasonality on Average Daily Rate (ADR)')
plt.xlabel('Month')
plt.ylabel('Average Daily Rate (ADR)')
plt.xticks(rotation=45)
plt.show()

# Bar chart: Impact of length of stay on average daily rate (adr)
df['total_nights'] = df['stays_in_week_nights']+df['stays_in_weekend_nights']

plt.figure(figsize=(15, 8))
sns.barplot(x='total_nights', y='adr', data=df, palette='Set1')
plt.title('Impact of Length of Stay on Average Daily Rate (ADR)')
plt.xlabel('Number of Nights')
plt.ylabel('Average Daily Rate (ADR)')
plt.show()


# Bar plot: Impact of customer type on average daily rate (adr)
plt.figure(figsize=(10, 6))
ax = sns.barplot(x='adr', y='customer_type', data=df, errorbar=None, palette='pastel')
plt.title('Impact of Customer Type on Average Daily Rate (ADR)')
plt.xlabel('Average Daily Rate (ADR)')
plt.ylabel('Customer Type')
# Add annotations
for p in ax.patches:
    ax.annotate(f'{p.get_width():.2f}', (p.get_width(), p.get_y() + p.get_height() / 2),ha='center', va='center', xytext=(5, 0), textcoords='offset points')
plt.show()

#### Chart - 7

In [None]:
# Chart - 7 visualization code

# Boxplot: Lead time distribution for different stay durations
plt.figure(figsize=(10, 6))
sns.boxplot(x='stays_in_week_nights', y='lead_time', data=df)
plt.title('Lead Time Distribution for Different Stay Durations')
plt.xlabel('Number of Week Nights')
plt.ylabel('Lead Time (days)')
plt.show()

# Calculate correlation between lead time and adr
correlation_lead_time_adr = df['lead_time'].corr(df['adr'])
print(f"Correlation between Lead Time and ADR: {correlation_lead_time_adr}")

# Calculate total revenue for each lead time category
df['total_revenue'] = df['adr'] * (df['stays_in_week_nights'] + 1)  # Assuming 1 extra night for calculation
revenue_by_lead_time = df.groupby('lead_time')['total_revenue'].sum().reset_index()

# Line chart: Impact of lead time on total revenue
plt.figure(figsize=(10, 6))
sns.lineplot(x='lead_time', y='total_revenue', data=revenue_by_lead_time)
plt.title('Impact of Lead Time on Total Revenue')
plt.xlabel('Lead Time (days)')
plt.ylabel('Total Revenue')
plt.show()


#### Chart - 8

In [None]:
# Chart - 8 visualization code
#Visualization 1: Length of Stay Distribution
df['total']=df['stays_in_weekend_nights'] + df['stays_in_week_nights']
# Distribution of length of stay
plt.figure(figsize=(10, 6))
sns.histplot(df['total'], bins=20, kde=True)
plt.title('Distribution of Length of Stay')
plt.xlabel('Length of Stay (Nights)')
plt.ylabel('Frequency')
plt.show()
#This visualization gives you insights into the distribution of the length of stay, combining weekend and week nights.

#Visualization 2: Special Requests Analysis

# Countplot for special requests
plt.figure(figsize=(8, 5))
sns.countplot(x='total_of_special_requests', data=df, palette='viridis')
plt.title('Distribution of Special Requests')
plt.xlabel('Number of Special Requests')
plt.ylabel('Frequency')
plt.show()
#This countplot shows the distribution of the total number of special requests made by guests.

#Visualization 3: Guest Preferences Analysis

# Countplot for meal preferences
plt.figure(figsize=(8, 5))
sns.countplot(x='meal', data=df, palette='Set2')
plt.title('Meal Preferences of Guests')
plt.xlabel('Meal Type')
plt.ylabel('Frequency')
plt.show()
#This countplot displays the distribution of meal preferences (BB, SC, HB) among guests.

#Analysis 1: Average Length of Stay by Customer Type
# Boxplot for average length of stay by customer type

plt.figure(figsize=(10, 6))
sns.boxplot(x='customer_type', y='total', data=df)
plt.title('Average Length of Stay by Customer Type')
plt.xlabel('Customer Type')
plt.ylabel('Length of Stay (total Nights)')
plt.show()

#Analysis 2: Impact of Required Car Parking Spaces on ADR for Different Customer Types

plt.figure(figsize=(12, 8))
sns.barplot(x='customer_type', y='adr', hue='required_car_parking_spaces', data=df, errorbar=None, palette='viridis')
plt.title('Impact of Required Car Parking Spaces on ADR for Different Customer Types')
plt.xlabel('Customer Type')
plt.ylabel('Average Daily Rate (ADR)')
plt.legend(title='Required Parking Spaces', loc='upper right', bbox_to_anchor=(1.2, 1))
plt.show()


#### Chart - 9

In [None]:
# Chart - 9 visualization code
#. Check-in/Check-out Processes Analysis:
#Enhancing Operational Efficiency: By understanding the factors influencing the booking process, hotels can streamline their operations.
#This includes optimizing check-in/check-out processes, allocating resources efficiently, and ensuring that staff is well-prepared for busy periods.

# Visualize the distribution of lead time (time between booking and check-in)
plt.figure(figsize=(10, 6))
sns.histplot(df['lead_time'], bins=20, kde=True)
plt.title('Distribution of Lead Time')
plt.xlabel('Lead Time (Days)')
plt.ylabel('Frequency')
plt.show()
#This visualization helps understand the lead time distribution, which can influence the efficiency of check-in/check-out processes.

#2. Resource Allocation:

# Visualize the distribution of total guests (adults + children + babies)
df['total_guests'] = df['adults'] + df['children'] + df['babies']

plt.figure(figsize=(10, 6))
sns.histplot(df['total_guests'], bins=20, kde=True)
plt.title('Distribution of Total Guests')
plt.xlabel('Number of Guests')
plt.ylabel('Frequency')
plt.show()
#This visualization provides insights into the distribution of the total number of guests, helping in resource allocation planning.

#3. Busy Period Preparation:

#this visiualization helps to know the the
# Assuming df is your DataFrame
df['arrival_date'] = pd.to_datetime(df['arrival_date_year'].astype(str) + '-' + df['arrival_date_month'].astype(str) + '-' + df['arrival_date_day_of_month'].astype(str))
df['day_of_week'] = df['arrival_date'].dt.day_name()

plt.figure(figsize=(10, 6))
sns.countplot(x='day_of_week', data=df, order=df['day_of_week'].value_counts().index, palette='viridis')
plt.title('Number of Bookings for Each Day of the Week')
plt.xlabel('Day of the Week')
plt.ylabel('Number of Bookings')
plt.show()

#### Chart - 10

In [None]:
# Chart - 10 visualization code

# Visualize monthly booking trends for demand forecasting
monthly_bookings = df.groupby(['arrival_date_year', 'arrival_date_month']).size().reset_index(name='count')

plt.figure(figsize=(12, 6))
sns.lineplot(x='arrival_date_month', y='count', hue='arrival_date_year', data=monthly_bookings)
plt.title('Monthly Booking Trends for Demand Forecasting')
plt.xlabel('Month')
plt.ylabel('Number of Bookings')
plt.show()

#This visualization helps in understanding the monthly booking trends, aiding in demand forecasting and resource planning.

#2. Staffing Levels and Labor Costs:

# Calculate total staff based on the number of adults, children, and babies
df['total_staff'] = df['adults'] + df['children'] + df['babies']

# Visualize the total staff needed over time
staff_over_time = df.groupby(['arrival_date_year', 'arrival_date_month'])['total_staff'].sum().reset_index()

plt.figure(figsize=(12, 6))
sns.lineplot(x='arrival_date_month', y='total_staff', hue='arrival_date_year', data=staff_over_time)
plt.title('Total Staff Needed Over Time')
plt.xlabel('Month')
plt.ylabel('Total Staff')
plt.show()
#This visualization helps in understanding the variation in staffing levels over time.

#3. Labor Cost Analysis:

# Calculate labor costs based on staff levels and average cost per staff member
average_cost_per_staff = 100  # Placeholder value, replace with actual cost
df['labor_costs'] = df['total_staff'] * average_cost_per_staff

# Visualize labor costs over time
labor_costs_over_time = df.groupby(['arrival_date_year', 'arrival_date_month'])['labor_costs'].sum().reset_index()

plt.figure(figsize=(12, 6))
sns.lineplot(x='arrival_date_month', y='labor_costs', hue='arrival_date_year', data=labor_costs_over_time)
plt.title('Labor Costs Over Time')
plt.xlabel('Month')
plt.ylabel('Labor Costs')
plt.show()
#This visualization helps in understanding the variation in labor costs over time based on staffing levels.

#4. Occupancy Rate Analysis:
# Calculate occupancy rate based on booked rooms
df['occupancy_rate'] = 1 - df['is_canceled']

# Visualize occupancy rate over time
occupancy_rate_over_time = df.groupby(['arrival_date_year', 'arrival_date_month'])['occupancy_rate'].mean().reset_index()

plt.figure(figsize=(12, 6))
sns.lineplot(x='arrival_date_month', y='occupancy_rate', hue='arrival_date_year', data=occupancy_rate_over_time)
plt.title('Occupancy Rate Over Time')
plt.xlabel('Month')
plt.ylabel('Occupancy Rate')
plt.show()

#### Chart - 11

In [None]:
#Demographic Distribution: Stacked Bar Chart


plt.figure(figsize=(12, 8))
sns.countplot(x='customer_type', hue='market_segment', data=df, palette='Set2')
plt.title('Demographic Distribution by Customer Type and Market Segment')
plt.xlabel('Customer Type')
plt.ylabel('Count')
plt.legend(title='Market Segment', loc='upper right', bbox_to_anchor=(1.2, 1))
plt.show()

#This chart reveals the distribution of market segments within each customer type, helping identify which segments are more prevalent among different types of guests.

#Booking Trends Over Time: Line Chart
plt.figure(figsize=(12, 6))
df['reservation_date'] = pd.to_datetime(df['reservation_status_date'])
df['reservation_year_month'] = df['reservation_date'].dt.to_period('M')
booking_trends = df.groupby('reservation_year_month')['is_canceled'].count()
booking_trends.plot(kind='line', marker='o', color='skyblue')
plt.title('Booking Trends Over Time')
plt.xlabel('Year-Month')
plt.ylabel('Number of Bookings')
plt.show()

#This line chart illustrates booking trends over time, helping identify peak periods and seasonality for targeted marketing campaigns.

#Source of Bookings: Pie Chart
plt.figure(figsize=(15, 12))
source_counts = df['distribution_channel'].value_counts()
plt.pie(source_counts, labels=source_counts.index, autopct='%1.1f%%', startangle=90, colors=sns.color_palette('pastel'))
plt.title('Distribution of Booking Sources')
plt.show()

#A pie chart depicting the distribution of booking sources helps allocate marketing resources effectively, emphasizing channels that contribute most to bookings.

#Market Segment Analysis: Box Plot
plt.figure(figsize=(12, 8))
sns.boxplot(x='market_segment', y='adr', data=df, palette='viridis')
plt.title('ADR Distribution Across Market Segments')
plt.xlabel('Market Segment')
plt.ylabel('Average Daily Rate (ADR)')
plt.show()

#### Chart - 12

In [None]:
#Risk management

# Create a pivot table for cancellations by month and year
cancellation_pivot = pd.pivot_table(df[df['is_canceled'] == 1], values='is_canceled', index='arrival_date_month', columns='arrival_date_year', aggfunc='count', fill_value=0)

# Plot the line chart
plt.figure(figsize=(12, 8))
sns.lineplot(data=cancellation_pivot, markers=True)
plt.title('Monthly Cancellation Trend Over Years')
plt.xlabel('Month')
plt.ylabel('Number of Cancellations')
plt.show()

#Lead Time Distribution for Canceled Bookings:

#A histogram illustrating the distribution of lead times for canceled bookings can reveal patterns related to how far in advance customers tend to cancel their reservations.


plt.figure(figsize=(10, 6))
sns.histplot(df[df['is_canceled'] == 1]['lead_time'], bins=20, kde=True)
plt.title('Lead Time Distribution for Canceled Bookings')
plt.xlabel('Lead Time (days)')
plt.ylabel('Frequency')
plt.show()

#Booking Changes and Cancellation Correlation:

#A violine plot  showing the correlation between the number of booking changes and the likelihood of cancellation can highlight potential correlations. This can help in identifying patterns where frequent changes may lead to increased cancellations.

plt.figure(figsize=(10, 8))
sns.violinplot(x='is_canceled', y='booking_changes', data=df)
plt.title('Distribution of Booking Changes for Canceled vs. Not Canceled Bookings')
plt.xlabel('Cancellation Status (1: Canceled, 0: Not Canceled)')
plt.ylabel('Number of Booking Changes')
plt.show()


unforeseen_impact = df[df['days_in_waiting_list'] > 0].pivot_table(index='arrival_date_month', columns='arrival_date_day_of_month', values='is_canceled', aggfunc='count')

plt.figure(figsize=(14, 8))
sns.heatmap(unforeseen_impact, cmap='YlGnBu', annot=True, fmt='g', cbar_kws={'label': 'Cancellations'})
plt.title('Impact of Unforeseen Circumstances on Bookings')
plt.xlabel('Day of Month')
plt.ylabel('Month')
plt.show()

