<a href="https://colab.research.google.com/github/Ananya1994das/Projects_/blob/main/Module_2%2C_Hotel_Booking_Analysis_Project.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Project Name**    -  **Hotel Booking Analysis**



##### **Project Type**    - EDA
##### **Contribution**    - Individual


# **Project Summary**


The goal of this project is to analyze a dataset of hotel bookings to extract key insights that can help the hotel management understand booking patterns, guest behavior, and potential areas for improvement.This data set contains booking information for a city hotel and resort hotel, and includes information such as when the booking was made, length of stay, the number of adults, children, and/or babies, and the number of available parking spaces,among other things.It consists of 1,19,390 observations and 32 variables.Each observation represents a hotel booking.This projects's principal aim is to dig deep into this extensive data, identify significant trends,patterns and insights pertaining to hotel booking activities, and present these discoveries visuallly for an enhanced understanding.

Our analysis can help us understand the given data and we can draw some useful insights to make some crucial business decisions regarding the factors affecting the bookings of the hotels.In hotel industry,cancellations of booking and average daily rate(ADR) are two important factors that affects the business.By understanding the factors that are determining the cancellation of certain bookings, the hotels can take necessary precautions to reduce the cancellation rate.By understanding the patterns in ADR against different variables, the hotels can be prepared in advance to generate more revenue and make a profitable business.

Exploratory Data Analysis(EDA):It refers to the critical process of performing initial investigations on data so as to discover patterns, to spot anomalies , to test hypothesis and to check assumptions with the help of summary statistics and graphical representations.The following are the various steps performed as a part of EDA


1.Descriptive Statistics
Summary Statistics: Calculate basic metrics like mean, median, and standard deviation to understand the distribution of numerical data (e.g., lead time, ADR).
2. Missing Data Analysis
Identify Missing Values: Check for and handle missing values, especially in columns with a lot of nulls like company and agent.
3. Categorical Data Analysis
Frequency Counts: Examine how often each category appears in variables like hotel type, meal type, and market segment.
4. Correlation Analysis
Correlation Matrix: Determine relationships between numerical variables, like the correlation between lead time and cancellations.
5. Bivariate Analysis
Compare Groups: Analyze how numerical values vary across different categories (e.g., ADR across different hotel types).
6. Time Series Analysis
Trends Over Time: Analyze how bookings, cancellations, and ADR change over months or years to identify seasonality.
7. Multivariate Analysis
Explore Multiple Variables: Look at relationships between several variables simultaneously to uncover complex patterns.
8. Outlier Detection
Identify Outliers: Find and analyze extreme values that might affect your analysis, such as very high or low ADR.
9. Cancellation Analysis
Cancellation Patterns: Study factors that contribute to cancellations, such as lead time or previous cancellation history.
10. Revenue Analysis
ADR Insights: Examine factors that influence average daily rates (ADR), such as room type or booking period.
11. Customer Behavior Analysis
Repeat Guests: Analyze the behavior of repeat guests compared to first-time guests.
12. Geographical Analysis
Country Distribution: Visualize where most guests come from and how it affects booking patterns.
13. Feature Engineering
Create New Variables: Develop new features based on insights (e.g., categorize lead time into short/long).
14. Hypothesis Testing
Statistical Validation: Use tests to confirm if observed patterns are statistically significant.

**Problem Statement**

We are here to explore a hotel booking dataset to discover important factors that govern the bookings, which contain booking information for a city hotel and a resort hotel. We will analyze some important aspects of hotel bookings which will helps us identify major loopholes and give us insights which will be helpful to run profitable hotel business.

**Business Objective **


The best time of year to book a hotel room?
Optimal length of stay to get the best daily rate?
To predict whether or not a hotel was likely to receive a disproportionately high number of special requests?
Define Your Business Objective?
In Hotel industry, Cancellation and Average Daily Rate are two important factors that help run the business effectively.

By understanding the factors that are determining the cancellation of a certain booking, the hotels can take necessary precautions to reduce Cancellation rate.
By understanding the patterns in ADR against different variables, the hotels can be prepared in advance to generate more revenue and help make a profitable business. Our goal here is to understand such factors in the given data set by performing Exploratory Data Analysis

# **GitHub Link -**

https://github.com/Ananya1994das/Projects_/blob/main/Module_2%2C_Hotel_Booking_Analysis_Project.ipynb


# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required.
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits.
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.


5. You have to create at least 20 logical & meaningful charts having important insights.

[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule.

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]







# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Import Libraries
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

### Dataset Loading

In [None]:
from google.colab import drive
drive.mount('/content/drive')

In [None]:
# Importing the dataset
hotel_booking_df=pd.read_csv('/content/drive/MyDrive/Projects/Hotel Bookings.csv')

### Dataset First View

In [None]:
# Dataset First
hotel_booking_df.head()


In [None]:
hotel_booking_df.tail()

### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count
shape=hotel_booking_df.shape
print('No. of rows =',shape[0])
print('No. of columns =',shape[1])

### Dataset Information

In [None]:
# Dataset Info
hotel_booking_df.info()


In [None]:
hotel_booking_df.describe()

#### Duplicate Values

In [None]:
# Dataset Duplicate Value Count
hotel_booking_df.duplicated().sum()

#### Missing Values/Null Values

In [None]:
# Missing Values/Null Values Count
for i in hotel_booking_df.columns.tolist():
  if hotel_booking_df[i].isnull().sum()>0:
    print(f'total number of missing values in {i} column is {hotel_booking_df[i].isnull().sum()}')


In [None]:
# Check Unique Values for each variable.
for i in hotel_booking_df.columns.tolist():
  print("No. of unique values in ",i,"is",hotel_booking_df[i].nunique(),".")

## ***2. Understanding Your Variables***

**What did you know about your dataset?**

The dataset contains the following variables:-

**Variables Description**


The dataset contains 32 columns, each representing a different attribute related to hotel bookings. These columns include:

**hotel:** Type of the hotel (Categorical)

**is_cancelled :** Whether booking is cancelled (cancelled = 1, not cancelled = 0) (numerical)

**lead_time:** The number of days elapsed between the booking and the arrival date of the guests (numerical)

**arrival_date_year**:Year of the arrival (numerical)

**arrival_date_month:** Month of the arrival (numerical)

**arrival_date_week_number:** Week of the arrival (numerical)

**arrival_date_day_of_the_month:** Day of the arrival (numerical)

**stays_in_weekend_nights:** Number of weekend nights stayed (numerical)

**stays_in_week_nights :** Number of week nights stayed (numerical)

**adults**: Number of adults (numerical)

**children:** Number of children (numerical)

**babies :** Number of babies (numerical)

**meal**: Type of meal (categorical)

**country:** Country of the guest (categorical)

**market_segment**: Which segment the customer belongs to (categorical)

**Disribution_channel:** Through which means guest got booking (categorical)

**is _repeated _guest:** Whether the guest is repeated(repeated = 1, not repeated = 0) (categorical)

**previous_cancellation:** Is there any previous cancellations of the guest (numerical)

**previous_booking:** Number of completed bookings of the guest (numerical)

**reserved_room_type:** Type of the room guest booked (categorical)

**assigned_room_type:** Room assigned to the guest for the booking (categorical)

**booking_changes:** Number of changes made in the booking (numerical)

**deposit_ type:** Type of deposit the guest made (categorical)

**agent:** ID of the agent (categorical)

**company:** ID of the company (categorical)

**days_in_waiting_list:** Number of days to wait (numerical)

**customer_type:** Type of the customer (categorical)

**adr:** Average daily rate(ADR) (numerical)

**required_car _parking**: Number of car parking spaces required to the guest (numerical)

**total_of_special_requests**: Special requests made by the guests (numerical)

**reservation_status:** Status of the reservation (categorical)

**reservation_status_date** :Date of reservation (categorical)


*  Null/Missing values in agent,country,children and company columns.completeness issue
*   hotel,meal,country,market_segment,distribution_channel,assigned_room_type,reserved_room_type,deposit_type,customer_type,reservation_status, is_canceled and is_repeated_guest columns can be of categorical dtype.validity issue
*   children,company and agent columns dtype should be int instead of float.validity issue
*   reservation_status_date column dtype should be datetime instead of object.validity issue
*   31994 duplicate rows in the dataset.validity issue
*   company column and arrival_date_week_number column are redundant.


## 3. ***Data Wrangling***

In [None]:
# 'agent' & 'company' are two columns with huge amount of null values in it.
# These are also not so essential for this analysis so we can drop these two columns.

new_hotel_df = hotel_booking_df.drop(['agent','company'],axis = 1)
new_hotel_df

# 'new_hotel_df' is the new dataframe we created which does not include 'agent' & 'company' columns.

In [None]:
# 'children' & 'country' are two remaining columns which has null values.
# We will rather replace the null values in this columns as number of null values are lesser than previous two columns.

new_hotel_df['country'].fillna(new_hotel_df['country'].mode().to_string(), inplace=True)
new_hotel_df['children'].fillna(round(new_hotel_df['children'].mean()), inplace=True)

# Replacing null values of 'country' column with the mode value of the column.
# Replacing null values of 'children' column with the mean value of the column.

In [None]:
# Re-checking the number of null values to ensure the data cleaning is successful.
new_hotel_df.isnull().sum()

# We observe that there are no more null values in the dataset so, the data is cleaned.

In [None]:
#deleting all duplicate rows
new_hotel_df.drop_duplicates(inplace=True)

In [None]:
# changing dtype of certain columns to category dtype
cat_lst=['hotel','meal','country','market_segment','distribution_channel','assigned_room_type','reserved_room_type','deposit_type','customer_type','reservation_status', 'is_canceled', 'is_repeated_guest']
for i in cat_lst:
  new_hotel_df[i]=new_hotel_df[i].astype('category')

In [None]:
# changing dtype of certain columns to int dtype
int_lst=['children']
for i in int_lst:
  new_hotel_df[i]=new_hotel_df[i].astype(int)

In [None]:
# changing reservation_status_date column dtype from object to datetime
new_hotel_df['reservation_status_date']= pd.to_datetime(new_hotel_df['reservation_status_date'],format='%Y-%m-%d')

In [None]:
# Creating a new column which will show total number of night stays.
new_hotel_df['total_night_stays'] = new_hotel_df['stays_in_weekend_nights']+ new_hotel_df['stays_in_week_nights']
new_hotel_df

In [None]:
# Creating a new column which will show total number of members.
new_hotel_df.loc[:, 'Total_members'] = (
    new_hotel_df['adults'] +
    new_hotel_df['children'] +
    new_hotel_df['babies']
)


In [None]:
# Creating a new column which will show total number of night stays.
new_hotel_df.loc[:, 'total_no._of_nights'] = (
    new_hotel_df['stays_in_weekend_nights'] +
    new_hotel_df['stays_in_week_nights']
)



In [None]:
#dropping the row containing value of ADR=5400
new_hotel_df=new_hotel_df[new_hotel_df['adr']<5000]


In [None]:
#checking the shape of the dataset after all manipulations
new_hotel_df.shape


In [None]:
new_hotel_df.describe()

In [None]:
new_hotel_df.info()

### What all manipulations have you done and insights you found?

Dropped all the duplicate rows.

Dropped company and agent column as it contains 94% Null values.

Replaced all the null values in the country and children column with mode and mean as number of null values are lesser

Changed dtype of certain columns to category,int,datetime dtype.

Dropped the row containing value of ADR=5400,which is a outlier.

By converting the 'reservation _status _date column to a different datatype, such as datetime, we can extract information based on quarters, months, or years. This allows us to analyze the booking patterns over different time periods.

Another new column we added is 'total_no._of_nights', which represents the total number of nights for each booking. This allows us to examine the bookings from the perspective of the duration of stay.

## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

**How about look into individual attributes and make useful plots to create insights?**

# Overview of the type of hotel

Since there are only 2 types (resort or city), we can simply use a barchart or pie chart to show.

In [None]:
# Enlarging the pie chart
plt.rcParams['figure.figsize'] = 6,6

# Indexing labels. tolist() will convert the index to list for easy manipulation
labels = new_hotel_df['hotel'].value_counts().index.tolist()

# Convert value counts to list
sizes = new_hotel_df['hotel'].value_counts().tolist()

# As the name suggest, explode will determine how much each section is separated from each other
explode = (0, 0.1)

# Determine colour of pie chart
colors = ['lightskyblue','yellow']

# Putting them together. Sizes with the count, explode with the magnitude of separation between pies, colors with the colors,
# autopct enables you to display the percent value using Python string formatting. .1f% will round off to the tenth place.
# startangle will allow the percentage to rotate counter-clockwise. Lets say we have 4 portions: 10%, 30%, 20% and 40%. The pie will rotate from smallest to the biggest (counter clockwise). 10% -> 20% -> 30% -> 40%
# We have only 2 sections so anglestart does not matter
# textprops will adjust the size of text
plt.pie(sizes, explode=explode, labels=labels, colors=colors, autopct='%1.1f%%',startangle=90, textprops={'fontsize': 14})

**What do we see here?**

It seems that a huge proportion of hotels was city hotel. Resort hotel tend to be on the expensive side and that is the reason most people just stick with city hotel.


# Let's have an overview of the number of people who booked the hotel.

In [None]:
# Looking into adults.
# Using groupby to group according to hotel types only.
new_hotel_df['adults'].groupby(new_hotel_df['hotel']).describe()

In [None]:
# Looking into children.
# Using groupby to group according to hotel types only.
new_hotel_df['children'].groupby(new_hotel_df['hotel']).describe()

**What do we see here?**

It seems that mean values for children are higher. This means that resort hotels are better choice for large families.:

# Overview of canceled bookings

In [None]:
new_hotel_df.loc[:, 'is_canceled'] = new_hotel_df['is_canceled'].replace([1, 0], ['canceled', 'not_canceled'])
canceled_data = new_hotel_df['is_canceled']

sns.countplot(x=canceled_data)


**What do we see here?**

It seen that there is a an approximation of 40% of the bookings getting canceled which should be taken care off. The hotel should look into the cancellations happening. This may be done by maybe collecting feedbacks for the cancellation.

# Let's look into cancellation rate among different type of hotel.

In [None]:
# Cancellation data of hotels(0 = Not cancelled & 1 = Cancelled)
booking_cancelled = new_hotel_df.groupby(['hotel','is_canceled'])['is_canceled'].count().unstack()
booking_cancelled

In [None]:
# Visualising cancellation data.
booking_cancelled.plot(kind='bar',figsize=(11,6),color=['seagreen','greenyellow'],fontsize=13)
plt.title('''CANCELLATION STATUS

(0 = Not cancelled & 1 = Cancelled)''',fontsize = 10)
plt.xlabel('HOTEL',fontsize = 15)
plt.ylabel('COUNT',fontsize = 15)
plt.show()

**What do we see here?**

We have seen a huge proportion of cancellation from city hotel. This was expected since most of the hotel bookings belong to city hotels.


# Overview of arrival period

In [None]:
lst3 = ['hotel', 'arrival_date_year', 'arrival_date_month','arrival_date_day_of_month' ]
period_arrival = new_hotel_df[lst3]
sns.countplot(data= period_arrival, x = 'arrival_date_year', hue = 'hotel')

In [None]:
plt.figure(figsize=(20,5))

sns.countplot(data = period_arrival, x = 'arrival_date_month', hue = 'hotel', order = ['January', 'February', 'March', 'April', 'May', 'June', 'July',
          'August', 'September', 'October', 'November', 'December']).set_title('Graph showing number of arrival per month',fontsize=20)
plt.xlabel('Month')
plt.ylabel('Count')

In [None]:
plt.figure(figsize=(15,5))

sns.countplot(data = period_arrival, x = 'arrival_date_day_of_month', hue = 'hotel').set_title('Graph showing number of arrival per day', fontsize = 20)

**So what do we see?**

We can see that 2016 seems the year where hotel booking is at its highest.Whereas booking has dropped on 2017 comparatively.

We also see an increasing trend in booking around the middle of the year, with August being the highest. Summer ends around August, followed straight by autumn. It seems that summer period is a peak period for hotel booking.

We do notice a roller coaster trend for the arrival day of month.

# Let's dig deeper into whether the stay is over a weekend or weekday.

In [None]:
plt.figure(figsize=(15,5))
sns.countplot( data=new_hotel_df, x = 'stays_in_weekend_nights').set_title('Number of stays on weekend nights', fontsize = 20)

In [None]:
plt.figure(figsize=(15,5))
sns.countplot(data = new_hotel_df, x = 'stays_in_week_nights' ).set_title('Number of stays on weekday night' , fontsize = 20)

**What do we see this time?**

It seems that majority of the stays are over the weekend_ night which is very obvious due to the weekly holidays.

# Analysis on total nights stays

In [None]:
# Total count of night stays.
night_stays = new_hotel_df['total_night_stays'].value_counts().sort_values()
night_stays

In [None]:
# Visualisation of night stay data till 20 nights.
sns.set(style='whitegrid')
plt.figure(figsize=(12,8))
sns.countplot(x=new_hotel_df['total_night_stays'],palette = 'colorblind')
plt.xlim(0.5,20.5)
plt.title('NIGHT STAYS',fontsize=20)
plt.xlabel('NUMBER OF NIGHTS', fontsize=15)
plt.ylabel('COUNT', fontsize=15)
plt.show()

# Most preffered number of night stays by the guest are 1,2,3,4,5 & 7.

**What do we see?**

Higher number of customers prefer staying for 3 nights in general.

# Kind of visitors coming for a stay

In [None]:
sns.countplot(data = new_hotel_df, x = 'adults', hue = 'hotel').set_title("Number of adults", fontsize = 20)

In [None]:
sns.countplot(data = new_hotel_df, x = 'children', hue = 'hotel').set_title("Number of children", fontsize = 20)

In [None]:
sns.countplot(data = new_hotel_df, x = 'babies', hue = 'hotel').set_title("Number of babies", fontsize = 20)

**What do we see here?**

It seems that majority of the visitors travel in pair. Those who travel with children or babies have no specific preference for the type of hotel.

# Looking into which countries the visitors are from

**We will want visitors with no cancellation.**

In [None]:
country_visitors =new_hotel_df[new_hotel_df['is_canceled'] == 'not_canceled'].groupby(['country']).size().reset_index(name = 'count')

# We will be using Plotly.express to plot a choropleth map. Big fan of Plotly here!
import plotly.express as px

px.choropleth(country_visitors,
                    locations = "country",
                    color= "count",
                    hover_name= "country", # column to add to hover information
                    color_continuous_scale=px.colors.sequential.Plasma,
                    title="Home country of visitors")

**What do we see?**

We have a huge number of visitors from western europe, namely France,UK and Portugal being the highest.


# ADR based on top 10 countries

In [None]:
# Countries with best 'adr'.
# Considering top 10 countries.
country_adr = new_hotel_df.groupby(['country'])['adr'].mean().sort_values(ascending = False)[0:10]
print(country_adr)

# Visualisation.
country_adr.plot(kind='bar', figsize=(14,6),color='olivedrab',width=0.8)
plt.title('ADR OF TOP 10 COUNTRIES',fontsize=20)
plt.xlabel('Country',fontsize=15)
plt.ylabel('ADR',fontsize=15)
plt.show()

# 'Dji' have the highest ADR.

**What do we see?**

'DJI' has the highest average ADR among all countries.

# Looking into market segments and distribution channel



In [None]:
num_segments = new_hotel_df['market_segment'].nunique()

# Define a custom color palette with enough colors
custom_colors = sns.color_palette("Set2", num_segments)

plt.figure(figsize=(10, 5))
sns.countplot(data=new_hotel_df, x='market_segment', hue='market_segment', palette=custom_colors, dodge=False, legend=False)
plt.title('Types of Market Segment', fontsize=20)
plt.show()

In [None]:
plt.figure(figsize=(10,5))
sns.countplot(data = new_hotel_df, x = 'distribution_channel').set_title('Types of distribution channel', fontsize = 20)

**What do we see here?**

Majority of the distribution channels and market segments involve travel agencies (online or offline).
We can target our marketing area to be on these travel agencies website and work with them since majority of the visitors tend to reach out to them.

# Looking into deposit types

In [None]:

plt.figure(figsize=(8,5))
sns.countplot(data =new_hotel_df, x = 'deposit_type').set_title('Graph showing types of deposits', fontsize = 20)

**What do we see here?**

Majority of the booking does not require deposit. That could explain why cancellation rate was actually 40% of non-cancellation rate.

# Overview of repeated guests

In [None]:
sns.countplot(data =new_hotel_df, x = 'is_repeated_guest').set_title('Graph showing whether guest is repeated guest', fontsize = 20)

**What do we see here?**

Low number of repeated guests.
There is a need to target repeated guests by providing offers or any kind of attractive package so that the previous visitors re-visit.

# Looking at types of guests

In [None]:
sns.countplot(data = new_hotel_df, x = 'customer_type').set_title('Graph showing type of guest', fontsize = 20)

**What do we see here?**

Majority of the bookings are transient. This means that the booking is not part of a group or contract. With the ease of booking directly from the website, most people tend to skip the middleman to ensure quick response from their booking.

# Looking into prices per month per hotel
average daily rate = Sum of all lodging transaction/ Total number of staying night

average daily rate per person = ADR/ Adult+ Children

We will need to find out average daily rate per person

In [None]:
# Resizing plot
plt.figure(figsize=(12,5))

# Calculating average daily rate per person
new_hotel_df['adr_pp'] = new_hotel_df['adr'] / (new_hotel_df['adults'] + new_hotel_df['children'])
actual_guests = new_hotel_df.loc[new_hotel_df["is_canceled"] == 'not_canceled']
actual_guests['price'] = actual_guests['adr'] * (actual_guests['stays_in_weekend_nights'] + actual_guests['stays_in_week_nights'])
sns.lineplot(data = actual_guests, x = 'arrival_date_month', y = 'price', hue = 'hotel')

**What can we see here?**

Prices of resort hotel are much higher in the month of August. Maybe this is the reson Resort Hotels are not booked that much. City hotels on the other hand has a stable price throughout the year. This is the reason why city hotels are preffered more.

The month of August has the highest booking rate so may be Resort hotels should reduce their pricing to get more bookings.

# Meal Analysis

In [None]:
# Unique meal type.
unique_meal_count=new_hotel_df['meal'].value_counts()
unique_meal_count

In [None]:
# Percentage of meal type count.
percentage_meal_count = new_hotel_df['meal'].value_counts(normalize= True)*100
percentage_meal_count

In [None]:
# Visualisation of percentage meal count
percentage_meal_count.plot(kind = 'pie',labels=percentage_meal_count.index,figsize=(15,10),autopct='%0.1f%%',colors=['lightcoral','yellow','royalblue','brown','white'],fontsize=15)
plt.title('PERCENTAGE OF MEAL PREFERENCE', fontsize=20)
plt.show()

**What do we see?**



Here we can see most preferable meal is BB. This plan includes a room for the night and breakfast the next morning. It is a populaly taken as most of the customer visit for travel purpose.The normally want a meal included but plan to eat the rest of their meals outside the hotel.

# Requirement of Car parking

In [None]:
# Car parking requirement count.
car_parking = new_hotel_df['required_car_parking_spaces'].value_counts()
car_parking

In [None]:
# Visualisation.
car_parking.plot(kind='line',color='brown',linestyle=':',linewidth=5,figsize =(12,6))
plt.title('CAR PARKING SPACE ANALYSIS',fontsize = 20)
plt.ylabel('COUNT',fontsize = 15)
plt.xlabel('NUMBER OF PARKING SPACE',fontsize = 15)
plt.grid(True)
plt.show()

**What do we see?**

Most of the hotels have no car parking. If they have, they have single car parking.

# How do waiting days affect ADR?

In [None]:
# command for applying a style
plt.style.use('dark_background')

In [None]:
# Chart - 8 visualization code
plt.scatter(hotel_booking_df['days_in_waiting_list'],hotel_booking_df['adr'])
plt.xlabel('Waiting days')
plt.ylabel('ADR')
plt.title('Waiting days VS ADR')
plt.show()

**What do we see?**

When the number of waiting days are low then there is high chance of getting good ADR.Hotels should do something to reduce the waiting period so that they can earn more profit.

# ADR on the basis of month

In [None]:

# Chart - 10 visualization code
plt.style.use('ggplot')
df=hotel_booking_df.groupby('arrival_date_month')['adr'].sum()
plt.plot(df)
plt.xlabel('Months')
plt.ylabel('ADR')
plt.title('Months vs ADR')
plt.xticks(rotation='vertical')
plt.show()

**What do we see?**

August month has the highest amount of ADR and January month has the lowest.
In months,which are generating less ADR, hotels should provide some more lucrative offers to attract more customers in those months as well.So that hotels can earn more profit.

## **5. Solution to Business Objective**

* As, we see percentage of Resort Hotels getting booked are less compared to City Hotels so maybe some pricing customizations are required. Even we also saw that Resort hotels are more preferable for family so there can be some add-on activity package included to justify the pricing.

* For reducing the nunber of booking getting cancelled the hotel should collect feedback from customers. This may give insights on why the cancellation rate is high.

* Drop in booking rate compared to previous year is a sign of serious issues. Management should look into it. Targeting months between May to Aug would be preferable. Those are peak months due to the summer period.

* Attractive packages can be provided for customers staying for long period. This may increase the number of night stays.

* Countries which have low booking rate, reducing the cost of booking upto 50% for them can be a good start for getting visitors. Whereas,
for those countries with more visitors, seasonal offers can be a good way to make them re-visit.

* For the market segemnts having low contribution in number of bookings hotels should provide domain specific offers so that these market segemnts can also bring more customers to the hotels.


* Majority of the distribution channels and market segments involve travel agencies (online or offline). Targeting can be done to focus on  these travel agencies website and work with them since majority of the visitors tend to reach out to them.

* It is also understood that the high rate of cancellations can be due high no deposit policies.

* Given that the hotel do not have repeated guests, management should target advertisement on guests to increase returning guests.

* Car parking provision should be done where there is none. As peoples normally prefer personal vehicles for travelling.

* When the number of waiting days are low then there is high chance of getting good ADR.Hotels should do something to reduce the waiting period so that they can earn more profit.










# **Conclusion**

To address the lower booking rates for Resort Hotels compared to City Hotels, pricing customizations and family-oriented activity packages should be considered. Collecting customer feedback can provide insights to reduce cancellation rates. The drop in bookings compared to the previous year highlights the need for targeted strategies, especially during peak months (May-August). Offering attractive packages for long stays, discounts for low-booking countries, and domain-specific offers for underperforming market segments can help increase bookings. Collaboration with travel agencies and addressing high no-deposit policies could reduce cancellations. Additionally, enhancing car parking availability and focusing on reducing waiting days can improve profitability and guest retention.