<a href="https://colab.research.google.com/github/Anas182000/HOTEL-BOOKING-ANALYSIS/blob/main/ON_MY_OWN_HOTEL_BOOKING_ANALYSIS.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Have you ever wondered when the best time of year to book a hotel room is? Or the optimal length of stay in order to get the best daily rate? What if you wanted to predict whether or not a hotel was likely to receive a disproportionately high number of special requests? This hotel booking dataset can help you explore those questions!
# This data set contains booking information for a city hotel and a resort hotel, and includes information such as when the booking was made, length of stay, the number of adults, children, and/or babies, and the number of available parking spaces, among other things. All personally identifying information has been removed from the data. Explore and analyze the data to discover important factors that govern the bookings.

# **During the vacation or leisure trip we oftenly prefer to book hotel before the arrival date,so have we ever wondered why the prices are evenly higher than the normal days during  festive or vacation time and booking a hotel is a bit complex during those days, in this project we analyse the factors which are responsible for price fluctuation and booking hotel room**


In [1]:
from google.colab import drive

drive.mount("/content/drive")

Mounted at /content/drive


In [None]:
#importing all the important libraries
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np




In [None]:
#Mounting the Drive 
path="/content/drive/MyDrive/HOTEL BOOKING ANALYSIS/Hotel Bookings.csv"
df=pd.read_csv(path)

In [None]:
df.describe()   # shows the data frame data

In [None]:
df.info()   #it tells about the basics information of data frame

In [None]:
df.shape  #shape of the Data Frame

In [None]:
df.head()   #shows the data frame of n rows * by default it shows starting five rows data frame

In [None]:
df.tail()   #shows the data frame of n rows * by default it shows last five rows data frame

In [None]:
df.columns

# **1**) Getting handfull and meaningfull insights from the Datasets

In order to get meaningfull data we have to explain the each coloumns in the dataset , this is a important step and will do this work on Excel Spreadsheet



1)is_canceled : Value indicating if the booking was canceled (1) or not (0)

2)lead_time :* Number of days that elapsed between the entering date of the booking into the PMS and the arrival date*

3)arrival_date_year : Year of arrival date

4)arrival_date_month : Month of arrival date

5)arrival_date_week_number : Week number of year for arrival date

6)arrival_date_day_of_month : Day of arrival date

7)stays_in_weekend_nights : Number of weekend nights (Saturday or Sunday) the guest stayed or booked to stay at the hotel

8)stays_in_week_nights : Number of week nights (Monday to Friday) the guest stayed or booked to stay at the hotel

9)adults : Number of adults

10)children : Number of children

11)babies : Number of babies

12)meal : Type of meal booked. Categories are presented in standard hospitality meal packages:

13)country : Country of origin.`

14)market_segment : Market segment designation. In categories, the term “TA” means “Travel Agents” and “TO” means “Tour Operators”

15)distribution_channel : Booking distribution channel. The term “TA” means “Travel Agents” and “TO” means “Tour Operators”

16)is_repeated_guest : Value indicating if the booking name was from a repeated guest (1) or not (0)

17)previous_cancellations : Number of previous bookings that were cancelled by the customer prior to the current booking

18)previous_bookings_not_canceled : Number of previous bookings not cancelled by the customer prior to the current booking

19)reserved_room_type : Code of room type reserved. Code is presented instead of designation for anonymity reasons.

20)assigned_room_type : Code for the type of room assigned to the booking.

21)booking_changes : Number of changes/amendments made to the booking from the moment the booking was entered on the PMS until the moment of check-in or cancellation

22)deposit_type : Indication on if the customer made a deposit to guarantee the booking.

23)agent : ID of the travel agency that made the booking

24)company : ID of the company/entity that made the booking or responsible for paying the booking.

25)days_in_waiting_list : Number of days the booking was in the waiting list before it was confirmed to the customer

26)customer_type : Type of booking, assuming one of four categories

27)adr : Average Daily Rate as defined by dividing the sum of all lodging transactions by the total number of staying nights

28)required_car_parking_spaces : Number of car parking spaces required by the customer

29)total_of_special_requests :* Number of special requests made by the customer (e.g. twin bed or high floor)*

30)hotel : Hotel(Resort Hotel or City Hotel)

31)reservation_status : Reservation last status, assuming one of three categories

a)Canceled – booking was canceled by the customer 

b)Check-Out – customer has checked in but already departed

c)No-Show – customer did not check-in and did inform the hotel of the reason why

32)reservation_status_date : Date at which the last status was set. This variable can be used in conjunction with the ReservationStatus to understand when was the booking canceled or when did the customer checked-out of the hotel

In [None]:
df["reservation_status_date"]

In [None]:
#Changing the object type to date time
df['reservation_status_date'] = pd.to_datetime(df['reservation_status_date'], errors='coerce')

In [None]:
df["reservation_status_date"]

# First Of All we will Clean the Data by dropping the Null values 

In [None]:
# Counting  the duplicate rows in the dataset provided
#What is duplicate observation-->Duplicate observations occur when two or more rows have the same values or nearly the same values
duplicate_Value=df.duplicated().value_counts()    #true means duplicate rows in the dataset

# *1)Finding The Duplicate and Original Rows in a Dataset*

In [None]:
#Now We will visualize the data of duplicate and original rows through the count pot
plt.figure(figsize=(10,10))
sns.countplot(x=df.duplicated())


# **Conclusion**  : There Are 31994   of duplicate rows in our dataset so we will drop that rows from our dataset

In [None]:
#Now dropped the duplicate rows from the dataset
df= df.drop_duplicates()

In [None]:
#Now again checking the number of rows and coloumn
print(f' We have total {df.shape[0]} rows and {df.shape[1]} columns.')

In [None]:
#Looking for the Null/NaN/Missing Values :
df.isna().sum().sort_values(ascending=False)[:10]

In [None]:
# Filling/replacing null values with 0.
null_columns=['agent','children','company']
for i in null_columns:
  df[i].fillna(0,inplace=True)


# Replacing countr values with 'Undefined'
df['country'].fillna('Undefined',inplace=True)


In [None]:
#Successfully handled  Null Values
df.isna().sum().sort_values(ascending=False)[:6]

In [None]:
df['arrival_date_month'].value_counts()

## *2) Most preffered Month By Travellers To Travel*

In [None]:
df['arrival_date_month'].value_counts().plot.pie(explode=[0.05]*12, autopct='%1.1f%%', shadow=True, figsize=(10,8),fontsize=20) 
plt.title("pie chart for most preferred month")  

In [None]:
df["hotel"].value_counts()

# *3) Most Preffered Hotel by Travellers*

In [None]:
df['hotel'].value_counts().plot.pie(explode=[0.05,0.05],autopct='%1.1f%%', shadow=True, figsize=(10,8),fontsize=20) 
plt.title("Pie Chart for most preferred Hotel ")  

## **Conclusion**: Most Preferred Hotel By the travellers is City Hotel

# *4) Checking for the hotel which has better ADR(Average Daily Rate)*italicised text*

In [None]:
#Grouping The Hotel ADR
group_by_hotel=df.groupby('hotel')

In [None]:
#Calculating Highest ADR
highest_adr=group_by_hotel.mean().reset_index()

#Setting The Plot Size 
plt.figure(figsize=(8,10))

#Setting The Labels Of the Graph
plt.xlabel('Hotel Type')
plt.ylabel('ADR')
plt.title("Avg ADR of each Hotel Type")

#Plotting The Graph
sns.barplot(x=highest_adr['hotel'],y=highest_adr['adr'])

# **CONCLUSION** : More the ADR , More the Revenue so City Hotel generates More Revenue

#*5) From which country The Maximum Travellers come?*

ABBREVATIONS OF COUNTRIES :


GBR- Great Britian

PRT- Portugal

FRA- France

ESP- Spain

ITA -Itlay

IRL - Ireland

BRA -Brazil

DEU - Germany

NLD-Netherlands

BEL -Belgium

In [None]:
Data_country=df["country"].value_counts()[:10]   # Collecting The Ten Most Visited Travellers Countries

In [None]:
#Visualizing Percentage Of Travellers  Through the Pie chart
Data_country.plot.pie(explode=[0.05]*10,autopct='%1.1f%%', shadow=True, figsize=(10,8),fontsize=20)
plt.title("Pie Chart for most Travellers Across The Globe")

# CONCLUSION : The maximum No of Travellers are from PRT(Portugal)

In [None]:
# Counting the Travellers from various Countries
Data_country=df['country'].value_counts().reset_index().rename(columns={'index': 'country','country': 'count of guests'})[:10]


# Visualizing through by Plotting the Graph
plt.figure(figsize=(20,8))
sns.barplot(x=Data_country['country'],y=Data_country['count of guests'])
plt.xlabel('Country')
plt.ylabel('Number of guests',fontsize=12)
plt.title("Number of guests from diffrent Countries")

In [None]:
df["customer_type"].value_counts()

# *6) Which Type of Customer has maximum Booking?*

In [None]:
df["customer_type"].value_counts().plot.pie(explode=[0.05]*4,autopct='%1.1f%%', shadow=True, figsize=(10,8),fontsize=20)
plt.title("Pie Chart for type of customer ")

# CONCLUSION : Transient type(82.4%) is the maximum type of customer in the Hotel booking

# *7)  Which agent has brought maximum no of Travellers?*

In [None]:
# Highest number of booking made by Agents
Highest_bookings= df.groupby(['agent'])['agent'].agg({'count'}).reset_index().rename(columns={'count': "Most_Bookings" }).sort_values(by='Most_Bookings',ascending=False)

 # As Agent 0 has NAN value and we replaced it with 0 so it has no booking so droping values og Agent 0
Highest_bookings.drop(Highest_bookings[Highest_bookings['agent']==0].index,inplace=True) 

# taking top 10 bookings made by agent
top_ten_highest_bookings=Highest_bookings[:10]

top_ten_highest_bookings

In [None]:
#Visualizaing the Graph Through Bar Plot

plt.figure(figsize=(20,10))
sns.barplot(x=top_ten_highest_bookings['agent'],y=top_ten_highest_bookings['Most_Bookings'],order=top_ten_highest_bookings['agent'])
plt.xlabel('Agent Number')
plt.ylabel('Number of Bookings')
plt.title("Most Bookings Made by the Agent")


# CONCLUSION : **Agent Number 9** has the Maximum number of Booking, so we can provide a better commission to Agent 9 to increase more booking from him

# *8)Which type Of meal has more demand by the travellers?*

Types of Meal in Hotels:

BB - (Bed and Breakfast)

HB- (Half Board)

FB- (Full Board)

SC- (Self Catering)


In [None]:
df["meal"].value_counts()

In [None]:
plt.figure(figsize=(20,8))
sns.countplot(x=df["meal"])
plt.xlabel('Meal Type')
plt.ylabel('Count')
plt.title("Most Preferred Meal Type")

In [None]:
#plotting a pie chart
df["meal"].value_counts().plot.pie(explode=[0.05]*5,autopct='%1.1f%%' , shadow=True, figsize=(10,8),fontsize=20)
plt.title("Pie Chart for type of customer ")


# CONCLUSION : BB(BED & BREAKFAST) Meal is Mostly Preferred By the Travellers

# *9) What is the Percentage of Cancelled Booking?*

0 Denotes--> Percentage of Booking Not cancelled 

1 Denotes--> Percentage of Booking Cancelled

In [None]:
df["is_canceled"].value_counts()

In [None]:
df['is_canceled'].value_counts().plot.pie(explode=[0.05, 0.05], autopct='%1.1f%%', shadow=True, figsize=(10,10),fontsize=30)
plt.title("Cancelled and Non Cancelled Booking")

# CONCLUSION : Only 27.5% Booking is Cancelled

# *10) Most Reseved Room Type By the travellers?*

In [None]:
#only taking the top six reserved room type
reserved_type_room=df["reserved_room_type"].value_counts().reset_index().rename(columns={'index': 'room_type','reserved_room_type': 'count_of_room'})[:6]




In [None]:
#Visualizing The Data through bar plot
plt.figure(figsize=(20,8))
sns.barplot(x=reserved_type_room['room_type'],y=reserved_type_room['count_of_room'])
plt.xlabel('Room_Type')
plt.ylabel('Count_Room_Type',fontsize=12)
plt.title("Most Preferred Meal Type")

# CONCLUSION : The Room Type (A) is more reserved type room , we can increase the price of this Room to generate more Revenue 

# *11) Deposit Type?*

In [None]:
df["deposit_type"].value_counts()

In [None]:
# Visualize The Data Through Graph
#Setting The Plot size
plt.figure(figsize=(12,8))


sns.countplot(x=df['deposit_type'],hue=df['is_canceled'])
plt.title("Year Wise bookings")

# CONCLUSION : Deposit type --> (No Deposit) Has Maximum Cancellation as comparing To others Deposit type

In [None]:
df["stays_in_weekend_nights"].value_counts()[:8]

In [None]:
df["stays_in_week_nights"].value_counts()

# *12) Checking The Percentage of Repeated guest in Hotels?*


0----> Denotes Non_Repeated_Guests

1----> Denotes Repeated_Guests


In [None]:
#Countng Repeated and Non Repeated Guests
df["is_repeated_guest"].value_counts()

In [None]:
#Visualizing The Repeated And Non Repeated Guests Through Pie Chart
df["is_repeated_guest"].value_counts().plot.pie(explode=[0.05,0.05],autopct='%1.1f%%', shadow=True, figsize=(10,10),fontsize=30)

# retention rate by hotels

reten_guest = df.groupby(['is_repeated_guest'])['is_repeated_guest'].count()
reten_dig = reten_guest.plot(kind = 'pie', subplots = True, figsize=(10,5), autopct='%1.1f%%', title = 'retention rate by hotels')


# CONCLUSION : The retention Percentage of Hotels Is very Low (3.9%)

# *13) Checking The Car Parking spaces Required By Customers?*

In [None]:
#Count of Car Parking Spaces
df["required_car_parking_spaces"].value_counts()

In [None]:
#Visualize The Data Through By Pie graph
df["required_car_parking_spaces"].value_counts().plot.pie(explode=[0.05]*5,autopct='%1.1f%%', shadow=True, figsize=(10,10),fontsize=30)
plt.title(" Percentage Of Car Parking Spaces Required ",fontsize=30)

 

# CONCLUSION : Almost 91.6% customers don't need parking spaces so it's not a Major concern for Generating Revenue

14) Booking Changes 

In [None]:
booking_changes_df=df['booking_changes'].value_counts().reset_index().rename(columns={'index': "number_booking_changes",'booking_changes':'Counts'})

plt.figure(figsize=(12,8))
sns.barplot(x=booking_changes_df['number_booking_changes'],y=booking_changes_df['Counts']*100/df.shape[0])
plt.title("% of Booking change")
plt.xlabel('Number of booking changes')
plt.ylabel('Percentage(%)')

In [None]:
df["booking_changes"].value_counts()

In [None]:
df

In [None]:
df.corr()