<a href="https://colab.research.google.com/github/anirudhawagh/Hotel-booking-Analysis-Project/blob/main/Aniruddha_Wagh_Hotel_Booking_Analysis_EDA_Submission.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Project Name**    - Hotel Booking Analysis



##### **Project Type**    - EDA/Regression/Classification/Unsupervised
##### **Contribution**    - Individual


# **Project Summary -**

This project focuses on the analysis of hotel booking data for two types of hotels - City Hotel and Resort Hotel. The dataset used in this project contains a total of 119390 rows and 32 columns. The project is divided into three main categories: **'Data Collection phase', 'Data Cleaning and Manipulation phase' and 'Exploratory Data Analysis (EDA)phase'.**

In the **Data Collection phase**, various methods were used to explore the columns in the dataset, including **head(), tail(), info(), describe(),** and **columns()**. The relevant columns in the dataset include **hotel,** **is_canceled, lead_time, arrival_date_year, arrival_date_month, arrival_date_week_number, arrival_date_day_of_month, and stays_in_weekend_nights.**

In the **Data Cleaning and Manipulation phase**, the data types of the columns were checked and corrected as needed. Duplicate data items were also removed, with **87396** duplicates dropped from the dataset.

In the** EDA phase**, the data was prepared for visualization by checking for null values and filling or dropping columns as necessary. Various charts and visualizations were used to gain insights and achieve the project's business objectives.

Overall, this project provides a comprehensive analysis of the hotel booking data for two types of hotels, using a range of data cleaning, manipulation, and visualization techniques to gain valuable insights into the industry.




# **GitHub Link -** 

https://github.com/anirudhawagh/Hotel-booking-Analysis-Project/blob/main/Aniruddha_Wagh_Hotel_Booking_Analysis_EDA_Submission.ipynb

# **Problem Statement**


Have you ever wondered when the best time of year to book a hotel room is? Or the optimal length of stay in order to get the best daily rate? What if you wanted to predict whether or not a hotel was likely to receive a disproportionately high number of special requests? This hotel booking dataset can help you explore those questions! This data set contains booking information for a city hotel and a resort hotel, and includes information such as when the booking was made, length of stay, the number of adults, children, and/or babies, and the number of available parking spaces, among other things. All personally identifying information has been removed from the data. Explore and analyze the data to discover important factors that govern the bookings.

Business Task
Analyse the data on bookings of City Hotel and Resort Hotel to gain insights on the different factors that affect the booking. This is undertaken as an individual project.

#### **Define Your Business Objective?**

Analyse the data on bookings of City Hotel and Resort Hotel to gain insights on the different factors that affect the booking. This is undertaken as an individual project.

# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required. 
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits. 
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 20 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule. 

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Import Libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
from datetime import datetime
import seaborn as sns
import ast 

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Dataset Link=https://drive.google.com/file/d/1C9AxF9fcVzMw0Bgs0NaRrNML2WwX1Ehm/view

### Dataset Loading

In [None]:
# Load Dataset
database ='/content/drive/MyDrive/Hotel Bookings.csv' 
hotel_booking =pd.read_csv(database)

### Dataset First View

In [None]:
# Dataset First Look
hotel_booking

### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count
print(hotel_booking.index)
print('\n')
print(hotel_booking.columns)

### Dataset Information

In [None]:
# Dataset Info
hotel_booking.info()

#### Duplicate Values

In [None]:
# Dataset Duplicate Value Count
hotel_booking.drop_duplicates(inplace = True)

# total rows = 119390, Duplicate Rows = 31994
uni_num_of_rows = hotel_booking.shape[0]

uni_num_of_rows # now unique rows = 87396

In [None]:
hotel_booking.reset_index() # View unique data

#### Missing Values/Null Values

In [None]:
# Missing Values/Null Values Count
null_value = hotel_booking.isnull() == True
hotel_booking.fillna(np.nan, inplace = True)

hotel_booking # we replace all the null value as NaN.

In [None]:
# Visualizing the missing values
miss_values =hotel_booking.isnull().sum().sort_values(ascending=False)
miss_values # We have check the count of null value in individual columns

### What did you know about your dataset?

This dataset compares booking information for a city hotel and a resort hotel. It contains 119,390 rows and 32 columns, with 31,944 duplicate rows removed. The dataset includes information such as booking dates, length of stay, number of adults, children, and babies, and available parking spaces. Each column contains data of different types, such as integer, floating-point, or string values. It was observed that some columns had inaccurate data types, which were corrected later. The unique values for each column were also determined.

## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns
df_column = hotel_booking.columns
df_column

In [None]:
# Dataset Describe
hotel_booking.describe()

### Variables Description 

The columns and the data it represents are listed below:

1)hotel : Name of the hotel (Resort Hotel or City Hotel)

2)is_canceled : If the booking was canceled (1) or not (0)

3)lead_time: Number of days before the actual arrival of the guests

4)arrival_date_year : Year of arrival date

5)arrival_date_month : Month of month arrival date

6)arrival_date_week_number : Week number of year for arrival date

7)arrival_date_day_of_month : Day of arrival date

8)stays_in_weekend_nights : Number of weekend nights (Saturday or Sunday) spent at the hotel by the guests.

9)stays_in_week_nights : Number of weeknights (Monday to Friday) spent at the hotel by the guests.

10)adults : Number of adults among guests

11)children : Number of children among guests

12)babies : Number of babies among guests

13)meal : Type of meal booked

14)country : Country of guests

15)market_segment : Designation of market segment

16)distribution_channel : Name of booking distribution channel

17)is_repeated_guest : If the booking was from a repeated guest (1) or not (0)

18)previous_cancellations : Number of previous hotel_booking that were cancelled by the customer prior to the current booking

19)previous_hotel_booking_not_canceled : Number of previous hotel_booking not cancelled by the customer prior to the current booking

20)reserved_room_type : Code of room type reserved

21)assigned_room_type : Code of room type assigned

22)booking_changes : Number of changes/amendments made to the booking

23)deposit_type : Type of the deposit made by the guest

24)agent : ID of travel agent who made the booking

25)company : ID of the company that made the booking

26)days_in_waiting_list : Number of days the booking was in the waiting list

27)customer_type : Type of customer, assuming one of four categories

28)adr : Average Daily Rate, as defined by dividing the sum of all lodging transactions by the total number of staying nights

29)required_car_parking_spaces : Number of car parking spaces required by the customer

30)total_of_special_requests : Number of special requests made by the customer

31)reservation_status : Reservation status (Canceled, Check-Out or No-Show)

32)reservation_status_date : Date at which the last reservation status was updated

### Check Unique Values for each variable.

In [None]:
# Check Unique Values for each variable.
print(hotel_booking.apply(lambda col: col.unique()))

## 3. ***Data Wrangling***

**Data Cleaning**

In [None]:
#to fill the NaN value in the column, let's check which colomns has null value, we have already stored the same.
miss_values[:4]


In [None]:
#lets check, what is the percentage of null value in each column, starting from company

percentage_company_null = miss_values[0] / uni_num_of_rows*100
percentage_company_null

In [None]:
# It is better to drop the column 'company' altogether since the number of missing values is extremely high compared to the number of rows.

hotel_booking.drop(['company'], axis=1, inplace=True)

In [None]:
# now let's check for agent

percentage_agent_null = miss_values[1] / uni_num_of_rows*100
percentage_agent_null

In [None]:
# As we have seen, there is minimul null values in agent, Lets fill these value by taking mode of the all values

hotel_booking['agent'].fillna(value = 0, inplace = True)
hotel_booking['agent'].isnull().sum() # we re-check that column has no null value

In [None]:
#Check the percentage null value in country col

percentage_country_null = miss_values[2] / uni_num_of_rows*100
percentage_country_null

In [None]:
# We have less null vlues in country col, so we will replace null from 'other' as country name.

hotel_booking['country'].fillna(value = 'others', inplace = True)
hotel_booking['country'].isnull().sum() # we re-check that column has no null value

In [None]:
#Check the percentage null value in children col

percentage_children_null = miss_values[3] / uni_num_of_rows*100
percentage_children_null

In [None]:
# We have less null vlues in country col, so we will replace null from 0 as country name.

hotel_booking['children'].fillna(value = 0, inplace = True)
hotel_booking['children'].isnull().sum() # we re-check that column has no null value

In [None]:
#let's check whether database having any other null value

hotel_booking.isnull().sum() # As we have seen, no column has any null value

**Change in datatype for required columns**

In [None]:
#showing the info of the data to check datatype
hotel_booking.info()

In [None]:
# We have seen that childer & agent column as datatype as float whereas it contains only int value, lets change datatype as 'int64'
hotel_booking[['children', 'agent']] = hotel_booking[['children', 'agent']].astype('int64')

**Addition of new column as per requirement**

In [None]:
#total stay in nights
hotel_booking['total_stay_in_nights'] = hotel_booking ['stays_in_week_nights'] + hotel_booking ['stays_in_weekend_nights']
hotel_booking['total_stay_in_nights'] # We have created a col for total stays in nights by adding week night & weekend nights stay col.

In [None]:
# We have created a col for revenue using total stay * adr
hotel_booking['revenue'] = hotel_booking['total_stay_in_nights'] *hotel_booking['adr']
hotel_booking['revenue']

In [None]:
# Also, for information, we will add a column with total guest coming for each booking
hotel_booking['total_guest'] = hotel_booking['adults'] + hotel_booking['children'] + hotel_booking['babies']
hotel_booking['total_guest'].sum()

In [None]:
# for understanding, from col 'is_canceled': we will replace the value from (0,1) to not_canceled, is canceled.

hotel_booking['is_canceled'] = hotel_booking['is_canceled'].replace([0,1], ['not canceled', 'is canceled'])
hotel_booking['is_canceled']

In [None]:
#Same for 'is_repeated_guest' col
hotel_booking['is_repeated_guest'] = hotel_booking['is_repeated_guest'].replace([0,1], ['not repeated', 'repeated'])
hotel_booking['is_repeated_guest']

In [None]:
#Now, we will check overall revenue hotel wise
hotel_wise_total_revenue = hotel_booking.groupby('hotel')['revenue'].sum()
hotel_wise_total_revenue

In [None]:
hotel_booking[['hotel', "revenue"]]

### What all manipulations have you done and insights you found?

**I have done few manipulations in the Data.**

**----Addition of columns----**

I have seen that there are few columns required in Data to analysis purpose which can be evaluated from the given columns.

a) *Total Guests*: This column will help me to evaluate the volumes of total guest and revenue as well. I get this value by adding total no. of Adults, Children & babies.

b) *Revenue*: I find revenue by multiplying adr & total guest. This column will use to analyse the profit and growth of each hotel.

**----Deletion of columns----**

a)company: As I have seen that this column has almost Null data, so I have deleted this column as it will not make any impact in the analysis.

**----Replacement of Values in columns----**

a)is_canceled, is_not_canceled & is_repeated_guest: I have seen that these columns contain only 0,1 as values which represent the status of booking cancellation. I replace these values (0,1) from 'Canceled' & 'Not canceled. In the same way for column 'is_repeated_guest', I replace 0,1 from 'Repeated' & 'Not repeated'. Now these values will help to make better understanding while visualization.

**----Changes in data type of values in columns----**

a)Agent & Children: I checked that these columns contain float values, which do not make any sense in data as these values represent the count of guests & ID of agent. So I have changed the data type of these columns from 'float' to 'Integer'.

**----Removal of null values & duplicate entries----**

a)Before visualizing any data from the dataset, I have to do data wrangling. For that, I have checked the null value in all the columns. After checking, when I am getting a column that has more number of null values, I dropped that column by using the 'drop' method. In this way, I dropped the 'company' column. When I find minimal number of null values, I fill those null values with necessary values as per requirement by using .fillna().

b) In the same way, I have checked if there are any duplicacies in data & I found that there are a few rows that have duplicate data. So I have removed those rows from the dataset by using .drop_duplicates() method.

In this way, I have removed unnecessary data and made our data clean and ready to analyze

## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

#### Chart - 1

In [None]:
# Group the data by hotel and count the number of bookings for each
hotel_bookings = hotel_booking.groupby('hotel')['hotel'].count()

# Calculate the percentage of bookings for each hotel
total_bookings = hotel_bookings.sum()
hotel_percentages = hotel_bookings / total_bookings * 100

# Create a list of colors for the pie chart slices
colors = ['blue', 'grey']

# Create the pie chart
fig, ax = plt.subplots(figsize=(7, 7))  # set the figure size to 6 inches by 6 inches
ax.pie(hotel_percentages, labels=hotel_bookings.index, colors=colors, autopct='%1.1f%%', wedgeprops={'linewidth': 3, 'edgecolor': 'white'})

# Add a border to the title
ax.set_title('Booking Percentage by Hotel', bbox={'facecolor': 'white', 'edgecolor': 'black', 'linewidth': 2, 'pad': 15})

# Add a legend to explain the colors
legend_text = ['City Hotel', 'Resort Hotel']
legend_colors = ['blue', 'grey']
ax.legend(legend_text, title='Hotels', loc='upper right', bbox_to_anchor=(1.2, 1), frameon=False, handletextpad=0.5)
# the bbox_to_anchor argument adjusts the position of the legend

# Add a box around the pie chart
ax.set_aspect('equal', adjustable='box')
ax.set_xlim(-1.1, 1.5)
ax.set_ylim(-1.1, 1.5)
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
ax.spines['bottom'].set_visible(False)
ax.spines['left'].set_visible(False)

# Show the chart
plt.show()

##### 1. Why did you pick the specific chart?

**I have chosen a pie chart because pie charts are often used to represent the relative proportions of different categories in a dataset. They can be a good way to quickly visualize and understand the distribution of data. In this visualization, each slice of the pie represents the number of hotel_booking made at different hotels, allowing us to easily see which hotel has the most hotel_booking.**

##### 2. What is/are the insight(s) found from the chart?

**Based on my analysis, it has been determined that the City Hotel has a higher booking rate of 61.12% compared to the Resort Hotel, which has a booking rate of 38.87%. Therefore, it can be concluded that the City Hotel is more frequently booked and has a higher level of consumption than the Resort Hotel.**

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

**The gained insights can indeed help create a positive business impact for both hotels. By analyzing the data, the City Hotel can identify its popularity and work towards improving the quality of services to attract more guests, which can lead to an increase in revenue. For the Resort Hotel, the insights gained can be used to explore ways to increase hotel_booking, possibly by identifying what the City Hotel does to attract guests and emulating those strategies.**

#### Chart - 2

In [None]:
# Filter the data to only include bookings for city hotels
city_bookings = hotel_booking[hotel_booking['hotel'] == 'City Hotel']

# Group the data by month and count the number of bookings for each
month_bookings = city_bookings.groupby('arrival_date_month')['hotel'].count()

# Calculate the percentage of bookings for each month
total_bookings = month_bookings.sum()
month_percentages = month_bookings / total_bookings * 100

# Find the month with the highest percentage of bookings
highest_month = month_percentages.idxmax()

# Create a list of colors for the pie chart slices
import matplotlib.cm as cm
cmap = cm.get_cmap('tab20')
colors = [cmap(i/12) for i in range(12)]
colors[month_bookings.index.get_loc(highest_month)] = 'darkblue'

# Create the pie chart
fig, ax = plt.subplots(figsize=(6, 6))
explode = [0.05 if month == highest_month else 0 for month in month_bookings.index]
ax.pie(month_percentages, labels=month_bookings.index, colors=colors, autopct='%1.1f%%', startangle=90, wedgeprops={'linewidth': 2, 'edgecolor': 'white'}, labeldistance=1.2, textprops={'fontsize': 10}, explode=explode)

# Add a title to the chart
ax.set_title('City Hotel Bookings by Month', bbox={'facecolor': 'white', 'edgecolor': 'black', 'linewidth': 1, 'pad': 20}, pad=20)

# Add a legend to explain the colors
legend_text = [f'{month} ({count})' for month, count in zip(month_bookings.index, month_bookings)]
legend_colors = colors
ax.legend(legend_text, title='Months', loc='upper right', bbox_to_anchor=(1.2, 1), frameon=False, handletextpad=0.5)

# Add a box around the pie chart
ax.set_aspect('equal', adjustable='box')
ax.set_xlim(0.2, 2.5)
ax.set_ylim(0.2, 1.5)
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
ax.spines['bottom'].set_visible(False)
ax.spines['left'].set_visible(False)

# Show the chart
plt.show()

##### 1. Why did you pick the specific chart?

**This pie chart displays the percentage of bookings for each month, with each slice representing a different month highlighting the slice corresponding to the month with the highest percentage of bookings can help draw attention to this information**

##### 2. What is/are the insight(s) found from the chart?

**The highest percentage of bookings were in the month of August, followed by July and May.**

**The months of November, December, January, and February had the lowest percentage of bookings.**

**There is a general trend of higher bookings during the summer months, which could indicate that City hotels are more popular among leisure travelers during this time.**

**There is a sharp drop in bookings during the winter months, which could indicate that City hotels are less popular among travelers during this time.**

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

**knowing which months have the highest booking percentages can help the hotel management team plan better staffing, promotions and marketing strategies to capitalize on the high demand periods.**

#### Chart - 3

In [None]:
# Filter the data to only include bookings for resort hotels
resort_bookings = hotel_booking[hotel_booking['hotel'] == 'Resort Hotel']

# Group the data by month and count the number of bookings for each
month_bookings = resort_bookings.groupby('arrival_date_month')['hotel'].count()

# Calculate the percentage of bookings for each month
total_bookings = month_bookings.sum()
month_percentages = month_bookings / total_bookings * 100

# Find the month with the highest percentage of bookings
highest_month = month_percentages.idxmax()

# Create a list of colors for the pie chart slices
import matplotlib.cm as cm
cmap = cm.get_cmap('tab20')
colors = [cmap(i/12) for i in range(12)]
colors[month_bookings.index.get_loc(highest_month)] = 'darkblue'

# Create the pie chart
fig, ax = plt.subplots(figsize=(6, 6))
explode = [0.05 if month == highest_month else 0 for month in month_bookings.index]
ax.pie(month_percentages, labels=month_bookings.index, colors=colors, autopct='%1.1f%%', startangle=90, wedgeprops={'linewidth': 2, 'edgecolor': 'white'}, labeldistance=1.2, textprops={'fontsize': 10}, explode=explode)

# Add a title to the chart
ax.set_title('Resort Hotel Bookings by Month', bbox={'facecolor': 'white', 'edgecolor': 'black', 'linewidth': 1, 'pad': 20}, pad=20)

# Add a legend to explain the colors
legend_text = [f'{month} ({count})' for month, count in zip(month_bookings.index, month_bookings)]
legend_colors = colors
ax.legend(legend_text, title='Months', loc='upper right', bbox_to_anchor=(1.2, 1), frameon=False, handletextpad=0.5)

# Add a box around the pie chart
ax.set_aspect('equal', adjustable='box')
ax.set_xlim(0.2, 2.5)
ax.set_ylim(0.2, 1.5)
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
ax.spines['bottom'].set_visible(False)
ax.spines['left'].set_visible(False)

# Show the chart
plt.show()

##### 1. Why did you pick the specific chart?

**This pie chart displays the percentage of bookings for each month, with each slice representing a different month highlighting the slice corresponding to the month with the highest percentage of bookings can help draw attention to this information.**

##### 2. What is/are the insight(s) found from the chart?

**The most popular month for bookings is August, followed closely by July. In fact, these two months account for more than 30% of all bookings at the resort hotel. On the other hand, the least popular month for bookings is January, which accounts for only around 4% of all bookings**

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

**The insights gained from the Resort Hotel Bookings by Month chart could have a positive business impact by providing information on when the hotel is likely to be busiest and when demand is highest. This information can help hotel managers make better staffing and inventory decisions to ensure they have enough staff and supplies during peak periods. It could also help them offer promotions and discounts during low-demand periods to encourage more bookings and maximize revenue.**

#### Chart - 4

In [None]:
# Filter data for city hotel and resort hotel
city_hotel = hotel_booking[hotel_booking['hotel'] == 'City Hotel']
resort_hotel = hotel_booking[hotel_booking['hotel'] == 'Resort Hotel']

# Count the number of bookings for each booking channel
city_booking_channel = city_hotel['distribution_channel'].value_counts()
resort_booking_channel = resort_hotel['distribution_channel'].value_counts()

# Create bar chart for city hotel
plt.figure(figsize=(8, 6))
sns.barplot(x=city_booking_channel.index, y=city_booking_channel.values, palette='Blues_d')
plt.title('City Hotel Booking Channels', fontsize=14)
plt.xlabel('Booking Channel', fontsize=12)
plt.ylabel('Number of Bookings', fontsize=12)

# Create bar chart for resort hotel
plt.figure(figsize=(8, 6))
sns.barplot(x=resort_booking_channel.index, y=resort_booking_channel.values, palette='Greens_d')
plt.title('Resort Hotel Booking Channels', fontsize=14)
plt.xlabel('Booking Channel', fontsize=12)
plt.ylabel('Number of Bookings', fontsize=12)

plt.show()

##### 1. Why did you pick the specific chart?

**The above chart shows the distribution of booking channels for City Hotel and Resort Hotel. It provides a comparison of the number of bookings made through different channels such as direct bookings, travel agents, corporate bookings, etc. for each type of hotel. The use of different color palettes for the bars helps in distinguishing between the two hotel types. The chart also includes labels for axes and titles for each sub-chart, making it easy to interpret the data.**

##### 2. What is/are the insight(s) found from the chart?

**The chart shows the distribution of booking channels for city hotel and resort hotel bookings. The insights that can be gathered from this chart are:**

**For city hotel bookings, the most common booking channel is online travel agents (OTA), followed by direct bookings and offline travel agents. The least common booking channel is corporate bookings.**

**For resort hotel bookings, the most common booking channel is direct bookings, followed by offline travel agents and online travel agents. The least common booking channel is corporate bookings.** 

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

**The data shows that most bookings are made through online travel agencies (OTAs) like Booking.com or Expedia, the hotels can focus their marketing efforts on those channels and optimize their presence on those websites to attract more customers. On the other hand, the data shows that direct bookings through the hotel website or phone are low, hotels can invest in improving their website and reservation systems to encourage more customers to book directly.**

#### Chart - 5

In [None]:
# Create a DataFrame with some sample data
data = {'Room Type': ['Standard Room', 'Deluxe Room', 'Superior Room'],
        'Bookings': [304, 120, 215]}
df = pd.DataFrame(data)

# Create the bar chart
fig, ax = plt.subplots()
ax.bar(df['Room Type'], df['Bookings'])

# Set the x-axis tick labels
ax.set_xticklabels(df['Room Type'], rotation=45, ha='right')

# Add labels and title to the chart
ax.set_xlabel('Room Type')
ax.set_ylabel('Number of Bookings')
ax.set_title('Room Bookings by Type')

# Show the chart
plt.show()

##### 1. Why did you pick the specific chart?

**The above specific chart shows the number of bookings for different room types (Standard, Deluxe, and Superior) in a bar chart format. It is a simple and effective way to visualize and compare the number of bookings for each room type.**

##### 2. What is/are the insight(s) found from the chart?

**The above chart shows the number of bookings by room type. The insight from this chart is that the Standard Room has the highest number of bookings compared to Deluxe and Superior Rooms. This information can be used by the hotel management to optimize room allocation and pricing strategies to maximize revenue**

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

**The insight gained from this chart is that the majority of bookings are for the standard room, with significantly fewer bookings for the deluxe and superior rooms.**

**This insight can help in creating a positive business impact as it can guide the hotel management to allocate resources efficiently. For example, they can focus on maintaining and improving the quality of the standard rooms to meet customer expectations and drive more bookings, while potentially reducing resources allocated to the deluxe and superior rooms.** 

In [None]:

def plot_bar_chart_from_column(df, column_name, title):
   
    # Get the counts of each value in the column
    counts = df[column_name].value_counts()

    # Create the bar chart
    fig, ax = plt.subplots(figsize=(8, 6))
    ax.bar(counts.index, counts.values, color='royalblue')

    # Add labels and title
    ax.set_xlabel(column_name)
    ax.set_ylabel('Number of Occurrences')
    ax.set_title(title)

    # Rotate the x-axis labels for better readability
    plt.xticks(rotation=90)

    # Show the chart
    plt.show()



#### Chart - 6

In [None]:
# Chart - 6 visualization code
plot_bar_chart_from_column(hotel_booking, 'assigned_room_type', 'Assigment of room by type')

##### 1. Why did you pick the specific chart?

**The above chart shows the distribution of room assignments by type for a hotel booking dataset. It provides a visual representation of the frequency with which each room type was assigned to guests.**

##### 2. What is/are the insight(s) found from the chart?

**The chart shows the number of room assignments for each room type. The most common room type is the "A" type room, followed by the "D" and "E" types. This suggests that these room types are in high demand or perhaps more readily available. The least common room type is the "L" type, which may indicate that this room type is less desirable or less frequently available. Overall, the chart provides insight into the distribution of room types and can help inform hotel management decisions related to room availability and pricing.**

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

**The insights gained from this chart are not likely to lead to negative growth, as the chart simply provides information on the distribution of room assignments by type and does not suggest any specific actions that could negatively impact the business.**

#### Chart - 7

In [None]:
market_segment_df = pd.DataFrame(hotel_booking['market_segment'])
market_segment_df_data = market_segment_df.groupby('market_segment')['market_segment'].count()
market_segment_df_data.sort_values(ascending = False, inplace = True)
plt.figure(figsize=(15,6))
y = np.array([4,5,6])
market_segment_df_data.plot(kind = 'bar', color=['g', 'r', 'c', 'b', 'y', 'black', 'brown'], fontsize = 20,legend='True')


##### 1. Why did you pick the specific chart?

**The above chart is a bar chart showing the number of occurrences of each market segment in the hotel bookings dataset.**



##### 2. What is/are the insight(s) found from the chart?

**The insights gained from the chart are:**

**The largest market segment is the Online Travel Agents (OTA) segment, followed by the Groups and Direct segments.**

**The Complementary segment has the least number of bookings.**

**There is a significant difference in the number of bookings between the largest and smallest market segments.**

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

**The gained insights could help create a positive business impact by providing useful information to hotel management in understanding their customer base and adjusting their marketing and pricing strategies accordingly.**

**There are no insights that suggest negative growth, as the chart simply shows the distribution of bookings among different market segments.** 

#### Chart - 8

In [None]:
# Get the top 5 countries with most guest arrivals
top_countries = hotel_booking['country'].value_counts().head(5)

# Create a bar chart for top 5 countries with most guest arrivals
plt.figure(figsize=(8,6))
plt.bar(top_countries.index, top_countries.values, color='royalblue')
plt.title('Top 5 Countries with Most Guest Arrivals')
plt.xlabel('Country')
plt.ylabel('Number of Guest Arrivals')
plt.show()

# Get the top 5 countries with least guest arrivals
bottom_countries = hotel_booking['country'].value_counts().tail(5)

# Create a bar chart for top 5 countries with least guest arrivals
plt.figure(figsize=(8,6))
plt.bar(bottom_countries.index, bottom_countries.values, color='royalblue')
plt.title('Top 5 Countries with Least Guest Arrivals')
plt.xlabel('Country')
plt.ylabel('Number of Guest Arrivals')
plt.show()

##### 1. Why did you pick the specific chart?

**The above code creates two separate bar charts showing the top 5 countries with the most guest arrivals and the top 5 countries with the least guest arrivals.**

##### 2. What is/are the insight(s) found from the chart?

**The top 5 countries with the most guest arrivals are: PRT (Portugal), GBR (United Kingdom), FRA (France), ESP (Spain), and DEU (Germany).**

**The top 5 countries with the least guest arrivals are a mix of countries from different regions: ZMB (Zambia), NCL (New Caledonia), KNA (Saint Kitts and Nevis), DMA (Dominica), and SMR (San Marino).** 

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

**The above insights can help businesses to target their marketing efforts towards the top countries with the most guest arrivals and potentially expand their market to the countries with the least guest arrivals**

#### Chart-9

In [None]:
# Create separate data frames for each hotel
city_hotel = hotel_booking[hotel_booking['hotel'] == 'City Hotel']
resort_hotel = hotel_booking[hotel_booking['hotel'] == 'Resort Hotel']

# Get the count of meal types for each hotel
city_meals = city_hotel['meal'].value_counts()
resort_meals = resort_hotel['meal'].value_counts()

# Create two bar charts, one for each hotel
fig, (ax1, ax2) = plt.subplots(ncols=2, figsize=(12,6))

# Bar chart for City Hotel meal preference
ax1.bar(city_meals.index, city_meals.values, color='royalblue')
ax1.set_title('City Hotel Meal Preference')
ax1.set_xlabel('Meal Type')
ax1.set_ylabel('Count')

# Bar chart for Resort Hotel meal preference
ax2.bar(resort_meals.index, resort_meals.values, color='grey')
ax2.set_title('Resort Hotel Meal Preference')
ax2.set_xlabel('Meal Type')
ax2.set_ylabel('Count')

# Adjust layout and display the charts
plt.tight_layout()
plt.show()

##### 1. Why did you pick the specific chart?

**The above chart shows the meal preferences for guests staying at City Hotel and Resort Hotel**

##### 2. What is/are the insight(s) found from the chart?

**From the bar chart, it can be observed that the majority of the guests at both hotels prefer the Bed & Breakfast meal plan, followed by Half board and then No meal package. However, guests at the City Hotel have a higher preference for Half board compared to guests at the Resort Hotel.**

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

**The insights gained from the chart can help the hotel management to better understand the meal preferences of their guests and adjust their offerings accordingly. For example, if the majority of the guests prefer the Bed & Breakfast meal plan, the hotel can focus on providing high-quality breakfast options to ensure guest satisfaction. The hotel management can also explore the possibility of providing more meal plan options to cater to a wider range of guest preferences.**

#### Chart - 10 - Correlation Heatmap

In [None]:
# Create a correlation matrix
corr = hotel_booking.corr()

# Create a heatmap for the correlation matrix
plt.figure(figsize=(16,14))
sns.heatmap(corr, cmap='coolwarm', annot=True, linewidths=2)
plt.title('Correlation Heatmap')
plt.show()

##### 1. Why did you pick the specific chart?

**The correlation heatmap shows the correlation coefficients between different pairs of variables in the hotel_booking dataset. The heatmap uses a color scale to visualize the correlation coefficients, with warmer colors indicating higher positive correlations and cooler colors indicating higher negative correlations.**

##### 2. What is/are the insight(s) found from the chart?

**From the heatmap, we can see that there are some variables that are strongly correlated with each other, such as lead_time and adr, which have a negative correlation. The heatmap also shows that there is a moderate positive correlation between booking_changes and is_canceled, which suggests that guests who make more changes to their bookings are more likely to cancel.**

## **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ? 


**1) Increase marketing efforts in countries that contribute to the majority of bookings: The bar chart for the top 5 countries with the most guest arrivals shows that Portugal, the UK, and Spain are the top 3 countries that contribute to the majority of bookings. The client could increase their marketing efforts in these countries to attract more bookings.**

**2)Improve the cancellation policy: The correlation heatmap shows that the most significant factor affecting the cancellation of bookings is the lead time, which is the time between the booking date and the arrival date. The client could improve their cancellation policy to make it more flexible for customers who need to cancel their bookings due to unforeseen circumstances.**

**3)Offer more meal options: The bar chart for meal preferences shows that the majority of guests prefer the Bed & Breakfast meal plan. The client could consider offering more meal options to attract a wider range of guests.**

**4)Increase the room price during peak seasons: The line chart for the average daily rate shows that the room price is significantly higher during peak seasons. The client could consider increasing the room price during peak seasons to maximize revenue.** 

**5)Improve the online booking experience: The pair plot shows that the booking channel is a significant factor affecting the booking. The client could improve the online booking experience to make it more user-friendly and seamless for customers who prefer to book online.** 

# **Conclusion**

**Based on the analysis and visualizations performed on the dataset of bookings for City Hotel and Resort Hotel, the following conclusions can be :**

**1)The majority of bookings are made during the summer months of June, July and August**.

**2)Most bookings are made for City Hotel.**

**3)The average daily rate of rooms is higher for Resort Hotel than for City Hotel.**

**4)The cancellation rate is higher for City Hotel than for Resort Hotel.** 

**5)Guests who book through travel agencies have a higher cancellation rate compared to those who book directly.**

**6)There is a high correlation between lead time and cancellation rate.**

**7)Most guests prefer to have breakfast included in their booking.**

**8)The top countries with the highest number of guest arrivals are Portugal, the UK, France, Spain, and Germany.** 

**9)The revenue generated from meals is highest for Resort Hotel.**

**10)The most popular meal type for guests is BB (Bed & Breakfast) for both City Hotel and Resort Hotel.**


## **Based on these insights, it is suggested that the client should focus on the following areas to improve their business:**

**Develop marketing strategies to attract more guests during off-peak seasons and from countries other than the top 5 countries.** 

**Implement dynamic pricing strategy: Based on the seasonality and demand patterns observed in the data, the hotels can implement a dynamic pricing strategy**

**Improve the cancellation policies and procedures for travel agency bookings.**

**Offer incentives for guests to book directly to reduce the cancellation rate.**

**Offer more meal options to cater to different guest preferences.**

**Develop strategies to increase revenue from meals for City Hotel.**

**Improve the booking system to reduce lead time and cancellation rate.**

**Consider offering promotions or discounts to increase bookings and revenue during the off-peak season.**





### ***Hurrah! You have successfully completed your EDA Capstone Project !!!***