![](https://searchengineland.com/figz/wp-content/seloads/2014/04/hotel-bell-customer-service-ss-1920.jpg)

With the availability of internet and smartphones, the number of internet users in the world has already crossed 3 billion marks. The Internet has made various activities and tasks easier. This convenience of the internet has impacted travel and hotel industry. This has resulted in the origin of online hotel booking engine. Nowadays, when people plan for a vacation or holiday, they start it by exploring places and deals on the internet.Online Hotel Booking Engine has smoothened the process of hotel booking. The best part of online hotel booking engine is that it allows travelers to book a hotel in advance to avoid any type of inconvenience later. That’s why many hotels are integrating it with their website. Benefits of hotel booking engine: Hotel booking engine increases the revenue and profit of hotels. As everyone is using internet nowadays and the number of users is increasing exponentially. So, the numbers of people who book hotels online are also increasing. Earlier hoteliers used to give commission to agents, but hotel booking engine gives hoteliers business directly. This sums up the profit amount.

In [None]:
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import datetime
pd.set_option('display.max_columns',None)
df = pd.read_csv('/kaggle/input/hotel-booking-demand/hotel_bookings.csv')
df.head()

In [None]:
df.shape # Number of rows and columns

In [None]:
df.describe()

### Percentage fo missing values

In [None]:
percentage_missing_values = round(df.isnull().sum()*100/len(df),2).reset_index()
percentage_missing_values.columns = ['column_name','percentage_missing_values']
percentage_missing_values = percentage_missing_values.sort_values('percentage_missing_values',ascending = False)
percentage_missing_values

As we see, we just have three columns with missing values.That's a good sign.

### Hotel Type

In [None]:
plt.figure(figsize=(12,8))
ax = sns.countplot(x="hotel", data = df)
plt.title('Hotel Type')
plt.xlabel('Hotel')
plt.ylabel('Total Bookings')
for p in ax.patches:
    ax.annotate((p.get_height()),(p.get_x()+0.4 , p.get_height()+100)) 

### Is Canceled ? 

Sometimes customers tend to cancel their reservation due to various reasons. Let's see how many of them have canceled.

In [None]:
plt.figure(figsize=(12,8))
ax = sns.countplot(x="is_canceled", data = df, palette="RdYlGn")
plt.title('Is Canceled?')
plt.xlabel('Is Canceled?')
plt.ylabel('Total Count')
for p in ax.patches:
    ax.annotate((p.get_height()),(p.get_x()+0.4 , p.get_height()+100)) 

One-Third of the users have canceled their reservation. This is a serious issue for the hotel in terms of revenue. We will get back to the prediction part of this in the later stage of this notebook. Moving on.

### Arrival Time

Seasonality plays an important role in hotel hospitality industry. 

In [None]:
def month_converter(month):
    months = ['January', 'February', 'March', 'April', 'May', 'June','July', 'August', 'September', 'October', 'November', 'December']
    return months.index(month) + 1
df['arrival_month'] = df['arrival_date_month'].apply(month_converter)
df['arrival_year_month'] = df['arrival_date_year'].astype(str) + " _ " + df['arrival_month'].astype(str)

plt.figure(figsize=(24,8))
ax = sns.countplot(x="arrival_year_month", data = df, palette="CMRmap_r")
plt.title('Arrival Year_Month')
plt.xlabel('arrival_year_month')
plt.ylabel('Total Count')
for p in ax.patches:
    ax.annotate((p.get_height()),(p.get_x()+0.2 , p.get_height()+50)) 

As we can see, most of the customers book hotel rooms during April-June season and the bookings are low from November-January.

### Arrival Month Vs Weekday

In [None]:
df['Arrrival Date'] = df.apply(lambda row: datetime.strptime(f"{int(row.arrival_date_year)}-{int(row.arrival_month)}-{int(row.arrival_date_day_of_month)}", '%Y-%m-%d'), axis=1)
df['arrival_day_of_week'] = df['Arrrival Date'].dt.day_name()
weekdays = ["Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday"]
df['arrival_day_of_week'] = pd.Categorical(df['arrival_day_of_week'],categories = weekdays)
arrivals = pd.pivot_table(df,columns = 'arrival_day_of_week',index = 'arrival_month',values = 'reservation_status',aggfunc = 'count')
fig, ax = plt.subplots(figsize = (16,11))
ax = sns.heatmap(arrivals ,annot=True, fmt="d",cmap = 'rocket_r')

### Total Guests

In [None]:
df[['adults','children','babies']] = df[['adults','children','babies']].fillna(0).astype(int)
df['total_guests'] = df['adults']+ df['children']+ df['babies']
plt.figure(figsize=(12,8))
ax = sns.countplot(x="total_guests", data = df,palette = 'twilight_shifted')
plt.title('Number of Guests')
plt.xlabel('total_guests')
plt.ylabel('Count')
for p in ax.patches:
    ax.annotate((p.get_height()),(p.get_x()+0.1 , p.get_height()+100)) 

As we can see, generally 2 guests stay in the hotel. It is also interesting to see that there are instances of over 50 guests which denotes that these were probably corporate sector guests. 

### Country of Origin

In [None]:
plt.figure(figsize=(20,8))
df_country = df['country'].value_counts().nlargest(25).astype(int)
ax = sns.barplot(df_country.index, df_country.values)
plt.title('Country')
plt.xlabel('Country')
plt.ylabel('Count')
for p in ax.patches:
    ax.annotate((p.get_height()),(p.get_x(), p.get_height()+100)) 

Most of the guests belonged to Portugal,which indicates that probably this dataset belongs to some hotel in Portugal.

### Market Segment

In [None]:
plt.figure(figsize=(12,8))
ax = sns.countplot(x="market_segment", data = df,palette = 'magma',order = df['market_segment'].value_counts().index)
plt.title('Market Segment')
plt.xlabel('market_segment')
plt.ylabel('Count')
for p in ax.patches:
    ax.annotate((p.get_height()),(p.get_x()+0.2 , p.get_height()+100)) 

Most of the bookings were made online.

### Distribution Channel

In [None]:
plt.figure(figsize=(12,8))
ax = sns.countplot(x="distribution_channel", data = df,palette = 'viridis',order = df['distribution_channel'].value_counts().index)
plt.title('Distribution Channel')
plt.xlabel('distribution_channel')
plt.ylabel('Count')
for p in ax.patches:
    ax.annotate((p.get_height()),(p.get_x()+0.3 , p.get_height()+100)) 

### Repeated Guests

In [None]:
plt.figure(figsize=(12,8))
ax = sns.countplot(x="is_repeated_guest", data = df, palette="RdYlGn")
plt.title('Is Repeated Guest?')
plt.xlabel('is_repeated_guest')
plt.ylabel('Total Count')
for p in ax.patches:
    ax.annotate((p.get_height()),(p.get_x()+0.4 , p.get_height()+100)) 

There were good share of guests who visited and stayed in the hotel more than once, which shows that the overall service of the hotel was good.

### Customer Type

In [None]:
plt.figure(figsize=(12,8))
ax = sns.countplot(x="customer_type", data = df, palette="nipy_spectral",order = df['customer_type'].value_counts().index)
plt.title('Customer Type')
plt.xlabel('customer_type')
plt.ylabel('Total Count')
for p in ax.patches:
    ax.annotate((p.get_height()),(p.get_x()+0.3 , p.get_height()+100)) 

### Car Parking

Vehicle parking is one of the main aspect in hotel booking. Customers generally tend to book hotels which provide good parking facilities.

In [None]:
plt.figure(figsize=(12,8))
ax = sns.countplot(x="required_car_parking_spaces", data = df, palette="jet_r",order = df['required_car_parking_spaces'].value_counts().index)
plt.title('Total Car Parking Spaces Required')
plt.xlabel('required_car_parking_spaces')
plt.ylabel('Total Count')
for p in ax.patches:
    ax.annotate((p.get_height()),(p.get_x()+0.35 , p.get_height()+100)) 

### Deposit Type

In [None]:
plt.figure(figsize=(12,8))
ax = sns.countplot(x="deposit_type", data = df, palette="jet_r",order = df['deposit_type'].value_counts().index)
plt.title('Deposit Type')
plt.xlabel('deposit_type')
plt.ylabel('Total Count')
for p in ax.patches:
    ax.annotate((p.get_height()),(p.get_x()+0.35 , p.get_height()+100)) 

### Period of Stay

In [None]:
df['total_nights_stayed'] = df['stays_in_weekend_nights'] + df['stays_in_week_nights']
plt.figure(figsize=(20,8))
ax = sns.countplot(x="total_nights_stayed", data = df, palette="tab10")
plt.title('Total Nights Stayed')
plt.xlabel('total_nights_stayed')
plt.ylabel('Total Count')
for p in ax.patches:
    ax.annotate((p.get_height()),(p.get_x()-0.1 , p.get_height()+100)) 

Most of the customers on average stayed from about 1-4 days.

### Reservation Status

In [None]:
plt.figure(figsize=(20,8))
ax = sns.countplot(x="reservation_status", data = df, palette="tab20")
plt.title('Reservation Status')
plt.xlabel('reservation_status')
plt.ylabel('Total Count')
for p in ax.patches:
    ax.annotate((p.get_height()),(p.get_x()+0.35 , p.get_height()+100)) 

Again as we see, there were lot of cancellations.

# To Be Continued...

### Please do upvote and support if you liked this notebook.