Hello 🙌, welcome to my notebook. In this notebook we will try to learn Complete Exploratory Data Analysis (EDA) on Hotel Booking Demand. Also making time series visualization. Feel free if you have any question or suggestion! Thank you!

![](https://cdn.galaxy.tf/thumb/sizeW1920/uploads/2s/cms_image/001/566/208/1566208111_5d5a706f22c27-thumb.jpg) (https://singapore.amarahotels.com)

In [None]:
import pandas as pd
pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)
pd.set_option('display.width', None)
pd.set_option('display.max_colwidth', None)

data1 = pd.read_csv('../input/hotel-booking-demand/hotel_bookings.csv')

- Context: 

    1. Have you ever wondered when the best time of year to book a hotel room is? 
    2. Or the optimal length of stay in order to get the best daily rate? 
    3. What if you wanted to predict whether or not a hotel was likely to receive a disproportionately high number of special requests?


- Content: This data set contains booking information for a city hotel and a resort hotel, and includes information such as when the booking was made, length of stay, the number of adults, children, and/or babies, and the number of available parking spaces, among other things.

- Inspiration: This data set is ideal for anyone looking to practice their exploratory data analysis (EDA) or get started in building predictive models

In [None]:
data1.shape

In [None]:
data1.info()

In [None]:
data1.head()

Feature Defenition:
1. hotel: Hotel (H1 = Resort Hotel or H2 = City Hotel)
2. is_canceled: Value indicating if the booking was canceled (1) or not (0)
3. lead_time: Number of days that elapsed between the entering date of the booking into the PMS and the arrival date
4. arrival_date_year: Year of arrival date
5. arrival_date_month: Month of arrival date
6. arrival_date_week_number: Week number of year for arrival date
7. stays_in_weekend_nights: Number of weekend nights (Saturday or Sunday) the guest stayed or booked to stay at the hotel
8. stays_in_week_nights: Number of week nights (Monday to Friday) the guest stayed or booked to stay at the hotel
9. adults: Number of adults
10. children: Number of children
11. babies: Number of babies
12. meal: Type of meal booked. Categories are presented in standard hospitality meal packages: Undefined/SC – no meal 
13. Country: Country of origin. Categories are represented in the ISO 3155–3:2013 format
14. market_segment: Market segment designation. In categories, the term “TA” means “Travel Agents” and “TO” means “Tour Operators”
15. distribution_channel: Booking distribution channel. The term “TA” means “Travel Agents” and “TO” means “Tour Operators”
16. is_repeated_guest: Value indicating if the booking name was from a repeated guest (1) or not (0)
17. previous_cancellations: Number of previous bookings that were cancelled by the customer prior to the current booking
18. previous_bookings_not_canceled: Number of previous bookings not cancelled by the customer prior to the current booking
19. reserved_room_type: Code of room type reserved. Code is presented instead of designation for anonymity reasons.
20. assigned_room_type: Code for the type of room assigned to the booking. Sometimes the assigned room type differs from the reserved room type.
21. booking_changes: Number of changes/amendments made to the booking from the moment the booking was entered on the PMS 
22. deposit_type: Indication on if the customer made a deposit to guarantee the booking
23. agent: ID of the travel agency that made the booking
24. company: ID of the company/entity that made the booking or responsible for paying the booking. ID is presented instead of designation for 
25. days_in_waiting_list: Number of days the booking was in the waiting list before it was confirmed to the customer
26. customer_type: Type of booking
27. adr: Average Daily Rate as defined by dividing the sum of all lodging transactions by the total number of staying nights
28. required_car_parking_spaces: Number of car parking spaces required by the customer
29. total_of_special_requests: Number of special requests made by the customer (e.g. twin bed or high floor)
30. reservation_status: Reservation last status 
31. reservation_status_date: Date at which the last status was set

In [None]:
'''Missing Value Chart'''
import matplotlib.pyplot as plt

plt.figure(figsize=(15, 10))
data1.isnull().mean(axis=0).plot.barh()
plt.title("Ratio of missing values per columns")

In [None]:
data1.isnull().sum().sort_values(ascending=False)

In [None]:
'''Nunique Columns'''

def nunique_counts(data):
   for i in data.columns:
       count = data[i].nunique()
       print(i, ": ", count)
    
nunique_counts(data1)

In [None]:
data1[data1.isnull().T.any().T].head(3)

In [None]:
data1['company'].unique()[:10]

In [None]:
data1['agent'].unique()[:10]

In [None]:
data1['country'].unique()[:10]

In [None]:
data1['children'].unique()[:10]

In [None]:
var = ["company", "agent", "country", 'children']

for i in var:
    data1[i].fillna(0, inplace=True)

- For the dataset, we have 119390 rows and 32 columns
- Company, Agent, and Country features have many missing values. Instead dropping it, i fill it with 0.
- And then i try to preview unique item for several columns

In [None]:
data1['arrival_date_year'] = pd.to_datetime(data1.arrival_date_year,format= '%Y' ).dt.year
data1['arrival_date_month'] = pd.to_datetime(data1.arrival_date_month,format= '%B' ).dt.month
data1['arrival_date_day_of_month'] = pd.to_datetime(data1.arrival_date_day_of_month,format= '%d' ).dt.day

In [None]:
data1['reservation_status_date'] = pd.to_datetime(data1['reservation_status_date']) #datetime format
data1['reservation_status_Y'] = data1['reservation_status_date'].dt.year # getting year
data1['reservation_status_M'] = data1['reservation_status_date'].dt.month # getting month
data1['reservation_status_D'] = data1['reservation_status_date'].dt.day # getting day

In [None]:
data1.head()

- I'm changing arrival year, month and day feature to datetime format. So later we can use it to time series visualization
- And for the reservation date feature, i also getting the year, month and day in seperate column

In [None]:
import plotly.offline as py 
py.init_notebook_mode(connected=True)
import plotly.graph_objs as go 
import plotly.tools as tools
import warnings
from collections import Counter 

custom_aggregation = {}
custom_aggregation["arrival_date_day_of_month"] = "count"
custom_aggregation["adults"] = "sum"
custom_aggregation["children"] = "sum"
custom_aggregation["babies"] = "sum"
data2 = data1.groupby("arrival_date_year").agg(custom_aggregation)
data2.columns = ["Booking Count",'Nb.Adults','Nb.Childrens','Nb.Babies']
data2['Year'] = data2.index
data2['Year'] = data2['Year'].astype(str)

booking = go.Bar(
    x = data2.Year.value_counts().index.sort_values(),
    y = data2["Booking Count"],
    name='Booking')

adults = go.Bar(
    x = data2.Year.value_counts().index.sort_values(),
    y = data2["Nb.Adults"],
    name='Adults')

children = go.Bar(
    x = data2.Year.value_counts().index.sort_values(),
    y = data2["Nb.Childrens"],
    name='Childrens')

babies = go.Bar(
    x = data2.Year.value_counts().index.sort_values(),
    y = data2["Nb.Babies"],
    name='Babies')

data = [booking, adults, children, babies]

fig = tools.make_subplots(rows=2, 
                          cols=2, 
                          #specs=[[{}, {}], [{'colspan': 1}, None]],
                          subplot_titles=('Number of Booking',
                                          'Number of Adults', 
                                          'Number of Childrens',
                                          'Number of Babies'))

fig.append_trace(booking, 1, 1)
fig.append_trace(adults, 1, 2)
fig.append_trace(children, 2, 1)
fig.append_trace(babies, 2, 2)

fig['layout'].update(height=800, width=800, title=' ', boxmode='group')
py.iplot(fig, filename='combined-savings')

In [None]:
import plotly.express as px

custom_aggregation = {}
custom_aggregation["arrival_date_day_of_month"] = "count"
data2 = data1.groupby("hotel").agg(custom_aggregation)
data2.columns = ["Booking Count"]
data2['Hotel'] = data2.index

fig = px.bar(data2, x='Hotel', y="Booking Count", color="Hotel")
fig['layout'].update(height=400, width=550, title='Number Booking based in Hotel Type', boxmode='group')
fig.show()

In [None]:
feature = ["hotel", 'is_canceled']
data2 = pd.crosstab(data1[feature[0]], data1[feature[1]])
data2['Hotel'] = data2.index


_0 = go.Bar(
            x = data2['Hotel'].index.values,
            y = data2[0],
            name='Continue')

_1 = go.Bar(
            x = data2['Hotel'].index.values,
            y = data2[1],
            name='Is Canceled')


feature = ["hotel", 'is_repeated_guest']
data2 = pd.crosstab(data1[feature[0]], data1[feature[1]])
data2['Hotel'] = data2.index

_0a = go.Bar(
            x = data2['Hotel'].index.values,
            y = data2[0],
            name='New Guest')

_1a = go.Bar(
            x = data2['Hotel'].index.values,
            y = data2[1],
            name='Repeated Guest')


fig = tools.make_subplots(rows=1, 
                          cols=2, 
                          #specs=[[{}, {}], [{'colspan': 1}, None]],
                          subplot_titles=('Is Canceled based on Hotel Type',
                                          'Guest based on Hotel Type'))

fig.append_trace(_0, 1, 1)
fig.append_trace(_1, 1, 1)

fig.append_trace(_0a, 1, 2)
fig.append_trace(_1a, 1, 2)

fig['layout'].update(height=500, width=900, title=' ', boxmode='group')
py.iplot(fig, filename='combined-savings')

In [None]:
feature = ["hotel", 'market_segment']
data2 = pd.crosstab(data1[feature[0]], data1[feature[1]])
data2['Hotel'] = data2.index

trace0 = go.Bar(
            x = data2['Hotel'].index.values,
            y = data2.Aviation,
            name='Aviation')

trace1 = go.Bar(
            x = data2['Hotel'].index.values,
            y = data2.Complementary,
            name='Complementary')

trace2 = go.Bar(
            x = data2['Hotel'].index.values,
            y = data2.Corporate,
            name='Corporate')

trace3 = go.Bar(
            x = data2['Hotel'].index.values,
            y = data2.Direct,
            name='Direct')

trace4 = go.Bar(
            x = data2['Hotel'].index.values,
            y = data2.Groups,
            name='Groups')

trace5 = go.Bar(
            x = data2['Hotel'].index.values,
            y = data2['Offline TA/TO'],
            name='Offline TA/TO')

trace6 = go.Bar(
            x = data2['Hotel'].index.values,
            y = data2['Online TA'],
            name='Online TA')

#---------------------------------------------------------------------

feature = ["is_canceled", 'market_segment']
data2 = pd.crosstab(data1[feature[0]], data1[feature[1]])
data2['Cancelation'] = data2.index

trace7 = go.Bar(
            x = data2['Cancelation'].index.values,
            y = data2.Aviation,
            name='Aviation')

trace8 = go.Bar(
            x = data2['Cancelation'].index.values,
            y = data2.Complementary,
            name='Complementary')

trace9 = go.Bar(
            x = data2['Cancelation'].index.values,
            y = data2.Corporate,
            name='Corporate')

trace10 = go.Bar(
            x = data2['Cancelation'].index.values,
            y = data2.Direct,
            name='Direct')

trace11 = go.Bar(
            x = data2['Cancelation'].index.values,
            y = data2.Groups,
            name='Groups')

trace12 = go.Bar(
            x = data2['Cancelation'].index.values,
            y = data2['Offline TA/TO'],
            name='Offline TA/TO')

trace13 = go.Bar(
            x = data2['Cancelation'].index.values,
            y = data2['Online TA'],
            name='Online TA')



data = [trace0, trace1, trace2, trace3, trace4, trace5, trace6,
       trace7,trace8,trace9,trace10,trace11,trace12,trace13]

fig = tools.make_subplots(rows=2, 
                          cols=1, 
                          #specs=[[{}, {}], [{'colspan': 1}, None]],
                          subplot_titles=('Market Segment Based on Hotel Type',
                                          'Cancelation Based on Market Segment'))

fig.append_trace(trace0, 1, 1)
fig.append_trace(trace1, 1, 1)
fig.append_trace(trace2, 1, 1)
fig.append_trace(trace3, 1, 1)
fig.append_trace(trace4, 1, 1)
fig.append_trace(trace5, 1, 1)
fig.append_trace(trace6, 1, 1)

fig.append_trace(trace7, 2, 1)
fig.append_trace(trace8, 2, 1)
fig.append_trace(trace9, 2, 1)
fig.append_trace(trace10, 2, 1)
fig.append_trace(trace11, 2, 1)
fig.append_trace(trace12, 2, 1)
fig.append_trace(trace13, 2, 1)

fig['layout'].update(height=750, width=900, title=' ', boxmode='group')
py.iplot(fig, filename='combined-savings')

In [None]:
feature = ["hotel", 'customer_type']
data2 = pd.crosstab(data1[feature[0]], data1[feature[1]])
data2['Hotel'] = data2.index

trace0 = go.Bar(
            x = data2['Hotel'].index.values,
            y = data2.Contract,
            name='Contract')

trace1 = go.Bar(
            x = data2['Hotel'].index.values,
            y = data2.Group,
            name='Group')

trace2 = go.Bar(
            x = data2['Hotel'].index.values,
            y = data2.Transient,
            name='Transient')

trace3 = go.Bar(
            x = data2['Hotel'].index.values,
            y = data2['Transient-Party'],
            name='Transient-Party')

#---------------------------------------------------------------------

feature = ["is_canceled", 'customer_type']
data2 = pd.crosstab(data1[feature[0]], data1[feature[1]])
data2['Cancelation'] = data2.index

trace4 = go.Bar(
            x = data2['Cancelation'].index.values,
            y = data2.Contract,
            name='Contract')

trace5 = go.Bar(
            x = data2['Cancelation'].index.values,
            y = data2.Group,
            name='Group')

trace6 = go.Bar(
            x = data2['Cancelation'].index.values,
            y = data2.Transient,
            name='Transient')

trace7 = go.Bar(
            x = data2['Cancelation'].index.values,
            y = data2['Transient-Party'],
            name='Transient-Party')

data = [trace0, trace1, trace2, trace3,
        trace7,trace8,trace9,trace10]

fig = tools.make_subplots(rows=2, 
                          cols=1, 
                          #specs=[[{}, {}], [{'colspan': 1}, None]],
                          subplot_titles=('Customer Type Based on Hotel Type',
                                          'Cancelation Based on Customer Type'))

fig.append_trace(trace0, 1, 1)
fig.append_trace(trace1, 1, 1)
fig.append_trace(trace2, 1, 1)
fig.append_trace(trace3, 1, 1)

fig.append_trace(trace4, 2, 1)
fig.append_trace(trace5, 2, 1)
fig.append_trace(trace6, 2, 1)
fig.append_trace(trace7, 2, 1)

fig['layout'].update(height=750, width=900, title=' ', boxmode='group')
py.iplot(fig, filename='combined-savings')

In [None]:
feature = ["hotel", 'deposit_type']
data2 = pd.crosstab(data1[feature[0]], data1[feature[1]])
data2['Hotel'] = data2.index

trace0 = go.Bar(
            x = data2['Hotel'].index.values,
            y = data2['No Deposit'],
            name='No Deposit')

trace1 = go.Bar(
            x = data2['Hotel'].index.values,
            y = data2['Non Refund'],
            name='Non Refund')

trace2 = go.Bar(
            x = data2['Hotel'].index.values,
            y = data2.Refundable,
            name='Refundable')

#---------------------------------------------------------------------

feature = ["is_canceled", 'deposit_type']
data2 = pd.crosstab(data1[feature[0]], data1[feature[1]])
data2['Cancelation'] = data2.index

trace3 = go.Bar(
            x = data2['Cancelation'].index.values,
            y = data2['No Deposit'],
            name='No Deposit')

trace4 = go.Bar(
            x = data2['Cancelation'].index.values,
            y = data2['Non Refund'],
            name='Non Refund')

trace5 = go.Bar(
            x = data2['Cancelation'].index.values,
            y = data2.Refundable,
            name='Refundable')

data = [trace0, trace1, trace2,
        trace3,trace4,trace5]

fig = tools.make_subplots(rows=1, 
                          cols=2, 
                          #specs=[[{}, {}], [{'colspan': 1}, None]],
                          subplot_titles=('Deposit Type Based on Hotel Type',
                                          'Cancelation Based on Deposit Type'))

fig.append_trace(trace0, 1, 1)
fig.append_trace(trace1, 1, 1)
fig.append_trace(trace2, 1, 1)

fig.append_trace(trace3, 1, 2)
fig.append_trace(trace4, 1, 2)
fig.append_trace(trace5, 1, 2)

fig['layout'].update(height=500, width=900, title=' ', boxmode='group')
py.iplot(fig, filename='combined-savings')

In [None]:
from plotly.subplots import make_subplots

feature = ["hotel", 'meal']
data2 = pd.crosstab(data1[feature[0]], data1[feature[1]])
data2['Hotel'] = data2.index

labels = ["BB", "FB", "HB", "SC", "Undefined"]

fig = make_subplots(rows=1, 
                    cols=2, 
                    specs=[[{'type':'domain'}, {'type':'domain'}]])

fig.add_trace(go.Pie(
                    labels=labels, 
                    values=[623505, 44, 6417, 10564, 0], 
                    name="City Hotel"),
                    1, 1)

fig.add_trace(go.Pie(
                    labels=labels, 
                    values=[30005, 754, 8046, 86, 1169], 
                    name="Resort Hotel"),
                    1, 2)


fig.update_traces(hole=.5, 
                  hoverinfo="label+percent+name")


fig.update_layout(
                  annotations=[dict(text='City Hotel', x=0.18, y=0.5, font_size=15, showarrow=False),
                               dict(text='Resort Hotel', x=0.85, y=0.5, font_size=15, showarrow=False)])

fig.update_traces(textposition='inside')
fig.update_layout(uniformtext_minsize=12, uniformtext_mode='hide')

fig['layout'].update(height=500, width=900, title='Meal Type', boxmode='group')
fig.show()

In [None]:
custom_aggregation = {}
custom_aggregation["arrival_date_day_of_month"] = "count"
data2 = data1.groupby("country").agg(custom_aggregation)
data2.columns = ["Booking Count"]
data2['Country'] = data2.index

labels = data2['Country'].tolist()
values = data2['Booking Count'].tolist()

fig = px.pie(data2, values=values, names=labels)
fig.update_traces(textposition='inside')
fig.update_layout(uniformtext_minsize=12, uniformtext_mode='hide')
fig['layout'].update(height=500, width=700, title='Booking by Country', boxmode='group')
fig.show()

In [None]:
fig = px.box(data1, x="market_segment", y="lead_time", color="market_segment", boxmode="overlay")

fig.update_traces(quartilemethod="inclusive")
fig['layout'].update(height=500, width=750, title='Lead Time Boxplot by Market Segment')
fig.show()

In [None]:
fig = px.box(data1, x="hotel", y="lead_time", color="hotel", boxmode="overlay")

fig['layout'].update(height=500, width=750, title='Lead Time Boxplot by Hotel Type')
fig.update_traces(quartilemethod="inclusive")
fig.show()

In [None]:
custom_aggregation = {}
custom_aggregation["reservation_status_D"] = "count"
data2 = data1.groupby("reserved_room_type").agg(custom_aggregation)
data2.columns = ["Booking Count"]
data2['Reserved Room Type'] = data2.index

trace0 = go.Bar(
            x = data2['Reserved Room Type'].index.values,
            y = data2['Booking Count'],
            name='Nb. of Booking of Reserved Room Type')

custom_aggregation = {}
custom_aggregation["reservation_status_D"] = "count"
data2 = data1.groupby("assigned_room_type").agg(custom_aggregation)
data2.columns = ["Booking Count"]
data2['Assigned Room Type'] = data2.index

trace1 = go.Bar(
            x = data2['Assigned Room Type'].index.values,
            y = data2['Booking Count'],
            name='Nb. of Booking of Assigned Room Type')

data = [trace0, trace1]

fig = tools.make_subplots(rows=2, 
                          cols=1, 
                          #specs=[[{}, {}], [{'colspan': 1}, None]],
                          subplot_titles=('Nb. of Booking of Reserved Room Type',
                                          'Nb. of Booking of Assigned Room Type'))

fig.append_trace(trace0, 1, 1)

fig.append_trace(trace1, 2, 1)

fig['layout'].update(height=650, width=950, title=' ', boxmode='group')
py.iplot(fig, filename='combined-savings')

In [None]:
feature = ["reserved_room_type", 'market_segment']
data2 = pd.crosstab(data1[feature[0]], data1[feature[1]])
data2['Reserved Room Type'] = data2.index

fig = make_subplots(rows=4, 
                    cols=2, 
                    specs=[[{'type':'domain'}, {'type':'domain'}],
                           [{'type':'domain'}, {'type':'domain'}],
                           [{'type':'domain'}, {'type':'domain'}],
                           [{'type':'domain'}, {'type':'domain'}]],
                    subplot_titles=('Aviation',
                                    'Complementary',
                                   'Corporate',
                                   'Direct',
                                   'Groups',
                                   'Offline TA/TO',
                                   'Online TA',
                                   'Undefined'))

# Aviation

labels = data2['Reserved Room Type'].tolist()
values = data2['Aviation'].tolist()

fig.add_trace(go.Pie(
                    labels=labels, 
                    values=values, 
                    name="Aviation"),
                    1, 1)

# Complementary
labels = data2['Reserved Room Type'].tolist()
values = data2['Complementary'].tolist()

fig.add_trace(go.Pie(
                    labels=labels, 
                    values=values, 
                    name="Complementary"),
                    1, 2)

# Corporate
labels = data2['Reserved Room Type'].tolist()
values = data2['Corporate'].tolist()

fig.add_trace(go.Pie(
                    labels=labels, 
                    values=values, 
                    name="Corporate"),
                    2, 1)

# Direct
labels = data2['Reserved Room Type'].tolist()
values = data2['Direct'].tolist()

fig.add_trace(go.Pie(
                    labels=labels, 
                    values=values, 
                    name="Direct"),
                    2, 2)

# Groups
labels = data2['Reserved Room Type'].tolist()
values = data2['Groups'].tolist()

fig.add_trace(go.Pie(
                    labels=labels, 
                    values=values, 
                    name="Groups"),
                    3, 1)

# Offline TA/TO
labels = data2['Reserved Room Type'].tolist()
values = data2['Offline TA/TO'].tolist()

fig.add_trace(go.Pie(
                    labels=labels, 
                    values=values, 
                    name="Offline TA/TO"),
                    3, 2)

# Online TA
labels = data2['Reserved Room Type'].tolist()
values = data2['Online TA'].tolist()

fig.add_trace(go.Pie(
                    labels=labels, 
                    values=values, 
                    name="Online TA"),
                    4, 1)

# Undefined
labels = data2['Reserved Room Type'].tolist()
values = data2['Undefined'].tolist()

fig.add_trace(go.Pie(
                    labels=labels, 
                    values=values, 
                    name="Undefined"),
                    4, 2)

fig.update_traces(textposition='inside')
fig.update_layout(uniformtext_minsize=12, uniformtext_mode='hide')

fig['layout'].update(height=900, width=700, title='Reserved Room Type Based on Market Segment', boxmode='group')
fig.show()

In [None]:
feature = ["assigned_room_type", 'market_segment']
data2 = pd.crosstab(data1[feature[0]], data1[feature[1]])
data2['Assigned Room Type'] = data2.index

fig = make_subplots(rows=4, 
                    cols=2, 
                    specs=[[{'type':'domain'}, {'type':'domain'}],
                           [{'type':'domain'}, {'type':'domain'}],
                           [{'type':'domain'}, {'type':'domain'}],
                           [{'type':'domain'}, {'type':'domain'}]],
                    subplot_titles=('Aviation',
                                    'Complementary',
                                   'Corporate',
                                   'Direct',
                                   'Groups',
                                   'Offline TA/TO',
                                   'Online TA',
                                   'Undefined'))

# Aviation

labels = data2['Assigned Room Type'].tolist()
values = data2['Aviation'].tolist()

fig.add_trace(go.Pie(
                    labels=labels, 
                    values=values, 
                    name="Aviation"),
                    1, 1)

# Complementary
labels = data2['Assigned Room Type'].tolist()
values = data2['Complementary'].tolist()

fig.add_trace(go.Pie(
                    labels=labels, 
                    values=values, 
                    name="Complementary"),
                    1, 2)

# Corporate
labels = data2['Assigned Room Type'].tolist()
values = data2['Corporate'].tolist()

fig.add_trace(go.Pie(
                    labels=labels, 
                    values=values, 
                    name="Corporate"),
                    2, 1)

# Direct
labels = data2['Assigned Room Type'].tolist()
values = data2['Direct'].tolist()

fig.add_trace(go.Pie(
                    labels=labels, 
                    values=values, 
                    name="Direct"),
                    2, 2)

# Groups
labels = data2['Assigned Room Type'].tolist()
values = data2['Groups'].tolist()

fig.add_trace(go.Pie(
                    labels=labels, 
                    values=values, 
                    name="Groups"),
                    3, 1)

# Offline TA/TO
labels = data2['Assigned Room Type'].tolist()
values = data2['Offline TA/TO'].tolist()

fig.add_trace(go.Pie(
                    labels=labels, 
                    values=values, 
                    name="Offline TA/TO"),
                    3, 2)

# Online TA
labels = data2['Assigned Room Type'].tolist()
values = data2['Online TA'].tolist()

fig.add_trace(go.Pie(
                    labels=labels, 
                    values=values, 
                    name="Online TA"),
                    4, 1)

# Undefined
labels = data2['Assigned Room Type'].tolist()
values = data2['Undefined'].tolist()

fig.add_trace(go.Pie(
                    labels=labels, 
                    values=values, 
                    name="Undefined"),
                    4, 2)

fig.update_traces(textposition='inside')
fig.update_layout(uniformtext_minsize=12, uniformtext_mode='hide')

fig['layout'].update(height=900, width=700, title='Assigned Room Type Based on Market Segment', boxmode='group')
fig.show()

In [None]:
fig = px.box(data1, x="reserved_room_type", y="booking_changes", color="reserved_room_type", boxmode="overlay")

fig['layout'].update(height=500, width=850, title='Booking Changes based on Room Type')
fig.update_traces(quartilemethod="inclusive")
fig.show()

In [None]:
fig = px.box(data1, x="reserved_room_type", y="days_in_waiting_list", color="reserved_room_type", boxmode="overlay")

fig['layout'].update(height=500, width=850, title='Waiting List based on Room Type')
fig.update_traces(quartilemethod="inclusive")
fig.show()

In [None]:
fig = px.box(data1, x="market_segment", y="total_of_special_requests", color="market_segment", boxmode="overlay")

fig['layout'].update(height=500, width=850, title='Special Request Based on Market Segment')
fig.update_traces(quartilemethod="inclusive")
fig.show()

In [None]:
fig = make_subplots(rows=2, 
                    cols=1, 
                    subplot_titles=('Weekly Number of Booking',
                                    'Monthly Number of Booking'))

custom_aggregation = {}
custom_aggregation["reservation_status_D"] = "count"
data1 = data1.set_index(pd.DatetimeIndex(data1['reservation_status_date']))
data2 = data1.resample('W').agg(custom_aggregation)
data2.columns = ["Booking Count"]
data2['Date'] = data2.index

x = data2['Date'].tolist()
y = data2['Booking Count'].tolist()


fig.add_trace(go.Scatter(x=x, y=y,name='Weekly Nb. of Book'), 1, 1)

custom_aggregation = {}
custom_aggregation["reservation_status_D"] = "count"
data1 = data1.set_index(pd.DatetimeIndex(data1['reservation_status_date']))
data2 = data1.resample('M').agg(custom_aggregation)
data2.columns = ["Booking Count"]
data2['Date'] = data2.index

x = data2['Date'].tolist()
y = data2['Booking Count'].tolist()

fig.add_trace(go.Scatter(x=x, y=y, mode='lines+markers',name='Monthly Nb. of Book'), 2, 1)

fig['layout'].update(height=700, width=900, title='Booking Time Series')
fig.show()

In [None]:
fig = make_subplots(rows=2, 
                    cols=1, 
                    subplot_titles=('Weekly Number of Arrival Guest',
                                    'Monthly Number of Arrival Guest'))

custom_aggregation = {}
custom_aggregation["arrival_date_day_of_month"] = "count"
data1 = data1.set_index(pd.DatetimeIndex(data1['reservation_status_date']))
data2 = data1.resample('W').agg(custom_aggregation)
data2.columns = ["Nb. of Arrival Guest"]
data2['Date'] = data2.index

x = data2['Date'].tolist()
y = data2['Nb. of Arrival Guest'].tolist()


fig.add_trace(go.Scatter(x=x, y=y,name='Weekly Nb. of Arrival Guest'), 1, 1)

custom_aggregation = {}
custom_aggregation["arrival_date_day_of_month"] = "count"
data1 = data1.set_index(pd.DatetimeIndex(data1['reservation_status_date']))
data2 = data1.resample('M').agg(custom_aggregation)
data2.columns = ["Nb. of Arrival Guest"]
data2['Date'] = data2.index

x = data2['Date'].tolist()
y = data2['Nb. of Arrival Guest'].tolist()

fig.add_trace(go.Scatter(x=x, y=y, mode='lines+markers',name='Monthly Nb. of Arrival Guest'), 2, 1)

fig['layout'].update(height=700, width=900, title='Arrival Guest Time Series')
fig.show()

In [None]:
fig = make_subplots(rows=2, 
                    cols=1, 
                    subplot_titles=('Weekly Average Rate',
                                    'Monthly Average Rate'))

custom_aggregation = {}
custom_aggregation["adr"] = "mean"
data1 = data1.set_index(pd.DatetimeIndex(data1['reservation_status_date']))
data2 = data1.resample('W').agg(custom_aggregation)
data2.columns = ["ADR"]
data2['Date'] = data2.index

x = data2['Date'].tolist()
y = data2['ADR'].tolist()


fig.add_trace(go.Scatter(x=x, y=y, name='Weekly Average Rate'), 1, 1)

custom_aggregation = {}
custom_aggregation["adr"] = "mean"
data1 = data1.set_index(pd.DatetimeIndex(data1['reservation_status_date']))
data2 = data1.resample('M').agg(custom_aggregation)
data2.columns = ["ADR"]
data2['Date'] = data2.index

x = data2['Date'].tolist()
y = data2['ADR'].tolist()

fig.add_trace(go.Scatter(x=x, y=y, mode='lines+markers', name='Monthly Average Rate'), 2, 1)

fig['layout'].update(height=700, width=900, title='Average Rate Time Series')
fig.show()

In [None]:
fig = make_subplots(rows=2, 
                    cols=1, 
                    subplot_titles=('Weekly Total Stays',
                                    'Monthly Total Stays'))

custom_aggregation = {}
custom_aggregation["stays_in_weekend_nights"] = "sum"
custom_aggregation["stays_in_week_nights"] = "sum"
data1 = data1.set_index(pd.DatetimeIndex(data1['reservation_status_date']))
data2 = data1.resample('W').agg(custom_aggregation)
data2.columns = ["Stays Weekend", 'Stays Weekdays']
data2['Date'] = data2.index

x = data2['Date'].tolist()
y = data2['Stays Weekend'].tolist()
z = data2['Stays Weekdays'].tolist()

fig.add_trace(go.Scatter(x=x, y=y, name='Weekly Stays in Weekend'), 1, 1)
fig.add_trace(go.Scatter(x=x, y=z, name='Weekly Stays in Weekdays'), 1, 1)

custom_aggregation = {}
custom_aggregation["stays_in_weekend_nights"] = "sum"
custom_aggregation["stays_in_week_nights"] = "sum"
data1 = data1.set_index(pd.DatetimeIndex(data1['reservation_status_date']))
data2 = data1.resample('M').agg(custom_aggregation)
data2.columns = ["Stays Weekend", 'Stays Weekdays']
data2['Date'] = data2.index

x = data2['Date'].tolist()
y = data2['Stays Weekend'].tolist()
z = data2['Stays Weekdays'].tolist()

fig.add_trace(go.Scatter(x=x, y=y, name='Monthly Stays in Weekend', mode='lines+markers'), 2, 1)
fig.add_trace(go.Scatter(x=x, y=z, name='Monthly Stays in Weekdays', mode='lines+markers'), 2, 1)


fig['layout'].update(height=700, width=900, title='Total Stays Time Series')
fig.show()

Conclusion:
1. From the Average Rate Time Series chart we can see that the lowest rate is in the last week of November. The chart show that every year the rate in that month is low compare to another month.
2. The optimal length of stay in order to get the best daily rate is 7 days, start from around Novenber 29 to December 6.
3. What if you wanted to predict whether or not a hotel was likely to receive a disproportionately high number of special request? Yes, because all of market segment mostly have special request. 


Dont' Forget to Upvote! Thank you!:)