## Hotel booking demand

The Hotel Booking demand dataset contains booking information for a city hotel and a resort hotel. It includes information such as booking time, length of stay, number of adults, children/babies, number of available parking spaces, among other things. This dataset is ideal for anyone looking to practice their exploratory data analysis (EDA) or get started in building predictive models.

### Abstract
This data article describes two datasets with hotel demand data. One of the hotels (H1) is a resort hotel and the other is a city hotel (H2). Both datasets share the same structure, with 31 variables describing the 40,060 observations of H1 and 79,330 observations of H2. Each observation represents a hotel booking. Both datasets comprehend bookings due to arrive between the 1st of July of 2015 and the 31st of August 2017, including bookings that effectively arrived and bookings that were canceled. Since this is hotel real data, all data elements pertaining hotel or costumer identification were deleted. Due to the scarcity of real business data for scientific and educational purposes, these datasets can have an important role for research and education in revenue management, machine learning, or data mining, as well as in other fields.

### Context
Have you ever wondered when the best time of year to book a hotel room is? Or the optimal length of stay in order to get the best daily rate? What if you wanted to predict whether or not a hotel was likely to receive a disproportionately high number of special requests?

This hotel booking dataset can help you explore those questions!

### Content
This data set contains booking information for a city hotel and a resort hotel, and includes information such as when the booking was made, length of stay, the number of adults, children, and/or babies, and the number of available parking spaces, among other things.

All personally identifying information has been removed from the data.

visit here to read more about the dataset https://www.sciencedirect.com/science/article/pii/S2352340918315191#ab0010

### Acknowledgements
The data is originally from the article Hotel Booking Demand Datasets, written by Nuno Antonio, Ana Almeida, and Luis Nunes for Data in Brief, Volume 22, February 2019.

The data was downloaded and cleaned by Thomas Mock and Antoine Bichat for #TidyTuesday during the week of February 11th, 2020.

| Variable | Type    | Description | Source/Engineering |
|--- |----------| ------|---------|
|ADR|	Numeric|	Average Daily Rate as defined by [5]|	BO, BL and TR / Calculated by dividing the sum of all lodging transactions by the total number of staying nights|
|Adults|	Integer|	Number of adults|	BO and BL|
|Agent|	Categorical|	ID of the travel agency that made the bookinga	BO and BL|
|ArrivalDateDayOfMonth|	Integer|	Day of the month of the arrival date	BO and BL|
|ArrivalDateMonth|	Categorical|	Month of arrival date with 12 categories: “January” to “December”	BO and BL|
|ArrivalDateWeekNumber|	Integer|	Week number of the arrival date	BO and BL|
|ArrivalDateYear|	Integer	|Year of arrival date	BO and BL|
|AssignedRoomType|	Categorical|	Code for the type of room assigned to the booking. Sometimes the assigned room type differs from the reserved room type due to hotel operation reasons (e.g. overbooking) or by customer request. Code is presented instead of designation for anonymity reasons	BO and BL|
|Babies|	Integer|	Number of babies	BO and BL|
|BookingChanges|	Integer|	Number of changes/amendments made to the booking from the moment the booking was entered on the PMS until the moment of check-in or cancellation	BO and BL/Calculated by adding the number of unique iterations that change some of the booking attributes, namely: persons, arrival date, nights, reserved room type or meal|
|Children|	Integer|	Number of children	BO and BL/Sum of both payable and non-payable children|
|Company|	Categorical|	ID of the company/entity that made the booking or responsible for paying the booking. ID is| presented instead of designation for anonymity reasons	BO and BL.|
|Country|	Categorical|	Country of origin. Categories are represented in the ISO 3155–3:2013 format [6]	BO, BL and NT|
|CustomerType|	Categorical|	Type of booking, assuming one of four categories:	BO and BL
Contract - when the booking has an allotment or other type of contract associated to it;
Group – when the booking is associated to a group;
Transient – when the booking is not part of a group or contract, and is not associated to other transient booking;|
Transient-party – when the booking is transient, but is associated to at least other transient booking
|DaysInWaitingList|	Integer|	Number of days the booking was in the waiting list before it was confirmed to the customer	BO/Calculated by subtracting the date the booking was confirmed to the customer from the date the booking entered on the PMS|




DepositType|Categorical|	Indication on if the customer made a deposit to guarantee the booking. This variable can assume three categories:|	BO and TR/Value calculated based on the payments identified for the booking in the transaction (TR) table before the booking׳s arrival or cancellation date.|
No Deposit – no deposit was made;
In case no payments were found the value is “No Deposit”.
If the payment was equal or exceeded the total cost of stay, the value is set as “Non Refund”.
Non Refund – a deposit was made in the value of the total stay cost;
Otherwise the value is set as “Refundable”
Refundable – a deposit was made with a value under the total cost of stay.
DistributionChannel	Categorical	Booking distribution channel. The term “TA” means “Travel Agents” and “TO” means “Tour Operators”	BO, BL and DC
IsCanceled	Categorical	Value indicating if the booking was canceled (1) or not (0)	BO
IsRepeatedGuest	Categorical	Value indicating if the booking name was from a repeated guest (1) or not (0)	BO, BL and C/ Variable created by verifying if a profile was associated with the booking customer. If so, and if the customer profile creation date was prior to the creation date for the booking on the PMS database it was assumed the booking was from a repeated guest
LeadTime	Integer	Number of days that elapsed between the entering date of the booking into the PMS and the arrival date	BO and BL/ Subtraction of the entering date from the arrival date
MarketSegment	Categorical	Market segment designation. In categories, the term “TA” means “Travel Agents” and “TO” means “Tour Operators”	BO, BL and MS




Meal	Categorical	Type of meal booked. Categories are presented in standard hospitality meal packages:	BO, BL and ML
Undefined/SC – no meal package;
BB – Bed & Breakfast;
HB – Half board (breakfast and one other meal – usually dinner);
FB – Full board (breakfast, lunch and dinner)
PreviousBookingsNotCanceled	Integer	Number of previous bookings not cancelled by the customer prior to the current booking	BO and BL / In case there was no customer profile associated with the booking, the value is set to 0. Otherwise, the value is the number of bookings with the same customer profile created before the current booking and not canceled.
PreviousCancellations	Integer	Number of previous bookings that were cancelled by the customer prior to the current booking	BO and BL/ In case there was no customer profile associated with the booking, the value is set to 0. Otherwise, the value is the number of bookings with the same customer profile created before the current booking and canceled.
RequiredCardParkingSpaces	Integer	Number of car parking spaces required by the customer	BO and BL




ReservationStatus	Categorical	Reservation last status, assuming one of three categories:	BO
Canceled – booking was canceled by the customer;
Check-Out – customer has checked in but already departed;
No-Show – customer did not check-in and did inform the hotel of the reason why
ReservationStatusDate	Date	Date at which the last status was set. This variable can be used in conjunction with the ReservationStatus to understand when was the booking canceled or when did the customer checked-out of the hotel	BO
ReservedRoomType	Categorical	Code of room type reserved. Code is presented instead of designation for anonymity reasons	BO and BL
StaysInWeekendNights	Integer	Number of weekend nights (Saturday or Sunday) the guest stayed or booked to stay at the hotel	BO and BL/ Calculated by counting the number of weekend nights from the total number of nights
StaysInWeekNights	Integer	Number of week nights (Monday to Friday) the guest stayed or booked to stay at the hotel	BO and BL/Calculated by counting the number of week nights from the total number of nights
TotalOfSpecialRequests	Integer	Number of special requests made by the customer (e.g. twin bed or high floor)	BO and BL/Sum of all special requests

In [None]:
from IPython.display import Image
Image(filename='NLP.PNG')

!pip install pivottablejs

In [None]:
from pivottablejs import pivot_ui
pivot_ui(data)

In [None]:
# Importing the pandas and numpy package
import pandas as pd
import numpy as np

In [6]:
# reading the contents of the data set
all_data = pd.read_csv('hotel_bookings.csv')

In [7]:
all_data.head()

Unnamed: 0,hotel,is_canceled,lead_time,arrival_date_year,arrival_date_month,arrival_date_week_number,arrival_date_day_of_month,stays_in_weekend_nights,stays_in_week_nights,adults,...,deposit_type,agent,company,days_in_waiting_list,customer_type,adr,required_car_parking_spaces,total_of_special_requests,reservation_status,reservation_status_date
0,Resort Hotel,0,342,2015,July,27,1,0,0,2,...,No Deposit,,,0,Transient,0.0,0,0,Check-Out,2015-07-01
1,Resort Hotel,0,737,2015,July,27,1,0,0,2,...,No Deposit,,,0,Transient,0.0,0,0,Check-Out,2015-07-01
2,Resort Hotel,0,7,2015,July,27,1,0,1,1,...,No Deposit,,,0,Transient,75.0,0,0,Check-Out,2015-07-02
3,Resort Hotel,0,13,2015,July,27,1,0,1,1,...,No Deposit,304.0,,0,Transient,75.0,0,0,Check-Out,2015-07-02
4,Resort Hotel,0,14,2015,July,27,1,0,2,2,...,No Deposit,240.0,,0,Transient,98.0,0,1,Check-Out,2015-07-03


In [8]:
all_data.tail()

Unnamed: 0,hotel,is_canceled,lead_time,arrival_date_year,arrival_date_month,arrival_date_week_number,arrival_date_day_of_month,stays_in_weekend_nights,stays_in_week_nights,adults,...,deposit_type,agent,company,days_in_waiting_list,customer_type,adr,required_car_parking_spaces,total_of_special_requests,reservation_status,reservation_status_date
119385,City Hotel,0,23,2017,August,35,30,2,5,2,...,No Deposit,394.0,,0,Transient,96.14,0,0,Check-Out,2017-09-06
119386,City Hotel,0,102,2017,August,35,31,2,5,3,...,No Deposit,9.0,,0,Transient,225.43,0,2,Check-Out,2017-09-07
119387,City Hotel,0,34,2017,August,35,31,2,5,2,...,No Deposit,9.0,,0,Transient,157.71,0,4,Check-Out,2017-09-07
119388,City Hotel,0,109,2017,August,35,31,2,5,2,...,No Deposit,89.0,,0,Transient,104.4,0,0,Check-Out,2017-09-07
119389,City Hotel,0,205,2017,August,35,29,2,7,2,...,No Deposit,9.0,,0,Transient,151.2,0,2,Check-Out,2017-09-07


# Shape of the data-set


The way a data-set is arranged into rows and columns is referred to as the shape of data.

It is a good practice to check whether the complete data-set has been imported successfully into the Pandas data frame from our data source i.e. the excel file in our case. This can be done by checking the number of rows and columns i.e. the shape property of the data frame.

## Shape is used to get the dimensions of the dataframe.

In [9]:
# Here we are getting the shape of the dataset. We have 119390 rows and 32 columns. THIS IS BEFORE CLEAN UP samples
all_data.shape

(119390, 32)

## Size is used to get the number of elements in the dataframe.

In [None]:
# multiple the number of rows and column
print(all_data.size)

In [10]:
# Here, we are getting the info on the dataset to see if we have to change the type of the target 
# we have identify with the method astype().
all_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 119390 entries, 0 to 119389
Data columns (total 32 columns):
hotel                             119390 non-null object
is_canceled                       119390 non-null int64
lead_time                         119390 non-null int64
arrival_date_year                 119390 non-null int64
arrival_date_month                119390 non-null object
arrival_date_week_number          119390 non-null int64
arrival_date_day_of_month         119390 non-null int64
stays_in_weekend_nights           119390 non-null int64
stays_in_week_nights              119390 non-null int64
adults                            119390 non-null int64
children                          119386 non-null float64
babies                            119390 non-null int64
meal                              119390 non-null object
country                           118902 non-null object
market_segment                    119390 non-null object
distribution_channel              119390 n

## Clean up the data
The first step in this is figuring out what we need to clean. I have found in practice, that you find things you need to clean as you perform operations and get errors. Based on the error, you decide how you should go about cleaning the data

In [11]:
# We are now cleaning the dataset using isna() method to identify all the columns that needs clean up, from the 
# below output we can see that Country, Agent and Company columns all need to be cleaned.
all_data.isna().sum()

hotel                                  0
is_canceled                            0
lead_time                              0
arrival_date_year                      0
arrival_date_month                     0
arrival_date_week_number               0
arrival_date_day_of_month              0
stays_in_weekend_nights                0
stays_in_week_nights                   0
adults                                 0
children                               4
babies                                 0
meal                                   0
country                              488
market_segment                         0
distribution_channel                   0
is_repeated_guest                      0
previous_cancellations                 0
previous_bookings_not_canceled         0
reserved_room_type                     0
assigned_room_type                     0
booking_changes                        0
deposit_type                           0
agent                              16340
company         

## Drop rows of NAN

In [13]:
# Now let's Find NAN inside the dataset to clean them up
data_non_nan = all_data[all_data.isna().any(axis=1)]
display(data_non_nan.head())

all_data = all_data.dropna(how='all')
all_data.head()

Unnamed: 0,hotel,is_canceled,lead_time,arrival_date_year,arrival_date_month,arrival_date_week_number,arrival_date_day_of_month,stays_in_weekend_nights,stays_in_week_nights,adults,...,deposit_type,agent,company,days_in_waiting_list,customer_type,adr,required_car_parking_spaces,total_of_special_requests,reservation_status,reservation_status_date
0,Resort Hotel,0,342,2015,July,27,1,0,0,2,...,No Deposit,,,0,Transient,0.0,0,0,Check-Out,2015-07-01
1,Resort Hotel,0,737,2015,July,27,1,0,0,2,...,No Deposit,,,0,Transient,0.0,0,0,Check-Out,2015-07-01
2,Resort Hotel,0,7,2015,July,27,1,0,1,1,...,No Deposit,,,0,Transient,75.0,0,0,Check-Out,2015-07-02
3,Resort Hotel,0,13,2015,July,27,1,0,1,1,...,No Deposit,304.0,,0,Transient,75.0,0,0,Check-Out,2015-07-02
4,Resort Hotel,0,14,2015,July,27,1,0,2,2,...,No Deposit,240.0,,0,Transient,98.0,0,1,Check-Out,2015-07-03


Unnamed: 0,hotel,is_canceled,lead_time,arrival_date_year,arrival_date_month,arrival_date_week_number,arrival_date_day_of_month,stays_in_weekend_nights,stays_in_week_nights,adults,...,deposit_type,agent,company,days_in_waiting_list,customer_type,adr,required_car_parking_spaces,total_of_special_requests,reservation_status,reservation_status_date
0,Resort Hotel,0,342,2015,July,27,1,0,0,2,...,No Deposit,,,0,Transient,0.0,0,0,Check-Out,2015-07-01
1,Resort Hotel,0,737,2015,July,27,1,0,0,2,...,No Deposit,,,0,Transient,0.0,0,0,Check-Out,2015-07-01
2,Resort Hotel,0,7,2015,July,27,1,0,1,1,...,No Deposit,,,0,Transient,75.0,0,0,Check-Out,2015-07-02
3,Resort Hotel,0,13,2015,July,27,1,0,1,1,...,No Deposit,304.0,,0,Transient,75.0,0,0,Check-Out,2015-07-02
4,Resort Hotel,0,14,2015,July,27,1,0,2,2,...,No Deposit,240.0,,0,Transient,98.0,0,1,Check-Out,2015-07-03


In [None]:
all_data.tail()

## Get rid of text in reservation status date column

In [14]:
# Getting rid of text in the reservation status date just to make sure we don't have any text in the column
all_data = all_data[all_data['reservation_status_date'].str[0:2]!='Or']

In [15]:
# Get the tail after getting rid of text in the date column
all_data.head(5)

Unnamed: 0,hotel,is_canceled,lead_time,arrival_date_year,arrival_date_month,arrival_date_week_number,arrival_date_day_of_month,stays_in_weekend_nights,stays_in_week_nights,adults,...,deposit_type,agent,company,days_in_waiting_list,customer_type,adr,required_car_parking_spaces,total_of_special_requests,reservation_status,reservation_status_date
0,Resort Hotel,0,342,2015,July,27,1,0,0,2,...,No Deposit,,,0,Transient,0.0,0,0,Check-Out,2015-07-01
1,Resort Hotel,0,737,2015,July,27,1,0,0,2,...,No Deposit,,,0,Transient,0.0,0,0,Check-Out,2015-07-01
2,Resort Hotel,0,7,2015,July,27,1,0,1,1,...,No Deposit,,,0,Transient,75.0,0,0,Check-Out,2015-07-02
3,Resort Hotel,0,13,2015,July,27,1,0,1,1,...,No Deposit,304.0,,0,Transient,75.0,0,0,Check-Out,2015-07-02
4,Resort Hotel,0,14,2015,July,27,1,0,2,2,...,No Deposit,240.0,,0,Transient,98.0,0,1,Check-Out,2015-07-03


## Make columns correct type

In [16]:
# Get the tail after getting rid of text in the date column
all_data.tail(5)

Unnamed: 0,hotel,is_canceled,lead_time,arrival_date_year,arrival_date_month,arrival_date_week_number,arrival_date_day_of_month,stays_in_weekend_nights,stays_in_week_nights,adults,...,deposit_type,agent,company,days_in_waiting_list,customer_type,adr,required_car_parking_spaces,total_of_special_requests,reservation_status,reservation_status_date
119385,City Hotel,0,23,2017,August,35,30,2,5,2,...,No Deposit,394.0,,0,Transient,96.14,0,0,Check-Out,2017-09-06
119386,City Hotel,0,102,2017,August,35,31,2,5,3,...,No Deposit,9.0,,0,Transient,225.43,0,2,Check-Out,2017-09-07
119387,City Hotel,0,34,2017,August,35,31,2,5,2,...,No Deposit,9.0,,0,Transient,157.71,0,4,Check-Out,2017-09-07
119388,City Hotel,0,109,2017,August,35,31,2,5,2,...,No Deposit,89.0,,0,Transient,104.4,0,0,Check-Out,2017-09-07
119389,City Hotel,0,205,2017,August,35,29,2,7,2,...,No Deposit,9.0,,0,Transient,151.2,0,2,Check-Out,2017-09-07


In [17]:
# Display the current data types of the columns
all_data.dtypes

hotel                              object
is_canceled                         int64
lead_time                           int64
arrival_date_year                   int64
arrival_date_month                 object
arrival_date_week_number            int64
arrival_date_day_of_month           int64
stays_in_weekend_nights             int64
stays_in_week_nights                int64
adults                              int64
children                          float64
babies                              int64
meal                               object
country                            object
market_segment                     object
distribution_channel               object
is_repeated_guest                   int64
previous_cancellations              int64
previous_bookings_not_canceled      int64
reserved_room_type                 object
assigned_room_type                 object
booking_changes                     int64
deposit_type                       object
agent                             

## Each column is considered a variable with multiple distinct values across which we can derive certain insights and prepare reports from our data set.

## Pivot Table

#### Here we decided to shorten our data frame for further analysis by only keeping a few columns across which we want to analyze the data by creating a pivot table.
#### We created a list of columns and created another smaller data frame on which we can reshape our data and do further analysis. This is a handy technique to stay focused on the columns we have to do further analysis on and prevent unnecessary processing load.

In [59]:
all_data_short = pd.DataFrame(all_data, columns = ['hotel','is_canceled','adr','adults','arrival_date_month',\
                                                  'arrival_date_week_number','arrival_date_year','babies',\
                                                 'booking_changes','children','days_in_waiting_list','lead_time',\
                                                 'previous_bookings_not_canceled','previous_cancellations',\
                                                 'required_car_parking_spaces','stays_in_weekend_nights',\
                                                 'stays_in_week_nights','total_of_special_requests','reserved_room_type',\
                                                 'is_repeated_guest','country','customer_type','reservation_status'])

In [71]:
data2 = all_data[['hotel','is_canceled','country','adr','adults', 'reservation_status']]

In [72]:
data2.head()

Unnamed: 0,hotel,is_canceled,country,adr,adults,reservation_status
0,Resort Hotel,0,PRT,0.0,2,Check-Out
1,Resort Hotel,0,PRT,0.0,2,Check-Out
2,Resort Hotel,0,GBR,75.0,1,Check-Out
3,Resort Hotel,0,GBR,75.0,1,Check-Out
4,Resort Hotel,0,GBR,98.0,2,Check-Out


In [75]:
data2.reservation_status.value_counts()

Check-Out    75166
Canceled     43017
No-Show       1207
Name: reservation_status, dtype: int64

In [84]:
CheckOut, Canceled, NoShow = data2.reservation_status.value_counts()

In [79]:
data2.hotel.value_counts()

City Hotel      79330
Resort Hotel    40060
Name: hotel, dtype: int64

In [85]:
data2['hotel'].value_counts()

City Hotel      79330
Resort Hotel    40060
Name: hotel, dtype: int64

In [80]:
city_hotel, resort_hotel = data2.hotel.value_counts()

In [81]:
city_hotel

79330

In [60]:
all_data_short['hotel'].unique()

array(['Resort Hotel', 'City Hotel'], dtype=object)

#### Add num_Checkout column

all_data_short['num_Checkout'] = all_data_short['reservation_status'].str[0:2]
all_data_short['num_Checkout'] = all_data_short['num_Checkout'].astype('int32')
all_data_short.tail()

#### Prepare report across all Hotel aggregating the is_canceled, total_of_special_requests, reservation_status and days_in_waiting_list for each.

In [None]:
## room type with the busiest month
## 

# Exploratory Data Analysis (EDA)

In [None]:
# Class inbalance

### Question 1: What was the best hotel (H1: City Hotel, H2: Resort Hotel) for reservation requests? How many were cancelled?

In [57]:
all_data_short['hotel'] = all_data_short['total_of_special_requests'] - all_data_short['is_canceled']

In [58]:
all_data_short.groupby(['hotel'])[['total_of_special_requests']].sum()

Unnamed: 0_level_0,total_of_special_requests
hotel,Unnamed: 1_level_1
-1,0
0,7318
1,31640
2,21544
3,6297
4,1226
5,190


In [None]:
# all_data_short.groupby(['hotel']).sum()

In [46]:
all_data_short.groupby(['hotel'])[['is_canceled']].sum()

Unnamed: 0_level_0,is_canceled
hotel,Unnamed: 1_level_1
-1,33556
0,7318
1,2866
2,446
3,36
4,2
5,0


In [None]:
groups = all_data_short.groupby('hotel')

In [53]:
all_data_short['hotel'].unique()

array([ 0,  1, -1,  3,  2,  4,  5], dtype=int64)

In [54]:
all_data['hotel'].unique()

array(['Resort Hotel', 'City Hotel'], dtype=object)

In [68]:
# Group by Hotel
all_data_short.groupby('hotel')[['total_of_special_requests']].sum()

Unnamed: 0_level_0,total_of_special_requests
hotel,Unnamed: 1_level_1
City Hotel,43387
Resort Hotel,24828


In [66]:
resort = groups.get_group('Resort Hotel')
resort.head()

Unnamed: 0,hotel,is_canceled,adr,adults,arrival_date_month,arrival_date_week_number,arrival_date_year,babies,booking_changes,children,...,previous_cancellations,required_car_parking_spaces,stays_in_weekend_nights,stays_in_week_nights,total_of_special_requests,reserved_room_type,is_repeated_guest,country,customer_type,reservation_status
0,Resort Hotel,0,0.0,2,July,27,2015,0,3,0.0,...,0,0,0,0,0,C,0,PRT,Transient,Check-Out
1,Resort Hotel,0,0.0,2,July,27,2015,0,4,0.0,...,0,0,0,0,0,C,0,PRT,Transient,Check-Out
2,Resort Hotel,0,75.0,1,July,27,2015,0,0,0.0,...,0,0,0,1,0,A,0,GBR,Transient,Check-Out
3,Resort Hotel,0,75.0,1,July,27,2015,0,0,0.0,...,0,0,0,1,0,A,0,GBR,Transient,Check-Out
4,Resort Hotel,0,98.0,2,July,27,2015,0,0,0.0,...,0,0,0,2,1,A,0,GBR,Transient,Check-Out


In [67]:
groups.get_group('Resort Hotel')[['total_of_special_requests']].sum()

total_of_special_requests    24828
dtype: int64

In [None]:
get_H1

In [None]:
get_H2 = groups.get_group('City Hotel')

In [None]:
get_H2

In [None]:
# get the length of the Resort Hotel
len(get_H1)

In [None]:
# get the length of the City Hotel
len(get_H2)

In [None]:
all_data_short['get_H1 = groups.get_group('Resort Hotel')reservation_status'].unique()

In [None]:
get_H1 = groups.get_group('Resort Hotel')
get_H2 = groups.get_group('City Hotel')

In [None]:
len(get_H1)

In [None]:
len(get_H2)

In [None]:
get_H1_request = get_H1.groupby('total_of_special_requests').sum()

display(get_H1_request)

#all_data_H1.groupby(['total_of_special_requests']).sum()

In [None]:
# * all_data['Price Each'].astype('float')      - all_data['is_canceled'].astype('float')
# all_data_H1['Hotel'] = all_data_H1['total_of_special_requests'].astype('int').sum()
# ,nan_df_H2.head()

display(get_H1,get_H1_request.sum())


In [None]:
all_data_H1 = all_data_H1['total_of_special_requests'].astype('int').sum()

In [None]:
all_data_H1

In [None]:
#  + all_data.groupby(['arrival_date_month']).sum()
all_data_H1.groupby(['hotel']).sum()

In [None]:
# ['Sales']
import matplotlib.pyplot as plt

# Arrival date month range
months = range(1,13)
print(months)

plt.bar(months,all_data.groupby(['arrival_date_month']).sum()['Hotel'])
plt.xticks(months)
plt.ylabel('Total of special requests')
plt.xlabel('arrival_date_month')
plt.show()


### Question 2: Each Hotel requests and reservation status

In [51]:
# add the two colums sum = df['budget'] + df['actual'
# df = pd.DataFrame(data,columns=['Hotel','reservation_status'])
# sum_row = df.sum(axis=1)
# print (sum_row)
df = pd.DataFrame(data,columns=['Hotel','reservation_status'])
#sum_row = df.sum(axis=1)
all_data.groupby(df).sum()

NameError: name 'data' is not defined

In [None]:
Question 3: What is the nationality of our customers? And which of our hotel they requested?

In [None]:
Question 4: What products are most often sold together?

In [None]:
Question 5: What product sold the most? Why do you think it sold the most?