# Demand Prediction: Hotel Occupancy

## <span style="color:red">Problem</span>

A hotel chain wants to improve their demand management and pricing policy by predicting intraday bookings, that is customers who request a room during the day without any prior reservation. This intraday demand is affected by the hotel characteristics but also external effects such as adverse transport or weather events. Using the given hotel occupancy dataset, build a model that can predict how many people will come and occupy a room on any given day and hour (intraday booking).


## <span style="color:red">Data</span>

The data consists in same day bookings for a hotel, that is how many people arrived within a given date and hour to get a room. It has current occupancy and price data from the hotel as well as external event data that could affect intraday occupancy (events, nearby airport data, weather).


### <span style="color:blue">Features</span>

#### Time variables

- **date**: date of occupancy


- **hour**: hour of occupancy (0 to 23)


- **last_hour**: 0 if before 11pm, 1 between 11pm and 12am (binary: 1 (yes), 0 (no))



#### Hotel variables 

- **start_available**: number of rooms available at the start of each date (numeric)


- **price**: average room price (numeric)


- **occupied_start**: number of rooms already occupied at the start of each date (numeric)

 
- **otb_today_hour**: "on the books': number of new rooms that have been occupied so far at the corresponding date and hour (numeric)




#### Event variables

- **comedy_event**: Is there a comedy event on this date? (binary: 1 (yes), 0 (no))


- **concert**: Is there a concert on this date? (binary: 1 (yes), 0 (no))


- **large_event**: Is there a large event on this date? (binary: 1 (yes), 0 (no))


- **show**: Is there a show event on this date? (binary: 1 (yes), 0 (no))


- **sports**: Is there a sports event on this date? (binary: 1 (yes), 0 (no))


#### Transport variables

- **airport_open**: is the airport open? (binary: 1 (yes), 0 (no))


- **flights_cancelled**: any unexpected flight cancellation on this date? (binary: 1 (yes), 0 (no))


- **flights_cancelled_notice**: any flight cancellation on this date with previous notice? (binary: 1 (yes), 0 (no))


- **travel_disruption**: any unexpected transport disruption on this date? (binary: 1 (yes), 0 (no))


- **travel_disruption_notice**: any transport disruption on this date with previous notice? (binary: 1 (yes), 0 (no))


- **weather**: are there adverse weather conditions on this date? (binary: 1 (yes), 0 (no))  


#### Other variables


- **national_holiday**: is the date variable a national holiday? (binary: 1 (yes), 0 (no))


- **school_holiday**: is the date variable a school holiday? (binary: 1 (yes), 0 (no))

       
### <span style="color:blue">Target variable</span>

- **occupied**: number of new rooms occupied at a given date and hour (numeric)


### <span style="color:blue">Train/Test sets</span>

The train set contains 3 years worth of occupancy and events data, and the test set is 1 year long.

## <span style="color:red">Before starting</span>

Given the problem and data:
- Which machine learning approach do you think would be most suited between classification and regression ?
- What is the range of values your model should be able to return ?

Answer in the below cell

## <span style="color:red">Coding starts here</span>

In [1]:
%pylab inline

Populating the interactive namespace from numpy and matplotlib


### Import packages
More can be added here on top of the default ones if necessary.

In [2]:
import pandas as pd
import seaborn

**Import Training Data**

In [3]:
train_data = pd.read_csv('https://github.com/youtalspectra/spectra_ml_example/raw/master/data/hotel_train.csv')

In [5]:
train_data

Unnamed: 0,date,hour,occupied,start_available,price,occupied_start,comedy_event,concert,flights_cancelled,flights_cancelled_notice,...,national_holiday,school_holiday,show,sports,travel_disruption,travel_disruption_notice,weather,otb_today_hour,airport_open,last_hour
0,2014-04-01,0,1,140,67.0,191,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,2014-04-01,1,1,139,67.0,191,0,0,0,0,...,0,0,0,0,0,0,0,1,0,0
2,2014-04-01,2,0,138,64.0,191,0,0,0,0,...,0,0,0,0,0,0,0,2,0,0
3,2014-04-01,3,0,138,66.0,191,0,0,0,0,...,0,0,0,0,0,0,0,2,0,0
4,2014-04-01,4,1,138,64.0,191,0,0,0,0,...,0,0,0,0,0,0,0,2,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
26299,2017-03-31,19,1,38,72.0,293,0,0,0,0,...,0,0,0,0,0,0,0,14,1,0
26300,2017-03-31,20,0,37,72.0,293,0,0,0,0,...,0,0,0,0,0,0,0,15,1,0
26301,2017-03-31,21,0,37,73.0,293,0,0,0,0,...,0,0,0,0,0,0,0,15,0,0
26302,2017-03-31,22,1,37,73.0,293,0,0,0,0,...,0,0,0,0,0,0,0,15,0,0


## Exploratory Data Analysis

Explore, pre-process and/or clean the data here. 

What is the type and/or range of values for each feature/variable? Are there any relationships or correlations between the different variables? Is any transformation of the data needed before fitting any model?

## Model fitting

Fit/optmize your model here, and get the model training score.

## Predictions

Make predictions on the following test set and get the model score here. Remember to apply the same pre-processing to the test set as done on the training set !

In [None]:
test_data = pd.read_csv('https://github.com/youtalspectra/spectra_ml_example/raw/master/data/hotel_test_1year.csv')