# Welcome to the Revenue Management Simulation!

First, make sure you've read by blog post about the project, so you have an understanding of the steps. Then, follow along to reproduce the project on your local computer.

For more information about what's happening behind the scenes, take a look at the `.py` scripts in `code` folder.

As-of-date (abbreviated 'AOD') is 2017-08-01. We will be trying to predict cancellations and demand for each day in August 2017.

### Let's start with some imports:

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import datetime as dt
from dbds import generate_hotel_dfs

pd.options.display.max_rows = 150
pd.options.display.max_columns = 250
pd.options.display.max_colwidth = None

aod = '2016-07-31'

### Let's also make sure we have a `code/pickle` folder.

In [2]:
!mkdir pickle 
# create an empty pickle folder if it doesn't exist (will not work on Windows)

A subdirectory or file pickle already exists.


### First, let's initialize all required pickle files

In [3]:
from init_pickle_files import init_pickle_files
init_pickle_files() #this is generating files correctly

h1_res.pick already exists
h2_res.pick already exists
X1_cxl.pick already exists
X2_cxl.pick already exists
y1_cxl.pick already exists
y2_cxl.pick already exists


### Now, let's get our reservations ready for modeling, and calculate actual hotel statistics.

Time to execute: < 30 seconds

In [4]:
from dbds import generate_hotel_dfs 
h2_res, h2_dbd = generate_hotel_dfs("../data/H2.csv", capacity=226)
h1_res, h1_dbd = generate_hotel_dfs(res_filepath="../data/H1.csv", capacity=187)

h1_res.to_pickle("pickle/h1_res.pick")
h1_dbd.to_pickle("pickle/h1_dbd.pick")
h2_res.to_pickle("pickle/h2_res.pick")
h2_dbd.to_pickle("pickle/h2_dbd.pick")

Hotel dataframes generated successfully!
Hotel capacity: 226 rooms
Hotel data date range: 2015-07-01 to 2017-08-31
Hotel dataframes generated successfully!
Hotel capacity: 187 rooms
Hotel data date range: 2015-07-01 to 2017-08-31


### Now let's calculate current OTB (on the books) statistics, as of every date from 2016-07-01 to 2017-08-30.

This is necessary to get demand features for each future arrival date, namely:
* On the books same-time-last year
* Recent booking trends (T-30, T-15, T-5)
* Pace

**Running the below (commented) code will take several hours the first time.** It trains and runs the predictive cancellation model (along with many other calculations), and saves a pickled file for each day, for each hotel. For information about exactly what's happening with each iteration, take a look at `sim.py`. 

In the real-world, this wouldn't be necessary, as we could simply save one file per day and access it whenever we need to. But given the situation, it's a necessary step.

**To avoid this, I've saved the aggregated output into one csv for each hotel.** These files are `../data/h1_stats.csv` and `../data/h2_stats.csv`.

In [5]:
# from save_sims import save_historical_OTB
# # save_historical_OTB saves the historical OTB data to a CSV file   
# # this one takes several hours
# save_historical_OTB(h1_dbd, h1_res, h2_dbd, h2_res)

#THIS IIS ALSO WORKING

In [6]:
# these process the resulting files from save_historical_otb
# from agg import prep_demand_features

# h1_sim = prep_demand_features(1, results_csv_out='../data/H1.csv', prelim_csv_out='../data/H1.csv')
# h2_sim = prep_demand_features(1, results_csv_out='../data/H2.csv', prelim_csv_out='../data/H2.csv')


In [7]:
# take the faster route (use the CSVs in the repo):
date_cols = ['StayDate',
 'STLY_StayDate',
 'AsOfDate',
 'STLY_AsOfDate',
 'AsOfDate_STLY',
 'StayDate_STLY']

h1_sim = pd.read_csv("../data/h1_stats.csv", parse_dates=date_cols, infer_datetime_format=True)
h1_sim.drop(columns=["STLY_StayDate", "AsOfDate_STLY", "Unnamed: 0"], errors='ignore', inplace=True) # remove dup. columns
h1_sim.reset_index(inplace=True)

h2_sim = pd.read_csv("../data/h2_stats.csv", parse_dates=date_cols, infer_datetime_format=True)
h2_sim.drop(columns=["STLY_StayDate", "AsOfDate_STLY", "Unnamed: 0"], errors='ignore', inplace=True) # remove dup. columns
h2_sim.reset_index(inplace=True)


print(h1_sim)


  h1_sim = pd.read_csv("../data/h1_stats.csv", parse_dates=date_cols, infer_datetime_format=True)
  h2_sim = pd.read_csv("../data/h2_stats.csv", parse_dates=date_cols, infer_datetime_format=True)


       index                       id  DOW  RoomsOTB    RevOTB  CxlForecast  \
0          0  2016-07-31 - 2016-07-31  Sun     170.0  28570.36         25.0   
1          1  2016-07-31 - 2016-08-01  Mon     178.0  29525.52         31.0   
2          2  2016-07-31 - 2016-08-02  Tue     182.0  30820.89         35.0   
3          3  2016-07-31 - 2016-08-03  Wed     174.0  30144.62         38.0   
4          4  2016-07-31 - 2016-08-04  Thu     179.0  32412.13         42.0   
...      ...                      ...  ...       ...       ...          ...   
11738  11738  2017-08-01 - 2017-08-27  Sun     165.0  31468.98         36.0   
11739  11739  2017-08-01 - 2017-08-28  Mon     169.0  32690.81         31.0   
11740  11740  2017-08-01 - 2017-08-29  Tue     172.0  32283.29         36.0   
11741  11741  2017-08-01 - 2017-08-30  Wed     163.0  29308.46         30.0   
11742  11742  2017-08-01 - 2017-08-31  Thu     148.0  25598.71         26.0   

       TRN_RoomsOTB  TRN_RevOTB  TRN_CxlForecast  T

### Now we can predict demand for each day in August, using the below two cells.

Note: Ignore the pricing information. I attempted to use price as a feature to predict demand, but it wasn't working. The reason is because I don't have historical selling price data, nor competitor pricing data, so there was no way to teach the model to recognize that increasing price reduces demand. 

In [8]:
from demand import model_demand
h1_demand = model_demand(1, h1_sim, aod)

Converting features to appropriate types...
Training Random Forest model to predict remaining transient demand...
Training set size: 8220 samples
Test set size: 3523 samples
Splitting data into training and test sets...
(8220, 39) (8220,) (3523, 39) (3523,)
x test        week_of_year  RoomsOTB  RoomsOTB_STLY  TRN_RoomsOTB  TRN_RoomsOTB_STLY  \
1655           41.0     174.0          174.0          15.0               46.0   
9763           22.0     176.0          172.0         122.0              113.0   
11701          34.0     173.0          172.0         132.0              141.0   
3529           47.0      51.0           22.0          49.0               19.0   
7646           17.0     166.0           89.0          25.0               74.0   
...             ...       ...            ...           ...                ...   
1144           39.0     175.0          185.0          75.0               95.0   
9684           24.0     166.0          166.0         129.0              131.0   
9598  

In [9]:
h2_demand = model_demand(2, h2_sim, aod)

Converting features to appropriate types...
Training Random Forest model to predict remaining transient demand...
Training set size: 8220 samples
Test set size: 3523 samples
Splitting data into training and test sets...
(8220, 39) (8220,) (3523, 39) (3523,)
Model ready.

R² score on test set (stay dates Aug 1 - Aug 31, 2017):                        0.971
MAE (Mean Absolute Error) score on test set (stay dates Aug 1 - Aug 31, 2017): 1.66
MSE (Mean Squared Error) score on test set (stay dates Aug 1 - Aug 31, 2017):  6.998

Calculating optimal selling prices...

df_demand    index                       id  DOW  RoomsOTB    RevOTB  CxlForecast  \
0      0  2016-07-31 - 2016-07-31  Sun     212.0  23157.67         10.0   
1      1  2016-07-31 - 2016-08-01  Mon     189.0  22065.26         12.0   
2      2  2016-07-31 - 2016-08-02  Tue     210.0  24525.32         16.0   
3      3  2016-07-31 - 2016-08-03  Wed     218.0  25384.31         19.0   
4      4  2016-07-31 - 2016-08-04  Thu     213.0 

## That's it! Now you're ready to look through all of these dataFrames we've created. 