# Revenue Management Simulations

The revenue management problem entails the management of different available resources consumed by different classes of customers in order to maximize revenue. The environment contains an agent that must determine what class of customers to accept at different locations for revenue, through careful allocation to avoid exhaustion of resources. The agent also has to consider the probabilities of customers showing up to the system. If a customer arrives to the system and is accepted by the agent, then the customer utilizes some amount of the different resources and provides some amount of revenue.

The state space of the environment is the amount of available resources for the agent and is represented by $S = [0,B_1]*[0,B_2]*...*[0,B_k]$, where $B_i$ is the max availability of resource $i$ and $k$ is the total number of resources. The action space of the environment is a binary vector of length n that determines, which classes of customers are accepted and rejected. This is represented by $A = [0,...,1]$ with length $n$, where $n$ represents the number of customer classes. Additionally, the reward for the agent is just the revenue from selling resources to customer class that arrives, and the reward is zero if the customer was denied or resources are not available. 

The state transitions based on arrival $P_t$ that either equals $j_t \in [n]$ or $\emptyset$: 
* If $P_t = \emptyset$, then no arrivals occured, $reward = 0$, and $S_t = S_{t+1}$. 
* If $P_t = j_t$ and $a_{jt} = 0$, then arrivals were rejected, $reward = 0$, and $S_t = S_{t+1}$. 
* If $P_t = j_t$, $a_{jt} = 1$, and $S_t-A_{jt}^T > 0$ (resources purchased), then arrivals were accepted and enough resources were available such that $S_{t+1} = S_t - A_{jt}^T$ with $reward = f_{jt}$

### Package Installation

In [2]:
import or_suite
import numpy as np

import copy

import os
from stable_baselines3.common.monitor import Monitor
from stable_baselines3 import PPO
from stable_baselines3.ppo import MlpPolicy
from stable_baselines3.common.env_util import make_vec_env
from stable_baselines3.common.evaluation import evaluate_policy
import pandas as pd


import gym

### Experimental Parameters


The revenue management problem has several experiment parameters
* The parameter `epLen`, an int, number of time steps to run the experiment for 
* `nEps` is an int representing the number of episodes. The default is set to 2. 
* `numIters`, an int, is the number of iterations. Here it is set to 50. 
* `seed` allows random numbers to be generated. 
* `dirPath`, a string, is the location where the data files are stored.
* `deBug`, a bool, prints information to the command line when set true. 
* `save_trajectory`, a bool, saves the trajectory information of the simulation when set to true. 
* `render` renders the algorithm when set to true.
* `pickle` is a bool that saves the information to a pickle file when set to true.

In [3]:
CONFIG =  or_suite.envs.env_configs.airline_default_config

epLen = CONFIG['epLen']
nEps = 2
numIters = 50

In [4]:
DEFAULT_SETTINGS = {'seed': 1, 
                    'recFreq': 1, 
                    'dirPath': '../data/', 
                    'deBug': False, 
                    'nEps': nEps, 
                    'numIters': numIters, 
                    'saveTrajectory': True, 
                    'epLen' : 5,
                    'render': False,
                    'pickle': False
                    }


revenue_env = gym.make('Airline-v0', config=CONFIG)
mon_env = Monitor(revenue_env)

### Specifying Agent

We specify 4 agents to compare effectiveness of each:

* `SB PPO` is Proximal Policy Optimization. When policy is updated, there is a parameter that “clips” each policy update so that action update does not go too far
* `Random` is a randomized RL algorithm, which randomly selects whether to accept/reject customer classes.
* `BayesSelector` is an optimization algorithm, which determines what optimal actions to take based on current inventory levels and the expected number of future arrivals (`RoundFlag` = True)
    * (`RoundFlag` = True) - Allocate based on the proportion of types accepted across all rounds being larger than 1/2
* `BayesSelectorBadRounding` is similar to the `BayesSelector` agent, but instead it's rounding is more inaccurate (`RoundFlag` = False)
    * (`RoundFlag` = False) - Allocate with a random policy which allocates a type to a bernoulli sampled from the proportion of those types accepted across all rounds


In [5]:
epLen=100
agents = { #'SB PPO': PPO(MlpPolicy, mon_env, gamma=1, verbose=0, n_steps=epLen),
'Random': or_suite.agents.rl.random.randomAgent(),
'BayesSelector': or_suite.agents.airline_revenue_management.bayes_selector.bayes_selectorAgent(epLen, round_flag=True),
'BayesSelectorBadRounding': or_suite.agents.airline_revenue_management.bayes_selector.bayes_selectorAgent(epLen, round_flag=False),
}

### Experiment Set-Up

In each experiment we set up parameters for our environment. This includes setting up the config dictionary that sets up the revenue management environment.
We then set the desired number of episodes and number of iterations. Then, we set up the settings for running the experiment, by creating the `DEFAULT_SETTINGS` dictionary. We then create an instance of the environment and a monitor for it. 
    Then, the experiment is run by calling `run_single_algo` (or `run_single_sb_algo` for the SB PPO agent). The results of the experiment are written to a csv file which can be used to obtain the line plot and radar graphs for each agent. 

## Basic Simulation

In this example, we use the default configuration and 50 iterations with 2 episodes. This is a synthetic example with 2 classes where each class has a 1 in 3 chance of arriving (and there is a 1/3 chance that no class arrives). There are also 3 types of resources available for the customers.

### Agents

In [6]:
agents = { # 'SB PPO': PPO(MlpPolicy, mon_env, gamma=1, verbose=0, n_steps=epLen),
'Random': or_suite.agents.rl.random.randomAgent(),
'BayesSelector': or_suite.agents.airline_revenue_management.bayes_selector.bayes_selectorAgent(epLen, round_flag=True),
'BayesSelectorBadRounding': or_suite.agents.airline_revenue_management.bayes_selector.bayes_selectorAgent(epLen, round_flag=False),
}

### Running Algorithm

In [7]:
path_list_line = []
algo_list_line = []
path_list_radar = []
algo_list_radar= []
for agent in agents:
    print(agent)
    DEFAULT_SETTINGS['dirPath'] = '../data/airline_'+str(agent)
    if agent == 'SB PPO':
        or_suite.utils.run_single_sb_algo(mon_env, agents[agent], DEFAULT_SETTINGS)
    else:
        or_suite.utils.run_single_algo(revenue_env, agents[agent], DEFAULT_SETTINGS)

    path_list_line.append('../data/airline_'+str(agent))
    algo_list_line.append(str(agent))
    if agent != 'SB PPO':
        path_list_radar.append('../data/airline_'+str(agent))
        algo_list_radar.append(str(agent))

Random
Writing to file data.csv
BayesSelector
Writing to file data.csv
BayesSelectorBadRounding
Writing to file data.csv


In [8]:
fig_path = '../figures/'
fig_name = 'revenue'+'_line_plot'+'.pdf'
or_suite.plots.plot_line_plots(path_list_line, algo_list_line, fig_path, fig_name, int(nEps / 40)+1)

additional_metric = {}
fig_name = 'revenue'+'_radar_plot'+'.pdf'
or_suite.plots.plot_radar_plots(path_list_radar, algo_list_radar,
fig_path, fig_name,
additional_metric
)



                  Algorithm  Reward      Time     Space
0                    Random    1.72  6.770615  -3727.14
1             BayesSelector    2.72  2.849841 -28247.98
2  BayesSelectorBadRounding    2.88  2.845717 -26245.44


In [9]:
from IPython.display import IFrame
IFrame("../figures/revenue_line_plot.pdf", width=600, height=280)

In [10]:
IFrame("../figures/revenue_radar_plot.pdf", width=600, height=450)

### Results
Based on the table above, the Bayes Selector agent outperforms the Random Agent. The 'Bad Rounding' version of the Bayes Selector agent performs slightly worse, but is still very close to the normal version. 


## Simulation with Dual Degeneracies

In this example, we use the default configuration. This is a synthetic example with 2 classes where each class has a 1 in 3 chance of arriving (and there is a 1/3 chance that no class arrives). There are also 3 types of resources available for the customers.  The difference with the previous, though, is that the cost parameters are sampled such that the solution experiences dual degeneracy (see [here](https://arxiv.org/abs/1906.06361) for a discussion).

### Experimental Parameters

In [11]:
p = .45 # either do .44 or .45
CONFIG['P'] = np.asarray([[1-p, p],[1-p,p],[1-p,p],[1-p,p],[1-p,p]])

DEFAULT_SETTINGS = {'seed': 1, 
                    'recFreq': 1, 
                    'dirPath': '../data/', 
                    'deBug': False, 
                    'nEps': nEps, 
                    'numIters': numIters, 
                    'saveTrajectory': True, 
                    'epLen' : 5,
                    'render': False,
                    'pickle': False
                    }


revenue_env = gym.make('Airline-v0', config=CONFIG)
mon_env = Monitor(revenue_env)

### Specifying Agents

In [12]:
agents = { # 'SB PPO': PPO(MlpPolicy, mon_env, gamma=1, verbose=0, n_steps=epLen),
'Random': or_suite.agents.rl.random.randomAgent(),
'BayesSelector': or_suite.agents.airline_revenue_management.bayes_selector.bayes_selectorAgent(epLen),
'BayesSelectorBadRounding': or_suite.agents.airline_revenue_management.bayes_selector.bayes_selectorAgent(epLen, round_flag=False),
}

### Running Algorithm

In [13]:
path_list_line = []
algo_list_line = []
path_list_radar = []
algo_list_radar= []
for agent in agents:
    print(agent)
    DEFAULT_SETTINGS['dirPath'] = '../data/airline_'+str(agent)
    if agent == 'SB PPO':
        or_suite.utils.run_single_sb_algo(mon_env, agents[agent], DEFAULT_SETTINGS)
    else:
        or_suite.utils.run_single_algo(revenue_env, agents[agent], DEFAULT_SETTINGS)

    path_list_line.append('../data/airline_'+str(agent))
    algo_list_line.append(str(agent))
    if agent != 'SB PPO':
        path_list_radar.append('../data/airline_'+str(agent))
        algo_list_radar.append(str(agent))
        
        
fig_path = '../figures/'
fig_name = 'revenue'+'_line_plot'+'.pdf'
or_suite.plots.plot_line_plots(path_list_line, algo_list_line, fig_path, fig_name, int(nEps / 40)+1)

# 
additional_metric = {}
fig_name = 'revenue'+'_radar_plot'+'.pdf'
or_suite.plots.plot_radar_plots(path_list_radar, algo_list_radar,
fig_path, fig_name,
additional_metric
)

Random
Writing to file data.csv
BayesSelector
Writing to file data.csv
BayesSelectorBadRounding
Writing to file data.csv
                  Algorithm  Reward      Time     Space
0                    Random    2.32  6.583894  -3610.36
1             BayesSelector    3.14  2.966289 -27388.78
2  BayesSelectorBadRounding    3.14  3.022140 -26070.02


In [14]:
IFrame("../figures/revenue_line_plot.pdf", width=600, height=280)

In [15]:
IFrame("../figures/revenue_radar_plot.pdf", width=600, height=450)

### Results
Once again, the Bayes Selector agent outperforms the Random Agent. The 'Bad Rounding' version of the Bayes Selector agent performs slightly worse, but is still very close to the normal version. 


## Simulation with different parameters

The following parameters come from [this paper](https://courses.cit.cornell.edu/orie6590/projects/spring_2021/sam_tan.pdf) written by ORIE 6590 students. This custom policy should  be a nontrivial example. 

### Experimental Parameters

In [16]:
epLen = 4
A = np.asarray([[1, 1, 0,0,0,0], [ 0,0, 1, 1, 1, 1], [ 0,0, 0,0, 1, 1] ])
tau = 23
P = np.ones((tau, A.shape[1]))/3
c = [5, 5, 5]
f = range(10, 16)
CONFIG = {'A': A, 'f': f, 'P': P, 'starting_state': c , 'tau': tau}
nEps = 2
numIters = 50

In [17]:
m = 6
l = 3
A = np.identity(m)
for i in range(l):
    for j in range(l):
        if i != j:
            demand_col = np.zeros((m, 1))
            demand_col[2 * i + 1] = 1.0
            demand_col[2 * j] = 1.0
            A=  np.append(A, demand_col, axis = 1)
A = np.append(A, A, axis = 1)
tau = 20
P = np.array([0.01327884, 0.02244177, 0.07923761, 0.0297121,  0.02654582, 0.08408091, 0.09591975, 0.00671065, 0.08147508, 0.00977341, 0.02966204, 0.121162, 0.00442628, 0.00748059, 0.02641254, 0.00990403, 0.00884861, 0.02802697, 0.03197325, 0.00223688, 0.02715836, 0.0032578,  0.00988735, 0.04038733])
P = np.array([P]*tau)
c = [2]*6
f = np.array([33, 28, 36, 34, 17, 20, 39, 24, 31, 19, 30, 48, 165, 140, 180, 170, 85, 100,195, 120, 155, 95, 150, 240])
CONFIG = {'epLen':epLen, 'A': A, 'f': f, 'P': P, 'starting_state': c , 'tau': tau}
epLen = CONFIG['epLen']


In [18]:
DEFAULT_SETTINGS = {'seed': 1, 
                    'recFreq': 1, 
                    'dirPath': '../data/', 
                    'deBug': False, 
                    'nEps': nEps, 
                    'numIters': numIters, 
                    'saveTrajectory': True, 
                    'epLen' : 5,
                    'render': False,
                    'pickle': False
                    }


revenue_env = gym.make('Airline-v0', config=CONFIG)
mon_env = Monitor(revenue_env)

### Specifying Agents

In [19]:
agents = { # 'SB PPO': PPO(MlpPolicy, mon_env, gamma=1, verbose=0, n_steps=epLen),
'Random': or_suite.agents.rl.random.randomAgent(),
'BayesSelector': or_suite.agents.airline_revenue_management.bayes_selector.bayes_selectorAgent(epLen),
'BayesSelectorBadRounding': or_suite.agents.airline_revenue_management.bayes_selector.bayes_selectorAgent(epLen, round_flag=False),
}

### Running Algorithm

In [20]:
path_list_line = []
algo_list_line = []
path_list_radar = []
algo_list_radar= []
for agent in agents:
    print(agent)
    DEFAULT_SETTINGS['dirPath'] = '../data/airline_'+str(agent)
    if agent == 'SB PPO':
        or_suite.utils.run_single_sb_algo(mon_env, agents[agent], DEFAULT_SETTINGS)
    else:
        or_suite.utils.run_single_algo(revenue_env, agents[agent], DEFAULT_SETTINGS)

    path_list_line.append('../data/airline_'+str(agent))
    algo_list_line.append(str(agent))
    if agent != 'SB PPO':
        path_list_radar.append('../data/airline_'+str(agent))
        algo_list_radar.append(str(agent))

Random
Writing to file data.csv
BayesSelector
Writing to file data.csv
BayesSelectorBadRounding
Writing to file data.csv


In [21]:
fig_path = '../figures/'
fig_name = 'revenue'+'_line_plot'+'.pdf'
or_suite.plots.plot_line_plots(path_list_line, algo_list_line, fig_path, fig_name, int(nEps / 40)+1)

# 
additional_metric = {}
fig_name = 'revenue'+'_radar_plot'+'.pdf'
or_suite.plots.plot_radar_plots(path_list_radar, algo_list_radar,
fig_path, fig_name,
additional_metric
)



                  Algorithm  Reward      Time     Space
0                    Random  120.02  5.851176  -3786.72
1             BayesSelector  134.54  2.714183 -27144.78
2  BayesSelectorBadRounding  126.76  2.647503 -26858.46


In [22]:
IFrame("../figures/revenue_line_plot.pdf", width=600, height=280)

In [23]:
IFrame("../figures/revenue_radar_plot.pdf", width=600, height=450)

### Results

Once again, the Bayes Selector agents outperform the Random agent and accumulate a higher reward. However, for this set of parameters, the "Bad Rounding" agent accumulates a slightly higher reward than the normal Bayess Selector agent. 