# OR Suite 

Reinforcement learning (RL) is a natural model for problems involving real-time sequential decision making, including inventory control, resource allocation, ridesharing systems, and ambulance routing. In these models, an agent interacts with a system that has stochastic transitions and rewards, and aims to control the system by maximizing their cumulative rewards across the trajectory. Reinforcement learning has been shown in practice to be an effective technique for learning complex control policies.

# Step 1: Package Installation
First we import the necessary packages

In [11]:
import or_suite
import numpy as np

import copy

import os
from stable_baselines3.common.monitor import Monitor
from stable_baselines3 import PPO
from stable_baselines3.ppo import MlpPolicy
from stable_baselines3.common.env_util import make_vec_env
from stable_baselines3.common.evaluation import evaluate_policy
import pandas as pd


import gym

ModuleNotFoundError: No module named 'or_suite'

# Step 2: Pick problem parameters for the environment

Here we use the ambulance metric environment as outlined in `or_suite/envs/ambulance/ambulance_metric.py`.  The package has default specifications for all of the environments in the file `or_suite/envs/env_configs.py`, and so we use one the default for the ambulance problem in a metric space.

In addition, we need to specify the number of episodes for learning, and the number of iterations (in order to plot average results with confidence intervals).

In [2]:
CONFIG =  or_suite.envs.env_configs.inventory_control_multiple_suppliers_default_config
# CONFIG = or_suite.envs.env_configs.oil_environment_default_config
CONFIG['epLen'] = 500
epLen = CONFIG['epLen']
nEps = 10
numIters = 30
print(epLen)


500


# Step 3: Pick simulation parameters

Next we need to specify parameters for the simulation. This includes setting a seed, the frequency to record the metrics, directory path for saving the data files, a deBug mode which prints the trajectory, etc.

In [3]:
DEFAULT_SETTINGS = {'seed': 1, 
                    'recFreq': 1, 
                    'dirPath': '../data/ambulance/', 
                    'deBug': False, 
                    'nEps': nEps, 
                    'numIters': numIters, 
                    'saveTrajectory': True, 
                    'epLen' : CONFIG['epLen'],
                    'render': False,
                    'pickle': False
                    }

env = gym.make('MultipleSuppliers-v0', config=CONFIG)
mon_env = Monitor(env)

# Step 4: Pick list of algorithms

We have several heuristics implemented for each of the environments defined, in addition to a `Random` policy, and some `RL discretization based` algorithms. 

The `Stable` agent only moves ambulances when responding to an incoming call and not in between calls. This means the policy $\pi$ chosen by the agent for any given state $X$ will be $\pi_h(X) = X$

The `Median` agent takes a list of all past call arrivals sorted by arrival location, and partitions it into $k$ quantiles where $k$ is the number of ambulances. The algorithm then selects the middle data point in each quantile as the locations to station the ambulances.

In [4]:
agents = { # 'SB PPO': PPO(MlpPolicy, mon_env, gamma=1, verbose=0, n_steps=epLen),
'Random': or_suite.agents.rl.random.randomAgent(),
'TBS': or_suite.agents.inventory_control_multiple_suppliers.base_surge.base_surgeAgent([14],0)
}

# Step 5: Run Simulations

Run the different heuristics in the environment

In [5]:
path_list_line = []
algo_list_line = []
path_list_radar = []
algo_list_radar= []

#each index of param_list is another list, param, where param[0] is r and param[1] is S
max_order = CONFIG['max_order']
param_list = []
for r in range(max_order+1):
    for S in range(max_order + 1):
        param_list.append([r,S])
        
for agent in agents:
    print(agent)
    DEFAULT_SETTINGS['dirPath'] = '../data/inventory_control_'+str(agent)+'/'
    if agent == 'SB PPO':
        or_suite.utils.run_single_sb_algo(mon_env, agents[agent], DEFAULT_SETTINGS)
    elif agent == 'TBS':
        or_suite.utils.run_single_algo_tune(env, agents[agent], param_list, DEFAULT_SETTINGS)
    else:
        or_suite.utils.run_single_algo(env, agents[agent], DEFAULT_SETTINGS)

    path_list_line.append('../data/inventory_control_'+str(agent))
    algo_list_line.append(str(agent))
    if agent != 'SB PPO':
        path_list_radar.append('../data/inventory_control_'+str(agent))
        algo_list_radar.append(str(agent))

Random
**************************************************
Running experiment
**************************************************
**************************************************
Experiment complete
**************************************************
**************************************************
Saving data
**************************************************
Writing to file data.csv
**************************************************
Data save complete
**************************************************
TBS
**************************************************
Running experiment
**************************************************
**************************************************
Experiment complete
**************************************************
**************************************************
Running experiment
**************************************************
**************************************************
Experiment complete
**************************************************


**************************************************
Experiment complete
**************************************************
**************************************************
Running experiment
**************************************************
**************************************************
Experiment complete
**************************************************
**************************************************
Running experiment
**************************************************
**************************************************
Experiment complete
**************************************************
**************************************************
Running experiment
**************************************************
**************************************************
Experiment complete
**************************************************
**************************************************
Running experiment
**************************************************
****************************

**************************************************
Experiment complete
**************************************************
**************************************************
Running experiment
**************************************************
**************************************************
Experiment complete
**************************************************
**************************************************
Running experiment
**************************************************
**************************************************
Experiment complete
**************************************************
**************************************************
Running experiment
**************************************************
**************************************************
Experiment complete
**************************************************
**************************************************
Running experiment
**************************************************
****************************

**************************************************
Experiment complete
**************************************************
**************************************************
Running experiment
**************************************************
**************************************************
Experiment complete
**************************************************
**************************************************
Running experiment
**************************************************
**************************************************
Experiment complete
**************************************************
**************************************************
Running experiment
**************************************************
**************************************************
Experiment complete
**************************************************
**************************************************
Running experiment
**************************************************
****************************

**************************************************
Experiment complete
**************************************************
**************************************************
Running experiment
**************************************************
**************************************************
Experiment complete
**************************************************
**************************************************
Running experiment
**************************************************
**************************************************
Experiment complete
**************************************************
**************************************************
Running experiment
**************************************************
**************************************************
Experiment complete
**************************************************
**************************************************
Running experiment
**************************************************
****************************

**************************************************
Experiment complete
**************************************************
**************************************************
Running experiment
**************************************************
**************************************************
Experiment complete
**************************************************
**************************************************
Running experiment
**************************************************
**************************************************
Experiment complete
**************************************************
**************************************************
Running experiment
**************************************************
**************************************************
Experiment complete
**************************************************
**************************************************
Running experiment
**************************************************
****************************

**************************************************
Experiment complete
**************************************************
**************************************************
Running experiment
**************************************************
**************************************************
Experiment complete
**************************************************
**************************************************
Running experiment
**************************************************
**************************************************
Experiment complete
**************************************************
**************************************************
Running experiment
**************************************************
**************************************************
Experiment complete
**************************************************
**************************************************
Running experiment
**************************************************
****************************

**************************************************
Experiment complete
**************************************************
**************************************************
Running experiment
**************************************************
**************************************************
Experiment complete
**************************************************
**************************************************
Running experiment
**************************************************
**************************************************
Experiment complete
**************************************************
**************************************************
Running experiment
**************************************************
**************************************************
Experiment complete
**************************************************
**************************************************
Running experiment
**************************************************
****************************

**************************************************
Experiment complete
**************************************************
**************************************************
Running experiment
**************************************************
**************************************************
Experiment complete
**************************************************
**************************************************
Running experiment
**************************************************
**************************************************
Experiment complete
**************************************************
**************************************************
Running experiment
**************************************************
**************************************************
Experiment complete
**************************************************
**************************************************
Running experiment
**************************************************
****************************

**************************************************
Experiment complete
**************************************************
**************************************************
Running experiment
**************************************************
**************************************************
Experiment complete
**************************************************
**************************************************
Running experiment
**************************************************
**************************************************
Experiment complete
**************************************************
**************************************************
Running experiment
**************************************************
**************************************************
Experiment complete
**************************************************
**************************************************
Running experiment
**************************************************
****************************

**************************************************
Experiment complete
**************************************************
**************************************************
Running experiment
**************************************************
**************************************************
Experiment complete
**************************************************
**************************************************
Running experiment
**************************************************
**************************************************
Experiment complete
**************************************************
**************************************************
Running experiment
**************************************************
**************************************************
Experiment complete
**************************************************
**************************************************
Running experiment
**************************************************
****************************

**************************************************
Experiment complete
**************************************************
**************************************************
Running experiment
**************************************************
**************************************************
Experiment complete
**************************************************
**************************************************
Running experiment
**************************************************
**************************************************
Experiment complete
**************************************************
**************************************************
Running experiment
**************************************************
**************************************************
Experiment complete
**************************************************
**************************************************
Running experiment
**************************************************
****************************

**************************************************
Experiment complete
**************************************************
**************************************************
Running experiment
**************************************************
**************************************************
Experiment complete
**************************************************
**************************************************
Running experiment
**************************************************
**************************************************
Experiment complete
**************************************************
**************************************************
Running experiment
**************************************************
**************************************************
Experiment complete
**************************************************
**************************************************
Running experiment
**************************************************
****************************

**************************************************
Experiment complete
**************************************************
**************************************************
Saving data
**************************************************
Writing to file data.csv
**************************************************
Data save complete
**************************************************
[8, 6]


# Step 6: Generate Figures

Create a chart to compare the different heuristic functions

In [6]:
fig_path = '../figures/'
fig_name = 'inventory'+'_line_plot'+'.pdf'
or_suite.plots.plot_line_plots(path_list_line, algo_list_line, fig_path, fig_name, int(nEps / 40)+1)

additional_metric = {}
fig_name = 'inventory'+'_radar_plot'+'.pdf'
or_suite.plots.plot_radar_plots(path_list_radar, algo_list_radar,
fig_path, fig_name,
additional_metric
)

# TODO: Import figures and display


  Algorithm        Reward      Time          Space
0    Random -1.471704e+06  2.798928 -457055.200000
1       TBS -5.480759e+05  2.734572 -457071.833333


In [None]:
# Step 7: Run Simulation with One Supplier


In [12]:
CONFIG = {'lead_times': [5],
           'demand_dist': lambda x: np.random.poisson(10),
           'supplier_costs': [100],
           'hold_cost': 1,
           'backorder_cost': 19,
           'max_inventory': 1000,
           'max_order': 200,
           'epLen': 500,
           'starting_state': None}
CONFIG['epLen'] = 100
epLen = CONFIG['epLen']
nEps = 100
numIters = 5

In [13]:
DEFAULT_SETTINGS = {'seed': 1, 
                    'recFreq': 1, 
                    'dirPath': '../data/ambulance/', 
                    'deBug': False, 
                    'nEps': nEps, 
                    'numIters': numIters, 
                    'saveTrajectory': True, 
                    'epLen' : CONFIG['epLen'],
                    'render': False,
                    'pickle': False
                    }

env = gym.make('MultipleSuppliers-v0', config=CONFIG)
mon_env = Monitor(env)

In [14]:
agents = { # 'SB PPO': PPO(MlpPolicy, mon_env, gamma=1, verbose=0, n_steps=epLen),
'Random': or_suite.agents.rl.random.randomAgent(),
'TBS': or_suite.agents.inventory_control_multiple_suppliers.base_surge.base_surgeAgent([],0)
}

In [15]:
path_list_line = []
algo_list_line = []
path_list_radar = []
algo_list_radar= []

#each index of param_list is another list, param, where param[0] is r and param[1] is S
max_order = CONFIG['max_order']
param_list = []
for S in range(max_order + 1):
        param_list.append([[],S])
        
for agent in agents:
    print(agent)
    DEFAULT_SETTINGS['dirPath'] = '../data/inventory_control_'+str(agent)+'/'
    if agent == 'SB PPO':
        or_suite.utils.run_single_sb_algo(mon_env, agents[agent], DEFAULT_SETTINGS)
    elif agent == 'TBS':
        or_suite.utils.run_single_algo_tune(env, agents[agent], param_list, DEFAULT_SETTINGS)
    else:
        or_suite.utils.run_single_algo(env, agents[agent], DEFAULT_SETTINGS)

    path_list_line.append('../data/inventory_control_'+str(agent))
    algo_list_line.append(str(agent))
    if agent != 'SB PPO':
        path_list_radar.append('../data/inventory_control_'+str(agent))
        algo_list_radar.append(str(agent))

Random
**************************************************
Running experiment
**************************************************
**************************************************
Experiment complete
**************************************************
**************************************************
Saving data
**************************************************
Writing to file data.csv
**************************************************
Data save complete
**************************************************
TBS
[] 0
**************************************************
Running experiment
**************************************************
**************************************************
Experiment complete
**************************************************
[] 1
**************************************************
Running experiment
**************************************************
**************************************************
Experiment complete
*****************************************

**************************************************
Experiment complete
**************************************************
[] 32
**************************************************
Running experiment
**************************************************
**************************************************
Experiment complete
**************************************************
[] 33
**************************************************
Running experiment
**************************************************
**************************************************
Experiment complete
**************************************************
[] 34
**************************************************
Running experiment
**************************************************
**************************************************
Experiment complete
**************************************************
[] 35
**************************************************
Running experiment
**************************************************
****

**************************************************
Experiment complete
**************************************************
[] 65
**************************************************
Running experiment
**************************************************
**************************************************
Experiment complete
**************************************************
[] 66
**************************************************
Running experiment
**************************************************
**************************************************
Experiment complete
**************************************************
[] 67
**************************************************
Running experiment
**************************************************
**************************************************
Experiment complete
**************************************************
[] 68
**************************************************
Running experiment
**************************************************
****

**************************************************
Experiment complete
**************************************************
[] 98
**************************************************
Running experiment
**************************************************
**************************************************
Experiment complete
**************************************************
[] 99
**************************************************
Running experiment
**************************************************
**************************************************
Experiment complete
**************************************************
[] 100
**************************************************
Running experiment
**************************************************
**************************************************
Experiment complete
**************************************************
[] 101
**************************************************
Running experiment
**************************************************
**

**************************************************
Experiment complete
**************************************************
[] 131
**************************************************
Running experiment
**************************************************
**************************************************
Experiment complete
**************************************************
[] 132
**************************************************
Running experiment
**************************************************
**************************************************
Experiment complete
**************************************************
[] 133
**************************************************
Running experiment
**************************************************
**************************************************
Experiment complete
**************************************************
[] 134
**************************************************
Running experiment
**************************************************


**************************************************
Experiment complete
**************************************************
[] 164
**************************************************
Running experiment
**************************************************
**************************************************
Experiment complete
**************************************************
[] 165
**************************************************
Running experiment
**************************************************
**************************************************
Experiment complete
**************************************************
[] 166
**************************************************
Running experiment
**************************************************
**************************************************
Experiment complete
**************************************************
[] 167
**************************************************
Running experiment
**************************************************


**************************************************
Experiment complete
**************************************************
[] 197
**************************************************
Running experiment
**************************************************
**************************************************
Experiment complete
**************************************************
[] 198
**************************************************
Running experiment
**************************************************
**************************************************
Experiment complete
**************************************************
[] 199
**************************************************
Running experiment
**************************************************
**************************************************
Experiment complete
**************************************************
[] 200
**************************************************
Running experiment
**************************************************


In [16]:
fig_path = '../figures/'
fig_name = 'inventory'+'_line_plot'+'.pdf'
or_suite.plots.plot_line_plots(path_list_line, algo_list_line, fig_path, fig_name, int(nEps / 40)+1)

additional_metric = {}
fig_name = 'inventory'+'_radar_plot'+'.pdf'
or_suite.plots.plot_radar_plots(path_list_radar, algo_list_radar,
fig_path, fig_name,
additional_metric
)

  Algorithm     Reward      Time    Space
0    Random -1058249.4  4.464966 -86288.0
1       TBS  -129644.2  4.396332 -86296.0
