# Experiment for Ridesharing


In this notebook, we will walk through the ridesharing problem with three different enviornment configurations that is set up in the package. 

The following two variables can be used to 1) choose between the two version of the environment (no travel_time vs travel_time) and 2) use grid approximation to optimize the $\alpha$ parameter of the max_weight agent.

In [1]:
has_travel_time = False
algo_tune_on = False

# Step 1: Package Installation
First we import the necessary packages

In [2]:
import or_suite
import numpy as np
import itertools as it

import copy

import os
from stable_baselines3.common.env_checker import check_env
from stable_baselines3.common.monitor import Monitor
from stable_baselines3 import PPO
from stable_baselines3.ppo import MlpPolicy
from stable_baselines3.common.env_util import make_vec_env
from stable_baselines3.common.evaluation import evaluate_policy
import pandas as pd


import gym
import networkx as nx

# Configuration of $K4$ graph with uniform length of edges
The first configuration consists of a simple $K4$ graph with uniform distance of 10 for all edges. For this particular set-up, the closest_car agent trumps the max_weight agent in performance. Max_weight agent's algorithm attempts to maintain a relative uniform distribution throughout the system, allowing any incoming request to have a car readily available close by at all times. However, for complete graphs with uniformly distanced edges, as distance from one node to another is always uniform and any node with a car can be used to dispatch, max_weight agent's use of the weight parameter $\alpha$ and number of cars available at each node outputs less optimal action, compared to the closest_car agent.
The details of the configuration is specified below.


* The network is a $K4$ graph with uniform distance of 10 for all edges.
* There are 10 avaiable cars in the system.
* The fare parameter $f$ is 3, and the cost parameter is $c$ is 1.
* The average velocity $v$ is 3.
* $\gamma$ and $d_{threshold}$ are 1 and 20 respectively.

# Step 2: Pick problem parameters for the environment

Here we use the ridesharing environment as outlined in `or_suite/envs/ambulance/ambulance_metric.py`. In addition, we need to specify the number of episodes for learning, and the number of iterations (in order to plot average results with confidence intervals).

In [3]:
CONFIG =  or_suite.envs.env_configs.rideshare_graph_2cities_config
CONFIG['epLen'] = 10
epLen = CONFIG['epLen']
nEps = 1000
numIters = 20

# Step 3: Pick simulation parameters

Next we need to specify parameters for the simulation. This includes setting a seed, the frequency to record the metrics, directory path for saving the data files, a deBug mode which prints the trajectory, etc.

In [4]:
DEFAULT_SETTINGS = {'seed': 1, 
                    'recFreq': 1, 
                    'dirPath': '../data/rideshare/', 
                    'deBug': False, 
                    'nEps': nEps, 
                    'numIters': numIters, 
                    'saveTrajectory': True, 
                    'epLen' : epLen,
                    'render': False,
                    'pickle': False
                    }

starting_state = CONFIG['starting_state']
num_cars = CONFIG['num_cars']
num_nodes = len(starting_state)

if has_travel_time:
  rideshare_env = gym.make('Rideshare-v1', config=CONFIG) # fix the indents...
else:
  rideshare_env = gym.make('Rideshare-v0', config=CONFIG)
mon_env = Monitor(rideshare_env)

scaling_list = [0.01, 0.1, 1., 10.]
observation_space = rideshare_env.observation_space
action_space = rideshare_env.action_space


# Step 4: Pick list of algorithms

We have several heuristics implemented for each of the environments defined, in addition to a `Random` policy, and some `RL discretization based` algorithms. 

In [7]:
agents = { #'SB PPO': PPO(MlpPolicy, mon_env, gamma=1, verbose=0, n_steps=epLen),
#'Random': or_suite.agents.rl.random.randomAgent(),
#'maxweightfixed' : or_suite.agents.rideshare.max_weight_fixed.maxWeightFixedAgent(CONFIG['epLen'], CONFIG, [1 for _ in range(num_nodes)]),
'closestcar' : or_suite.agents.rideshare.closest_car.closetCarAgent(CONFIG['epLen'], CONFIG),
'randomcar' : or_suite.agents.rideshare.random_car.randomCarAgent(CONFIG['epLen'], CONFIG),
'DiscreteQL' : or_suite.agents.rl.discrete_ql.DiscreteQl(action_space, observation_space, epLen, scaling_list[0]),
'DiscreteMB': or_suite.agents.rl.discrete_mb.DiscreteMB(action_space, observation_space, epLen, scaling_list[0], 0, False)
}

#param_list = [list(p) for p in it.product(np.linspace(0,1,4),repeat = len(starting_state))]

(10, 4, 4, 4, 4, 4, 4, 6, 6, 6, 4, 4, 4, 4, 4, 4, 6, 6)


# Step 5: Run Simulations

Run the different heuristics in the environment

In [8]:
path_list_line = []
algo_list_line = []
path_list_radar = []
algo_list_radar= []

linspace_alpha = []

for agent in agents:
    print(agent)
    DEFAULT_SETTINGS['dirPath'] = '../data/rideshare_'+str(agent)+'_'+str(num_cars)
   # if algo_tune_on and agent == 'maxweightfixed':
   #     or_suite.utils.run_single_algo_tune(rideshare_env,agents[agent], param_list, DEFAULT_SETTINGS)
    if algo_tune_on and agent == 'DiscreteQL':
        or_suite.utils.run_single_algo_tune(rideshare_env,agents[agent], scaling_list, DEFAULT_SETTINGS)
    if algo_tune_on and agent == 'DiscreteMB':
        or_suite.utils.run_single_algo_tune(rideshare_env,agents[agent], scaling_list, DEFAULT_SETTINGS)
    if agent == 'SB PPO':
        or_suite.utils.run_single_sb_algo(mon_env, agents[agent], DEFAULT_SETTINGS)
    else:
        or_suite.utils.run_single_algo(rideshare_env, agents[agent], DEFAULT_SETTINGS)

    path_list_line.append('../data/rideshare_'+str(agent)+'_'+str(num_cars))
    algo_list_line.append(str(agent))
    if agent != 'SB PPO':
        path_list_radar.append('../data/rideshare_'+str(agent)+'_'+str(num_cars))
        algo_list_radar.append(str(agent))

DiscreteMB


KeyboardInterrupt: 

# Step 6: Generate Figures

Create a chart to compare the different heuristic functions. The ridesharing environment offers three more metrics: acceptance rate, mean and variance of response time. They are named as ACPT, MN, and VAR, respectively.

In [None]:
fig_path = '../figures/'
fig_name = 'rideshare_'+'_line_plot'+'.pdf'
or_suite.plots.plot_line_plots(path_list_line, algo_list_line, fig_path, fig_name, int(nEps / 40)+1)

additional_metric = {'ACPT': lambda traj : or_suite.utils.acceptance_rate(traj, lambda x, y : lengths[x,y]),'MN': lambda traj : or_suite.utils.mean_dispatch_dist(traj, lambda x, y : lengths[x,y]),'VAR': lambda traj : or_suite.utils.var_dispatch_dist(traj, lambda x, y : lengths[x,y])}

graph = nx.Graph(CONFIG['edges'])
lengths = or_suite.envs.ridesharing.rideshare_graph.RideshareGraphEnvironment.find_lengths(rideshare_env, graph, graph.number_of_nodes())

fig_name = 'rideshare_'+'_'+'_radar_plot'+'.pdf'
or_suite.plots.plot_radar_plots(path_list_radar, algo_list_radar,
fig_path, fig_name,
additional_metric
)

    Algorithm    Reward      Time    Space      ACPT          MN        VAR
0  closestcar  9.998924  5.453197  -9871.2  1.000000  [-0.81076]  -6.771578
1   randomcar  9.998634  5.005686  -9930.2  0.999970  [-5.03099] -26.060830
2  DiscreteQL  8.498688  4.935753 -11482.4  0.813465  [-5.09903] -25.762623
3  DiscreteMB  8.948711  4.379160 -11423.2  0.840285  [-5.06364] -25.805790
