# OR Suite 

Reinforcement learning (RL) is a natural model for problems involving real-time sequential decision making, including inventory control, resource allocation, ridesharing systems, and ambulance routing. In these models, an agent interacts with a system that has stochastic transitions and rewards, and aims to control the system by maximizing their cumulative rewards across the trajectory. Reinforcement learning has been shown in practice to be an effective technique for learning complex control policies.

# Ridesharing Code Demo

Reinforcement learning (RL) is a natural model for problems involving real-time sequential decision making. In these models, a principal interacts with a system having stochastic transitions and rewards and aims to control the system online (by exploring available actions using real-time feedback) or offline (by exploiting known properties of the system).

This project revolves around providing a unified landscape on scaling reinforcement learning algorithms to operations research domains.

In this notebook, we walk through the Ambulance Routing problem with a 1-dimensional reinforcement learning environment in the space $X = [0, 1]$. Each ambulance in the problem can be located anywhere in $X$, so the state space is $S = X^k$, where $k$ is the number of ambulances. For this example there will be only one ambulance, so $k = 1$.

The default distribution for call arrivals is $Beta(5, 2)$ over $[0,1]$, however any probability distribution defined over the interval $[0,1]$ is valid. The probability distribution can also change with each timestep.

For example, in a problem with two ambulances, imagine the ambulances are initially located at $0.4$ and $0.6$, and the distance function being used is the $\ell_1$ norm. The agent could choose to move the ambulances to $0.342$ and $0.887$. If a call arrived at $0.115$, ambulance 1, which was at $0.342$, would respond to that call, and the state at the end of the iteration would be ambulance 1 at $0.115$ and ambulance 2 at $0.887$. The agent could then choose new locations to move the ambulances to, and the cycle would repeat.

# Step 1: Package Installation
First we import the necessary packages

In [1]:
import or_suite
import numpy as np
import itertools as it
import networkx as nx

import copy

import os
from stable_baselines3.common.env_checker import check_env
from stable_baselines3.common.monitor import Monitor
from stable_baselines3 import PPO
from stable_baselines3.ppo import MlpPolicy
from stable_baselines3.common.env_util import make_vec_env
from stable_baselines3.common.evaluation import evaluate_policy
import pandas as pd


import gym

# Step 2: Pick problem parameters for the environment

Here we use the ambulance metric environment as outlined in `or_suite/envs/ambulance/ambulance_metric.py`.  The package has default specifications for all of the environments in the file `or_suite/envs/env_configs.py`, and so we use one the default for the ambulance problem in a metric space.

In addition, we need to specify the number of episodes for learning, and the number of iterations (in order to plot average results with confidence intervals).

In [2]:
CONFIG =  or_suite.envs.env_configs.rideshare_graph_exp3_config
CONFIG['epLen'] = 500
epLen = CONFIG['epLen']
nEps = 2
numIters = 1

In [3]:
print(epLen)

500


# Step 3: Pick simulation parameters

Next we need to specify parameters for the simulation. This includes setting a seed, the frequency to record the metrics, directory path for saving the data files, a deBug mode which prints the trajectory, etc.

In [4]:
DEFAULT_SETTINGS = {'seed': 1, 
                    'recFreq': 1, 
                    'dirPath': '../data/rideshare/', 
                    'deBug': False, 
                    'nEps': nEps, 
                    'numIters': numIters, 
                    'saveTrajectory': True, 
                    'epLen' : epLen,
                    'render': False,
                    'pickle': False
                    }

starting_state = CONFIG['starting_state']
num_cars = CONFIG['num_cars']
num_nodes = len(starting_state)

rideshare_env = gym.make('Rideshare-v1', config=CONFIG)
mon_env = Monitor(rideshare_env)

In [5]:
print(num_nodes)

7


# Step 4: Pick list of algorithms

We have several heuristics implemented for each of the environments defined, in addition to a `Random` policy, and some `RL discretization based` algorithms. 

The `Stable` agent only moves ambulances when responding to an incoming call and not in between calls. This means the policy $\pi$ chosen by the agent for any given state $X$ will be $\pi_h(X) = X$

The `Median` agent takes a list of all past call arrivals sorted by arrival location, and partitions it into $k$ quantiles where $k$ is the number of ambulances. The algorithm then selects the middle data point in each quantile as the locations to station the ambulances.

In [6]:
agents = { #'SB PPO': PPO(MlpPolicy, mon_env, gamma=1, verbose=0, n_steps=epLen),
#'Random': or_suite.agents.rl.random.randomAgent(),
 'maxweightfixed' : or_suite.agents.rideshare.max_weight_fixed.maxWeightFixedAgent(CONFIG['epLen'], CONFIG, [1 for _ in range(num_nodes)]),
 'closestcar' : or_suite.agents.rideshare.closest_car.closetCarAgent(CONFIG['epLen'], CONFIG),
'randomcar' : or_suite.agents.rideshare.random_car.randomCarAgent(CONFIG['epLen'], CONFIG)
}

# Step 5: Run Simulations

Run the different heuristics in the environment

In [7]:
#param_list = [list(p) for p in it.product(np.linspace(0,1,4),repeat = len(starting_state))]
#print(len(param_list))

In [8]:
path_list_line = []
algo_list_line = []
path_list_radar = []
algo_list_radar= []

linspace_alpha = []

for agent in agents:
    print(agent)
    DEFAULT_SETTINGS['dirPath'] = '../data/rideshare_'+str(agent)+'_'+str(num_cars)
    #if agent == 'maxweightfixed':
    #    or_suite.utils.run_single_algo_tune(rideshare_env,agents[agent], param_list, DEFAULT_SETTINGS)
    if agent == 'SB PPO':
        or_suite.utils.run_single_sb_algo(mon_env, agents[agent], DEFAULT_SETTINGS)
    else:
        or_suite.utils.run_single_algo(rideshare_env, agents[agent], DEFAULT_SETTINGS)

    path_list_line.append('../data/rideshare_'+str(agent)+'_'+str(num_cars))
    algo_list_line.append(str(agent))
    if agent != 'SB PPO':
        path_list_radar.append('../data/rideshare_'+str(agent)+'_'+str(num_cars))
        algo_list_radar.append(str(agent))

maxweightfixed
**************************************************
Running experiment
**************************************************
{'iter': 0, 'episode': 0, 'step': 0, 'oldState': array([0, 0, 3, 0, 3, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 5, 1]), 'action': 5, 'reward': 140.0, 'newState': array([0, 0, 3, 0, 3, 2, 0, 1, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 1, 3, 5]), 'info': {'request': array([3, 5]), 'acceptance': True}}
{'iter': 0, 'episode': 0, 'step': 1, 'oldState': array([0, 0, 3, 0, 3, 2, 0, 1, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 1, 3, 5]), 'action': 2, 'reward': 0.0, 'newState': array([0, 0, 3, 0, 3, 2, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 1, 4, 1]), 'info': {'request': array([4, 1]), 'acceptance': False}}
{'iter': 0, 'episode': 0, 'step': 2, 'oldState': array([0, 0, 3, 0, 3, 2, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 1, 4, 1]), 'action': 4, 'reward':

       0, 0, 0, 2, 2, 5]), 'info': {'request': array([2, 5]), 'acceptance': True}}
{'iter': 0, 'episode': 0, 'step': 56, 'oldState': array([0, 2, 1, 0, 2, 2, 0, 6, 1, 5, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 2, 2, 5]), 'action': 2, 'reward': 140.0, 'newState': array([0, 2, 0, 0, 2, 2, 1, 5, 2, 5, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 2, 6, 2]), 'info': {'request': array([6, 2]), 'acceptance': True}}
{'iter': 0, 'episode': 0, 'step': 57, 'oldState': array([0, 2, 0, 0, 2, 2, 1, 5, 2, 5, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 2, 6, 2]), 'action': 4, 'reward': 130.0, 'newState': array([0, 2, 0, 0, 1, 3, 1, 5, 1, 2, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 2, 5, 2]), 'info': {'request': array([5, 2]), 'acceptance': True}}
{'iter': 0, 'episode': 0, 'step': 58, 'oldState': array([0, 2, 0, 0, 1, 3, 1, 5, 1, 2, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 2, 5, 2]), 'action': 5, 'reward': 140.0, 'newState': array([0, 2, 0, 0, 1, 3, 1, 2

       0, 0, 0, 1, 6, 0]), 'info': {'request': array([6, 0]), 'acceptance': False}}
{'iter': 0, 'episode': 0, 'step': 89, 'oldState': array([0, 2, 2, 0, 2, 1, 1, 4, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 1, 6, 0]), 'action': 4, 'reward': 110.0, 'newState': array([0, 2, 2, 0, 2, 1, 1, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 1, 3, 4]), 'info': {'request': array([3, 4]), 'acceptance': True}}
{'iter': 0, 'episode': 0, 'step': 90, 'oldState': array([0, 2, 2, 0, 2, 1, 1, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 1, 3, 4]), 'action': 1, 'reward': 0.0, 'newState': array([0, 2, 2, 0, 2, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 1, 6, 0]), 'info': {'request': array([6, 0]), 'acceptance': False}}
{'iter': 0, 'episode': 0, 'step': 91, 'oldState': array([0, 2, 2, 0, 2, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 1, 6, 0]), 'action': 4, 'reward': 110.0, 'newState': array([1, 2, 2, 0, 1, 1, 1, 0

       0, 0, 0, 2, 1, 4]), 'info': {'request': array([1, 4]), 'acceptance': True}}
{'iter': 0, 'episode': 0, 'step': 141, 'oldState': array([1, 1, 2, 0, 1, 1, 1, 5, 1, 1, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 2, 1, 4]), 'action': 1, 'reward': 120.0, 'newState': array([1, 0, 2, 0, 1, 2, 1, 4, 2, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 2, 1, 5]), 'info': {'request': array([1, 5]), 'acceptance': True}}
{'iter': 0, 'episode': 0, 'step': 142, 'oldState': array([1, 0, 2, 0, 1, 2, 1, 4, 2, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 2, 1, 5]), 'action': 0, 'reward': 130.0, 'newState': array([0, 1, 2, 0, 1, 2, 1, 4, 1, 5, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 2, 1, 6]), 'info': {'request': array([1, 6]), 'acceptance': True}}
{'iter': 0, 'episode': 0, 'step': 143, 'oldState': array([0, 1, 2, 0, 1, 2, 1, 4, 1, 5, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 2, 1, 6]), 'action': 1, 'reward': 140.0, 'newState': array([0, 0, 2, 0, 2, 2, 1

       0, 0, 0, 2, 4, 3]), 'info': {'request': array([4, 3]), 'acceptance': True}}
{'iter': 0, 'episode': 0, 'step': 168, 'oldState': array([1, 0, 4, 0, 1, 0, 1, 3, 2, 6, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 2, 4, 3]), 'action': 4, 'reward': 120.0, 'newState': array([1, 0, 4, 0, 0, 0, 2, 3, 1, 3, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 2, 6, 1]), 'info': {'request': array([6, 1]), 'acceptance': True}}
{'iter': 0, 'episode': 0, 'step': 169, 'oldState': array([1, 0, 4, 0, 0, 0, 2, 3, 1, 3, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 2, 6, 1]), 'action': 6, 'reward': 140.0, 'newState': array([1, 0, 4, 1, 0, 0, 1, 1, 2, 3, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 2, 0, 6]), 'info': {'request': array([0, 6]), 'acceptance': True}}
{'iter': 0, 'episode': 0, 'step': 170, 'oldState': array([1, 0, 4, 1, 0, 0, 1, 1, 2, 3, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 2, 0, 6]), 'action': 2, 'reward': 0.0, 'newState': array([1, 0, 4, 2, 0, 0, 1, 

       0, 0, 0, 2, 3, 5]), 'info': {'request': array([3, 5]), 'acceptance': True}}
{'iter': 0, 'episode': 0, 'step': 210, 'oldState': array([0, 2, 0, 3, 0, 1, 1, 6, 1, 5, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 2, 3, 5]), 'action': 3, 'reward': 140.0, 'newState': array([0, 2, 0, 2, 0, 1, 2, 5, 2, 5, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 2, 3, 6]), 'info': {'request': array([3, 6]), 'acceptance': True}}
{'iter': 0, 'episode': 0, 'step': 211, 'oldState': array([0, 2, 0, 2, 0, 1, 2, 5, 2, 5, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 2, 3, 6]), 'action': 3, 'reward': 140.0, 'newState': array([0, 2, 0, 1, 0, 2, 2, 5, 1, 6, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 2, 0, 4]), 'info': {'request': array([0, 4]), 'acceptance': True}}
{'iter': 0, 'episode': 0, 'step': 212, 'oldState': array([0, 2, 0, 1, 0, 2, 2, 5, 1, 6, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 2, 0, 4]), 'action': 1, 'reward': 90.0, 'newState': array([0, 1, 0, 1, 0, 3, 2,

       0, 0, 0, 2, 4, 1]), 'info': {'request': array([4, 1]), 'acceptance': True}}
{'iter': 0, 'episode': 0, 'step': 290, 'oldState': array([1, 0, 2, 1, 2, 1, 0, 1, 2, 6, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 2, 4, 1]), 'action': 4, 'reward': 120.0, 'newState': array([1, 0, 2, 1, 1, 1, 1, 1, 1, 1, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 2, 0, 6]), 'info': {'request': array([0, 6]), 'acceptance': True}}
{'iter': 0, 'episode': 0, 'step': 291, 'oldState': array([1, 0, 2, 1, 1, 1, 1, 1, 1, 1, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 2, 0, 6]), 'action': 2, 'reward': 0.0, 'newState': array([1, 1, 2, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 1, 4, 3]), 'info': {'request': array([4, 3]), 'acceptance': False}}
{'iter': 0, 'episode': 0, 'step': 292, 'oldState': array([1, 1, 2, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 1, 4, 3]), 'action': 4, 'reward': 120.0, 'newState': array([1, 2, 2, 1, 0, 1, 1,

       0, 0, 0, 2, 4, 1]), 'info': {'request': array([4, 1]), 'acceptance': True}}
{'iter': 0, 'episode': 0, 'step': 359, 'oldState': array([0, 0, 0, 0, 1, 5, 1, 6, 2, 5, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 2, 4, 1]), 'action': 5, 'reward': 110.0, 'newState': array([0, 0, 0, 0, 1, 5, 1, 6, 1, 1, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 2, 4, 2]), 'info': {'request': array([4, 2]), 'acceptance': True}}
{'iter': 0, 'episode': 0, 'step': 360, 'oldState': array([0, 0, 0, 0, 1, 5, 1, 6, 1, 1, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 2, 4, 2]), 'action': 5, 'reward': 0.0, 'newState': array([0, 0, 0, 0, 1, 5, 2, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 1, 0, 4]), 'info': {'request': array([0, 4]), 'acceptance': False}}
{'iter': 0, 'episode': 0, 'step': 361, 'oldState': array([0, 0, 0, 0, 1, 5, 2, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 1, 0, 4]), 'action': 4, 'reward': 0.0, 'newState': array([0, 1, 0, 0, 1, 5, 2, 0

       0, 0, 0, 2, 1, 6]), 'info': {'request': array([1, 6]), 'acceptance': True}}
{'iter': 0, 'episode': 0, 'step': 412, 'oldState': array([1, 0, 2, 4, 0, 0, 0, 6, 2, 4, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 2, 1, 6]), 'action': 0, 'reward': 0.0, 'newState': array([1, 0, 2, 4, 1, 0, 0, 6, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 1, 4, 1]), 'info': {'request': array([4, 1]), 'acceptance': False}}
{'iter': 0, 'episode': 0, 'step': 413, 'oldState': array([1, 0, 2, 4, 1, 0, 0, 6, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 1, 4, 1]), 'action': 4, 'reward': 120.0, 'newState': array([1, 0, 2, 4, 0, 0, 1, 1, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 1, 0, 6]), 'info': {'request': array([0, 6]), 'acceptance': True}}
{'iter': 0, 'episode': 0, 'step': 414, 'oldState': array([1, 0, 2, 4, 0, 0, 1, 1, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 1, 0, 6]), 'action': 3, 'reward': 0.0, 'newState': array([1, 0, 2, 4, 0, 0, 1, 1

       0, 0, 0, 2, 6, 2]), 'info': {'request': array([6, 2]), 'acceptance': True}}
{'iter': 0, 'episode': 0, 'step': 484, 'oldState': array([2, 1, 1, 1, 1, 1, 0, 1, 1, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 2, 6, 2]), 'action': 4, 'reward': 0.0, 'newState': array([2, 2, 1, 1, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 1, 0, 4]), 'info': {'request': array([0, 4]), 'acceptance': False}}
{'iter': 0, 'episode': 0, 'step': 485, 'oldState': array([2, 2, 1, 1, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 1, 0, 4]), 'action': 0, 'reward': 100.0, 'newState': array([2, 2, 1, 1, 1, 1, 0, 4, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 1, 4, 2]), 'info': {'request': array([4, 2]), 'acceptance': True}}
{'iter': 0, 'episode': 0, 'step': 486, 'oldState': array([2, 2, 1, 1, 1, 1, 0, 4, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 1, 4, 2]), 'action': 4, 'reward': 120.0, 'newState': array([2, 2, 1, 1, 1, 1, 0,

       0, 0, 0, 0, 0, 6]), 'info': {'request': array([0, 6]), 'acceptance': False}}
{'iter': 0, 'episode': 1, 'step': 33, 'oldState': array([0, 0, 1, 0, 2, 6, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 6]), 'action': 2, 'reward': 110.0, 'newState': array([0, 0, 0, 0, 2, 6, 0, 6, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 1, 6, 0]), 'info': {'request': array([6, 0]), 'acceptance': True}}
{'iter': 0, 'episode': 1, 'step': 34, 'oldState': array([0, 0, 0, 0, 2, 6, 0, 6, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 1, 6, 0]), 'action': 4, 'reward': 110.0, 'newState': array([0, 0, 0, 0, 1, 6, 0, 6, 1, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 2, 6, 1]), 'info': {'request': array([6, 1]), 'acceptance': True}}
{'iter': 0, 'episode': 1, 'step': 35, 'oldState': array([0, 0, 0, 0, 1, 6, 0, 6, 1, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 2, 6, 1]), 'action': 4, 'reward': 130.0, 'newState': array([0, 0, 0, 0, 0, 6, 1, 

       0, 0, 0, 2, 1, 4]), 'info': {'request': array([1, 4]), 'acceptance': True}}
{'iter': 0, 'episode': 1, 'step': 77, 'oldState': array([1, 0, 1, 1, 1, 2, 1, 5, 1, 5, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 2, 1, 4]), 'action': 0, 'reward': 110.0, 'newState': array([0, 0, 1, 1, 1, 3, 1, 4, 2, 5, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 2, 5, 3]), 'info': {'request': array([5, 3]), 'acceptance': True}}
{'iter': 0, 'episode': 1, 'step': 78, 'oldState': array([0, 0, 1, 1, 1, 3, 1, 4, 2, 5, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 2, 5, 3]), 'action': 5, 'reward': 140.0, 'newState': array([0, 0, 1, 1, 1, 3, 1, 4, 1, 3, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 2, 3, 5]), 'info': {'request': array([3, 5]), 'acceptance': True}}
{'iter': 0, 'episode': 1, 'step': 79, 'oldState': array([0, 0, 1, 1, 1, 3, 1, 4, 1, 3, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 2, 3, 5]), 'action': 3, 'reward': 140.0, 'newState': array([0, 0, 1, 0, 2, 3, 1, 5

       0, 0, 0, 2, 5, 0]), 'info': {'request': array([5, 0]), 'acceptance': True}}
{'iter': 0, 'episode': 1, 'step': 148, 'oldState': array([0, 0, 1, 0, 1, 3, 2, 6, 1, 5, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 2, 5, 0]), 'action': 5, 'reward': 120.0, 'newState': array([0, 0, 1, 0, 1, 2, 3, 0, 2, 5, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 2, 2, 6]), 'info': {'request': array([2, 6]), 'acceptance': True}}
{'iter': 0, 'episode': 1, 'step': 149, 'oldState': array([0, 0, 1, 0, 1, 2, 3, 0, 2, 5, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 2, 2, 6]), 'action': 2, 'reward': 140.0, 'newState': array([0, 0, 0, 0, 1, 3, 3, 0, 1, 6, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 2, 1, 5]), 'info': {'request': array([1, 5]), 'acceptance': True}}
{'iter': 0, 'episode': 1, 'step': 150, 'oldState': array([0, 0, 0, 0, 1, 3, 3, 0, 1, 6, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 2, 1, 5]), 'action': 4, 'reward': 0.0, 'newState': array([1, 0, 0, 0, 1, 3, 3, 

       0, 0, 0, 2, 3, 6]), 'info': {'request': array([3, 6]), 'acceptance': True}}
{'iter': 0, 'episode': 1, 'step': 212, 'oldState': array([1, 1, 1, 1, 0, 2, 1, 3, 1, 4, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 2, 3, 6]), 'action': 3, 'reward': 140.0, 'newState': array([1, 1, 1, 1, 1, 2, 1, 6, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 1, 6, 3]), 'info': {'request': array([6, 3]), 'acceptance': True}}
{'iter': 0, 'episode': 1, 'step': 213, 'oldState': array([1, 1, 1, 1, 1, 2, 1, 6, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 1, 6, 3]), 'action': 6, 'reward': 140.0, 'newState': array([1, 1, 1, 1, 1, 2, 0, 6, 1, 3, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 2, 3, 4]), 'info': {'request': array([3, 4]), 'acceptance': True}}
{'iter': 0, 'episode': 1, 'step': 214, 'oldState': array([1, 1, 1, 1, 1, 2, 0, 6, 1, 3, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 2, 3, 4]), 'action': 3, 'reward': 120.0, 'newState': array([1, 1, 1, 0, 1, 2, 1

       0, 0, 0, 2, 3, 5]), 'info': {'request': array([3, 5]), 'acceptance': True}}
{'iter': 0, 'episode': 1, 'step': 257, 'oldState': array([1, 0, 1, 5, 0, 0, 0, 2, 2, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 2, 3, 5]), 'action': 3, 'reward': 140.0, 'newState': array([2, 0, 1, 4, 0, 0, 0, 2, 1, 5, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 2, 6, 2]), 'info': {'request': array([6, 2]), 'acceptance': True}}
{'iter': 0, 'episode': 1, 'step': 258, 'oldState': array([2, 0, 1, 4, 0, 0, 0, 2, 1, 5, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 2, 6, 2]), 'action': 0, 'reward': 0.0, 'newState': array([2, 0, 2, 4, 0, 0, 0, 5, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 1, 1, 5]), 'info': {'request': array([1, 5]), 'acceptance': False}}
{'iter': 0, 'episode': 1, 'step': 259, 'oldState': array([2, 0, 2, 4, 0, 0, 0, 5, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 1, 1, 5]), 'action': 0, 'reward': 0.0, 'newState': array([2, 0, 2, 4, 0, 1, 0, 0

       0, 0, 0, 2, 3, 4]), 'info': {'request': array([3, 4]), 'acceptance': True}}
{'iter': 0, 'episode': 1, 'step': 321, 'oldState': array([0, 1, 0, 5, 0, 0, 1, 3, 1, 2, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 2, 3, 4]), 'action': 3, 'reward': 120.0, 'newState': array([0, 1, 0, 5, 0, 0, 1, 4, 2, 2, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 2, 4, 0]), 'info': {'request': array([4, 0]), 'acceptance': True}}
{'iter': 0, 'episode': 1, 'step': 322, 'oldState': array([0, 1, 0, 5, 0, 0, 1, 4, 2, 2, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 2, 4, 0]), 'action': 6, 'reward': 90.0, 'newState': array([0, 1, 1, 5, 0, 0, 0, 4, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 2, 4, 2]), 'info': {'request': array([4, 2]), 'acceptance': True}}
{'iter': 0, 'episode': 1, 'step': 323, 'oldState': array([0, 1, 1, 5, 0, 0, 0, 4, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 2, 4, 2]), 'action': 3, 'reward': 0.0, 'newState': array([1, 1, 1, 5, 1, 0, 0, 0

       0, 0, 0, 0, 5, 1]), 'info': {'request': array([5, 1]), 'acceptance': False}}
{'iter': 0, 'episode': 1, 'step': 367, 'oldState': array([0, 2, 1, 2, 1, 2, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 5, 1]), 'action': 5, 'reward': 140.0, 'newState': array([0, 2, 1, 2, 1, 1, 1, 1, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 1, 1, 5]), 'info': {'request': array([1, 5]), 'acceptance': True}}
{'iter': 0, 'episode': 1, 'step': 368, 'oldState': array([0, 2, 1, 2, 1, 1, 1, 1, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 1, 1, 5]), 'action': 1, 'reward': 140.0, 'newState': array([0, 1, 1, 2, 1, 1, 1, 1, 1, 5, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 2, 0, 5]), 'info': {'request': array([0, 5]), 'acceptance': True}}
{'iter': 0, 'episode': 1, 'step': 369, 'oldState': array([0, 1, 1, 2, 1, 1, 1, 1, 1, 5, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 2, 0, 5]), 'action': 3, 'reward': 110.0, 'newState': array([0, 2, 1, 1, 1, 1, 

       0, 0, 0, 1, 4, 3]), 'info': {'request': array([4, 3]), 'acceptance': False}}
{'iter': 0, 'episode': 1, 'step': 416, 'oldState': array([0, 1, 1, 1, 2, 2, 1, 6, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 1, 4, 3]), 'action': 4, 'reward': 120.0, 'newState': array([0, 1, 1, 1, 1, 2, 2, 3, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 1, 3, 5]), 'info': {'request': array([3, 5]), 'acceptance': True}}
{'iter': 0, 'episode': 1, 'step': 417, 'oldState': array([0, 1, 1, 1, 1, 2, 2, 3, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 1, 3, 5]), 'action': 3, 'reward': 140.0, 'newState': array([0, 1, 1, 0, 1, 2, 2, 3, 1, 5, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 2, 3, 5]), 'info': {'request': array([3, 5]), 'acceptance': True}}
{'iter': 0, 'episode': 1, 'step': 418, 'oldState': array([0, 1, 1, 0, 1, 2, 2, 3, 1, 5, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 2, 3, 5]), 'action': 1, 'reward': 0.0, 'newState': array([0, 1, 1, 1, 1, 2, 2,

       0, 0, 0, 0, 5, 1]), 'info': {'request': array([5, 1]), 'acceptance': False}}
{'iter': 0, 'episode': 1, 'step': 477, 'oldState': array([1, 1, 4, 2, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 5, 1]), 'action': 4, 'reward': 0.0, 'newState': array([1, 1, 4, 2, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 6, 3]), 'info': {'request': array([6, 3]), 'acceptance': False}}
{'iter': 0, 'episode': 1, 'step': 478, 'oldState': array([1, 1, 4, 2, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 6, 3]), 'action': 4, 'reward': 130.0, 'newState': array([1, 1, 4, 2, 0, 0, 0, 3, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 1, 5, 3]), 'info': {'request': array([5, 3]), 'acceptance': True}}
{'iter': 0, 'episode': 1, 'step': 479, 'oldState': array([1, 1, 4, 2, 0, 0, 0, 3, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 1, 5, 3]), 'action': 0, 'reward': 0.0, 'newState': array([1, 1, 4, 2, 0, 0, 0, 

       0, 0, 0, 1, 5, 0]), 'info': {'request': array([5, 0]), 'acceptance': False}}
{'iter': 0, 'episode': 0, 'step': 10, 'oldState': array([0, 4, 3, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 1, 5, 0]), 'action': 4, 'reward': 0.0, 'newState': array([0, 5, 3, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 5, 1]), 'info': {'request': array([5, 1]), 'acceptance': False}}
{'iter': 0, 'episode': 0, 'step': 11, 'oldState': array([0, 5, 3, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 5, 1]), 'action': 4, 'reward': 130.0, 'newState': array([0, 5, 3, 0, 0, 0, 0, 1, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 1, 1, 4]), 'info': {'request': array([1, 4]), 'acceptance': True}}
{'iter': 0, 'episode': 0, 'step': 12, 'oldState': array([0, 5, 3, 0, 0, 0, 0, 1, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 1, 1, 4]), 'action': 1, 'reward': 120.0, 'newState': array([0, 4, 3, 0, 0, 0, 0, 1

{'iter': 0, 'episode': 0, 'step': 56, 'oldState': array([0, 1, 2, 0, 2, 2, 0, 6, 1, 5, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 2, 2, 5]), 'action': 2, 'reward': 140.0, 'newState': array([0, 1, 1, 0, 2, 2, 1, 5, 2, 5, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 2, 6, 2]), 'info': {'request': array([6, 2]), 'acceptance': True}}
{'iter': 0, 'episode': 0, 'step': 57, 'oldState': array([0, 1, 1, 0, 2, 2, 1, 5, 2, 5, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 2, 6, 2]), 'action': 6, 'reward': 140.0, 'newState': array([0, 1, 1, 0, 2, 3, 0, 5, 1, 2, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 2, 5, 2]), 'info': {'request': array([5, 2]), 'acceptance': True}}
{'iter': 0, 'episode': 0, 'step': 58, 'oldState': array([0, 1, 1, 0, 2, 3, 0, 5, 1, 2, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 2, 5, 2]), 'action': 5, 'reward': 140.0, 'newState': array([0, 1, 1, 0, 2, 3, 0, 2, 2, 2, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 2, 4, 1]), 'info': {'re

       0, 0, 0, 1, 1, 5]), 'info': {'request': array([1, 5]), 'acceptance': True}}
{'iter': 0, 'episode': 0, 'step': 123, 'oldState': array([0, 3, 1, 0, 0, 0, 4, 2, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 1, 1, 5]), 'action': 1, 'reward': 140.0, 'newState': array([0, 2, 1, 0, 0, 0, 4, 2, 1, 5, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 2, 6, 2]), 'info': {'request': array([6, 2]), 'acceptance': True}}
{'iter': 0, 'episode': 0, 'step': 124, 'oldState': array([0, 2, 1, 0, 0, 0, 4, 2, 1, 5, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 2, 6, 2]), 'action': 6, 'reward': 140.0, 'newState': array([0, 2, 2, 0, 0, 0, 3, 2, 2, 5, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 2, 4, 2]), 'info': {'request': array([4, 2]), 'acceptance': True}}
{'iter': 0, 'episode': 0, 'step': 125, 'oldState': array([0, 2, 2, 0, 0, 0, 3, 2, 2, 5, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 2, 4, 2]), 'action': 6, 'reward': 0.0, 'newState': array([0, 2, 2, 0, 0, 1, 3, 

       0, 0, 0, 2, 2, 6]), 'info': {'request': array([2, 6]), 'acceptance': True}}
{'iter': 0, 'episode': 0, 'step': 179, 'oldState': array([0, 0, 4, 2, 0, 1, 0, 1, 1, 6, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 2, 2, 6]), 'action': 2, 'reward': 140.0, 'newState': array([0, 1, 3, 2, 0, 1, 0, 6, 2, 6, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 2, 0, 4]), 'info': {'request': array([0, 4]), 'acceptance': True}}
{'iter': 0, 'episode': 0, 'step': 180, 'oldState': array([0, 1, 3, 2, 0, 1, 0, 6, 2, 6, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 2, 0, 4]), 'action': 1, 'reward': 0.0, 'newState': array([0, 1, 3, 2, 0, 1, 1, 6, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 1, 1, 4]), 'info': {'request': array([1, 4]), 'acceptance': False}}
{'iter': 0, 'episode': 0, 'step': 181, 'oldState': array([0, 1, 3, 2, 0, 1, 1, 6, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 1, 1, 4]), 'action': 1, 'reward': 120.0, 'newState': array([0, 0, 3, 2, 0, 1, 2,

       0, 0, 0, 2, 2, 6]), 'info': {'request': array([2, 6]), 'acceptance': True}}
{'iter': 0, 'episode': 0, 'step': 243, 'oldState': array([0, 0, 1, 1, 1, 2, 2, 6, 1, 4, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 2, 2, 6]), 'action': 2, 'reward': 140.0, 'newState': array([0, 0, 0, 1, 1, 2, 3, 6, 2, 4, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 2, 6, 0]), 'info': {'request': array([6, 0]), 'acceptance': True}}
{'iter': 0, 'episode': 0, 'step': 244, 'oldState': array([0, 0, 0, 1, 1, 2, 3, 6, 2, 4, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 2, 6, 0]), 'action': 6, 'reward': 120.0, 'newState': array([0, 0, 0, 1, 2, 2, 2, 6, 1, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 2, 5, 1]), 'info': {'request': array([5, 1]), 'acceptance': True}}
{'iter': 0, 'episode': 0, 'step': 245, 'oldState': array([0, 0, 0, 1, 2, 2, 2, 6, 1, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 2, 5, 1]), 'action': 5, 'reward': 140.0, 'newState': array([0, 0, 0, 1, 2, 1, 3

       0, 0, 0, 2, 0, 6]), 'info': {'request': array([0, 6]), 'acceptance': True}}
{'iter': 0, 'episode': 0, 'step': 291, 'oldState': array([1, 0, 1, 1, 1, 2, 1, 1, 1, 1, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 2, 0, 6]), 'action': 0, 'reward': 120.0, 'newState': array([0, 1, 1, 1, 1, 2, 1, 6, 2, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 2, 4, 3]), 'info': {'request': array([4, 3]), 'acceptance': True}}
{'iter': 0, 'episode': 0, 'step': 292, 'oldState': array([0, 1, 1, 1, 1, 2, 1, 6, 2, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 2, 4, 3]), 'action': 4, 'reward': 120.0, 'newState': array([0, 2, 1, 1, 0, 2, 1, 6, 1, 3, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 2, 3, 4]), 'info': {'request': array([3, 4]), 'acceptance': True}}
{'iter': 0, 'episode': 0, 'step': 293, 'oldState': array([0, 2, 1, 1, 0, 2, 1, 6, 1, 3, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 2, 3, 4]), 'action': 3, 'reward': 120.0, 'newState': array([0, 2, 1, 0, 0, 2, 2

       0, 0, 0, 0, 2, 6]), 'info': {'request': array([2, 6]), 'acceptance': False}}
{'iter': 0, 'episode': 0, 'step': 352, 'oldState': array([1, 0, 1, 0, 1, 6, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 2, 6]), 'action': 2, 'reward': 140.0, 'newState': array([1, 0, 0, 0, 1, 6, 0, 6, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 1, 4, 1]), 'info': {'request': array([4, 1]), 'acceptance': True}}
{'iter': 0, 'episode': 0, 'step': 353, 'oldState': array([1, 0, 0, 0, 1, 6, 0, 6, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 1, 4, 1]), 'action': 4, 'reward': 120.0, 'newState': array([1, 0, 0, 0, 0, 6, 0, 6, 1, 1, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 2, 6, 0]), 'info': {'request': array([6, 0]), 'acceptance': True}}
{'iter': 0, 'episode': 0, 'step': 354, 'oldState': array([1, 0, 0, 0, 0, 6, 0, 6, 1, 1, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 2, 6, 0]), 'action': 5, 'reward': 0.0, 'newState': array([1, 0, 0, 0, 0, 6, 1,

       0, 0, 0, 2, 4, 1]), 'info': {'request': array([4, 1]), 'acceptance': True}}
{'iter': 0, 'episode': 0, 'step': 408, 'oldState': array([1, 1, 1, 2, 1, 1, 0, 3, 2, 2, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 2, 4, 1]), 'action': 4, 'reward': 120.0, 'newState': array([1, 1, 2, 2, 0, 1, 0, 3, 1, 1, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 2, 5, 2]), 'info': {'request': array([5, 2]), 'acceptance': True}}
{'iter': 0, 'episode': 0, 'step': 409, 'oldState': array([1, 1, 2, 2, 0, 1, 0, 3, 1, 1, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 2, 5, 2]), 'action': 5, 'reward': 140.0, 'newState': array([1, 1, 2, 3, 0, 0, 0, 2, 2, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 2, 1, 4]), 'info': {'request': array([1, 4]), 'acceptance': True}}
{'iter': 0, 'episode': 0, 'step': 410, 'oldState': array([1, 1, 2, 3, 0, 0, 0, 2, 2, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 2, 1, 4]), 'action': 1, 'reward': 120.0, 'newState': array([1, 1, 2, 3, 0, 0, 0

       0, 0, 0, 2, 1, 4]), 'info': {'request': array([1, 4]), 'acceptance': True}}
{'iter': 0, 'episode': 0, 'step': 465, 'oldState': array([2, 0, 0, 2, 0, 1, 2, 0, 2, 3, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 2, 1, 4]), 'action': 0, 'reward': 110.0, 'newState': array([1, 0, 0, 3, 0, 1, 2, 0, 1, 4, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 2, 5, 2]), 'info': {'request': array([5, 2]), 'acceptance': True}}
{'iter': 0, 'episode': 0, 'step': 466, 'oldState': array([1, 0, 0, 3, 0, 1, 2, 0, 1, 4, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 2, 5, 2]), 'action': 5, 'reward': 140.0, 'newState': array([2, 0, 0, 3, 0, 0, 2, 2, 2, 4, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 2, 2, 4]), 'info': {'request': array([2, 4]), 'acceptance': True}}
{'iter': 0, 'episode': 0, 'step': 467, 'oldState': array([2, 0, 0, 3, 0, 0, 2, 2, 2, 4, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 2, 2, 4]), 'action': 0, 'reward': 110.0, 'newState': array([1, 0, 0, 3, 1, 0, 2

       0, 0, 0, 1, 0, 6]), 'info': {'request': array([0, 6]), 'acceptance': True}}
{'iter': 0, 'episode': 1, 'step': 33, 'oldState': array([0, 0, 1, 0, 2, 5, 0, 4, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 1, 0, 6]), 'action': 2, 'reward': 110.0, 'newState': array([0, 0, 0, 0, 2, 5, 0, 4, 1, 6, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 2, 6, 0]), 'info': {'request': array([6, 0]), 'acceptance': True}}
{'iter': 0, 'episode': 1, 'step': 34, 'oldState': array([0, 0, 0, 0, 2, 5, 0, 4, 1, 6, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 2, 6, 0]), 'action': 4, 'reward': 110.0, 'newState': array([0, 0, 0, 0, 2, 5, 0, 0, 2, 6, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 2, 6, 1]), 'info': {'request': array([6, 1]), 'acceptance': True}}
{'iter': 0, 'episode': 1, 'step': 35, 'oldState': array([0, 0, 0, 0, 2, 5, 0, 0, 2, 6, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 2, 6, 1]), 'action': 4, 'reward': 130.0, 'newState': array([0, 0, 0, 0, 1, 5, 1, 0

       0, 0, 0, 1, 5, 1]), 'info': {'request': array([5, 1]), 'acceptance': False}}
{'iter': 0, 'episode': 1, 'step': 97, 'oldState': array([0, 1, 2, 0, 0, 4, 1, 4, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 1, 5, 1]), 'action': 5, 'reward': 140.0, 'newState': array([0, 1, 2, 0, 1, 3, 1, 1, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 1, 2, 5]), 'info': {'request': array([2, 5]), 'acceptance': True}}
{'iter': 0, 'episode': 1, 'step': 98, 'oldState': array([0, 1, 2, 0, 1, 3, 1, 1, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 1, 2, 5]), 'action': 2, 'reward': 140.0, 'newState': array([0, 1, 1, 0, 1, 3, 1, 1, 1, 5, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 2, 6, 3]), 'info': {'request': array([6, 3]), 'acceptance': True}}
{'iter': 0, 'episode': 1, 'step': 99, 'oldState': array([0, 1, 1, 0, 1, 3, 1, 1, 1, 5, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 2, 6, 3]), 'action': 6, 'reward': 140.0, 'newState': array([0, 2, 1, 0, 1, 3, 0, 

       0, 0, 0, 2, 3, 4]), 'info': {'request': array([3, 4]), 'acceptance': True}}
{'iter': 0, 'episode': 1, 'step': 159, 'oldState': array([1, 2, 1, 0, 0, 0, 3, 1, 2, 2, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 2, 3, 4]), 'action': 0, 'reward': 110.0, 'newState': array([0, 2, 2, 0, 0, 0, 3, 1, 1, 4, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 2, 5, 3]), 'info': {'request': array([5, 3]), 'acceptance': True}}
{'iter': 0, 'episode': 1, 'step': 160, 'oldState': array([0, 2, 2, 0, 0, 0, 3, 1, 1, 4, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 2, 5, 3]), 'action': 6, 'reward': 0.0, 'newState': array([0, 3, 2, 0, 0, 0, 3, 4, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 1, 0, 4]), 'info': {'request': array([0, 4]), 'acceptance': False}}
{'iter': 0, 'episode': 1, 'step': 161, 'oldState': array([0, 3, 2, 0, 0, 0, 3, 4, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 1, 0, 4]), 'action': 1, 'reward': 90.0, 'newState': array([0, 2, 2, 0, 1, 0, 3, 

       0, 0, 0, 1, 1, 4]), 'info': {'request': array([1, 4]), 'acceptance': True}}
{'iter': 0, 'episode': 1, 'step': 231, 'oldState': array([1, 1, 0, 4, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 1, 1, 4]), 'action': 1, 'reward': 120.0, 'newState': array([2, 0, 0, 4, 1, 1, 0, 4, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 1, 3, 5]), 'info': {'request': array([3, 5]), 'acceptance': True}}
{'iter': 0, 'episode': 1, 'step': 232, 'oldState': array([2, 0, 0, 4, 1, 1, 0, 4, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 1, 3, 5]), 'action': 3, 'reward': 140.0, 'newState': array([2, 0, 0, 3, 1, 1, 0, 4, 1, 5, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 2, 1, 5]), 'info': {'request': array([1, 5]), 'acceptance': True}}
{'iter': 0, 'episode': 1, 'step': 233, 'oldState': array([2, 0, 0, 3, 1, 1, 0, 4, 1, 5, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 2, 1, 5]), 'action': 0, 'reward': 130.0, 'newState': array([1, 0, 0, 3, 2, 1, 0

       0, 0, 0, 1, 2, 6]), 'info': {'request': array([2, 6]), 'acceptance': False}}
{'iter': 0, 'episode': 1, 'step': 301, 'oldState': array([0, 1, 0, 0, 3, 2, 2, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 1, 2, 6]), 'action': 1, 'reward': 0.0, 'newState': array([0, 2, 0, 0, 3, 2, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 6, 3]), 'info': {'request': array([6, 3]), 'acceptance': False}}
{'iter': 0, 'episode': 1, 'step': 302, 'oldState': array([0, 2, 0, 0, 3, 2, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 6, 3]), 'action': 6, 'reward': 140.0, 'newState': array([0, 2, 0, 0, 3, 2, 1, 3, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 1, 6, 3]), 'info': {'request': array([6, 3]), 'acceptance': True}}
{'iter': 0, 'episode': 1, 'step': 303, 'oldState': array([0, 2, 0, 0, 3, 2, 1, 3, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 1, 6, 3]), 'action': 6, 'reward': 140.0, 'newState': array([0, 2, 0, 0, 3, 2, 0

       0, 0, 0, 2, 2, 6]), 'info': {'request': array([2, 6]), 'acceptance': True}}
{'iter': 0, 'episode': 1, 'step': 353, 'oldState': array([1, 0, 0, 0, 1, 5, 0, 4, 1, 4, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 2, 2, 6]), 'action': 0, 'reward': 130.0, 'newState': array([0, 0, 0, 0, 2, 5, 0, 6, 2, 4, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 2, 2, 6]), 'info': {'request': array([2, 6]), 'acceptance': True}}
{'iter': 0, 'episode': 1, 'step': 354, 'oldState': array([0, 0, 0, 0, 2, 5, 0, 6, 2, 4, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 2, 2, 6]), 'action': 4, 'reward': 0.0, 'newState': array([0, 0, 0, 0, 3, 5, 0, 6, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 1, 1, 6]), 'info': {'request': array([1, 6]), 'acceptance': False}}
{'iter': 0, 'episode': 1, 'step': 355, 'oldState': array([0, 0, 0, 0, 3, 5, 0, 6, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 1, 1, 6]), 'action': 4, 'reward': 0.0, 'newState': array([0, 0, 0, 0, 3, 5, 1, 0

       0, 0, 0, 2, 0, 4]), 'info': {'request': array([0, 4]), 'acceptance': True}}
{'iter': 0, 'episode': 1, 'step': 406, 'oldState': array([2, 0, 0, 1, 1, 0, 3, 5, 1, 4, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 2, 0, 4]), 'action': 0, 'reward': 100.0, 'newState': array([1, 0, 0, 1, 1, 1, 3, 4, 1, 4, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 2, 2, 6]), 'info': {'request': array([2, 6]), 'acceptance': True}}
{'iter': 0, 'episode': 1, 'step': 407, 'oldState': array([1, 0, 0, 1, 1, 1, 3, 4, 1, 4, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 2, 2, 6]), 'action': 0, 'reward': 130.0, 'newState': array([0, 0, 0, 1, 3, 1, 3, 6, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 1, 6, 3]), 'info': {'request': array([6, 3]), 'acceptance': True}}
{'iter': 0, 'episode': 1, 'step': 408, 'oldState': array([0, 0, 0, 1, 3, 1, 3, 6, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 1, 6, 3]), 'action': 6, 'reward': 140.0, 'newState': array([0, 0, 0, 1, 3, 1, 2

       0, 0, 0, 0, 3, 4]), 'info': {'request': array([3, 4]), 'acceptance': False}}
{'iter': 0, 'episode': 1, 'step': 439, 'oldState': array([0, 1, 0, 0, 2, 2, 4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 3, 4]), 'action': 1, 'reward': 0.0, 'newState': array([0, 1, 0, 0, 2, 2, 4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 6, 1]), 'info': {'request': array([6, 1]), 'acceptance': False}}
{'iter': 0, 'episode': 1, 'step': 440, 'oldState': array([0, 1, 0, 0, 2, 2, 4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 6, 1]), 'action': 6, 'reward': 140.0, 'newState': array([0, 1, 0, 0, 2, 2, 3, 1, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 1, 4, 1]), 'info': {'request': array([4, 1]), 'acceptance': True}}
{'iter': 0, 'episode': 1, 'step': 441, 'oldState': array([0, 1, 0, 0, 2, 2, 3, 1, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 1, 4, 1]), 'action': 4, 'reward': 120.0, 'newState': array([0, 1, 0, 0, 1, 2, 3

       0, 0, 0, 0, 5, 1]), 'info': {'request': array([5, 1]), 'acceptance': False}}
{'iter': 0, 'episode': 1, 'step': 477, 'oldState': array([0, 1, 4, 2, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 5, 1]), 'action': 4, 'reward': 0.0, 'newState': array([0, 1, 4, 2, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 6, 3]), 'info': {'request': array([6, 3]), 'acceptance': False}}
{'iter': 0, 'episode': 1, 'step': 478, 'oldState': array([0, 1, 4, 2, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 6, 3]), 'action': 4, 'reward': 130.0, 'newState': array([0, 1, 4, 2, 1, 0, 0, 3, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 1, 5, 3]), 'info': {'request': array([5, 3]), 'acceptance': True}}
{'iter': 0, 'episode': 1, 'step': 479, 'oldState': array([0, 1, 4, 2, 1, 0, 0, 3, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 1, 5, 3]), 'action': 4, 'reward': 130.0, 'newState': array([0, 1, 4, 2, 0, 0, 0

**************************************************
Data save complete
**************************************************
randomcar
**************************************************
Running experiment
**************************************************
{'iter': 0, 'episode': 0, 'step': 0, 'oldState': array([0, 0, 3, 0, 3, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 5, 1]), 'action': 4, 'reward': 130.0, 'newState': array([0, 0, 3, 0, 2, 3, 0, 1, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 1, 5, 3]), 'info': {'request': array([5, 3]), 'acceptance': True}}
{'iter': 0, 'episode': 0, 'step': 1, 'oldState': array([0, 0, 3, 0, 2, 3, 0, 1, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 1, 5, 3]), 'action': 2, 'reward': 0.0, 'newState': array([0, 0, 3, 0, 2, 3, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 1, 4, 2]), 'info': {'request': array([4, 2]), 'acceptance': False}}
{'iter': 0, 'episode': 0, 'step': 2, 'oldState': array(

       0, 0, 0, 1, 0, 4]), 'info': {'request': array([0, 4]), 'acceptance': True}}
{'iter': 0, 'episode': 0, 'step': 27, 'oldState': array([1, 1, 1, 0, 2, 3, 0, 4, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 1, 0, 4]), 'action': 0, 'reward': 100.0, 'newState': array([0, 1, 1, 0, 2, 3, 0, 4, 1, 4, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 2, 0, 5]), 'info': {'request': array([0, 5]), 'acceptance': True}}
{'iter': 0, 'episode': 0, 'step': 28, 'oldState': array([0, 1, 1, 0, 2, 3, 0, 4, 1, 4, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 2, 0, 5]), 'action': 4, 'reward': 0.0, 'newState': array([0, 1, 1, 0, 4, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 4, 1]), 'info': {'request': array([4, 1]), 'acceptance': False}}
{'iter': 0, 'episode': 0, 'step': 29, 'oldState': array([0, 1, 1, 0, 4, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 4, 1]), 'action': 4, 'reward': 120.0, 'newState': array([0, 1, 1, 0, 3, 3, 0, 1,

       0, 0, 0, 2, 0, 6]), 'info': {'request': array([0, 6]), 'acceptance': True}}
{'iter': 0, 'episode': 0, 'step': 77, 'oldState': array([1, 0, 3, 0, 2, 0, 1, 1, 2, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 2, 0, 6]), 'action': 6, 'reward': 0.0, 'newState': array([1, 1, 3, 0, 2, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 1, 4, 2]), 'info': {'request': array([4, 2]), 'acceptance': False}}
{'iter': 0, 'episode': 0, 'step': 78, 'oldState': array([1, 1, 3, 0, 2, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 1, 4, 2]), 'action': 2, 'reward': 0.0, 'newState': array([1, 2, 3, 0, 2, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 4, 3]), 'info': {'request': array([4, 3]), 'acceptance': False}}
{'iter': 0, 'episode': 0, 'step': 79, 'oldState': array([1, 2, 3, 0, 2, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 4, 3]), 'action': 1, 'reward': 0.0, 'newState': array([1, 2, 3, 0, 2, 0, 1, 0, 0,

{'iter': 0, 'episode': 0, 'step': 154, 'oldState': array([1, 2, 5, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 4, 3]), 'action': 0, 'reward': 0.0, 'newState': array([1, 2, 5, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 6, 3]), 'info': {'request': array([6, 3]), 'acceptance': False}}
{'iter': 0, 'episode': 0, 'step': 155, 'oldState': array([1, 2, 5, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 6, 3]), 'action': 0, 'reward': 0.0, 'newState': array([1, 2, 5, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 6, 0]), 'info': {'request': array([6, 0]), 'acceptance': False}}
{'iter': 0, 'episode': 0, 'step': 156, 'oldState': array([1, 2, 5, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 6, 0]), 'action': 1, 'reward': 0.0, 'newState': array([1, 2, 5, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 4, 3]), 'info': {'req

       0, 0, 0, 0, 1, 5]), 'info': {'request': array([1, 5]), 'acceptance': False}}
{'iter': 0, 'episode': 0, 'step': 242, 'oldState': array([0, 5, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 1, 5]), 'action': 1, 'reward': 140.0, 'newState': array([0, 4, 1, 1, 1, 1, 0, 5, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 1, 4, 2]), 'info': {'request': array([4, 2]), 'acceptance': True}}
{'iter': 0, 'episode': 0, 'step': 243, 'oldState': array([0, 4, 1, 1, 1, 1, 0, 5, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 1, 4, 2]), 'action': 2, 'reward': 0.0, 'newState': array([0, 4, 1, 1, 1, 1, 0, 5, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 1, 2, 4]), 'info': {'request': array([2, 4]), 'acceptance': False}}
{'iter': 0, 'episode': 0, 'step': 244, 'oldState': array([0, 4, 1, 1, 1, 1, 0, 5, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 1, 2, 4]), 'action': 4, 'reward': 0.0, 'newState': array([0, 4, 1, 1, 1, 2, 0, 

       0, 0, 0, 0, 2, 4]), 'info': {'request': array([2, 4]), 'acceptance': False}}
{'iter': 0, 'episode': 0, 'step': 299, 'oldState': array([0, 3, 3, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 2, 4]), 'action': 2, 'reward': 120.0, 'newState': array([0, 3, 2, 1, 1, 1, 0, 4, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 1, 1, 4]), 'info': {'request': array([1, 4]), 'acceptance': True}}
{'iter': 0, 'episode': 0, 'step': 300, 'oldState': array([0, 3, 2, 1, 1, 1, 0, 4, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 1, 1, 4]), 'action': 1, 'reward': 120.0, 'newState': array([0, 2, 2, 1, 1, 1, 0, 4, 1, 4, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 2, 4, 1]), 'info': {'request': array([4, 1]), 'acceptance': True}}
{'iter': 0, 'episode': 0, 'step': 301, 'oldState': array([0, 2, 2, 1, 1, 1, 0, 4, 1, 4, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 2, 4, 1]), 'action': 4, 'reward': 120.0, 'newState': array([0, 2, 2, 1, 1, 1, 

       0, 0, 0, 1, 2, 6]), 'info': {'request': array([2, 6]), 'acceptance': False}}
{'iter': 0, 'episode': 0, 'step': 385, 'oldState': array([1, 2, 2, 1, 1, 0, 1, 3, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 1, 2, 6]), 'action': 1, 'reward': 0.0, 'newState': array([1, 2, 2, 2, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 6, 3]), 'info': {'request': array([6, 3]), 'acceptance': False}}
{'iter': 0, 'episode': 0, 'step': 386, 'oldState': array([1, 2, 2, 2, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 6, 3]), 'action': 4, 'reward': 130.0, 'newState': array([1, 2, 2, 2, 0, 0, 1, 3, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 1, 4, 1]), 'info': {'request': array([4, 1]), 'acceptance': True}}
{'iter': 0, 'episode': 0, 'step': 387, 'oldState': array([1, 2, 2, 2, 0, 0, 1, 3, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 1, 4, 1]), 'action': 3, 'reward': 0.0, 'newState': array([1, 2, 2, 2, 0, 0, 1, 

       0, 0, 0, 0, 6, 1]), 'info': {'request': array([6, 1]), 'acceptance': False}}
{'iter': 0, 'episode': 0, 'step': 421, 'oldState': array([0, 2, 0, 1, 1, 2, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 6, 1]), 'action': 1, 'reward': 0.0, 'newState': array([0, 2, 0, 1, 1, 2, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 4, 2]), 'info': {'request': array([4, 2]), 'acceptance': False}}
{'iter': 0, 'episode': 0, 'step': 422, 'oldState': array([0, 2, 0, 1, 1, 2, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 4, 2]), 'action': 3, 'reward': 0.0, 'newState': array([0, 2, 0, 1, 1, 2, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 5, 2]), 'info': {'request': array([5, 2]), 'acceptance': False}}
{'iter': 0, 'episode': 0, 'step': 423, 'oldState': array([0, 2, 0, 1, 1, 2, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 5, 2]), 'action': 4, 'reward': 130.0, 'newState': array([0, 2, 0, 1, 0, 2, 3,

       0, 0, 0, 1, 5, 2]), 'info': {'request': array([5, 2]), 'acceptance': True}}
{'iter': 0, 'episode': 0, 'step': 455, 'oldState': array([1, 1, 1, 2, 0, 2, 1, 3, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 1, 5, 2]), 'action': 3, 'reward': 0.0, 'newState': array([1, 1, 1, 2, 0, 2, 1, 3, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 1, 4, 3]), 'info': {'request': array([4, 3]), 'acceptance': False}}
{'iter': 0, 'episode': 0, 'step': 456, 'oldState': array([1, 1, 1, 2, 0, 2, 1, 3, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 1, 4, 3]), 'action': 0, 'reward': 0.0, 'newState': array([1, 1, 1, 3, 0, 2, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 1, 4]), 'info': {'request': array([1, 4]), 'acceptance': False}}
{'iter': 0, 'episode': 0, 'step': 457, 'oldState': array([1, 1, 1, 3, 0, 2, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 1, 4]), 'action': 0, 'reward': 0.0, 'newState': array([1, 1, 1, 3, 0, 2, 1, 0,

{'iter': 0, 'episode': 1, 'step': 22, 'oldState': array([0, 2, 2, 1, 1, 2, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 5, 3]), 'action': 6, 'reward': 0.0, 'newState': array([0, 2, 2, 1, 1, 2, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 6, 2]), 'info': {'request': array([6, 2]), 'acceptance': False}}
{'iter': 0, 'episode': 1, 'step': 23, 'oldState': array([0, 2, 2, 1, 1, 2, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 6, 2]), 'action': 3, 'reward': 0.0, 'newState': array([0, 2, 2, 1, 1, 2, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 6]), 'info': {'request': array([0, 6]), 'acceptance': False}}
{'iter': 0, 'episode': 1, 'step': 24, 'oldState': array([0, 2, 2, 1, 1, 2, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 6]), 'action': 1, 'reward': 110.0, 'newState': array([0, 1, 2, 1, 1, 2, 1, 6, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 1, 1, 5]), 'info': {'requ

{'iter': 0, 'episode': 1, 'step': 74, 'oldState': array([0, 0, 0, 0, 0, 5, 4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 2, 6]), 'action': 6, 'reward': 0.0, 'newState': array([0, 0, 0, 0, 0, 5, 4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 6, 2]), 'info': {'request': array([6, 2]), 'acceptance': False}}
{'iter': 0, 'episode': 1, 'step': 75, 'oldState': array([0, 0, 0, 0, 0, 5, 4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 6, 2]), 'action': 6, 'reward': 140.0, 'newState': array([0, 0, 0, 0, 0, 5, 3, 2, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 1, 4, 0]), 'info': {'request': array([4, 0]), 'acceptance': True}}
{'iter': 0, 'episode': 1, 'step': 76, 'oldState': array([0, 0, 0, 0, 0, 5, 3, 2, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 1, 4, 0]), 'action': 5, 'reward': 0.0, 'newState': array([0, 0, 0, 0, 0, 5, 3, 2, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 1, 0, 4]), 'info': {'reque

       0, 0, 0, 0, 2, 6]), 'info': {'request': array([2, 6]), 'acceptance': False}}
{'iter': 0, 'episode': 1, 'step': 107, 'oldState': array([1, 1, 2, 1, 0, 2, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 2, 6]), 'action': 6, 'reward': 0.0, 'newState': array([1, 1, 2, 1, 0, 2, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 4, 0]), 'info': {'request': array([4, 0]), 'acceptance': False}}
{'iter': 0, 'episode': 1, 'step': 108, 'oldState': array([1, 1, 2, 1, 0, 2, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 4, 0]), 'action': 2, 'reward': 0.0, 'newState': array([1, 1, 2, 1, 0, 2, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 6, 1]), 'info': {'request': array([6, 1]), 'acceptance': False}}
{'iter': 0, 'episode': 1, 'step': 109, 'oldState': array([1, 1, 2, 1, 0, 2, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 6, 1]), 'action': 0, 'reward': 0.0, 'newState': array([1, 1, 2, 1, 0, 2, 2, 0

       0, 0, 0, 0, 1, 5]), 'info': {'request': array([1, 5]), 'acceptance': False}}
{'iter': 0, 'episode': 1, 'step': 209, 'oldState': array([0, 0, 3, 1, 2, 2, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 1, 5]), 'action': 4, 'reward': 0.0, 'newState': array([0, 0, 3, 1, 2, 2, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 6, 2]), 'info': {'request': array([6, 2]), 'acceptance': False}}
{'iter': 0, 'episode': 1, 'step': 210, 'oldState': array([0, 0, 3, 1, 2, 2, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 6, 2]), 'action': 5, 'reward': 0.0, 'newState': array([0, 0, 3, 1, 2, 2, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 4, 2]), 'info': {'request': array([4, 2]), 'acceptance': False}}
{'iter': 0, 'episode': 1, 'step': 211, 'oldState': array([0, 0, 3, 1, 2, 2, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 4, 2]), 'action': 2, 'reward': 0.0, 'newState': array([0, 0, 3, 1, 2, 2, 1, 0

       0, 0, 0, 0, 3, 6]), 'info': {'request': array([3, 6]), 'acceptance': False}}
{'iter': 0, 'episode': 1, 'step': 259, 'oldState': array([0, 3, 4, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 3, 6]), 'action': 2, 'reward': 0.0, 'newState': array([0, 3, 4, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 1, 4]), 'info': {'request': array([1, 4]), 'acceptance': False}}
{'iter': 0, 'episode': 1, 'step': 260, 'oldState': array([0, 3, 4, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 1, 4]), 'action': 1, 'reward': 120.0, 'newState': array([0, 2, 4, 1, 1, 0, 0, 4, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 1, 1, 4]), 'info': {'request': array([1, 4]), 'acceptance': True}}
{'iter': 0, 'episode': 1, 'step': 261, 'oldState': array([0, 2, 4, 1, 1, 0, 0, 4, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 1, 1, 4]), 'action': 3, 'reward': 0.0, 'newState': array([0, 2, 4, 1, 1, 0, 0, 

       0, 0, 0, 0, 5, 3]), 'info': {'request': array([5, 3]), 'acceptance': False}}
{'iter': 0, 'episode': 1, 'step': 306, 'oldState': array([0, 1, 4, 0, 2, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 5, 3]), 'action': 2, 'reward': 0.0, 'newState': array([0, 1, 4, 0, 2, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 6, 3]), 'info': {'request': array([6, 3]), 'acceptance': False}}
{'iter': 0, 'episode': 1, 'step': 307, 'oldState': array([0, 1, 4, 0, 2, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 6, 3]), 'action': 2, 'reward': 0.0, 'newState': array([0, 1, 4, 0, 2, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 6]), 'info': {'request': array([0, 6]), 'acceptance': False}}
{'iter': 0, 'episode': 1, 'step': 308, 'oldState': array([0, 1, 4, 0, 2, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 6]), 'action': 6, 'reward': 0.0, 'newState': array([0, 1, 4, 0, 2, 1, 1, 0

       0, 0, 0, 0, 3, 4]), 'info': {'request': array([3, 4]), 'acceptance': False}}
{'iter': 0, 'episode': 1, 'step': 379, 'oldState': array([0, 1, 2, 1, 2, 1, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 3, 4]), 'action': 4, 'reward': 0.0, 'newState': array([0, 1, 2, 1, 2, 1, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 5, 3]), 'info': {'request': array([5, 3]), 'acceptance': False}}
{'iter': 0, 'episode': 1, 'step': 380, 'oldState': array([0, 1, 2, 1, 2, 1, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 5, 3]), 'action': 4, 'reward': 130.0, 'newState': array([0, 1, 2, 1, 1, 1, 2, 3, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 1, 5, 3]), 'info': {'request': array([5, 3]), 'acceptance': True}}
{'iter': 0, 'episode': 1, 'step': 381, 'oldState': array([0, 1, 2, 1, 1, 1, 2, 3, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 1, 5, 3]), 'action': 4, 'reward': 130.0, 'newState': array([0, 1, 2, 1, 0, 1, 2

       0, 0, 0, 0, 1, 6]), 'info': {'request': array([1, 6]), 'acceptance': False}}
{'iter': 0, 'episode': 1, 'step': 416, 'oldState': array([1, 1, 3, 2, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 1, 6]), 'action': 2, 'reward': 0.0, 'newState': array([1, 1, 3, 2, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 6]), 'info': {'request': array([0, 6]), 'acceptance': False}}
{'iter': 0, 'episode': 1, 'step': 417, 'oldState': array([1, 1, 3, 2, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 6]), 'action': 4, 'reward': 0.0, 'newState': array([1, 1, 3, 2, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 2, 5]), 'info': {'request': array([2, 5]), 'acceptance': False}}
{'iter': 0, 'episode': 1, 'step': 418, 'oldState': array([1, 1, 3, 2, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 2, 5]), 'action': 2, 'reward': 140.0, 'newState': array([1, 1, 2, 2, 1, 1, 0,

{'iter': 0, 'episode': 1, 'step': 460, 'oldState': array([2, 0, 1, 2, 1, 2, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 3, 5]), 'action': 3, 'reward': 140.0, 'newState': array([2, 0, 1, 1, 1, 2, 1, 5, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 1, 6, 2]), 'info': {'request': array([6, 2]), 'acceptance': True}}
{'iter': 0, 'episode': 1, 'step': 461, 'oldState': array([2, 0, 1, 1, 1, 2, 1, 5, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 1, 6, 2]), 'action': 5, 'reward': 0.0, 'newState': array([2, 0, 1, 1, 1, 2, 1, 5, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 1, 3, 6]), 'info': {'request': array([3, 6]), 'acceptance': False}}
{'iter': 0, 'episode': 1, 'step': 462, 'oldState': array([2, 0, 1, 1, 1, 2, 1, 5, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 1, 3, 6]), 'action': 4, 'reward': 0.0, 'newState': array([2, 0, 1, 1, 1, 3, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 5, 1]), 'info': {'re

# Step 6: Generate Figures

Create a chart to compare the different heuristic functions

In [9]:
fig_path = '../figures/'
fig_name = 'rideshare_'+'_line_plot'+'.pdf'
or_suite.plots.plot_line_plots(path_list_line, algo_list_line, fig_path, fig_name, int(nEps / 40)+1)

additional_metric = {'ACPT': lambda traj : or_suite.utils.acceptance_rate(traj, lambda x, y : lengths[x,y]),'MN': lambda traj : or_suite.utils.mean_dispatch_dist(traj, lambda x, y : lengths[x,y]),'VAR': lambda traj : or_suite.utils.var_dispatch_dist(traj, lambda x, y : lengths[x,y])}

graph = nx.Graph(CONFIG['edges'])
lengths = or_suite.envs.ridesharing.rideshare_graph.RideshareGraphEnvironment.find_lengths(rideshare_env, graph, graph.number_of_nodes())

fig_name = 'rideshare_'+'_'+'_radar_plot'+'.pdf'
or_suite.plots.plot_radar_plots(path_list_radar, algo_list_radar,
fig_path, fig_name,
additional_metric
)

# TODO: Import figures and display


        Algorithm   Reward     Time     Space   ACPT     MN       VAR
0  maxweightfixed  44530.0  0.45409 -623018.0  0.696  -8.10 -162.1900
1      closestcar  44430.0  0.45621 -584849.0  0.712  -7.75 -173.6375
2       randomcar  12310.0  0.46670 -591073.0  0.221 -39.61 -783.1479
