# Introduction to our PettingZoo Environment

> We have created a framework that integrates reinforcement learning (RL) with a microscopic traffic simulation tool to explore the potential of RL in optimizing urban route choice.

> We use [SUMO](https://sumo.dlr.de/docs/index.html), an open-source, microscopic and continuous traffic simulation.

## Related work

> Some methods have utilized RL for optimal route choice (Thomasini et al. [2023](https://alaworkshop2023.github.io/papers/ALA2023_paper_69.pdf/)). These approaches
are typically based on macroscopic traffic simulations, which model relationships among traffic
flow characteristics such as density, flow, and mean speed of a traffic stream. In contrast, our
problem employs a microscopic model, which focuses on interactions between individual vehicles.

> Additionally, a method proposed by (Tavares and Bazzan [2012](https://www.researchgate.net/publication/235219033_Reinforcement_learning_for_route_choice_in_an_abstract_traffic_scenario)) addresses optimal route choice at the microscopic level, where rewards are generated through a predefined function. In contrast, in our approach, rewards are provided dynamically by a continuous traffic simulator.

#### Import libraries

In [1]:
import sys
import os
from tqdm import tqdm
import numpy as np

sys.path.append(os.path.abspath(os.path.join(os.getcwd(), '../../')))

from routerl.environment.environment import TrafficEnvironment
from routerl.services import plotter
from routerl.keychain import Keychain as kc

from routerl.create_agents import create_agent_objects
from routerl.utilities import check_device
from routerl.utilities import get_params
from routerl.utilities import set_seeds

In [2]:
check_device()
set_seeds()
params = get_params("params_main.json")

[INFO] Running on device: cpu


#### Environment initialization

> In this example, the environment initially contains only human agents.

In [3]:
kwargs = {
        "learning_type" : 'markow',
        "gamma_c" : 0.2,
        "gamma_u" : 0.2,
        "remember" : 5,
        "greedy" : 0.3,
        "noise_alpha" : 0,
        "noise_taste" : 0.8,
        "noise_random" : 0.2,
        "network":'test'
        }

In [4]:
env = TrafficEnvironment(params[kc.RUNNER], params[kc.ENVIRONMENT], params[kc.SIMULATOR], params[kc.AGENT_GEN], params[kc.AGENTS], params[kc.PLOTTER],**kwargs)

[CONFIRMED] Environment variable exists: SUMO_HOME
[SUCCESS] Added module directory: /opt/homebrew/opt/sumo/share/sumo/tools
here
   origins  destinations                                               path  \
0        0             0  441496282#0,441496282#1,441496282#2,441496282#...   
1        0             0  441496282#0,441496282#1,441496282#2,441496282#...   

   free_flow_time  
0               0  
1               0  


In [5]:
print("Number of total agents is: ", len(env.all_agents), "\n")
print("Agents are: ", env.all_agents, "\n")
print("Number of human agents is: ", len(env.human_agents), "\n")
print("Number of machine agents (autonomous vehicles) is: ", len(env.machine_agents), "\n")

Number of total agents is:  20 

Agents are:  [<agent.HumanAgent object at 0x103215fa0>, <agent.HumanAgent object at 0x1519f0dc0>, <agent.HumanAgent object at 0x103227ac0>, <agent.HumanAgent object at 0x103227910>, <agent.HumanAgent object at 0x103288430>, <agent.HumanAgent object at 0x1032276a0>, <agent.HumanAgent object at 0x10315ce50>, <agent.HumanAgent object at 0x1032887c0>, <agent.HumanAgent object at 0x10315cd60>, <agent.HumanAgent object at 0x103227c10>, <agent.HumanAgent object at 0x1519f0f40>, <agent.HumanAgent object at 0x1032883d0>, <agent.HumanAgent object at 0x103227cd0>, <agent.HumanAgent object at 0x103288760>, <agent.HumanAgent object at 0x28adb0430>, <agent.HumanAgent object at 0x28adb0550>, <agent.HumanAgent object at 0x28adb0d60>, <agent.HumanAgent object at 0x28adb0dc0>, <agent.HumanAgent object at 0x28adb0df0>, <agent.HumanAgent object at 0x28adb0e20>] 

Number of human agents is:  20 

Number of machine agents (autonomous vehicles) is:  0 



> Reset the environment and the connection with SUMO

#### Human learning

In [6]:
from pathlib import Path

RoutingZoo = str(Path.home() / "Documents/Simulator_human_behaviour")
sys.path.append(RoutingZoo)
import utilities_RZ as URZ

In [7]:
num_episodes =  1

env.start()

for episode in range(num_episodes):
    
    env.step()

    URZ_help = URZ.Utilities(env,RoutingZoo,episode,**kwargs)

    URZ_help.data()

    URZ_help.Replace_data()

env.close()

 Retrying in 1 seconds




KeyError: 'demand'

import pandas as pd

num_episodes =  1

env.start()

for episode in range(num_episodes):
    
    env.step()

    id = []
    Utilities = []
    Noises = []

    for a in range(len(env.human_agents)):

        id.append(env.human_agents[a].id)
        Utilities.append(env.human_agents[a].stored_utilities)
        Noises.append(env.human_agents[a].stored_noises)

    data = pd.DataFrame([id,Utilities,Noises]).T
    data = data.rename(columns={0:'id',1:'utilities',2:'noises'})
    data.to_csv(f'{RoutingZoo}/training_records/agents/ep_{episode+1}.csv',index=False)
    import shutil

    os.rename(f'/Users/zoltanvarga/Documents/RouteRL/tutorials/PettingZooEnv/training_records/episodes/ep{episode+1}.csv',f'/Users/zoltanvarga/Documents/RouteRL/tutorials/PettingZooEnv/training_records/episodes/ep_ep{episode+1}.csv')
    source = f'/Users/zoltanvarga/Documents/RouteRL/tutorials/PettingZooEnv/training_records/episodes/ep_ep{episode+1}.csv'
    destination = f'{RoutingZoo}/training_records/episodes/ep_ep{episode+1}.csv'

    shutil.copy2(source, destination)

env.stop()

import sys
from pathlib import Path

# Get the path to FolderB
simulator_path = str(Path.home() / "Documents/Simulator_human_behaviour")
sys.path.append(simulator_path)

In [11]:
import data_analysis as da

In [12]:
program = da.Table_record_creator(1,URZ_help.model,0,2)

In [13]:
program.table_record()

Run was succesful


Unnamed: 0,network,model,demand,Bounded,Greedy,link_value,link_std,TT_value,TT_std,entropy_value,entropy_std
0,test,markow,20,0.2,0.3,0.0,0.0,13.0,0.0,0.0,0.0


#### Mutation

> Mutation: a portion of human agents are converted into machine agents (autonomous vehicles). You can adjust the number of agents to be mutated in the <code style="color:white">/params_main.json</code> file.

In [None]:
env.mutation()

In [8]:
print("Number of total agents is: ", len(env.all_agents), "\n")
print("Agents are: ", env.all_agents, "\n")
print("Number of human agents is: ", len(env.human_agents), "\n")
print("Number of machine agents (autonomous vehicles) is: ", len(env.machine_agents), "\n")

Number of total agents is:  20 

Agents are:  [<agent.HumanAgent object at 0x106e629a0>, <agent.HumanAgent object at 0x106e75460>, <agent.HumanAgent object at 0x15e4af670>, <agent.HumanAgent object at 0x106e62e20>, <agent.HumanAgent object at 0x15efcf2b0>, <agent.HumanAgent object at 0x15e4affa0>, <agent.HumanAgent object at 0x106ebc520>, <agent.HumanAgent object at 0x1076087f0>, <agent.HumanAgent object at 0x107608850>, <agent.HumanAgent object at 0x106ee6760>, <agent.HumanAgent object at 0x106e757f0>, <agent.HumanAgent object at 0x107881f40>, <agent.HumanAgent object at 0x107881f70>, <agent.HumanAgent object at 0x106ee6370>, <agent.HumanAgent object at 0x106ebcca0>, <agent.HumanAgent object at 0x106ee63d0>, <agent.HumanAgent object at 0x1078812e0>, <agent.HumanAgent object at 0x15e4af730>, <agent.HumanAgent object at 0x15e4afe20>, <agent.HumanAgent object at 0x106ebc220>] 

Number of human agents is:  20 

Number of machine agents (autonomous vehicles) is:  0 



In [None]:
env.machine_agents

[Machine 7,
 Machine 17,
 Machine 12,
 Machine 11,
 Machine 10,
 Machine 8,
 Machine 13,
 Machine 16,
 Machine 5,
 Machine 6]

In [None]:
episodes = 1

for episode in range(episodes):
    print(f"\nStarting episode {episode + 1}")
    env.reset()
    
    for agent in env.agent_iter():
        observation, reward, termination, truncation, info = env.last()

        if termination or truncation:
            action = None
        else:
            # Policy action or random sampling
            action = env.action_space(agent).sample()
        print(f"Agent {agent} takes action: {action}")
        
        env.step(action)
        print(f"Agent {agent} has stepped, environment updated.\n")



Starting episode 1
Agent 5 takes action: 0
Agent 5 has stepped, environment updated.

Agent 6 takes action: 1
Agent 6 has stepped, environment updated.

Agent 7 takes action: 1
Agent 7 has stepped, environment updated.

Agent 8 takes action: 1
Agent 8 has stepped, environment updated.

Agent 10 takes action: 0
Agent 10 has stepped, environment updated.

Agent 11 takes action: 1
Agent 11 has stepped, environment updated.

Agent 12 takes action: 0
Agent 12 has stepped, environment updated.

Agent 13 takes action: 1
Agent 13 has stepped, environment updated.

Agent 16 takes action: 1
Agent 16 has stepped, environment updated.

Agent 17 takes action: 0
Agent 17 has stepped, environment updated.

Agent 5 takes action: None
Agent 5 has stepped, environment updated.

Agent 6 takes action: None
Agent 6 has stepped, environment updated.

Agent 7 takes action: None
Agent 7 has stepped, environment updated.

Agent 8 takes action: None
Agent 8 has stepped, environment updated.

Agent 10 takes act

<code style="color:white">agent_iter(max_iter=2**63)</code> returns an iterator that yields the current agent of the environment. It terminates when all agents in the environment are done or when max_iter (steps have been executed).

<code style="color:white">last(observe=True)</code> returns observation, reward, done, and info for the agent currently able to act. The returned reward is the cumulative reward that the agent has received since it last acted. If observe is set to False, the observation will not be computed, and None will be returned in its place. Note that a single agent being done does not imply the environment is done.

<code style="color:white">reset()</code> resets the environment and sets it up for use when called the first time. This method must be called before any other method.

<code style="color:white">step(action)</code> takes and executes the action of the agent in the environment, automatically switches control to the next agent.

> Close SUMO connection.

In [None]:
env.stop()

FatalTraCIError: Connection closed by SUMO.

In [None]:
from RouteRL.services import plotter
plotter(params[kc.PLOTTER])