# Introduction to our PettingZoo Environment

> We have created a framework that integrates reinforcement learning (RL) with a microscopic traffic simulation tool to explore the potential of RL in optimizing urban route choice.

> We use [SUMO](https://sumo.dlr.de/docs/index.html), an open-source, microscopic and continuous traffic simulation.

## Related work

> Some methods have utilized RL for optimal route choice (Thomasini et al. [2023](https://alaworkshop2023.github.io/papers/ALA2023_paper_69.pdf/)). These approaches
are typically based on macroscopic traffic simulations, which model relationships among traffic
flow characteristics such as density, flow, and mean speed of a traffic stream. In contrast, our
problem employs a microscopic model, which focuses on interactions between individual vehicles.

> Additionally, a method proposed by (Tavares and Bazzan [2012](https://www.researchgate.net/publication/235219033_Reinforcement_learning_for_route_choice_in_an_abstract_traffic_scenario)) addresses optimal route choice at the microscopic level, where rewards are generated through a predefined function. In contrast, in our approach, rewards are provided dynamically by a continuous traffic simulator.

#### Import libraries

In [2]:
import sys
import os
from tqdm import tqdm
from keychain_main import Keychain as kc
import numpy as np

sys.path.append(os.path.abspath(os.path.join(os.getcwd(), '../../')))

from RouteRL.environment.environment import TrafficEnvironment
from RouteRL.services import plotter

from RouteRL.create_agents import create_agent_objects
from RouteRL.utilities import check_device
from RouteRL.utilities import get_params
from RouteRL.utilities import set_seeds

from pettingzoo.test import api_test

In [3]:
check_device()
set_seeds()
params = get_params(kc.PARAMS_PATH)

[INFO] Running on device: cpu


#### Environment initialization

> In this example, the environment initially contains only human agents.

In [4]:
env = TrafficEnvironment(params[kc.RUNNER], params[kc.ENVIRONMENT], params[kc.SIMULATOR], params[kc.AGENT_GEN], params[kc.AGENTS], params[kc.PHASE])

[CONFIRMED] Environment variable exists: SUMO_HOME
[SUCCESS] Added module directory: C:\Program Files (x86)\Eclipse\Sumo\tools


In [5]:
print("Number of total agents is: ", len(env.all_agents), "\n")
print("Agents are: ", env.all_agents, "\n")
print("Number of human agents is: ", len(env.human_agents), "\n")
print("Number of machine agents (autonomous vehicles) is: ", len(env.machine_agents), "\n")

Number of total agents is:  20 

Agents are:  [Human 0, Human 1, Human 2, Human 3, Human 4, Human 5, Human 6, Human 7, Human 8, Human 9, Human 10, Human 11, Human 12, Human 13, Human 14, Human 15, Human 16, Human 17, Human 18, Human 19] 

Number of human agents is:  20 

Number of machine agents (autonomous vehicles) is:  0 



> Reset the environment and the connection with SUMO

In [6]:
env.start()
env.reset()

({}, {})

In [7]:
num_episodes = 100

for episode in range(num_episodes):
    env.step()

> Mutation: a portion of human agents are converted into machine agents (autonomous vehicles). You can adjust the number of agents to be mutated in the <code style="color:white">/params.json</code> file.

In [8]:
env.mutation()

In [9]:
print("Number of total agents is: ", len(env.all_agents), "\n")
print("Agents are: ", env.all_agents, "\n")
print("Number of human agents is: ", len(env.human_agents), "\n")
print("Number of machine agents (autonomous vehicles) is: ", len(env.machine_agents), "\n")

Number of total agents is:  20 

Agents are:  [Machine 2, Machine 17, Machine 9, Machine 8, Machine 7, Machine 4, Machine 10, Machine 15, Machine 0, Machine 1, Human 3, Human 5, Human 6, Human 11, Human 12, Human 13, Human 14, Human 16, Human 18, Human 19] 

Number of human agents is:  10 

Number of machine agents (autonomous vehicles) is:  10 



In [10]:
env.machine_agents

[Machine 2,
 Machine 17,
 Machine 9,
 Machine 8,
 Machine 7,
 Machine 4,
 Machine 10,
 Machine 15,
 Machine 0,
 Machine 1]

In [11]:
episodes = 1

for episode in range(episodes):
    env.reset()
    for agent in env.agent_iter():
        observation, reward, termination, truncation, info = env.last()

        if termination or truncation:
            print(f"Agent {agent} is terminating\n")
            action = None
        else:
            # this is where you would insert your policy
            action = env.action_space(agent).sample()

        print(f"Agent {agent} is going to step with action {action}\n")
        env.step(action)

Agent 8 is going to step with action 0

Agent 9 is going to step with action 1

Agent 0 is going to step with action 1

Agent 15 is going to step with action 0

Agent 2 is going to step with action 0

Agent 17 is going to step with action 1

Agent 7 is going to step with action 1

Agent 4 is going to step with action 0

Agent 10 is going to step with action 0

Agent 1 is going to step with action 0

Agent 8 is terminating

Agent 8 is going to step with action None

Agent 9 is terminating

Agent 9 is going to step with action None

Agent 0 is terminating

Agent 0 is going to step with action None

Agent 15 is terminating

Agent 15 is going to step with action None

Agent 2 is terminating

Agent 2 is going to step with action None

Agent 17 is terminating

Agent 17 is going to step with action None

Agent 7 is terminating

Agent 7 is going to step with action None

Agent 4 is terminating

Agent 4 is going to step with action None

Agent 10 is terminating

Agent 10 is going to step with a

<code style="color:white">agent_iter(max_iter=2**63)</code> returns an iterator that yields the current agent of the environment. It terminates when all agents in the environment are done or when max_iter (steps have been executed).

<code style="color:white">last(observe=True)</code> returns observation, reward, done, and info for the agent currently able to act. The returned reward is the cumulative reward that the agent has received since it last acted. If observe is set to False, the observation will not be computed, and None will be returned in its place. Note that a single agent being done does not imply the environment is done.

<code style="color:white">reset()</code> resets the environment and sets it up for use when called the first time. This method must be called before any other method.

<code style="color:white">step(action)</code> takes and executes the action of the agent in the environment, automatically switches control to the next agent.

> Close SUMO connection.

In [12]:
env.stop()

In [13]:
from RouteRL.services import plotter
plotter(params[kc.PLOTTER])



<RouteRL.services.plotter.Plotter at 0x28af775cbd0>

> Check the folder RouteRL/network_and_config/agents_data.csv and adjust the number of agents.

It will be nice if we can have AVs have different colour from human drivers and every time we run sumo to understand which vehicle is which.