# RouteRL Quickstart

We simulate a simple network topology where humans and later AVs make routing decisions to maximize their rewards (i.e., minimize travel times) over a sequence of days.

* For the first 100 days, we model a human-driven system, where drivers update their routing policies using behavioral models to optimize rewards.
* Each day, we simulate the impact of joint actions using the [`SUMO`](https://eclipse.dev/sumo/) traffic simulator, which returns the reward for each agent.
* After 100 days, we introduce 10 `Autononmous Vehicles` as `Petting Zoo` agents, allowing them to use any `MARL` algorithm to maximise rewards.
* Finally, we analyse basic results from the simulation.
  




#### Import libraries

In [None]:
import sys
import os
sys.path.append(os.path.abspath(os.path.join(os.getcwd(), '../../')))

from routerl import TrafficEnvironment

#### Define hyperparameters

> Further parameters customization can take place by modifying the entries of the `routerl/environment/params.json`. Users can create a dictionary with their preferred adjustments and pass it as an argument to the `TrafficEnvironment` class.

In [10]:
human_learning_episodes = 100


env_params = {
    "agent_parameters" : {
        "num_agents" : 100,
        "new_machines_after_mutation": 10,
        "human_parameters" : {
            "model" : "w_avg"
        },
        "machine_parameters" :
        {
            "behavior" : "malicious",
        }
    },
    "simulator_parameters" : {
        "network_name" : "two_route_yield"
    },  
    "plotter_parameters" : {
        "phases" : [0, human_learning_episodes],
        "smooth_by" : 50,
    },
    "path_generation_parameters":
    {
        "number_of_paths" : 3,
        "beta" : -5,
    }
}

#### Environment initialization

> In our setup, road networks initially consist of human agents, with AVs introduced later.

- First, we initialize environment with `TrafficEnvironment` which contains the human agents.
- Then we create the connection with `SUMO` and the initialization

In [None]:
env = TrafficEnvironment(seed=42, **env_params)

[CONFIRMED] Environment variable exists: SUMO_HOME
[SUCCESS] Added module directory: /opt/homebrew/opt/sumo/share/sumo/tools
TrafficEnvironment with 100 agents.            
0 machines and 100 humans.            
Machines: []            
Humans: [Human 0, Human 1, Human 2, Human 3, Human 4, Human 5, Human 6, Human 7, Human 8, Human 9, Human 10, Human 11, Human 12, Human 13, Human 14, Human 15, Human 16, Human 17, Human 18, Human 19, Human 20, Human 21, Human 22, Human 23, Human 24, Human 25, Human 26, Human 27, Human 28, Human 29, Human 30, Human 31, Human 32, Human 33, Human 34, Human 35, Human 36, Human 37, Human 38, Human 39, Human 40, Human 41, Human 42, Human 43, Human 44, Human 45, Human 46, Human 47, Human 48, Human 49, Human 50, Human 51, Human 52, Human 53, Human 54, Human 55, Human 56, Human 57, Human 58, Human 59, Human 60, Human 61, Human 62, Human 63, Human 64, Human 65, Human 66, Human 67, Human 68, Human 69, Human 70, Human 71, Human 72, Human 73, Human 74, Human 75, Huma

In [12]:
print("Number of total agents is: ", len(env.all_agents), "\n")
print("Agents are: ", env.all_agents, "\n")
print("Number of human agents is: ", len(env.human_agents), "\n")
print("Number of machine agents (autonomous vehicles) is: ", len(env.machine_agents), "\n")

Number of total agents is:  100 

Agents are:  [Human 0, Human 1, Human 2, Human 3, Human 4, Human 5, Human 6, Human 7, Human 8, Human 9, Human 10, Human 11, Human 12, Human 13, Human 14, Human 15, Human 16, Human 17, Human 18, Human 19, Human 20, Human 21, Human 22, Human 23, Human 24, Human 25, Human 26, Human 27, Human 28, Human 29, Human 30, Human 31, Human 32, Human 33, Human 34, Human 35, Human 36, Human 37, Human 38, Human 39, Human 40, Human 41, Human 42, Human 43, Human 44, Human 45, Human 46, Human 47, Human 48, Human 49, Human 50, Human 51, Human 52, Human 53, Human 54, Human 55, Human 56, Human 57, Human 58, Human 59, Human 60, Human 61, Human 62, Human 63, Human 64, Human 65, Human 66, Human 67, Human 68, Human 69, Human 70, Human 71, Human 72, Human 73, Human 74, Human 75, Human 76, Human 77, Human 78, Human 79, Human 80, Human 81, Human 82, Human 83, Human 84, Human 85, Human 86, Human 87, Human 88, Human 89, Human 90, Human 91, Human 92, Human 93, Human 94, Human 95, Hu

> Reset the environment and the connection with SUMO

#### Human learning

In [13]:
env.start()

for episode in range(human_learning_episodes):
    
    env.step()


 Retrying in 1 seconds


#### Mutation

> Mutation: a portion of human agents are converted into machine agents (autonomous vehicles). 

In [14]:
env.mutation()

In [15]:
print("Number of total agents is: ", len(env.all_agents), "\n")
print("Agents are: ", env.all_agents, "\n")
print("Number of human agents is: ", len(env.human_agents), "\n")
print("Number of machine agents (autonomous vehicles) is: ", len(env.machine_agents), "\n")

Number of total agents is:  100 

Agents are:  [Machine 1, Machine 15, Machine 10, Machine 91, Machine 22, Machine 73, Machine 5, Machine 52, Machine 81, Machine 77, Human 0, Human 2, Human 3, Human 4, Human 6, Human 7, Human 8, Human 9, Human 11, Human 12, Human 13, Human 14, Human 16, Human 17, Human 18, Human 19, Human 20, Human 21, Human 23, Human 24, Human 25, Human 26, Human 27, Human 28, Human 29, Human 30, Human 31, Human 32, Human 33, Human 34, Human 35, Human 36, Human 37, Human 38, Human 39, Human 40, Human 41, Human 42, Human 43, Human 44, Human 45, Human 46, Human 47, Human 48, Human 49, Human 50, Human 51, Human 53, Human 54, Human 55, Human 56, Human 57, Human 58, Human 59, Human 60, Human 61, Human 62, Human 63, Human 64, Human 65, Human 66, Human 67, Human 68, Human 69, Human 70, Human 71, Human 72, Human 74, Human 75, Human 76, Human 78, Human 79, Human 80, Human 82, Human 83, Human 84, Human 85, Human 86, Human 87, Human 88, Human 89, Human 90, Human 92, Human 93, Hu

In [16]:
env.machine_agents

[Machine 1,
 Machine 15,
 Machine 10,
 Machine 91,
 Machine 22,
 Machine 73,
 Machine 5,
 Machine 52,
 Machine 81,
 Machine 77]

> Human and AV agents interact with the environment over multiple episodes, with AVs following a random policy as defined in the PettingZoo environment [loop](https://pettingzoo.farama.org/content/basic_usage/).

In [17]:
episodes = 1

for episode in range(episodes):
    print(f"\nStarting episode {episode + 1}")
    env.reset()
    
    for agent in env.agent_iter():
        observation, reward, termination, truncation, info = env.last()

        if termination or truncation:
            action = None
        else:
            # Policy action or random sampling
            action = env.action_space(agent).sample()
        print(f"Agent {agent} takes action: {action}")
        
        env.step(action)
        print(f"Agent {agent} has stepped, environment updated.\n")



Starting episode 1
Agent 52 takes action: 1
Agent 52 has stepped, environment updated.

Agent 5 takes action: 0
Agent 5 has stepped, environment updated.

Agent 1 takes action: 0
Agent 1 has stepped, environment updated.

Agent 15 takes action: 0
Agent 15 has stepped, environment updated.

Agent 81 takes action: 1
Agent 81 has stepped, environment updated.

Agent 91 takes action: 1
Agent 91 has stepped, environment updated.

Agent 77 takes action: 0
Agent 77 has stepped, environment updated.

Agent 73 takes action: 0
Agent 73 has stepped, environment updated.

Agent 10 takes action: 1
Agent 10 has stepped, environment updated.

Agent 22 takes action: 0
Agent 22 has stepped, environment updated.

Agent 52 takes action: None
Agent 52 has stepped, environment updated.

Agent 5 takes action: None
Agent 5 has stepped, environment updated.

Agent 1 takes action: None
Agent 1 has stepped, environment updated.

Agent 15 takes action: None
Agent 15 has stepped, environment updated.

Agent 81 t

In [18]:
env.stop_simulation()