# RouteRL Quickstart

We simulate a simple network topology where humans and later AVs make routing decisions to maximize their rewards (i.e., minimize travel times) over a sequence of days.

* For the first 100 days, we model a human-driven system, where drivers update their routing policies using behavioral models to optimize rewards.
* Each day, we simulate the impact of joint actions using the [`SUMO`](https://eclipse.dev/sumo/) traffic simulator, which returns the reward for each agent.
* After 100 days, we introduce 10 `Autononmous Vehicles` as `Petting Zoo` agents, allowing them to use any `MARL` algorithm to maximise rewards.
* Finally, we analyse basic results from the simulation.
  




<p align="center">
  <img src="../../docs/img/two_route_net_1.png" alt="Two-route network" />
  <img src="../../docs/img/two_route_net_1_2.png" alt="Two-route network" />
</p>  

#### Import libraries

In [1]:
import sys
import os
sys.path.append(os.path.abspath(os.path.join(os.getcwd(), '../../')))

from routerl import TrafficEnvironment

#### Define hyperparameters

> Further parameters customization can take place by modifying the entries of the `routerl/environment/params.json`. Users can create a dictionary with their preferred adjustments and pass it as an argument to the `TrafficEnvironment` class.

In [2]:
human_learning_episodes = 100


env_params = {
    "agent_parameters" : {
        "num_agents" : 100,
        "new_machines_after_mutation": 10,
        "human_parameters" : {
            "model" : "gawron"
        },
        "machine_parameters" :
        {
            "behavior" : "selfish",
        }
    },
    "simulator_parameters" : {
        "network_name" : "two_route_yield"
    },  
    "plotter_parameters" : {
        "phases" : [0, human_learning_episodes],
        "smooth_by" : 50,
    },
    "path_generation_parameters":
    {
        "number_of_paths" : 3,
        "beta" : -5,
    }
}

#### Environment initialization

In our setup, road networks initially consist of human agents, with AVs introduced later.

- The `TrafficEnvironment` environment is firstly initialized.
- The traffic network is instantiated and the paths between designated origin and destination points are determined.
- The drivers/agents objects are created.

In [3]:
env = TrafficEnvironment(seed=42, **env_params)

[CONFIRMED] Environment variable exists: SUMO_HOME
[SUCCESS] Added module directory: C:\Program Files (x86)\Eclipse\Sumo\tools


<p >
  <img src="plots_saved/0_0.png" width="600" />
</p>  

In [4]:
print("Number of total agents is: ", len(env.all_agents), "\n")
print("Number of human agents is: ", len(env.human_agents), "\n")
print("Number of machine agents (autonomous vehicles) is: ", len(env.machine_agents), "\n")

Number of total agents is:  100 

Number of human agents is:  100 

Number of machine agents (autonomous vehicles) is:  0 



> Reset the environment and the connection with SUMO

In [5]:
env.start()

#### Human learning

In [6]:
for episode in range(human_learning_episodes):
    env.step()

<p align="center">
  <img src="plots_saved/human_learning.png"/>
</p> 

#### Mutation

> Mutation: a portion of human agents are converted into machine agents (autonomous vehicles). 

In [7]:
env.mutation()

In [8]:
print("Number of total agents is: ", len(env.all_agents), "\n")
print("Number of human agents is: ", len(env.human_agents), "\n")
print("Number of machine agents (autonomous vehicles) is: ", len(env.machine_agents), "\n")

Number of total agents is:  100 

Number of human agents is:  90 

Number of machine agents (autonomous vehicles) is:  10 



In [9]:
env.machine_agents

[Machine 1,
 Machine 15,
 Machine 10,
 Machine 91,
 Machine 22,
 Machine 73,
 Machine 5,
 Machine 52,
 Machine 81,
 Machine 77]

> Human and AV agents interact with the environment over multiple episodes, with AVs following a random policy as defined in the PettingZoo environment [loop](https://pettingzoo.farama.org/content/basic_usage/).

#### PettingZoo stepping loop

In [10]:
episodes = 100

for episode in range(episodes): # returns an iterator that yields the current agent of the environment
    print(f"\nStarting episode {episode + 1}")
    env.reset() # reset the environment 
    
    for agent in env.agent_iter():
        observation, reward, termination, truncation, info = env.last() # returns observation, reward etc for the agent able to act

        if termination or truncation:
            action = None
        else:
            action = env.action_space(agent).sample() # policy action or random sampling
        #print(f"Agent {agent} takes action: {action}")
        
        env.step(action)


Starting episode 1

Starting episode 2

Starting episode 3

Starting episode 4

Starting episode 5

Starting episode 6

Starting episode 7

Starting episode 8

Starting episode 9

Starting episode 10

Starting episode 11

Starting episode 12

Starting episode 13

Starting episode 14

Starting episode 15

Starting episode 16

Starting episode 17

Starting episode 18

Starting episode 19

Starting episode 20

Starting episode 21

Starting episode 22

Starting episode 23

Starting episode 24

Starting episode 25

Starting episode 26

Starting episode 27

Starting episode 28

Starting episode 29

Starting episode 30

Starting episode 31

Starting episode 32

Starting episode 33

Starting episode 34

Starting episode 35

Starting episode 36

Starting episode 37

Starting episode 38

Starting episode 39

Starting episode 40

Starting episode 41

Starting episode 42

Starting episode 43

Starting episode 44

Starting episode 45

Starting episode 46

Starting episode 47

Starting episode 48



#### Plot results 

>This will be shown in the `\plots` folder.

In [11]:
env.plot_results()

| |  |
|---------|---------|
| **Action shifts of human and AV agents** ![](plots_saved/actions_shifts.png) | **Action shifts of all vehicles in the network** ![](plots_saved/actions.png) |
| ![](plots_saved/rewards.png) | ![](plots_saved/travel_times.png) |


<p align="center">
  <img src="plots_saved/tt_dist.png" width="700" />
</p>


> Interrupt the connection with `SUMO`.

In [12]:
env.stop_simulation()