# RouteRL Quickstart

Here you will simulate a simple network topology, where first humans and then CAVs make routing decisions to maximise their rewards (minimize travel times) over sequence of days.

* We first simulate human system for 100 days, where drivers update their routing policies with behavioural models to maximize rewards.
* Each day we simulate the impact of joint actions with `SUMO` traffic simulators, which returns the reward of each agent.
* After 100 days we introduce 10 `Autononmous Vehicles` as `Petting Zoo` agents who can use any `RL` algorithm to maximise rewards.
* Finally, we see some basic results
  




#### Imports

In [None]:
import sys
import os
sys.path.append(os.path.abspath(os.path.join(os.getcwd(), '../../')))

from routerl import TrafficEnvironment

#### Parameterize

> Further adjustments can be made by modifying the parameters in <code style="color:white">routerl/environment/params.json</code> # be clear - what do you do? You import .json and then overwrite, or create a fresh dict? What is the typical routine for route RL?
>

#PS. "color:white" does not work for me - I have white background in github and I ain't see sh** :) - quite non-zero chances others may not see it as well


In [6]:
human_learning_episodes =  100 # RK: Why this sole parameter here? Here you introduce params 


env_params = {
    "agent_parameters" : {
        "num_agents" : 100,
        "new_machines_after_mutation": 10, #name of this param suggests that after you will have 110 agents, not 10 mutating within 100.
        "human_parameters" : {
            "model" : "w_avg" #maybe something more represenatative and closer to equilibrium?
        },
        "machine_parameters" :
        {
            "behavior" : "malicious",
        }
    },
    "simulator_parameters" : {
        "network_name" : "two_route_yield"
    },  
    "plotter_parameters" : {
        "phases" : [0, human_learning_episodes], # comment
        "smooth_by" : 50, #is this parameter important?
    },
    "path_generation_parameters": # to run path external generator januX
    {
        "number_of_paths" : 3,
        "beta" : -5, # comment all of them if possible. 
    }
}

#### Environment initialization

In our setup, road networks initially consist of human agents, with AVs introduced later.

First we initialize Environment with `TrafficEnvironment` which... 

In [None]:
env = TrafficEnvironment(seed=42, **env_params)

#print was not needed here (?) it is below

[CONFIRMED] Environment variable exists: SUMO_HOME
[SUCCESS] Added module directory: /opt/homebrew/opt/sumo/share/sumo/tools
TrafficEnvironment with 100 agents.            
0 machines and 100 humans.            
Machines: []            
Humans: [Human 0, Human 1, Human 2, Human 3, Human 4, Human 5, Human 6, Human 7, Human 8, Human 9, Human 10, Human 11, Human 12, Human 13, Human 14, Human 15, Human 16, Human 17, Human 18, Human 19, Human 20, Human 21, Human 22, Human 23, Human 24, Human 25, Human 26, Human 27, Human 28, Human 29, Human 30, Human 31, Human 32, Human 33, Human 34, Human 35, Human 36, Human 37, Human 38, Human 39, Human 40, Human 41, Human 42, Human 43, Human 44, Human 45, Human 46, Human 47, Human 48, Human 49, Human 50, Human 51, Human 52, Human 53, Human 54, Human 55, Human 56, Human 57, Human 58, Human 59, Human 60, Human 61, Human 62, Human 63, Human 64, Human 65, Human 66, Human 67, Human 68, Human 69, Human 70, Human 71, Human 72, Human 73, Human 74, Human 75, Huma

In [None]:
print("Number of total agents is: ", len(env.all_agents), "\n")
#print("Agents are: ", env.all_agents, "\n")
print("Number of human agents is: ", len(env.human_agents), "\n")
print("Number of machine agents (autonomous vehicles) is: ", len(env.machine_agents), "\n")

Number of total agents is:  100 

Agents are:  [Human 0, Human 1, Human 2, Human 3, Human 4, Human 5, Human 6, Human 7, Human 8, Human 9, Human 10, Human 11, Human 12, Human 13, Human 14, Human 15, Human 16, Human 17, Human 18, Human 19, Human 20, Human 21, Human 22, Human 23, Human 24, Human 25, Human 26, Human 27, Human 28, Human 29, Human 30, Human 31, Human 32, Human 33, Human 34, Human 35, Human 36, Human 37, Human 38, Human 39, Human 40, Human 41, Human 42, Human 43, Human 44, Human 45, Human 46, Human 47, Human 48, Human 49, Human 50, Human 51, Human 52, Human 53, Human 54, Human 55, Human 56, Human 57, Human 58, Human 59, Human 60, Human 61, Human 62, Human 63, Human 64, Human 65, Human 66, Human 67, Human 68, Human 69, Human 70, Human 71, Human 72, Human 73, Human 74, Human 75, Human 76, Human 77, Human 78, Human 79, Human 80, Human 81, Human 82, Human 83, Human 84, Human 85, Human 86, Human 87, Human 88, Human 89, Human 90, Human 91, Human 92, Human 93, Human 94, Human 95, Hu

> Reset the environment and the connection with SUMO
>
RK: Which part of the code it refers to?

RK: I miss some of information about agents. Their id, properties, origin, destination and time, etc. reward - show something about them.

#### Human learning

In [None]:
env.start() # comment this
human_learning_episodes =  100 # here, sufficient
for episode in range(human_learning_episodes):
    env.step() #comment what happens here.
#RK: I miss some plot of the human learning to show what is going on here.

NameError: name 'env' is not defined

#### Mutation

Now a portion of human agents are converted into machine agents (autonomous vehicles). 

In [None]:
env.mutation() # comment

#and show now the new agent to refer to his RL-specific properties (act/learn or sth that changes after mutation)

In [10]:
print("Number of total agents is: ", len(env.all_agents), "\n")
#print("Agents are: ", env.all_agents, "\n") not needed
print("Number of human agents is: ", len(env.human_agents), "\n")
print("Number of machine agents (autonomous vehicles) is: ", len(env.machine_agents), "\n")
print("Machine agents are: ", env.machine_agents, "\n")

NameError: name 'env' is not defined

> Human and AV agents interact with the environment over multiple episodes, with AVs following a random policy as defined in the PettingZoo environment [loop](https://pettingzoo.farama.org/content/basic_usage/).

### RL part - or however you name it.

In [None]:
episodes = 1 # comment

for episode in range(episodes): # why in the loop if this is just 1?
    print(f"\nStarting episode {episode + 1}")
    env.reset() # comment
    
    for agent in env.agent_iter(): # comment
        observation, reward, termination, truncation, info = env.last() # comment

        if termination or truncation: # do we use it? if not - delete this
            action = None
        else:
            # Policy action or random sampling
            action = env.action_space(agent).sample() # comment
            # I do not see why shall we not use actual RL in this basic tutorial? what's the point of showing random policies? I know pettingZoo does that, but still.
        print(f"Agent {agent} takes action: {action}")
        
        env.step(action) # comment
        print(f"Agent {agent} has stepped, environment updated.\n")


In [None]:
env.stop_simulation() # why outside of the above cell?

RK: And now we need to conclude. What happened, what we see, what are the outcomes, what are the results, where can we see them? What can be extended and modified, etc.

I do not see why this is here - RK

<code style="color:white">agent_iter(max_iter=2**63)</code> returns an iterator that yields the current agent of the environment. It terminates when all agents in the environment are done or when max_iter (steps have been executed).

<code style="color:white">last(observe=True)</code> returns observation, reward, done, and info for the agent currently able to act. The returned reward is the cumulative reward that the agent has received since it last acted. If observe is set to False, the observation will not be computed, and None will be returned in its place. Note that a single agent being done does not imply the environment is done.

<code style="color:white">reset()</code> resets the environment and sets it up for use when called the first time. This method must be called before any other method.

<code style="color:white">step(action)</code> takes and executes the action of the agent in the environment, automatically switches control to the next agent.