# Train Traffic Lights Agents

Utilise les fonctions de @Binetruy

- crée un network à partir d'un fichier .osm et des trajectoires de véhiculess
- ajoute un flux de voiture sur les routes
- personnalise un Environnement pour le RL
- integre l'environnement pour RLlib et execute la simulation


In [1]:
from flow.core.params import VehicleParams
from flow.core.params import NetParams, SumoCarFollowingParams, SumoLaneChangeParams
from flow.core.params import InitialConfig
from flow.core.params import EnvParams
from flow.core.params import SumoParams
from flow.controllers import RLController, IDMController
from flow.networks.IssyOSMNetwork import IssyOSMNetwork
from flow.core.params import InFlows
from collections import OrderedDict
import json
import ray
from ray.rllib.agents.registry import get_agent_class
from ray.tune import run_experiments
from ray.tune.registry import register_env
from flow.utils.registry import make_create_env
from flow.utils.rllib import FlowParamsEncoder
from flow.core.params import VehicleParams, SumoCarFollowingParams

  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])


## Importation du network d'Issy

On vérifie si IssyOSMNetwork est bien importé.

In [2]:
from flow.networks.IssyOSMNetwork import ADDITIONAL_NET_PARAMS, EDGES_DISTRIBUTION

print(ADDITIONAL_NET_PARAMS)
print(EDGES_DISTRIBUTION)

{'speed_limit': 50}
['-100822066', '4794817', '4783299#0', '155558218']


## Ajoute les flux de voiture

`IDMController` : The Intelligent Driver Model is a car-following model specifying vehicle dynamics by a differential equation for acceleration $\dot{v}$.

`RLController` : a trainable autuonomous vehicle whose actions are dictated by an RL agent. 

In [3]:
vehicles = VehicleParams()
vehicles.add("human",
             acceleration_controller=(IDMController, {}),
             car_following_params=SumoCarFollowingParams(
                 speed_mode="right_of_way"),
             lane_change_params=SumoLaneChangeParams(
                 lane_change_mode=2722)
             )
vehicles.add("rl",
             acceleration_controller=(IDMController, {}),
             car_following_params=SumoCarFollowingParams(
                 speed_mode="right_of_way"),
             lane_change_params=SumoLaneChangeParams(
                 lane_change_mode=2722),
             color="cyan")

- `vehs_per_hour`: nombre de vehicule par heure, uniformément espacés. Par exemple, comme il y a $60 \times 60 = 3600$ secondes dans une heure, le parametre $\frac{3600}{5}=720$ va faire rentrer des vehicules dans le network toutes les $5$ secondes.

- `probability`: c'est la probabilité qu'un véhicule entre dans le network toutes les secondes. Par exemple, si on la fixe à $0.2$, alors chaque seconde de la simulation un véhicule aura $\frac{1}{5}$ chance d'entrer dans le network

- `period`: C'est le temps en secondes entre 2 véhicules qui sont insérés. Par exemple, le fixer à $5$ ferait rentrer des véhicules dans le network toutes les $5$ secondes (ce qui équivaut à mettre `vehs_per_hour` à $720$).

<font color='red'>
$\rightarrow$ Exactement 1 seul de ces 3 paramètres doit être configurer !
</font>

In [4]:
inflow = InFlows()

inflow.add(veh_type      = "human",
           edge          = "4794817",
           probability   = 0.01, 
           depart_speed  = 7,
           depart_lane   = 0)

inflow.add(veh_type      = "human",
           edge          = "4783299#0",
           probability   = 0.2,
           depart_speed  = 7,
           depart_lane   = 0)

inflow.add(veh_type       = "rl",
           edge           = "4783299#0",
           probability    = 0.05,
           depart_speed   = 7,
           depart_lane    = 0,
           color          = "blue")

inflow.add(veh_type       = "human",
           edge           = "-100822066",
           probability    = 0.25,
           depart_speed   = 7,
           depart_lane    = 0)

inflow.add(veh_type       = "rl",
           edge           = "-100822066",
           probability    = 0.05,
           depart_speed   = 7,
           depart_lane    = 0,
           color          = "blue")

inflow.add(veh_type       = "human",
           edge          = "155558218",
           probability   = 0.01,
           depart_speed  = 7,
           depart_lane   = 0)

## Personnalise un Environnement pour le RL

plus de méthodes sur : http://berkeleyflow.readthedocs.io/en/latest/

In [5]:
from flow.envs.IssyEnv import IssyEnv1

## Lance une simulation avec Training RLlib

Pour qu'un environnement puisse être entrainé, l'environnement doit être accessible via l'importation à partir de flow.envs. 


<font color='red'>
Copier l'environnement créé dans un fichier .py et on importe l'environnement dans `flow.envs.__init__.py`.
Mettre le chemin absolu du fichier .osm .
</font> 

In [6]:
# possibles actions
action_spec = OrderedDict({ "30677963": [ "GGGGrrrGGGG", "rrrrGGGrrrr"],
                            "30763263": ["GGGGGGGGGG",  "rrrrrrrrrr"],
                            "30677810": [ "GGrr", "rrGG"]})

In [7]:
horizon  = 2000
SIM_STEP = 0.2
n_veh    = 12
rollouts = 10
n_cpus   = 3
discount_rate = 0.999

In [8]:
# SUMO PARAM
sumo_params = SumoParams(sim_step=SIM_STEP, render=True, restart_instance=True, overtake_right=True)

# ENVIRONMENT PARAM
ADDITIONAL_ENV_PARAMS = {"beta": n_veh, "action_spec": action_spec, "algorithm": "DQN", "tl_constraint_min": 100,  "tl_constraint_max": 600, "sim_step": SIM_STEP}
env_params = EnvParams(additional_params=ADDITIONAL_ENV_PARAMS, horizon=horizon, warmup_steps=1)

# NETWORK PARAM
path_file  = '/home/lino/Documents/DQN_CIL4SYS/DQN_CIL4SYS/notebooks/issy.osm'
net_params = NetParams(inflows=inflow, osm_path=path_file) 

# NETWORK
network = IssyOSMNetwork

# INITIAL CONFIG
initial_config = InitialConfig(edges_distribution=EDGES_DISTRIBUTION)


flow_params = dict( exp_tag   = "ISSY_trial01", 
                    env_name  = IssyEnv1,  
                    network   = IssyOSMNetwork,
                    simulator = 'traci',
                    sim       = sumo_params,
                    env       = env_params,
                    net       = net_params,
                    veh       = vehicles,
                    initial   = initial_config)

# Setup RLlib library

Configures RLlib DQN algorithm to be used to train the RL model.

In [9]:
def setup_DQN_exp():

    alg_run   = 'DQN'
    agent_cls = get_agent_class(alg_run)
    config    = agent_cls._default_config.copy()
    config['num_workers']      = n_cpus
    config['train_batch_size'] = horizon * rollouts
    config['gamma']            = discount_rate
    config['clip_actions']     = False  # FIXME(ev) temporary ray bug
    config['horizon']          = horizon
    config["hiddens"]          = [256]
    config['model'].update({'fcnet_hiddens': [32, 32]})

    # save the flow params for replay
    flow_json = json.dumps(flow_params, cls=FlowParamsEncoder, sort_keys=True, indent=4)
    config['env_config']['flow_params'] = flow_json
    config['env_config']['run'] = alg_run

    create_env, gym_name = make_create_env(params=flow_params, version=0)

    # Register as rllib env
    register_env(gym_name, create_env)
    
    return alg_run, gym_name, config

Configures RLlib PPO algorithm to be used to train the RL model.

See: https://ray.readthedocs.io/en/latest/rllib-algorithms.html#proximal-policy-optimization-ppo

In [10]:
def setup_PPO_exp():

    alg_run   = 'PPO'
    agent_cls = get_agent_class(alg_run)
    config    = agent_cls._default_config.copy()
    config['num_workers']      = n_cpus
    config['train_batch_size'] = horizon * rollouts
    config['gamma']            = discount_rate
    config['use_gae']          = True
    config['lambda']           = 0.97
    config['kl_target']        = 0.02
    config['num_sgd_iter']     = 10
    config['clip_actions']     = False  # FIXME(ev) temporary ray bug
    config['horizon']          = horizon
    config['model'].update({'fcnet_hiddens': [32, 32]})

    # save the flow params for replay
    flow_json = json.dumps(flow_params,cls=FlowParamsEncoder,sort_keys=True,indent=4)
    config['env_config']['flow_params'] = flow_json
    config['env_config']['run'] = alg_run

    create_env, gym_name = make_create_env(params=flow_params,version=0)

    # Register as rllib env
    register_env(gym_name, create_env)
    
    return alg_run, gym_name, config

# Run Experiment

In [11]:
alg_run, gym_name, config = setup_DQN_exp()

ray.init(num_cpus=n_cpus + 1)

2020-05-09 18:37:50,976	INFO node.py:498 -- Process STDOUT and STDERR is being redirected to /tmp/ray/session_2020-05-09_18-37-50_975982_4604/logs.
2020-05-09 18:37:51,086	INFO services.py:409 -- Waiting for redis server at 127.0.0.1:27839 to respond...
2020-05-09 18:37:51,206	INFO services.py:409 -- Waiting for redis server at 127.0.0.1:56425 to respond...
2020-05-09 18:37:51,211	INFO services.py:809 -- Starting Redis shard with 2.93 GB max memory.
2020-05-09 18:37:51,235	INFO node.py:512 -- Process STDOUT and STDERR is being redirected to /tmp/ray/session_2020-05-09_18-37-50_975982_4604/logs.
2020-05-09 18:37:51,239	INFO services.py:1475 -- Starting the Plasma object store with 4.4 GB memory using /dev/shm.


{'node_ip_address': '192.168.0.48',
 'redis_address': '192.168.0.48:27839',
 'object_store_address': '/tmp/ray/session_2020-05-09_18-37-50_975982_4604/sockets/plasma_store',
 'raylet_socket_name': '/tmp/ray/session_2020-05-09_18-37-50_975982_4604/sockets/raylet',
 'webui_url': None,
 'session_dir': '/tmp/ray/session_2020-05-09_18-37-50_975982_4604'}

In [12]:
exp_tag = {"run": alg_run,
           "env": gym_name,
           "config": {**config},
           "checkpoint_freq": 20,
           "checkpoint_at_end": True,
           "max_failures": 999,
           "stop": {"training_iteration": 20}}


trials = run_experiments({flow_params["exp_tag"]: exp_tag})

2020-05-09 18:37:59,181	INFO trial_runner.py:176 -- Starting a new experiment.
2020-05-09 18:37:59,198	ERROR log_sync.py:34 -- Log sync requires cluster to be setup with `ray up`.


== Status ==
Using FIFO scheduling algorithm.
Resources requested: 0/4 CPUs, 0/0 GPUs
Memory usage on this node: 3.6/14.7 GB

== Status ==
Using FIFO scheduling algorithm.
Resources requested: 4/4 CPUs, 0/0 GPUs
Memory usage on this node: 3.6/14.7 GB
Result logdir: /home/lino/ray_results/ISSY_trial01
Number of trials: 1 ({'RUNNING': 1})
RUNNING trials:
 - DQN_IssyEnv1-v0_0:	RUNNING

[2m[36m(pid=5038)[0m   _np_qint8 = np.dtype([("qint8", np.int8, 1)])
[2m[36m(pid=5038)[0m   _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
[2m[36m(pid=5038)[0m   _np_qint16 = np.dtype([("qint16", np.int16, 1)])
[2m[36m(pid=5038)[0m   _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
[2m[36m(pid=5038)[0m   _np_qint32 = np.dtype([("qint32", np.int32, 1)])
[2m[36m(pid=5038)[0m   np_resource = np.dtype([("resource", np.ubyte, 1)])
[2m[36m(pid=5038)[0m Success.
[2m[36m(pid=5038)[0m 2020-05-09 18:38:02,598	INFO rollout_worker.py:319 -- Creating policy evaluation worker 0 on CPU (please

[2m[36m(pid=5041)[0m Success.
[2m[36m(pid=5040)[0m Success.
[2m[36m(pid=5039)[0m Success.
[2m[36m(pid=5041)[0m 2020-05-09 18:38:07,791	INFO rollout_worker.py:319 -- Creating policy evaluation worker 1 on CPU (please ignore any CUDA init errors)
[2m[36m(pid=5041)[0m 2020-05-09 18:38:07.804514: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX
[2m[36m(pid=5040)[0m 2020-05-09 18:38:07,791	INFO rollout_worker.py:319 -- Creating policy evaluation worker 3 on CPU (please ignore any CUDA init errors)
[2m[36m(pid=5040)[0m 2020-05-09 18:38:07.804001: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX
[2m[36m(pid=5039)[0m 2020-05-09 18:38:07,815	INFO rollout_worker.py:319 -- Creating policy evaluation worker 2 on CPU (please ignore any CUDA init errors)
[2m[36m(pid

[2m[36m(pid=5041)[0m Success.
[2m[36m(pid=5039)[0m Success.
[2m[36m(pid=5040)[0m Success.
[2m[36m(pid=5041)[0m Success.
[2m[36m(pid=5041)[0m 2020-05-09 18:38:17,027	INFO sampler.py:304 -- Raw obs from env: { 0: { 'agent0': np.ndarray((112,), dtype=float64, min=0.0, max=1.0, mean=0.205)}}
[2m[36m(pid=5041)[0m 2020-05-09 18:38:17,027	INFO sampler.py:305 -- Info return from env: {0: {'agent0': None}}


2020-05-09 18:38:22,073	ERROR worker.py:1654 -- Possible unhandled error from worker: [36mray_RolloutWorker:sample()[39m (pid=5041, host=lino-iMac)
  File "/home/lino/anaconda3/envs/flow/lib/python3.6/site-packages/ray/rllib/evaluation/rollout_worker.py", line 453, in sample
    batches = [self.input_reader.next()]
  File "/home/lino/anaconda3/envs/flow/lib/python3.6/site-packages/ray/rllib/evaluation/sampler.py", line 56, in next
    batches = [self.get_data()]
  File "/home/lino/anaconda3/envs/flow/lib/python3.6/site-packages/ray/rllib/evaluation/sampler.py", line 97, in get_data
    item = next(self.rollout_provider)
  File "/home/lino/anaconda3/envs/flow/lib/python3.6/site-packages/ray/rllib/evaluation/sampler.py", line 313, in _env_runner
    soft_horizon)
  File "/home/lino/anaconda3/envs/flow/lib/python3.6/site-packages/ray/rllib/evaluation/sampler.py", line 401, in _process_observations
    policy_id).transform(raw_obs)
  File "/home/lino/anaconda3/envs/flow/lib/python3.6/sit

[2m[36m(pid=5040)[0m Success.


2020-05-09 18:38:39,080	ERROR worker.py:1654 -- Possible unhandled error from worker: [36mray_RolloutWorker:sample()[39m (pid=5040, host=lino-iMac)
  File "/home/lino/anaconda3/envs/flow/lib/python3.6/site-packages/ray/rllib/evaluation/rollout_worker.py", line 453, in sample
    batches = [self.input_reader.next()]
  File "/home/lino/anaconda3/envs/flow/lib/python3.6/site-packages/ray/rllib/evaluation/sampler.py", line 56, in next
    batches = [self.get_data()]
  File "/home/lino/anaconda3/envs/flow/lib/python3.6/site-packages/ray/rllib/evaluation/sampler.py", line 97, in get_data
    item = next(self.rollout_provider)
  File "/home/lino/anaconda3/envs/flow/lib/python3.6/site-packages/ray/rllib/evaluation/sampler.py", line 313, in _env_runner
    soft_horizon)
  File "/home/lino/anaconda3/envs/flow/lib/python3.6/site-packages/ray/rllib/evaluation/sampler.py", line 401, in _process_observations
    policy_id).transform(raw_obs)
  File "/home/lino/anaconda3/envs/flow/lib/python3.6/sit

[2m[36m(pid=5039)[0m Success.


2020-05-09 18:38:47,634	ERROR trial_runner.py:550 -- Error processing event.
Traceback (most recent call last):
  File "/home/lino/anaconda3/envs/flow/lib/python3.6/site-packages/ray/tune/trial_runner.py", line 498, in _process_trial
    result = self.trial_executor.fetch_result(trial)
  File "/home/lino/anaconda3/envs/flow/lib/python3.6/site-packages/ray/tune/ray_trial_executor.py", line 342, in fetch_result
    result = ray.get(trial_future[0])
  File "/home/lino/anaconda3/envs/flow/lib/python3.6/site-packages/ray/worker.py", line 2247, in get
    raise value
ray.exceptions.RayTaskError: [36mray_DQN:train()[39m (pid=5038, host=lino-iMac)
  File "/home/lino/anaconda3/envs/flow/lib/python3.6/site-packages/ray/rllib/agents/trainer.py", line 369, in train
    raise e
  File "/home/lino/anaconda3/envs/flow/lib/python3.6/site-packages/ray/rllib/agents/trainer.py", line 358, in train
    result = Trainable.train(self)
  File "/home/lino/anaconda3/envs/flow/lib/python3.6/site-packages/ray/

[2m[36m(pid=5041)[0m Error in atexit._run_exitfuncs:
[2m[36m(pid=5041)[0m Traceback (most recent call last):
[2m[36m(pid=5041)[0m   File "/home/lino/Documents/flow/flow/envs/base.py", line 688, in terminate
[2m[36m(pid=5041)[0m     self.k.close()
[2m[36m(pid=5041)[0m   File "/home/lino/Documents/flow/flow/core/kernel/kernel.py", line 109, in close
[2m[36m(pid=5041)[0m     self.simulation.close()
[2m[36m(pid=5041)[0m   File "/home/lino/Documents/flow/flow/core/kernel/simulation/traci.py", line 64, in close
[2m[36m(pid=5041)[0m     self.kernel_api.close()
[2m[36m(pid=5041)[0m   File "/home/lino/anaconda3/envs/flow/lib/python3.6/site-packages/traci/connection.py", line 355, in close
[2m[36m(pid=5040)[0m Error in atexit._run_exitfuncs:
[2m[36m(pid=5040)[0m Traceback (most recent call last):
[2m[36m(pid=5040)[0m   File "/home/lino/Documents/flow/flow/envs/base.py", line 688, in terminate
[2m[36m(pid=5040)[0m     self.k.close()
[2m[36m(pid=5040)[0m  

[2m[36m(pid=5319)[0m   _np_qint8 = np.dtype([("qint8", np.int8, 1)])
[2m[36m(pid=5319)[0m   _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
[2m[36m(pid=5319)[0m   _np_qint16 = np.dtype([("qint16", np.int16, 1)])
[2m[36m(pid=5319)[0m   _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
[2m[36m(pid=5319)[0m   _np_qint32 = np.dtype([("qint32", np.int32, 1)])
[2m[36m(pid=5319)[0m   np_resource = np.dtype([("resource", np.ubyte, 1)])
[2m[36m(pid=5323)[0m   _np_qint8 = np.dtype([("qint8", np.int8, 1)])
[2m[36m(pid=5323)[0m   _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
[2m[36m(pid=5323)[0m   _np_qint16 = np.dtype([("qint16", np.int16, 1)])
[2m[36m(pid=5323)[0m   _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
[2m[36m(pid=5323)[0m   _np_qint32 = np.dtype([("qint32", np.int32, 1)])
[2m[36m(pid=5323)[0m   np_resource = np.dtype([("resource", np.ubyte, 1)])
[2m[36m(pid=5320)[0m   _np_qint8 = np.dtype([("qint8", np.int8, 1)])
[2m[36m(pid=5320)[0

2020-05-09 18:38:56,509	ERROR worker.py:1716 -- listen_error_messages_raylet: Error 111 connecting to 192.168.0.48:27839. Connection refused.
2020-05-09 18:38:56,512	ERROR worker.py:1616 -- print_logs: Error 111 connecting to 192.168.0.48:27839. Connection refused.
2020-05-09 18:38:56,512	ERROR import_thread.py:89 -- ImportThread: Error 111 connecting to 192.168.0.48:27839. Connection refused.


KeyboardInterrupt: 