# About

This jupyter notebook contains the different ways in which the wolf library can be used. 

The first section covers running DQN experiments using wolf with your own config, while the second section covers how to use existing config.

Third section covers using wolf for real-world networks and the final section covers QMIX training over environments created by wolf.

# 1 Running wolf

Different ways of using wolf.

## 1.1 Creating a Config and Running Code

Imports

In [1]:
import os
import pprint

In [2]:
import ray
from ray.tune import tune
from ray import rllib
import yaml
import wolf
from wolf.utils.configuration.configuration import Configuration

Instructions for updating:
non-resource variables are not supported in the long term


Below cell creates a config for running on simple single intersection environment.

In [3]:
config = yaml.safe_load("""
"ray":
  "init":
    "local_mode": false
    "log_to_driver": true
    "logging_level": "WARNING"
  "run_experiments":
    "experiments":
      "global_agent":
        "run": "APEX"
        "checkpoint_freq": 1
        "checkpoint_at_end": true
        "stop":
          "training_iteration": 100
        "config":

          ####################
          ####################
          # OTHERS
          ####################
          ####################

          # "framework": "tf"
          "log_level": "WARNING"

          ####################
          ####################
          # RL ALGO PARAMS
          ####################
          ####################

          "num_gpus": 1
          "num_workers": 2
          "target_network_update_freq": 100
          "learning_starts": 0
          "timesteps_per_iteration": 1000

          ####################
          ####################
          # EVALUATION
          ####################
          ####################

          # "evaluation_interval": 10 # Evaluate with every `evaluation_interval` training iterations.
          # "evaluation_num_episodes": 10
          # "in_evaluation": False
          # "evaluation_config":
          #   "explore": False
          # "evaluation_num_workers": 0
          # "custom_eval_function": null
          # "use_exec_api": False

          ####################
          ####################
          # EXPLORATION
          ####################
          ####################

          # "exploration_config":
          #   "type": "EpsilonGreedy"
          #   "epsilon_schedule":
          #     "type": "ExponentialSchedule" # check vizu_exp_schedule.py to find the right params for this exploration
          #     "schedule_timesteps": 10000
          #     "initial_p": 1.0
          #     "decay_rate": 0.01

          ####################
          ####################
          # MODEL
          ####################
          ####################

          "model":
            "custom_model": "tdtse"
            "custom_model_config":
              "filters_size": 32
              "dense_layer_size_by_node": 64 # size by node
              "use_progression": false

          ####################
          ####################
          # ENV
          ####################
          ####################

          "gamma": 0.99 # 0.995
          "horizon": null # if null, horizon will be choosen by env
          "env": "traffic_env_test0"
          "env_config":
            "render": False
            "simulator": "traci"
            "sim_params":
              "restart_instance": True
              "sim_step": 1
              "print_warnings": False
              "render": False
            "env_state_params": null
            "group_agents_params": null
            "multi_agent_config_params":
              "name": "shared_policy"
              "params": {}
            "agents_params":
              "name": "global_agent"
              "params":
                "default_policy": null
                "global_reward": false
                "action_params":
                  "name": "ExtendChangePhaseConnector"
                  "params": {}
                "obs_params":
                  "name": "TDTSEConnector"
                  "params":
                    "obs_params":
                      "num_history": 60 # same results with  tbh :O 30
                      "detector_position": [5, 100]
                    "phase_channel": true
                "reward_params":
                  "name": "QueueRewardConnector"
                  "params":
                    "stop_speed": 2
"general":
  "id": "main"
  "seed": null
  "repeat": 1
  "is_tensorboardX": false
  "sumo_home": "/home/ncarrara/sumo_binaries/bin"
  "workspace": "test0/results"
  "logging":
    "version": 1
    "disable_existing_loggers": false
    "formatters":
      "standard":
        "format": "[%(name)s] %(levelname)s - %(message)s"
    "handlers":
      "default":
        "level": "WARNING"
        "formatter": "standard"
        "class": "logging.StreamHandler"
    "loggers":
      "":
        "handlers": ["default"]
        "level": "WARNING"
        "propagate": false
      "some.logger.you.want.to.enable.in.the.code":
        "handlers": ["default"]
        "level": "ERROR"
        "propagate": false
""")

Below cell uses above config dict to create a wolf configuration object as well load custom utilities needed to run training over this traffic network.

In [4]:
C = Configuration() \
    .load(config) \
    .load_custom_trainable().load_sumo() \
    .load_custom_models()

Initialize ray.

In [5]:
# initialize ray
ray.init(**C.ray().init())

{'node_ip_address': '192.168.42.11',
 'raylet_ip_address': '192.168.42.11',
 'redis_address': '192.168.42.11:42388',
 'object_store_address': '/tmp/ray/session_2020-08-05_10-42-13_979455_25687/sockets/plasma_store',
 'raylet_socket_name': '/tmp/ray/session_2020-08-05_10-42-13_979455_25687/sockets/raylet',
 'webui_url': 'localhost:8265',
 'session_dir': '/tmp/ray/session_2020-08-05_10-42-13_979455_25687'}

Following two cells, create a params object that is required by tune to start the DQN training over the provided traffic network environment.

In [8]:
def on_episode_step(info):
    episode = info["episode"]
    pass


def setup_run_exp_params(C):
    from ray.rllib.env.group_agents_wrapper import _GroupAgentsWrapper

    from wolf.scripts.misc.vizu_exp_schedule import show_schedule
    def resolve_multi_agent_config(spec):
        if "exploration_config" in spec["config"]:
            if "epsilon_schedule" in spec["config"]["exploration_config"]:
                show_schedule(spec["config"]["exploration_config"]["epsilon_schedule"])
        from wolf.utils.configuration.registry import R
        config = spec["config"]
        create_env = R.env_factory(config["env"])
        env = create_env(config["env_config"])
        if isinstance(env, _GroupAgentsWrapper):
            return env.env.multi_agent_config
        else:
            return env.multi_agent_config

    # setup config
    run_ex_params = C.ray()["run_experiments"]
    for name, exp in run_ex_params["experiments"].items():
        config = exp["config"]
        config["multiagent"] = ray.tune.sample.sample_from(lambda spec: resolve_multi_agent_config(spec))
        config["callbacks"] = {
            "on_episode_step": on_episode_step,
        }
        exp["local_dir"] = C["general"]["workspace"]

        def trial_name_string(trial):
            name = "{}_{}".format(trial.trainable_name, trial.trial_id)
            return name

        exp["trial_name_creator"] = trial_name_string

    pprint.pprint(run_ex_params, depth=4)
    return run_ex_params

In [9]:
params = setup_run_exp_params(C)

{'experiments': {'global_agent': {'checkpoint_at_end': True,
                                  'checkpoint_freq': 1,
                                  'config': {'callbacks': {...},
                                             'env': 'traffic_env_test0',
                                             'env_config': {...},
                                             'gamma': 0.99,
                                             'horizon': None,
                                             'learning_starts': 0,
                                             'model': {...},
                                             'multiagent': tune.sample_from(<function setup_run_exp_params.<locals>.<lambda> at 0x7fd3884b4ea0>),
                                             'num_gpus': 1,
                                             'num_workers': 2,
                                             'target_network_update_freq': 100,
                                             'timesteps_per_iteration': 1000},
 

Start training using tune.

In [34]:
trials = tune.run_experiments(**params)

Shutdown ray.

In [7]:
ray.shutdown()

## 1.2 Running using Existing Configuration Files

Can be executed from the command line or terminal using the below commands.

```cd sow45_code
python wolf/scripts/main.py <path_to_config>```

For example:

```python wolf/scripts/main.py wolf/tests/traffic_env/test0/global_agent.yaml```

In [32]:
os.getcwd()

'/home/parth/repos/traffic-management/sow45_code/demos'

In [33]:
# change directory to sow45_code
os.chdir('../')
os.getcwd()

'/home/parth/repos/traffic-management/sow45_code'

In [None]:
!python wolf/scripts/main.py wolf/tests/traffic_env/test0/global_agent.yaml

## 1.3 Running on Real-World Network

There are two types of networks. 
- Artifical Generated Networks:
Created Programmatically.

- Real-world Networks:
Created by importing sumo.cfg and sumo configuration files (.xml) files.

In this section we would see what changes are to made to the configuration file to run a real-world network training on wolf.

**1. Change `env` property:**

`"env": "<registered_env_name>"` such as `"env": "traffic_env_test0"` changes to

`"env": "real_world_network"`

**2. Change `name` property inside `multi_agent_config_params` property:**

Change it from:

`"multi_agent_config_params":
  "name": "shared_policy"` to

`"multi_agent_config_params":
  "name": "independent_policy"`
  
Shared Policy is used when intersections are exactly the same, so parameter sharing can happen between these agent networks.

But for real-world networks, this is almost never the case, as all intersections have some form of dissimilarity.

**3. Add the following properties to `env_config` property:**

`"net_params":
  "template":
    "net": "<path_to_network_file>.net.xml"
    "rou": "<path_to_routes_file>.rou.xml"
    "vtype": "<path_to_vehicle_types_or_routes_file>.rou.xml"
    "add": "<path_to_detectors_file>.xml"
  "controlled_tls" : <list_of_intersection_ids_that_are_to_be_controlled>`

For example for the case of china5ups, following is added:

`"net_params":
  "template":
    "net": "wolf/sumo_net/wujiang/china_net_5p_ups_LD_noUturn.net.xml"
    "rou": "wolf/sumo_net/wujiang/china_flows_1hr45min_noUturn_ups.rou.xml"
    "vtype": "wolf/sumo_net/wujiang/china_flows_1hr45min_noUturn_ups.rou.xml"
    "add": "wolf/sumo_net/wujiang/china_net_5p_ups_loop_detectors.xml"
  "controlled_tls" : ['main_center']`

**4. Change detector configuration property in `agents_params.params.obs_params.params.obs_params`:**
    
For artificial network detector position is specified using following:
    
`"detector_position": [5, 295]`

where these numbers represent the distance of loop detectors from the stop line.

For real-world networks detector position is specified using:

`"num_detector_group": 2`

where the number denotes the number of detector groups. All other detector information is passed using the `add` file specified in the previous point. 

For more visit the following configs at location `wolf/tests/traffic_env/`:

`test4_1`, `test4_2`: These are configs that work with china5ups network.

Each of these folders have different configuration of different agents.

## 1.4 Running using QMIX

QMIX running commands will be different because it has not been completely integrated within the `wolf` framework. This is also the reason why it exists in a separate folder currently.

In [3]:
os.getcwd()

'/home/parth/repos/traffic-management/sow45_code/demos'

In [6]:
os.chdir('../')
os.getcwd()

'/home/parth/repos/traffic-management/sow45_code'

In [None]:
!python qmix/src/main.py --config=qmix_traf --env-config=traf

Here the configuration has been broken into configuration for the algorithm (`--config`) and configuration for the environment (`--env-config`).

Within `--env-config` we specify `traf` which makes the program read the `traf.yaml` inside the `qmix/src/config/envs/` directory.

From this file one can change the:
- `test_env` to change between different wolf registered environments.
- `render` to render the simulation or not.

And other environment specific configuration properties.