Skip to content

Commit

Permalink
Add a training example using RLLib (#72)
Browse files Browse the repository at this point in the history
* fix remote env problems

* finish this script

* update this example
  • Loading branch information
PENG Zhenghao committed Sep 13, 2021
1 parent df2c115 commit 65cb8e4
Show file tree
Hide file tree
Showing 6 changed files with 238 additions and 34 deletions.
33 changes: 1 addition & 32 deletions documentation/source/get_start.rst
Original file line number Diff line number Diff line change
Expand Up @@ -73,35 +73,4 @@ The following scripts is a minimal example for instantiating a MetaDrive environ
.. Note:: Please note that each process should only have one single MetaDrive instance due to the limit of the underlying simulation engine. As a workaround, we provide an asynchronous version of MetaDrive through `Ray framework <https://github.com/ray-project/ray>`_, please find the environment in `remove_env.py <https://github.com/decisionforce/metadrive/blob/main/metadrive/envs/remote_env.py>`_.


Out-of-the-box Environments
#############################


.. warning:: This section is under construction!

Besides, we provide several predefined environments for different purposes shown in the following table.
Please feel free to open an issue if you want to request new environments.

+-------------------------+-------------------+----------------+---------------------------------------------------------+
| Gym Environment Name | Random Seed Range | Number of Maps | Comments |
+=========================+===================+================+=========================================================+
| `MetaDrive-test-v0` | [0, 200) | 200 | Test set, not change for all experiments. |
+-------------------------+-------------------+----------------+---------------------------------------------------------+
| `MetaDrive-validation-v0` | [200, 1000) | 800 | Validation set. |
+-------------------------+-------------------+----------------+---------------------------------------------------------+
| `MetaDrive-v0` | [1000, 1100) | 100 | Default training setting, for quick start. |
+-------------------------+-------------------+----------------+---------------------------------------------------------+
| `MetaDrive-10envs-v0` | [1000, 1100) | 10 | Training environment with 10 maps. |
+-------------------------+-------------------+----------------+---------------------------------------------------------+
| `MetaDrive-1000envs-v0` | [1000, 1100) | 1000 | Training environment with 1000 maps. |
+-------------------------+-------------------+----------------+---------------------------------------------------------+
| `MetaDrive-training0-v0` | [3000, 4000) | 1000 | First set of 1000 environments. |
+-------------------------+-------------------+----------------+---------------------------------------------------------+
| `MetaDrive-training1-v0` | [5000, 6000) | 1000 | Second set of 1000 environments. |
+-------------------------+-------------------+----------------+---------------------------------------------------------+
| `MetaDrive-training2-v0` | [7000, 8000) | 1000 | Thirds set of 1000 environments. |
+-------------------------+-------------------+----------------+---------------------------------------------------------+
| ... | | | *More map set can be added in response to the requests* |
+-------------------------+-------------------+----------------+---------------------------------------------------------+

You can also try out our example of using RLLib to train RL policies in :ref:`Training with RLLib`.
1 change: 1 addition & 0 deletions documentation/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,7 @@ Please feel free to contact us if you have any suggestions or ideas!
action_and_dynamics.rst
config_system.rst
read_data_from_dataset.rst
training_with_rllib.rst

.. toctree::
:hidden:
Expand Down
26 changes: 26 additions & 0 deletions documentation/source/training_with_rllib.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
########################
Training with RLLib
########################


We provide a script demonstrating how to use `RLLib <https://docs.ray.io/en/latest/rllib.html>`_ to
train RL agents:

.. code-block:: shell
# Make sure current folder does not have a sub-folder named metadrive
python -m metadrive.examples.train_generalization_experiment
# You can also use GPUs and customized experiment name:
python -m metadrive.examples.train_generalization_experiment \
--exp-name CUSTOMIZED_EXP_NAME \
--num-gpus HOW_MANY_GPUS_IN_THIS_MACHINES
In this example, we leave the training hyper-parameter :code:`config["num_envs_per_worker"] = 1` as default, so that each process (ray worker) will only contain one MetaDrive instance.
We further set the evaluation workers :code:`config["evaluation_num_workers"] = 5`, so that the test set environments are hosted in separated processes.
By utilizing the feature of RLLib, we avoid the issue of multiple MetaDrive instances in single process.

We welcome more examples using MetaDrive in different context! Please show off your code if you like to share it by opening new issue! Thanks!

.. note:: We tested this script using :code:`ray==1.2.0`. If you find this script not compatible with newer RLLib, please contact us.
3 changes: 3 additions & 0 deletions metadrive/examples/drive_in_argoverse_env.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,6 @@
"""
This script demonstrates how to use the environment where traffic and road map are loaded from argoverse dataset.
"""
from metadrive.envs.argoverse_env import ArgoverseEnv

if __name__ == "__main__":
Expand Down
206 changes: 206 additions & 0 deletions metadrive/examples/train_generalization_experiment.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,206 @@
"""
This script demonstrates how to train a set of policies under different number of training scenarios and test them
in the same test set using rllib.
We verified this script with ray==1.2.0. Please report to use if you find newer version of ray is not compatible with
this script.
"""
import argparse
import copy
from typing import Dict

import numpy as np

from metadrive import MetaDriveEnv

try:
import ray
from ray import tune

from ray.tune import CLIReporter
from ray.rllib.agents.callbacks import DefaultCallbacks
from ray.rllib.env import BaseEnv
from ray.rllib.evaluation import MultiAgentEpisode, RolloutWorker
from ray.rllib.policy import Policy
except ImportError:
ray = None
raise ValueError("Please install ray through 'pip install ray'.")


class DrivingCallbacks(DefaultCallbacks):
def on_episode_start(
self, *, worker: RolloutWorker, base_env: BaseEnv, policies: Dict[str, Policy], episode: MultiAgentEpisode,
env_index: int, **kwargs
):
episode.user_data["velocity"] = []
episode.user_data["steering"] = []
episode.user_data["step_reward"] = []
episode.user_data["acceleration"] = []
episode.user_data["cost"] = []

def on_episode_step(
self, *, worker: RolloutWorker, base_env: BaseEnv, episode: MultiAgentEpisode, env_index: int, **kwargs
):
info = episode.last_info_for()
if info is not None:
episode.user_data["velocity"].append(info["velocity"])
episode.user_data["steering"].append(info["steering"])
episode.user_data["step_reward"].append(info["step_reward"])
episode.user_data["acceleration"].append(info["acceleration"])
episode.user_data["cost"].append(info["cost"])

def on_episode_end(
self, worker: RolloutWorker, base_env: BaseEnv, policies: Dict[str, Policy], episode: MultiAgentEpisode,
**kwargs
):
arrive_dest = episode.last_info_for()["arrive_dest"]
crash = episode.last_info_for()["crash"]
out_of_road = episode.last_info_for()["out_of_road"]
max_step_rate = not (arrive_dest or crash or out_of_road)
episode.custom_metrics["success_rate"] = float(arrive_dest)
episode.custom_metrics["crash_rate"] = float(crash)
episode.custom_metrics["out_of_road_rate"] = float(out_of_road)
episode.custom_metrics["max_step_rate"] = float(max_step_rate)
episode.custom_metrics["velocity_max"] = float(np.max(episode.user_data["velocity"]))
episode.custom_metrics["velocity_mean"] = float(np.mean(episode.user_data["velocity"]))
episode.custom_metrics["velocity_min"] = float(np.min(episode.user_data["velocity"]))
episode.custom_metrics["steering_max"] = float(np.max(episode.user_data["steering"]))
episode.custom_metrics["steering_mean"] = float(np.mean(episode.user_data["steering"]))
episode.custom_metrics["steering_min"] = float(np.min(episode.user_data["steering"]))
episode.custom_metrics["acceleration_min"] = float(np.min(episode.user_data["acceleration"]))
episode.custom_metrics["acceleration_mean"] = float(np.mean(episode.user_data["acceleration"]))
episode.custom_metrics["acceleration_max"] = float(np.max(episode.user_data["acceleration"]))
episode.custom_metrics["step_reward_max"] = float(np.max(episode.user_data["step_reward"]))
episode.custom_metrics["step_reward_mean"] = float(np.mean(episode.user_data["step_reward"]))
episode.custom_metrics["step_reward_min"] = float(np.min(episode.user_data["step_reward"]))
episode.custom_metrics["cost"] = float(sum(episode.user_data["cost"]))

def on_train_result(self, *, trainer, result: dict, **kwargs):
result["success"] = np.nan
result["crash"] = np.nan
result["out"] = np.nan
result["max_step"] = np.nan
result["length"] = result["episode_len_mean"]
result["cost"] = np.nan
if "custom_metrics" not in result:
return

if "success_rate_mean" in result["custom_metrics"]:
result["success"] = result["custom_metrics"]["success_rate_mean"]
result["crash"] = result["custom_metrics"]["crash_rate_mean"]
result["out"] = result["custom_metrics"]["out_of_road_rate_mean"]
result["max_step"] = result["custom_metrics"]["max_step_rate_mean"]
if "cost_mean" in result["custom_metrics"]:
result["cost"] = result["custom_metrics"]["cost_mean"]


def train(
trainer,
config,
stop,
exp_name,
num_gpus=0,
test_mode=False,
checkpoint_freq=10,
keep_checkpoints_num=None,
custom_callback=None,
max_failures=5,
**kwargs
):
ray.init(num_gpus=num_gpus)
used_config = {
"callbacks": custom_callback if custom_callback else DrivingCallbacks, # Must Have!
}
used_config.update(config)
config = copy.deepcopy(used_config)

if not isinstance(stop, dict) and stop is not None:
assert np.isscalar(stop)
stop = {"timesteps_total": int(stop)}

if keep_checkpoints_num is not None and not test_mode:
assert isinstance(keep_checkpoints_num, int)
kwargs["keep_checkpoints_num"] = keep_checkpoints_num
kwargs["checkpoint_score_attr"] = "episode_reward_mean"

metric_columns = CLIReporter.DEFAULT_COLUMNS.copy()
progress_reporter = CLIReporter(metric_columns)
progress_reporter.add_metric_column("success")
progress_reporter.add_metric_column("crash")
progress_reporter.add_metric_column("out")
progress_reporter.add_metric_column("max_step")
progress_reporter.add_metric_column("length")
progress_reporter.add_metric_column("cost")
kwargs["progress_reporter"] = progress_reporter

# start training
analysis = tune.run(
trainer,
name=exp_name,
checkpoint_freq=checkpoint_freq,
checkpoint_at_end=True if "checkpoint_at_end" not in kwargs else kwargs.pop("checkpoint_at_end"),
stop=stop,
config=config,
max_failures=max_failures if not test_mode else 0,
reuse_actors=False,
local_dir=".",
**kwargs
)
return analysis


def get_train_parser():
parser = argparse.ArgumentParser()
parser.add_argument("--exp-name", type=str, default="generalization_experiment")
parser.add_argument("--num-gpus", type=int, default=0)
return parser


if __name__ == '__main__':
args = get_train_parser().parse_args()
exp_name = args.exp_name
stop = int(1000_0000)
config = dict(

# ===== Training Environment =====
# Train the policies in scenario sets with different number of scenarios.
env=MetaDriveEnv,
env_config=dict(
environment_num=tune.grid_search([1, 5, 10, 20, 50, 100, 300, 1000]),
start_seed=tune.grid_search([5000, 6000, 7000]),
random_traffic=False,
),

# ===== Evaluation =====
# Evaluate the trained policies in unseen 200 scenarios.
evaluation_interval=2,
evaluation_num_episodes=40,
metrics_smoothing_episodes=200,
evaluation_config=dict(env_config=dict(environment_num=200, start_seed=0)),
evaluation_num_workers=5,

# ===== Training =====
# Hyper-parameters for PPO
horizon=1000,
rollout_fragment_length=200,
sgd_minibatch_size=256,
train_batch_size=20000,
num_sgd_iter=10,
lr=3e-4,
num_workers=5,
**{"lambda": 0.95},

# ===== Resources Specification =====
num_gpus=0.25 if args.num_gpus != 0 else 0,
num_cpus_per_worker=0.2,
num_cpus_for_driver=0.5,
)

train(
"PPO",
exp_name=exp_name,
keep_checkpoints_num=5,
stop=stop,
config=config,
num_gpus=args.num_gpus,
)
3 changes: 1 addition & 2 deletions metadrive/tests/test_env/_test_remote_metadrive_env.py
Original file line number Diff line number Diff line change
@@ -1,9 +1,8 @@
from metadrive.envs.generation_envs.remote_metadrive_env import RemoteMetaDrive

# This test is broken for some reasons. Just remove it from CI temporarily.


def _test_remote_metadrive_env():
from metadrive.envs.remote_env import RemoteMetaDrive
# Test
envs = [RemoteMetaDrive(dict(map=7)) for _ in range(3)]
ret = [env.reset() for env in envs]
Expand Down

0 comments on commit 65cb8e4

Please sign in to comment.