diff --git a/README.md b/README.md index 906029dece..ddb167655b 100644 --- a/README.md +++ b/README.md @@ -27,7 +27,7 @@ developer communities. * 10+ sample Unity environments * Support for multiple environment configurations and training scenarios * Train memory-enhanced agents using deep reinforcement learning -* Easily definable Curriculum Learning scenarios +* Easily definable Curriculum Learning and Generalization scenarios * Broadcasting of agent behavior for supervised learning * Built-in support for Imitation Learning * Flexible agent control with On Demand Decision Making @@ -77,11 +77,11 @@ If you run into any problems using the ML-Agents toolkit, [submit an issue](https://github.com/Unity-Technologies/ml-agents/issues) and make sure to include as much detail as possible. -Your opinion matters a great deal to us. Only by hearing your thoughts on the Unity ML-Agents Toolkit can we continue to improve and grow. Please take a few minutes to [let us know about it](https://github.com/Unity-Technologies/ml-agents/issues/1454). +Your opinion matters a great deal to us. Only by hearing your thoughts on the Unity ML-Agents Toolkit can we continue to improve and grow. Please take a few minutes to [let us know about it](https://github.com/Unity-Technologies/ml-agents/issues/1454). For any other questions or feedback, connect directly with the ML-Agents -team at ml-agents@unity3d.com. +team at ml-agents@unity3d.com. ## Translations diff --git a/config/generalize_test.yaml b/config/3dball_generalize.yaml similarity index 100% rename from config/generalize_test.yaml rename to config/3dball_generalize.yaml diff --git a/docs/ML-Agents-Overview.md b/docs/ML-Agents-Overview.md index 04224be32c..daeb770745 100644 --- a/docs/ML-Agents-Overview.md +++ b/docs/ML-Agents-Overview.md @@ -320,7 +320,8 @@ actions from the human player to learn a policy. [Video Link](https://youtu.be/kpb8ZkMBFYs). ML-Agents provides ways to both learn directly from demonstrations as well as -use demonstrations to help speed up reward-based training. The +use demonstrations to help speed up reward-based training, and two algorithms to do +so (Generative Adversarial Imitation Learning and Behavioral Cloning). The [Training with Imitation Learning](Training-Imitation-Learning.md) tutorial covers these features in more depth. @@ -421,6 +422,14 @@ training process. the broadcasting feature [here](Learning-Environment-Design-Brains.md#using-the-broadcast-feature). +- **Training with Environment Parameter Sampling** - To train agents to be robust + to changes in its environment (i.e., generalization), the agent should be exposed + to a variety of environment variations. Similarly to Curriculum Learning, which + allows environments to get more difficult as the agent learns, we also provide + a way to randomly resample aspects of the environment during training. See + [Training with Environment Parameter Sampling](Training-Generalization-Learning.md) + to learn more about this feature. + - **Docker Set-up (Experimental)** - To facilitate setting up ML-Agents without installing Python or TensorFlow directly, we provide a [guide](Using-Docker.md) on how to create and run a Docker container. diff --git a/docs/Training-Generalization-Learning.md b/docs/Training-Generalization-Learning.md index 9578a625b9..79dea8da9e 100644 --- a/docs/Training-Generalization-Learning.md +++ b/docs/Training-Generalization-Learning.md @@ -18,8 +18,9 @@ Ball scale of 0.5 | Ball scale of 4 _Variations of the 3D Ball environment._ To vary environments, we first decide what parameters to vary in an -environment. These parameters are known as `Reset Parameters`. In the 3D ball -environment example displayed in the figure above, the reset parameters are `gravity`, `ball_mass` and `ball_scale`. +environment. We call these parameters `Reset Parameters`. In the 3D ball +environment example displayed in the figure above, the reset parameters are +`gravity`, `ball_mass` and `ball_scale`. ## How-to @@ -31,17 +32,17 @@ can be done either deterministically or randomly. This is done by assigning each reset parameter a sampler, which samples a reset parameter value (such as a uniform sampler). If a sampler isn't provided for a reset parameter, the parameter maintains the default value throughout the -training, remaining unchanged. The samplers for all the reset parameters are -handled by a **Sampler Manager**, which also handles the generation of new +training procedure, remaining unchanged. The samplers for all the reset parameters +are handled by a **Sampler Manager**, which also handles the generation of new values for the reset parameters when needed. To setup the Sampler Manager, we setup a YAML file that specifies how we wish to generate new samples. In this file, we specify the samplers and the -`resampling-duration` (number of simulation steps after which reset parameters are +`resampling-interval` (number of simulation steps after which reset parameters are resampled). Below is an example of a sampler file for the 3D ball environment. ```yaml -episode-length: 5000 +resampling-interval: 5000 mass: sampler-type: "uniform" @@ -59,7 +60,7 @@ scale: ``` -* `resampling-duration` (int) - Specifies the number of steps for agent to +* `resampling-interval` (int) - Specifies the number of steps for agent to train under a particular environment configuration before resetting the environment with a new sample of reset parameters. @@ -77,8 +78,40 @@ environment, then this specification will be ignored. key under the `multirange_uniform` sampler for the gravity reset parameter. The key name should match the name of the corresponding argument in the sampler definition. (Look at defining a new sampler method) + The sampler manager allocates a sampler for a reset parameter by using the *Sampler Factory*, which maintains a dictionary mapping of string keys to sampler objects. The available samplers to be used for reset parameter resampling is as available in the Sampler Factory. +#### Possible Sampler Types + +The currently implemented samplers that can be used with the `sampler-type` arguments are: + +* `uniform` - Uniform sampler + * Uniformly samples a single float value between defined endpoints. + The sub-arguments for this sampler to specify the interval + endpoints are as below. The sampling is done in the range of + [`min_value`, `max_value`). + + * **sub-arguments** - `min_value`, `max_value` + +* `gaussian` - Gaussian sampler + * Samples a single float value from the distribution characterized by + the mean and standard deviation. The sub-arguments to specify the + gaussian distribution to use are as below. + + * **sub-arguments** - `mean`, `st_dev` + +* `multirange_uniform` - Multirange Uniform sampler + * Uniformly samples a single float value between the specified intervals. + Samples by first performing a weight pick of an interval from the list + of intervals (weighted based on interval width) and samples uniformly + from the selected interval (half-closed interval, same as the uniform + sampler). This sampler can take an arbitrary number of intervals in a + list in the following format: + [[`interval_1_min`, `interval_1_max`], [`interval_2_min`, `interval_2_max`], ...] + + * **sub-arguments** - `intervals` + + The implementation of the samplers can be found at `ml-agents-envs/mlagents/envs/sampler_class.py`. ### Defining a new sampler method @@ -115,10 +148,10 @@ With the sampler file setup, we can proceed to train our agent as explained in t ### Training with Generalization Learning -We first begin with setting up the sampler file. After the sampler file is defined and configured, we proceed by launching `mlagents-learn` and specify our configured sampler file with the `--sampler` flag. To demonstrate, if we wanted to train a 3D ball agent with generalization using the `config/generalization-test.yaml` sampling setup, we can run +We first begin with setting up the sampler file. After the sampler file is defined and configured, we proceed by launching `mlagents-learn` and specify our configured sampler file with the `--sampler` flag. To demonstrate, if we wanted to train a 3D ball agent with generalization using the `config/3dball_generalize.yaml` sampling setup, we can run ```sh -mlagents-learn config/trainer_config.yaml --sampler=config/generalize_test.yaml --run-id=3D-Ball-generalization --train +mlagents-learn config/trainer_config.yaml --sampler=config/3dball_generalize.yaml --run-id=3D-Ball-generalization --train ``` We can observe progress and metrics via Tensorboard. diff --git a/docs/Training-ML-Agents.md b/docs/Training-ML-Agents.md index 1c12c32956..a36bfcca71 100644 --- a/docs/Training-ML-Agents.md +++ b/docs/Training-ML-Agents.md @@ -196,7 +196,7 @@ are conducting, see: * [Training with PPO](Training-PPO.md) * [Using Recurrent Neural Networks](Feature-Memory.md) * [Training with Curriculum Learning](Training-Curriculum-Learning.md) -* [Training with Generalization](Training-Generalization-Learning.md) +* [Training with Environment Parameter Sampling](Training-Generalization-Learning.md) * [Training with Imitation Learning](Training-Imitation-Learning.md) You can also compare the