- Added Generative Adversarial Imitation Learning (GAIL), a new way to do imitation learning. (#2118)
- Unlike Behavioral Cloning, which requires demonstrations that exhaustively cover all the scenarios that an agent could encounter, GAIL enables imitation learning from as few as 5-10 demonstrations. This makes GAIL more applicable to more problems than Behavioral Cloning, and less work (in recording demonstrations) to set up.
- GAIL can also be used with reinforcement learning to guide the behavior of the agent to be similar to the demonstrations. In environments where the reward is sparse, providing demonstrations can speed up training by several times. See imitation learning for more information on how to use GAIL, and a comparison of training times on one of our example environments.
- Enabled pre-training for the PPO trainer. (#2118)
- Pre-training can be used to bootstrap an agent's behavior using human-provided demonstrations, and helps the agent explore in the right direction during training. It can be used in conjunction with GAIL for further training speedup, especially in environments where the agent rarely sees a reward, or gets "stuck" in certain parts. See imitation learning for more information on how to use pre-training.
- Introduced training generalized reinforcement learning agents. (#2232)
- Agents trained in the same environment throughout the training process can learn to be really good at solving that particular problem. However, when introduced to variations in the environment (e.g., the terrain changes, the agent's physics changes slightly) these agents will fail.
- This release enables varying the environment during training, so that the trained agent is robust to environment variations. In addition, we've added changeable parameters to our example environments that enable them to train and test these generalized agents. See Training Generalized Agents to learn more about using this feature.
- Changed stepping of the environments to be done asynchronously when running multiple Unity environments. (#2265)
- Prior to this change, ML-Agents waited for all the parallel environments to complete a step. For environments where some steps (e.g. reset) could take much longer than others, this slows the step collection time to the slowest step. Note that this changes the definition of "step" reported in TensorBoard when using multiple environments.
- Added options for Nature and ResNet CNN architectures when using visual observations. These larger networks may help with more complex visual observations. (#2289)
- Added basic profiling in Python (#2180).
Fixes & Improvements
- Upgraded the Unity Inference Engine to 0.2.4, significantly reducing memory usage during inference (#2308).
- Unified definition of reward sources in
trainer_config.yamlacross Curiosity, Extrinsic, and GAIL. (#2144)
- Added support for gym wrapper and multiple visual observations. (#2192)
- Added Korean documentation and localization (#2219, #2356)
- Fixed custom reset parameters for
- Fixed spawning bug in VisualBanana example environment. (#2277)
- Fixed memory leak when using visual observations in a Docker container (#2274)
- Added ability to pass in Unity executable command line parameters while instantiating a UnityEnvironment. (#2243)
- Included other minor bug and doc fixes
- Enabling Curiosity (as well as GAIL) is done under a new
reward_signalsparameter in the trainer configuration YAML file.
- When running training using multiple environments, the number of steps reported to TensorBoard now correspond to the number of steps taken per environment, not one per all environment.
See Migrating for more details on workflow changes.
In some rare cases, the model may not be saved when quitting with
Ctrl+C on Windows. If this occurs, reload the model by running
mlagents-learn using the
--load parameter, and attempt saving again.