Unity-Technologies · ervteng · Aug 1, 2019 · Jul 29, 2019 · Jul 29, 2019 · Jul 30, 2019
diff --git a/UnitySDK/Assets/ML-Agents/Examples/3DBall/TFModels/3DBallHardLearning.nn b/UnitySDK/Assets/ML-Agents/Examples/3DBall/TFModels/3DBallHardLearning.nn
diff --git a/UnitySDK/Assets/ML-Agents/Examples/3DBall/TFModels/3DBallLearning.nn b/UnitySDK/Assets/ML-Agents/Examples/3DBall/TFModels/3DBallLearning.nn
diff --git a/UnitySDK/Assets/ML-Agents/Examples/BananaCollectors/TFModels/BananaLearning.nn b/UnitySDK/Assets/ML-Agents/Examples/BananaCollectors/TFModels/BananaLearning.nn
diff --git a/UnitySDK/Assets/ML-Agents/Examples/Basic/TFModels/BasicLearning.nn b/UnitySDK/Assets/ML-Agents/Examples/Basic/TFModels/BasicLearning.nn
diff --git a/UnitySDK/Assets/ML-Agents/Examples/Bouncer/TFModels/BouncerLearning.nn b/UnitySDK/Assets/ML-Agents/Examples/Bouncer/TFModels/BouncerLearning.nn
diff --git a/UnitySDK/Assets/ML-Agents/Examples/Crawler/TFModels/CrawlerDynamicLearning.nn b/UnitySDK/Assets/ML-Agents/Examples/Crawler/TFModels/CrawlerDynamicLearning.nn
diff --git a/UnitySDK/Assets/ML-Agents/Examples/Crawler/TFModels/CrawlerStaticLearning.nn b/UnitySDK/Assets/ML-Agents/Examples/Crawler/TFModels/CrawlerStaticLearning.nn
diff --git a/UnitySDK/Assets/ML-Agents/Examples/GridWorld/TFModels/GridWorldLearning.nn b/UnitySDK/Assets/ML-Agents/Examples/GridWorld/TFModels/GridWorldLearning.nn
diff --git a/UnitySDK/Assets/ML-Agents/Examples/Hallway/TFModels/HallwayLearning.nn b/UnitySDK/Assets/ML-Agents/Examples/Hallway/TFModels/HallwayLearning.nn
diff --git a/UnitySDK/Assets/ML-Agents/Examples/PushBlock/TFModels/PushBlockLearning.nn b/UnitySDK/Assets/ML-Agents/Examples/PushBlock/TFModels/PushBlockLearning.nn
diff --git a/UnitySDK/Assets/ML-Agents/Examples/Pyramids/TFModels/PyramidsLearning.nn b/UnitySDK/Assets/ML-Agents/Examples/Pyramids/TFModels/PyramidsLearning.nn
diff --git a/UnitySDK/Assets/ML-Agents/Examples/Reacher/TFModels/ReacherLearning.nn b/UnitySDK/Assets/ML-Agents/Examples/Reacher/TFModels/ReacherLearning.nn
diff --git a/UnitySDK/Assets/ML-Agents/Examples/Soccer/TFModels/GoalieLearning.nn b/UnitySDK/Assets/ML-Agents/Examples/Soccer/TFModels/GoalieLearning.nn
diff --git a/UnitySDK/Assets/ML-Agents/Examples/Soccer/TFModels/StrikerLearning.nn b/UnitySDK/Assets/ML-Agents/Examples/Soccer/TFModels/StrikerLearning.nn
diff --git a/UnitySDK/Assets/ML-Agents/Examples/Tennis/TFModels/TennisLearning.nn b/UnitySDK/Assets/ML-Agents/Examples/Tennis/TFModels/TennisLearning.nn
diff --git a/UnitySDK/Assets/ML-Agents/Examples/Walker/TFModels/WalkerLearning.nn b/UnitySDK/Assets/ML-Agents/Examples/Walker/TFModels/WalkerLearning.nn
diff --git a/UnitySDK/Assets/ML-Agents/Examples/WallJump/TFModels/BigWallJumpLearning.nn b/UnitySDK/Assets/ML-Agents/Examples/WallJump/TFModels/BigWallJumpLearning.nn
diff --git a/UnitySDK/Assets/ML-Agents/Examples/WallJump/TFModels/SmallWallJumpLearning.nn b/UnitySDK/Assets/ML-Agents/Examples/WallJump/TFModels/SmallWallJumpLearning.nn
diff --git a/docs/Installation.md b/docs/Installation.md
@@ -63,7 +63,7 @@ If you installed this correctly, you should be able to run
 `mlagents-learn --help`, after which you will see the Unity logo and the command line
 parameters you can use with `mlagents-learn`. 
 
-By installing the `mlagents` package, its dependencies listed in the [setup.py file](../ml-agents/setup.py) are also installed.
+By installing the `mlagents` package, the dependencies listed in the [setup.py file](../ml-agents/setup.py) are also installed.
 Some of the primary dependencies include:
 
 - [TensorFlow](Background-TensorFlow.md) (Requires a CPU w/ AVX support)

diff --git a/docs/Learning-Environment-Examples.md b/docs/Learning-Environment-Examples.md
@@ -32,7 +32,7 @@ If you would like to contribute environments, please see our
   * Vector Observation space: One variable corresponding to current state.
   * Vector Action space: (Discrete) Two possible actions (Move left, move
     right).
-  * Visual Observations: None.
+  * Visual Observations: None
 * Reset Parameters: None
 * Benchmark Mean Reward: 0.94
 
@@ -56,7 +56,7 @@ If you would like to contribute environments, please see our
   * Vector Action space: (Continuous) Size of 2, with one value corresponding to
     X-rotation, and the other to Z-rotation.
   * Visual Observations: None.
-* Reset Parameters: Three, corresponding to the following:
+* Reset Parameters: Three
     * scale: Specifies the scale of the ball in the 3 dimensions (equal across the three dimensions)
       * Default: 1
       * Recommended Minimum: 0.2
@@ -116,8 +116,8 @@ If you would like to contribute environments, please see our
     of ball and racket.
   * Vector Action space: (Continuous) Size of 2, corresponding to movement
     toward net or away from net, and jumping.
-  * Visual Observations: None.
-* Reset Parameters: Three, corresponding to the following:
+  * Visual Observations: None
+* Reset Parameters: Three
     * angle: Angle of the racket from the vertical (Y) axis.
       * Default: 55
       * Recommended Minimum: 35 
@@ -153,7 +153,7 @@ If you would like to contribute environments, please see our
     `VisualPushBlock` scene. __The visual observation version of
      this environment does not train with the provided default
      training parameters.__
-* Reset Parameters: Four, corresponding to the following:
+* Reset Parameters: Four
     * block_scale: Scale of the block along the x and z dimensions
         * Default: 2
         * Recommended Minimum: 0.5
@@ -194,8 +194,8 @@ If you would like to contribute environments, please see our
     * Rotation (3 possible actions: Rotate Left, Rotate Right, No Action)
     * Side Motion (3 possible actions: Left, Right, No Action)
     * Jump (2 possible actions: Jump, No Action)
-  * Visual Observations: None.
-* Reset Parameters: 4, corresponding to the height of the possible walls.
+  * Visual Observations: None
+* Reset Parameters: Four
 * Benchmark Mean Reward (Big & Small Wall Brain): 0.8
 
 ## [Reacher](https://youtu.be/2N9EoF6pQyE)
@@ -213,7 +213,7 @@ If you would like to contribute environments, please see our
   * Vector Action space: (Continuous) Size of 4, corresponding to torque
     applicable to two joints.
   * Visual Observations: None.
-* Reset Parameters: Five, corresponding to the following
+* Reset Parameters: Five
   * goal_size: radius of the goal zone
     * Default: 5
     * Recommended Minimum: 1
@@ -254,7 +254,7 @@ If you would like to contribute environments, please see our
     angular acceleration of the body.
   * Vector Action space: (Continuous) Size of 20, corresponding to target
     rotations for joints.
-  * Visual Observations: None.
+  * Visual Observations: None
 * Reset Parameters: None
 * Benchmark Mean Reward for `CrawlerStaticTarget`: 2000
 * Benchmark Mean Reward for `CrawlerDynamicTarget`: 400
@@ -284,7 +284,7 @@ If you would like to contribute environments, please see our
     `VisualBanana` scene. __The visual observation version of
      this environment does not train with the provided default
      training parameters.__
-* Reset Parameters: Two, corresponding to the following
+* Reset Parameters: Two
   * laser_length: Length of the laser used by the agent
     * Default: 1
     * Recommended Minimum: 0.2
@@ -318,7 +318,7 @@ If you would like to contribute environments, please see our
     `VisualHallway` scene. __The visual observation version of
      this environment does not train with the provided default
      training parameters.__
-* Reset Parameters: None.
+* Reset Parameters: None
 * Benchmark Mean Reward: 0.7
   * To speed up training, you can enable curiosity by adding `use_curiosity: true` in `config/trainer_config.yaml`
 * Optional Imitation Learning scene: `HallwayIL`.
@@ -340,8 +340,8 @@ If you would like to contribute environments, please see our
     banana.
   * Vector Action space: (Continuous) 3 corresponding to agent force applied for
     the jump.
-  * Visual Observations: None.
-* Reset Parameters: Two, corresponding to the following
+  * Visual Observations: None
+* Reset Parameters: Two
     * banana_scale: The scale of the banana in the 3 dimensions
         * Default: 150
         * Recommended Minimum: 50
@@ -375,8 +375,8 @@ If you would like to contribute environments, please see our
     * Striker: 6 actions corresponding to forward, backward, sideways movement,
       as well as rotation.
     * Goalie: 4 actions corresponding to forward, backward, sideways movement.
-  * Visual Observations: None.
-* Reset Parameters: Two, corresponding to the following:
+  * Visual Observations: None
+* Reset Parameters: Two
   * ball_scale: Specifies the scale of the ball in the 3 dimensions (equal across the three dimensions)
     * Default: 7.5
     * Recommended minimum: 4
@@ -409,8 +409,8 @@ If you would like to contribute environments, please see our
     velocity, and angular velocities of each limb, along with goal direction.
   * Vector Action space: (Continuous) Size of 39, corresponding to target
     rotations applicable to the joints.
-  * Visual Observations: None.
-* Reset Parameters: Four, corresponding to the following
+  * Visual Observations: None
+* Reset Parameters: Four
     * gravity: Magnitude of gravity
         * Default: 9.81
         * Recommended Minimum:
@@ -450,6 +450,6 @@ If you would like to contribute environments, please see our
     `VisualPyramids` scene. __The visual observation version of
      this environment does not train with the provided default
      training parameters.__
-* Reset Parameters: None.
+* Reset Parameters: None
 * Optional Imitation Learning scene: `PyramidsIL`.
 * Benchmark Mean Reward: 1.75
diff --git a/docs/ML-Agents-Overview.md b/docs/ML-Agents-Overview.md
@@ -319,11 +319,11 @@ imitation learning algorithm will then use these pairs of observations and
 actions from the human player to learn a policy. [Video
 Link](https://youtu.be/kpb8ZkMBFYs).
 
-ML-Agents provides ways to both learn directly from demonstrations as well as
-use demonstrations to help speed up reward-based training, and two algorithms to do
-so (Generative Adversarial Imitation Learning and Behavioral Cloning). The
-[Training with Imitation Learning](Training-Imitation-Learning.md) tutorial
-covers these features in more depth.
+The toolkit provides a way to learn directly from demonstrations, as well as use them
+to help speed up reward-based training (RL).  We include two algorithms called
+Behavioral Cloning (BC) and Generative Adversarial Imitation Learning (GAIL). The
+[Training with Imitation Learning](Training-Imitation-Learning.md) tutorial covers these
+features in more depth.
 
 ## Flexible Training Scenarios
 
@@ -408,6 +408,14 @@ training process.
   learn more about adding visual observations to an agent
   [here](Learning-Environment-Design-Agents.md#multiple-visual-observations).
 
+- **Training with Reset Parameter Sampling** - To train agents to be adapt
+  to changes in its environment (i.e., generalization), the agent should be exposed
+  to several variations of the environment. Similar to Curriculum Learning,
+  where environments become more difficult as the agent learns, the toolkit provides
+  a way to randomly sample Reset Parameters of the environment during training. See
+  [Training Generalized Reinforcement Learning Agents](Training-Generalized-Reinforcement-Learning-Agents.md)
+  to learn more about this feature.
+
 - **Broadcasting** - As discussed earlier, a Learning Brain sends the
   observations for all its Agents to the Python API when dragged into the
   Academy's `Broadcast Hub` with the `Control` checkbox checked. This is helpful
@@ -422,14 +430,6 @@ training process.
   the broadcasting feature
   [here](Learning-Environment-Design-Brains.md#using-the-broadcast-feature).
 
-- **Training with Environment Parameter Sampling** - To train agents to be robust
-  to changes in its environment (i.e., generalization), the agent should be exposed
-  to a variety of environment variations. Similarly to Curriculum Learning, which
-  allows environments to get more difficult as the agent learns, we also provide
-  a way to randomly resample aspects of the environment during training. See
-  [Training with Environment Parameter Sampling](Training-Generalization-Learning.md)
-  to learn more about this feature.
-
 - **Docker Set-up (Experimental)** - To facilitate setting up ML-Agents without
   installing Python or TensorFlow directly, we provide a
   [guide](Using-Docker.md) on how to create and run a Docker container.

diff --git a/docs/Migrating.md b/docs/Migrating.md
@@ -1,5 +1,26 @@
 # Migrating
 
+## Migrating from ML-Agents toolkit v0.8 to v0.9
+
+### Important Changes
+* We have changed the way reward signals (including Curiosity) are defined in the
+`trainer_config.yaml`.
+* When using multiple environments, every "step" is recorded in TensorBoard.
+* The steps in the command line console corresponds to a single step of a single environment.
+Previously, each step corresponded to one step for all environments (i.e., `num_envs` steps).
+
+#### Steps to Migrate
+* If you were overriding any of these following parameters in your config file, remove them
+from the top-level config and follow the steps below:
+  * `gamma`: Define a new `extrinsic` reward signal and set it's `gamma` to your new gamma.
+  * `use_curiosity`, `curiosity_strength`, `curiosity_enc_size`: Define a `curiosity` reward signal
+  and set its `strength` to `curiosity_strength`, and `encoding_size` to `curiosity_enc_size`. Give it
+  the same `gamma` as your `extrinsic` signal to mimic previous behavior.
+See [Reward Signals](Reward-Signals.md) for more information on defining reward signals.
+* TensorBoards generated when running multiple environments in v0.8 are not comparable to those generated in
+v0.9 in terms of step count. Multiply your v0.8 step count by `num_envs` for an approximate comparison.
+You may need to change `max_steps` in your config as appropriate as well.
+
 ## Migrating from ML-Agents toolkit v0.7 to v0.8
 
 ### Important Changes

diff --git a/docs/Profiling.md → docs/Profiling-Python.md b/docs/Profiling.md → docs/Profiling-Python.md
@@ -1,7 +1,7 @@
-# Profiling ML-Agents in Python
+# Profiling in Python
 
-ML-Agents provides a lightweight profiling system, in order to identity hotspots in the training process and help spot
-regressions from changes.
+As part of the ML-Agents tookit, we provide a lightweight profiling system,
+in order to identity hotspots in the training process and help spot regressions from changes.
 
 Timers are hierarchical, meaning that the time tracked in a block of code can be further split into other blocks if
 desired. This also means that a function that is called from multiple places in the code will appear in multiple
@@ -24,7 +24,6 @@ class TrainerController:
 
 You can also used the `hierarchical_timer` context manager.
 
-
 ``` python
 with hierarchical_timer("communicator.exchange"):
     outputs = self.communicator.exchange(step_input)

diff --git a/docs/Readme.md b/docs/Readme.md
@@ -39,6 +39,7 @@
 * [Training with Curriculum Learning](Training-Curriculum-Learning.md)
 * [Training with Imitation Learning](Training-Imitation-Learning.md)
 * [Training with LSTM](Feature-Memory.md)
+* [Training Generalized Reinforcement Learning Agents](Training-Generalized-Reinforcement-Learning-Agents.md)
 * [Training on the Cloud with Amazon Web Services](Training-on-Amazon-Web-Service.md)
 * [Training on the Cloud with Microsoft Azure](Training-on-Microsoft-Azure.md)
 * [Training Using Concurrent Unity Instances](Training-Using-Concurrent-Unity-Instances.md)