Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file modified UnitySDK/Assets/ML-Agents/Examples/Basic/TFModels/BasicLearning.nn
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
2 changes: 1 addition & 1 deletion docs/Installation.md
Original file line number Diff line number Diff line change
Expand Up @@ -63,7 +63,7 @@ If you installed this correctly, you should be able to run
`mlagents-learn --help`, after which you will see the Unity logo and the command line
parameters you can use with `mlagents-learn`.

By installing the `mlagents` package, its dependencies listed in the [setup.py file](../ml-agents/setup.py) are also installed.
By installing the `mlagents` package, the dependencies listed in the [setup.py file](../ml-agents/setup.py) are also installed.
Some of the primary dependencies include:

- [TensorFlow](Background-TensorFlow.md) (Requires a CPU w/ AVX support)
Expand Down
36 changes: 18 additions & 18 deletions docs/Learning-Environment-Examples.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ If you would like to contribute environments, please see our
* Vector Observation space: One variable corresponding to current state.
* Vector Action space: (Discrete) Two possible actions (Move left, move
right).
* Visual Observations: None.
* Visual Observations: None
* Reset Parameters: None
* Benchmark Mean Reward: 0.94

Expand All @@ -56,7 +56,7 @@ If you would like to contribute environments, please see our
* Vector Action space: (Continuous) Size of 2, with one value corresponding to
X-rotation, and the other to Z-rotation.
* Visual Observations: None.
* Reset Parameters: Three, corresponding to the following:
* Reset Parameters: Three
* scale: Specifies the scale of the ball in the 3 dimensions (equal across the three dimensions)
* Default: 1
* Recommended Minimum: 0.2
Expand Down Expand Up @@ -116,8 +116,8 @@ If you would like to contribute environments, please see our
of ball and racket.
* Vector Action space: (Continuous) Size of 2, corresponding to movement
toward net or away from net, and jumping.
* Visual Observations: None.
* Reset Parameters: Three, corresponding to the following:
* Visual Observations: None
* Reset Parameters: Three
* angle: Angle of the racket from the vertical (Y) axis.
* Default: 55
* Recommended Minimum: 35
Expand Down Expand Up @@ -153,7 +153,7 @@ If you would like to contribute environments, please see our
`VisualPushBlock` scene. __The visual observation version of
this environment does not train with the provided default
training parameters.__
* Reset Parameters: Four, corresponding to the following:
* Reset Parameters: Four
* block_scale: Scale of the block along the x and z dimensions
* Default: 2
* Recommended Minimum: 0.5
Expand Down Expand Up @@ -194,8 +194,8 @@ If you would like to contribute environments, please see our
* Rotation (3 possible actions: Rotate Left, Rotate Right, No Action)
* Side Motion (3 possible actions: Left, Right, No Action)
* Jump (2 possible actions: Jump, No Action)
* Visual Observations: None.
* Reset Parameters: 4, corresponding to the height of the possible walls.
* Visual Observations: None
* Reset Parameters: Four
* Benchmark Mean Reward (Big & Small Wall Brain): 0.8

## [Reacher](https://youtu.be/2N9EoF6pQyE)
Expand All @@ -213,7 +213,7 @@ If you would like to contribute environments, please see our
* Vector Action space: (Continuous) Size of 4, corresponding to torque
applicable to two joints.
* Visual Observations: None.
* Reset Parameters: Five, corresponding to the following
* Reset Parameters: Five
* goal_size: radius of the goal zone
* Default: 5
* Recommended Minimum: 1
Expand Down Expand Up @@ -254,7 +254,7 @@ If you would like to contribute environments, please see our
angular acceleration of the body.
* Vector Action space: (Continuous) Size of 20, corresponding to target
rotations for joints.
* Visual Observations: None.
* Visual Observations: None
* Reset Parameters: None
* Benchmark Mean Reward for `CrawlerStaticTarget`: 2000
* Benchmark Mean Reward for `CrawlerDynamicTarget`: 400
Expand Down Expand Up @@ -284,7 +284,7 @@ If you would like to contribute environments, please see our
`VisualBanana` scene. __The visual observation version of
this environment does not train with the provided default
training parameters.__
* Reset Parameters: Two, corresponding to the following
* Reset Parameters: Two
* laser_length: Length of the laser used by the agent
* Default: 1
* Recommended Minimum: 0.2
Expand Down Expand Up @@ -318,7 +318,7 @@ If you would like to contribute environments, please see our
`VisualHallway` scene. __The visual observation version of
this environment does not train with the provided default
training parameters.__
* Reset Parameters: None.
* Reset Parameters: None
* Benchmark Mean Reward: 0.7
* To speed up training, you can enable curiosity by adding `use_curiosity: true` in `config/trainer_config.yaml`
* Optional Imitation Learning scene: `HallwayIL`.
Expand All @@ -340,8 +340,8 @@ If you would like to contribute environments, please see our
banana.
* Vector Action space: (Continuous) 3 corresponding to agent force applied for
the jump.
* Visual Observations: None.
* Reset Parameters: Two, corresponding to the following
* Visual Observations: None
* Reset Parameters: Two
* banana_scale: The scale of the banana in the 3 dimensions
* Default: 150
* Recommended Minimum: 50
Expand Down Expand Up @@ -375,8 +375,8 @@ If you would like to contribute environments, please see our
* Striker: 6 actions corresponding to forward, backward, sideways movement,
as well as rotation.
* Goalie: 4 actions corresponding to forward, backward, sideways movement.
* Visual Observations: None.
* Reset Parameters: Two, corresponding to the following:
* Visual Observations: None
* Reset Parameters: Two
* ball_scale: Specifies the scale of the ball in the 3 dimensions (equal across the three dimensions)
* Default: 7.5
* Recommended minimum: 4
Expand Down Expand Up @@ -409,8 +409,8 @@ If you would like to contribute environments, please see our
velocity, and angular velocities of each limb, along with goal direction.
* Vector Action space: (Continuous) Size of 39, corresponding to target
rotations applicable to the joints.
* Visual Observations: None.
* Reset Parameters: Four, corresponding to the following
* Visual Observations: None
* Reset Parameters: Four
* gravity: Magnitude of gravity
* Default: 9.81
* Recommended Minimum:
Expand Down Expand Up @@ -450,6 +450,6 @@ If you would like to contribute environments, please see our
`VisualPyramids` scene. __The visual observation version of
this environment does not train with the provided default
training parameters.__
* Reset Parameters: None.
* Reset Parameters: None
* Optional Imitation Learning scene: `PyramidsIL`.
* Benchmark Mean Reward: 1.75
26 changes: 13 additions & 13 deletions docs/ML-Agents-Overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -319,11 +319,11 @@ imitation learning algorithm will then use these pairs of observations and
actions from the human player to learn a policy. [Video
Link](https://youtu.be/kpb8ZkMBFYs).

ML-Agents provides ways to both learn directly from demonstrations as well as
use demonstrations to help speed up reward-based training, and two algorithms to do
so (Generative Adversarial Imitation Learning and Behavioral Cloning). The
[Training with Imitation Learning](Training-Imitation-Learning.md) tutorial
covers these features in more depth.
The toolkit provides a way to learn directly from demonstrations, as well as use them
to help speed up reward-based training (RL). We include two algorithms called
Behavioral Cloning (BC) and Generative Adversarial Imitation Learning (GAIL). The
[Training with Imitation Learning](Training-Imitation-Learning.md) tutorial covers these
features in more depth.

## Flexible Training Scenarios

Expand Down Expand Up @@ -408,6 +408,14 @@ training process.
learn more about adding visual observations to an agent
[here](Learning-Environment-Design-Agents.md#multiple-visual-observations).

- **Training with Reset Parameter Sampling** - To train agents to be adapt
to changes in its environment (i.e., generalization), the agent should be exposed
to several variations of the environment. Similar to Curriculum Learning,
where environments become more difficult as the agent learns, the toolkit provides
a way to randomly sample Reset Parameters of the environment during training. See
[Training Generalized Reinforcement Learning Agents](Training-Generalized-Reinforcement-Learning-Agents.md)
to learn more about this feature.

- **Broadcasting** - As discussed earlier, a Learning Brain sends the
observations for all its Agents to the Python API when dragged into the
Academy's `Broadcast Hub` with the `Control` checkbox checked. This is helpful
Expand All @@ -422,14 +430,6 @@ training process.
the broadcasting feature
[here](Learning-Environment-Design-Brains.md#using-the-broadcast-feature).

- **Training with Environment Parameter Sampling** - To train agents to be robust
to changes in its environment (i.e., generalization), the agent should be exposed
to a variety of environment variations. Similarly to Curriculum Learning, which
allows environments to get more difficult as the agent learns, we also provide
a way to randomly resample aspects of the environment during training. See
[Training with Environment Parameter Sampling](Training-Generalization-Learning.md)
to learn more about this feature.

- **Docker Set-up (Experimental)** - To facilitate setting up ML-Agents without
installing Python or TensorFlow directly, we provide a
[guide](Using-Docker.md) on how to create and run a Docker container.
Expand Down
21 changes: 21 additions & 0 deletions docs/Migrating.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,26 @@
# Migrating

## Migrating from ML-Agents toolkit v0.8 to v0.9

### Important Changes
* We have changed the way reward signals (including Curiosity) are defined in the
`trainer_config.yaml`.
* When using multiple environments, every "step" is recorded in TensorBoard.
* The steps in the command line console corresponds to a single step of a single environment.
Previously, each step corresponded to one step for all environments (i.e., `num_envs` steps).

#### Steps to Migrate
* If you were overriding any of these following parameters in your config file, remove them
from the top-level config and follow the steps below:
* `gamma`: Define a new `extrinsic` reward signal and set it's `gamma` to your new gamma.
* `use_curiosity`, `curiosity_strength`, `curiosity_enc_size`: Define a `curiosity` reward signal
and set its `strength` to `curiosity_strength`, and `encoding_size` to `curiosity_enc_size`. Give it
the same `gamma` as your `extrinsic` signal to mimic previous behavior.
See [Reward Signals](Reward-Signals.md) for more information on defining reward signals.
* TensorBoards generated when running multiple environments in v0.8 are not comparable to those generated in
v0.9 in terms of step count. Multiply your v0.8 step count by `num_envs` for an approximate comparison.
You may need to change `max_steps` in your config as appropriate as well.

## Migrating from ML-Agents toolkit v0.7 to v0.8

### Important Changes
Expand Down
7 changes: 3 additions & 4 deletions docs/Profiling.md → docs/Profiling-Python.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# Profiling ML-Agents in Python
# Profiling in Python

ML-Agents provides a lightweight profiling system, in order to identity hotspots in the training process and help spot
regressions from changes.
As part of the ML-Agents tookit, we provide a lightweight profiling system,
in order to identity hotspots in the training process and help spot regressions from changes.

Timers are hierarchical, meaning that the time tracked in a block of code can be further split into other blocks if
desired. This also means that a function that is called from multiple places in the code will appear in multiple
Expand All @@ -24,7 +24,6 @@ class TrainerController:

You can also used the `hierarchical_timer` context manager.


``` python
with hierarchical_timer("communicator.exchange"):
outputs = self.communicator.exchange(step_input)
Expand Down
1 change: 1 addition & 0 deletions docs/Readme.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,7 @@
* [Training with Curriculum Learning](Training-Curriculum-Learning.md)
* [Training with Imitation Learning](Training-Imitation-Learning.md)
* [Training with LSTM](Feature-Memory.md)
* [Training Generalized Reinforcement Learning Agents](Training-Generalized-Reinforcement-Learning-Agents.md)
* [Training on the Cloud with Amazon Web Services](Training-on-Amazon-Web-Service.md)
* [Training on the Cloud with Microsoft Azure](Training-on-Microsoft-Azure.md)
* [Training Using Concurrent Unity Instances](Training-Using-Concurrent-Unity-Instances.md)
Expand Down
Loading