Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
49 commits
Select commit Hold shift + click to select a range
e8b9156
Included explicit version # for ZN
shihzy Jul 30, 2019
0602ba4
added explicit version for KR docs
shihzy Jul 30, 2019
8a8a908
minor fix in installation doc
shihzy Jul 30, 2019
849cf8d
Consistency with numbers for reset parameters
shihzy Jul 30, 2019
e344aa0
Removed extra verbiage. minor consistency
shihzy Jul 30, 2019
5c48bd6
minor consistency
shihzy Jul 30, 2019
4925e08
Cleaned up IL language
shihzy Jul 30, 2019
b902ed8
moved parameter sampling above in list
shihzy Jul 30, 2019
3e93bd8
Cleaned up language in Env Parameter sampling
shihzy Jul 30, 2019
5a8bc9b
Cleaned up migrating content
shihzy Jul 30, 2019
6451d44
updated consistency of Reset Parameter Sampling
shihzy Jul 30, 2019
6a8dc23
Rename Training-Generalization-Learning.md to Training-Generalization…
shihzy Jul 30, 2019
9b54d7c
Updated doc link for generalization
shihzy Jul 30, 2019
0363349
Rename Training-Generalization-Reinforcement-Learning-Agents.md to Tr…
shihzy Jul 30, 2019
b70148e
Re-wrote the intro paragraph for generalization
shihzy Jul 30, 2019
97e3991
add titles, cleaned up language for reset params
shihzy Jul 30, 2019
b31dd28
Update Training-Generalized-Reinforcement-Learning-Agents.md
shihzy Jul 30, 2019
da2dace
cleanup of generalization doc
shihzy Jul 30, 2019
8b44655
More cleanup in generalization
shihzy Jul 30, 2019
d6e870a
Fixed title
shihzy Jul 30, 2019
d701280
Clean up included sampler type section
shihzy Jul 30, 2019
c641a5b
cleaned up defining new sampler type in generalization
shihzy Jul 30, 2019
c79392b
cleaned up training section of generalization
shihzy Jul 30, 2019
26321ed
final cleanup for generalization
shihzy Jul 30, 2019
9bc0860
Clean up of Training w Imitation Learning doc
shihzy Jul 30, 2019
1290bb7
updated link for generalization, reordered
shihzy Jul 30, 2019
4b5d05e
consistency fix
shihzy Jul 30, 2019
05d85a9
cleaned up training ml agents doc
shihzy Jul 30, 2019
a573ef2
Update and rename Profiling.md to Profiling-Python.md
shihzy Jul 30, 2019
106ebdd
Updated Python profiling link
shihzy Jul 30, 2019
0e59547
minor clean up in profiling doc
shihzy Jul 30, 2019
c58873c
Rename Training-BehavioralCloning.md to Training-Behavioral-Cloning.md
shihzy Jul 30, 2019
1eeb52e
Updated link to BC
shihzy Jul 30, 2019
4686070
Rename Training-RewardSignals.md to Reward-Signals.md
shihzy Jul 30, 2019
d842004
fix reward links to new
shihzy Jul 30, 2019
dc9dbb4
cleaned up reward signal language
shihzy Jul 30, 2019
15ed7f1
fixed broken links to reward signals
shihzy Jul 30, 2019
2ca95ae
consistency fix
shihzy Jul 30, 2019
54a55a2
Updated readme with generalization
shihzy Jul 30, 2019
e1f3fae
Added example for GAIL reward signal
shihzy Jul 30, 2019
a3ff2b6
minor fixes and consistency to Reward Signals
shihzy Jul 30, 2019
ac4e629
referencing GAIL in the recording demonstration
shihzy Jul 30, 2019
59b8bc6
consistency
shihzy Jul 30, 2019
824490d
fixed desc of bc and gail
shihzy Jul 30, 2019
9c676c0
comment fix
shihzy Jul 30, 2019
dfaf876
comments fix
shihzy Jul 30, 2019
ae6ca42
Fix broken links
Jul 31, 2019
82871c4
Fix grammar in Overview for IL
Jul 31, 2019
9dd5f18
Add optional params to reward signals
Jul 30, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion docs/Installation.md
Original file line number Diff line number Diff line change
Expand Up @@ -63,7 +63,7 @@ If you installed this correctly, you should be able to run
`mlagents-learn --help`, after which you will see the Unity logo and the command line
parameters you can use with `mlagents-learn`.

By installing the `mlagents` package, its dependencies listed in the [setup.py file](../ml-agents/setup.py) are also installed.
By installing the `mlagents` package, the dependencies listed in the [setup.py file](../ml-agents/setup.py) are also installed.
Some of the primary dependencies include:

- [TensorFlow](Background-TensorFlow.md) (Requires a CPU w/ AVX support)
Expand Down
36 changes: 18 additions & 18 deletions docs/Learning-Environment-Examples.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ If you would like to contribute environments, please see our
* Vector Observation space: One variable corresponding to current state.
* Vector Action space: (Discrete) Two possible actions (Move left, move
right).
* Visual Observations: None.
* Visual Observations: None
* Reset Parameters: None
* Benchmark Mean Reward: 0.94

Expand All @@ -56,7 +56,7 @@ If you would like to contribute environments, please see our
* Vector Action space: (Continuous) Size of 2, with one value corresponding to
X-rotation, and the other to Z-rotation.
* Visual Observations: None.
* Reset Parameters: Three, corresponding to the following:
* Reset Parameters: Three
* scale: Specifies the scale of the ball in the 3 dimensions (equal across the three dimensions)
* Default: 1
* Recommended Minimum: 0.2
Expand Down Expand Up @@ -116,8 +116,8 @@ If you would like to contribute environments, please see our
of ball and racket.
* Vector Action space: (Continuous) Size of 2, corresponding to movement
toward net or away from net, and jumping.
* Visual Observations: None.
* Reset Parameters: Three, corresponding to the following:
* Visual Observations: None
* Reset Parameters: Three
* angle: Angle of the racket from the vertical (Y) axis.
* Default: 55
* Recommended Minimum: 35
Expand Down Expand Up @@ -153,7 +153,7 @@ If you would like to contribute environments, please see our
`VisualPushBlock` scene. __The visual observation version of
this environment does not train with the provided default
training parameters.__
* Reset Parameters: Four, corresponding to the following:
* Reset Parameters: Four
* block_scale: Scale of the block along the x and z dimensions
* Default: 2
* Recommended Minimum: 0.5
Expand Down Expand Up @@ -194,8 +194,8 @@ If you would like to contribute environments, please see our
* Rotation (3 possible actions: Rotate Left, Rotate Right, No Action)
* Side Motion (3 possible actions: Left, Right, No Action)
* Jump (2 possible actions: Jump, No Action)
* Visual Observations: None.
* Reset Parameters: 4, corresponding to the height of the possible walls.
* Visual Observations: None
* Reset Parameters: Four
* Benchmark Mean Reward (Big & Small Wall Brain): 0.8

## [Reacher](https://youtu.be/2N9EoF6pQyE)
Expand All @@ -213,7 +213,7 @@ If you would like to contribute environments, please see our
* Vector Action space: (Continuous) Size of 4, corresponding to torque
applicable to two joints.
* Visual Observations: None.
* Reset Parameters: Five, corresponding to the following
* Reset Parameters: Five
* goal_size: radius of the goal zone
* Default: 5
* Recommended Minimum: 1
Expand Down Expand Up @@ -254,7 +254,7 @@ If you would like to contribute environments, please see our
angular acceleration of the body.
* Vector Action space: (Continuous) Size of 20, corresponding to target
rotations for joints.
* Visual Observations: None.
* Visual Observations: None
* Reset Parameters: None
* Benchmark Mean Reward for `CrawlerStaticTarget`: 2000
* Benchmark Mean Reward for `CrawlerDynamicTarget`: 400
Expand Down Expand Up @@ -284,7 +284,7 @@ If you would like to contribute environments, please see our
`VisualBanana` scene. __The visual observation version of
this environment does not train with the provided default
training parameters.__
* Reset Parameters: Two, corresponding to the following
* Reset Parameters: Two
* laser_length: Length of the laser used by the agent
* Default: 1
* Recommended Minimum: 0.2
Expand Down Expand Up @@ -318,7 +318,7 @@ If you would like to contribute environments, please see our
`VisualHallway` scene. __The visual observation version of
this environment does not train with the provided default
training parameters.__
* Reset Parameters: None.
* Reset Parameters: None
* Benchmark Mean Reward: 0.7
* To speed up training, you can enable curiosity by adding `use_curiosity: true` in `config/trainer_config.yaml`
* Optional Imitation Learning scene: `HallwayIL`.
Expand All @@ -340,8 +340,8 @@ If you would like to contribute environments, please see our
banana.
* Vector Action space: (Continuous) 3 corresponding to agent force applied for
the jump.
* Visual Observations: None.
* Reset Parameters: Two, corresponding to the following
* Visual Observations: None
* Reset Parameters: Two
* banana_scale: The scale of the banana in the 3 dimensions
* Default: 150
* Recommended Minimum: 50
Expand Down Expand Up @@ -375,8 +375,8 @@ If you would like to contribute environments, please see our
* Striker: 6 actions corresponding to forward, backward, sideways movement,
as well as rotation.
* Goalie: 4 actions corresponding to forward, backward, sideways movement.
* Visual Observations: None.
* Reset Parameters: Two, corresponding to the following:
* Visual Observations: None
* Reset Parameters: Two
* ball_scale: Specifies the scale of the ball in the 3 dimensions (equal across the three dimensions)
* Default: 7.5
* Recommended minimum: 4
Expand Down Expand Up @@ -409,8 +409,8 @@ If you would like to contribute environments, please see our
velocity, and angular velocities of each limb, along with goal direction.
* Vector Action space: (Continuous) Size of 39, corresponding to target
rotations applicable to the joints.
* Visual Observations: None.
* Reset Parameters: Four, corresponding to the following
* Visual Observations: None
* Reset Parameters: Four
* gravity: Magnitude of gravity
* Default: 9.81
* Recommended Minimum:
Expand Down Expand Up @@ -450,6 +450,6 @@ If you would like to contribute environments, please see our
`VisualPyramids` scene. __The visual observation version of
this environment does not train with the provided default
training parameters.__
* Reset Parameters: None.
* Reset Parameters: None
* Optional Imitation Learning scene: `PyramidsIL`.
* Benchmark Mean Reward: 1.75
26 changes: 13 additions & 13 deletions docs/ML-Agents-Overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -319,11 +319,11 @@ imitation learning algorithm will then use these pairs of observations and
actions from the human player to learn a policy. [Video
Link](https://youtu.be/kpb8ZkMBFYs).

ML-Agents provides ways to both learn directly from demonstrations as well as
use demonstrations to help speed up reward-based training, and two algorithms to do
so (Generative Adversarial Imitation Learning and Behavioral Cloning). The
[Training with Imitation Learning](Training-Imitation-Learning.md) tutorial
covers these features in more depth.
The toolkit provides a way to learn directly from demonstrations, as well as use them
to help speed up reward-based training (RL). We include two algorithms called
Behavioral Cloning (BC) and Generative Adversarial Imitation Learning (GAIL). The
[Training with Imitation Learning](Training-Imitation-Learning.md) tutorial covers these
features in more depth.

## Flexible Training Scenarios

Expand Down Expand Up @@ -408,6 +408,14 @@ training process.
learn more about adding visual observations to an agent
[here](Learning-Environment-Design-Agents.md#multiple-visual-observations).

- **Training with Reset Parameter Sampling** - To train agents to be adapt
to changes in its environment (i.e., generalization), the agent should be exposed
to several variations of the environment. Similar to Curriculum Learning,
where environments become more difficult as the agent learns, the toolkit provides
a way to randomly sample Reset Parameters of the environment during training. See
[Training Generalized Reinforcement Learning Agents](Training-Generalized-Reinforcement-Learning-Agents.md)
to learn more about this feature.

- **Broadcasting** - As discussed earlier, a Learning Brain sends the
observations for all its Agents to the Python API when dragged into the
Academy's `Broadcast Hub` with the `Control` checkbox checked. This is helpful
Expand All @@ -422,14 +430,6 @@ training process.
the broadcasting feature
[here](Learning-Environment-Design-Brains.md#using-the-broadcast-feature).

- **Training with Environment Parameter Sampling** - To train agents to be robust
to changes in its environment (i.e., generalization), the agent should be exposed
to a variety of environment variations. Similarly to Curriculum Learning, which
allows environments to get more difficult as the agent learns, we also provide
a way to randomly resample aspects of the environment during training. See
[Training with Environment Parameter Sampling](Training-Generalization-Learning.md)
to learn more about this feature.

- **Docker Set-up (Experimental)** - To facilitate setting up ML-Agents without
installing Python or TensorFlow directly, we provide a
[guide](Using-Docker.md) on how to create and run a Docker container.
Expand Down
10 changes: 5 additions & 5 deletions docs/Migrating.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,18 +5,18 @@
### Important Changes
* We have changed the way reward signals (including Curiosity) are defined in the
`trainer_config.yaml`.
* When using multiple environments, every "step" as recorded in TensorBoard and
printed in the command line now corresponds to a single step of a single environment.
* When using multiple environments, every "step" is recorded in TensorBoard.
* The steps in the command line console corresponds to a single step of a single environment.
Previously, each step corresponded to one step for all environments (i.e., `num_envs` steps).

#### Steps to Migrate
* If you were overriding any of these following parameters in your config file, remove them
from the top-level config and follow the steps below:
* `gamma` - Define a new `extrinsic` reward signal and set it's `gamma` to your new gamma.
* `use_curiosity`, `curiosity_strength`, `curiosity_enc_size` - Define a `curiosity` reward signal
* `gamma`: Define a new `extrinsic` reward signal and set it's `gamma` to your new gamma.
* `use_curiosity`, `curiosity_strength`, `curiosity_enc_size`: Define a `curiosity` reward signal
and set its `strength` to `curiosity_strength`, and `encoding_size` to `curiosity_enc_size`. Give it
the same `gamma` as your `extrinsic` signal to mimic previous behavior.
See [Reward Signals](Training-RewardSignals.md) for more information on defining reward signals.
See [Reward Signals](Reward-Signals.md) for more information on defining reward signals.
* TensorBoards generated when running multiple environments in v0.8 are not comparable to those generated in
v0.9 in terms of step count. Multiply your v0.8 step count by `num_envs` for an approximate comparison.
You may need to change `max_steps` in your config as appropriate as well.
Expand Down
7 changes: 3 additions & 4 deletions docs/Profiling.md → docs/Profiling-Python.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# Profiling ML-Agents in Python
# Profiling in Python

ML-Agents provides a lightweight profiling system, in order to identity hotspots in the training process and help spot
regressions from changes.
As part of the ML-Agents tookit, we provide a lightweight profiling system,
in order to identity hotspots in the training process and help spot regressions from changes.

Timers are hierarchical, meaning that the time tracked in a block of code can be further split into other blocks if
desired. This also means that a function that is called from multiple places in the code will appear in multiple
Expand All @@ -24,7 +24,6 @@ class TrainerController:

You can also used the `hierarchical_timer` context manager.


``` python
with hierarchical_timer("communicator.exchange"):
outputs = self.communicator.exchange(step_input)
Expand Down
1 change: 1 addition & 0 deletions docs/Readme.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,7 @@
* [Training with Curriculum Learning](Training-Curriculum-Learning.md)
* [Training with Imitation Learning](Training-Imitation-Learning.md)
* [Training with LSTM](Feature-Memory.md)
* [Training Generalized Reinforcement Learning Agents](Training-Generalized-Reinforcement-Learning-Agents.md)
* [Training on the Cloud with Amazon Web Services](Training-on-Amazon-Web-Service.md)
* [Training on the Cloud with Microsoft Azure](Training-on-Microsoft-Azure.md)
* [Training Using Concurrent Unity Instances](Training-Using-Concurrent-Unity-Instances.md)
Expand Down
Loading