Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,8 @@ developer communities.
* Visualizing network outputs within the environment
* Simplified set-up with Docker
* Wrap learning environments as a gym
* Utilizes the Unity Inference Engine
* Train using concurrent Unity environment instances

## Documentation

Expand Down
67 changes: 34 additions & 33 deletions docs/Custom-Protos.md → docs/Creating-Custom-Protobuf-Messages.md
Original file line number Diff line number Diff line change
@@ -1,28 +1,30 @@
# Creating custom protobuf messages
# Creating Custom Protobuf Messages

Unity and Python communicate by sending protobuf messages to and from each other. You can create custom protobuf messages if you want to exchange structured data beyond what is included by default.

Assume the ml-agents repository is checked out to a folder named $MLAGENTS_ROOT. Whenever you change the fields of a custom message, you must run `$MLAGENTS_ROOT/protobuf-definitions/make.bat` to create C# and Python files corresponding to the new message. Follow the directions in [this file](../protobuf-definitions/README.md) for guidance. After running it, reinstall the Python package by running `pip install $MLAGENTS_ROOT/ml-agents` and make sure your Unity project is using the newly-generated version of `$MLAGENTS_ROOT/UnitySDK`.
## Implementing a Custom Message

## Custom message types
Assume the ml-agents repository is checked out to a folder named $MLAGENTS_ROOT. Whenever you change the fields of a custom message, you must run `$MLAGENTS_ROOT/protobuf-definitions/make.bat` to create C# and Python files corresponding to the new message. Follow the directions in [this file](../protobuf-definitions/README.md) for guidance. After running `$MLAGENTS_ROOT/protobuf-definitions/make.bat`, reinstall the Python package by running `pip install $MLAGENTS_ROOT/ml-agents` and make sure your Unity project is using the newly-generated version of `$MLAGENTS_ROOT/UnitySDK`.

There are three custom message types currently supported, described below. In each case, `env` is an instance of a `UnityEnvironment` in Python. `CustomAction` is described most thoroughly; usage of the other custom messages follows a similar template.
## Custom Message Types

### Custom actions
There are three custom message types currently supported - Custom Actions, Custom Reset Parameters, and Custom Observations. In each case, `env` is an instance of a `UnityEnvironment` in Python.

By default, the Python API sends actions to Unity in the form of a floating-point list per agent and an optional string-valued text action.
### Custom Actions

You can define a custom action type to replace or augment this by adding fields to the `CustomAction` message, which you can do by editing the file `protobuf-definitions/proto/mlagents/envs/communicator_objects/custom_action.proto`.
By default, the Python API sends actions to Unity in the form of a floating point list and an optional string-valued text action for each agent.

Instances of custom actions are set via the `custom_action` parameter of `env.step`. An agent receives a custom action by defining a method with the signature
You can define a custom action type, to either replace or augment the default, by adding fields to the `CustomAction` message, which you can do by editing the file `protobuf-definitions/proto/mlagents/envs/communicator_objects/custom_action.proto`.

Instances of custom actions are set via the `custom_action` parameter of the `env.step`. An agent receives a custom action by defining a method with the signature:

```csharp
public virtual void AgentAction(float[] vectorAction, string textAction, CommunicatorObjects.CustomAction customAction)
```

Here is an example of creating a custom action that instructs an agent to choose a cardinal direction to walk in and how far to walk.
Below is an example of creating a custom action that instructs an agent to choose a cardinal direction to walk in and how far to walk.

`custom_action.proto` will look like
The `custom_action.proto` file looks like:

```protobuf
syntax = "proto3";
Expand All @@ -42,7 +44,7 @@ message CustomAction {
}
```

In your Python file, create an instance of a custom action:
The Python instance of the custom action looks like:

```python
from mlagents.envs.communicator_objects import CustomAction
Expand All @@ -52,7 +54,7 @@ action = CustomAction(direction=CustomAction.NORTH, walkAmount=2.0)
env.step(custom_action=action)
```

Then in your agent,
And the agent code looks like:

```csharp
...
Expand All @@ -72,17 +74,17 @@ class MyAgent : Agent {
}
```

Note that the protobuffer compiler automatically configures the capitalization scheme of the C# version of the custom field names you defined in the `CustomAction` message to match C# conventions - "NORTH" becomes "North", "walkAmount" becomes "WalkAmount", etc.
Keep in mind that the protobuffer compiler automatically configures the capitalization scheme of the C# version of the custom field names you defined in the `CustomAction` message to match C# conventions - "NORTH" becomes "North", "walkAmount" becomes "WalkAmount", etc.

### Custom reset parameters
### Custom Reset Parameters

By default, you can configure an environment `env ` in the Python API by specifying a `config` parameter that is a dictionary mapping strings to floats.
By default, you can configure an environment `env` in the Python API by specifying a `config` parameter that is a dictionary mapping strings to floats.

You can also configure an environment using a custom protobuf message. To do so, add fields to the `CustomResetParameters` protobuf message in `custom_reset_parameters.proto`, analogously to `CustomAction` above. Then pass an instance of the message to `env.reset` via the `custom_reset_parameters` keyword parameter.
You can also configure the environment reset using a custom protobuf message. To do this, add fields to the `CustomResetParameters` protobuf message in `custom_reset_parameters.proto`, analogously to `CustomAction` above. Then pass an instance of the message to `env.reset` via the `custom_reset_parameters` keyword parameter.

In Unity, you can then access the `customResetParameters` field of your academy to accesss the values set in your Python script.

In this example, an academy is setting the initial position of a box based on custom reset parameters that looks like
In this example, the academy is setting the initial position of a box based on custom reset parameters. The `custom_reset_parameters.proto` would look like:

```protobuf
message CustomResetParameters {
Expand All @@ -101,7 +103,18 @@ message CustomResetParameters {
}
```

In your academy, you'd have something like
The Python instance of the custom reset parameter looks like

```python
from mlagents.envs.communicator_objects import CustomResetParameters
env = ...
pos = CustomResetParameters.Position(x=1, y=1, z=2)
color = CustomResetParameters.Color(r=.5, g=.1, b=1.0)
params = CustomResetParameters(initialPos=pos, color=color)
env.reset(custom_reset_parameters=params)
```

The academy looks like

```csharp
public class MyAcademy : Academy
Expand All @@ -122,18 +135,7 @@ public class MyAcademy : Academy
}
```

Then in Python, when setting up your scene, you might write

```python
from mlagents.envs.communicator_objects import CustomResetParameters
env = ...
pos = CustomResetParameters.Position(x=1, y=1, z=2)
color = CustomResetParameters.Color(r=.5, g=.1, b=1.0)
params = CustomResetParameters(initialPos=pos, color=color)
env.reset(custom_reset_parameters=params)
```

### Custom observations
### Custom Observations

By default, Unity returns observations to Python in the form of a floating-point vector.

Expand All @@ -143,8 +145,7 @@ Then in your agent, create an instance of a custom observation via `new Communic

In Python, the custom observation can be accessed by calling `env.step` or `env.reset` and accessing the `custom_observations` property of the return value. It will contain a list with one `CustomObservation` instance per agent.

For example, if you have added a field called `customField` to the `CustomObservation` message, you would program your agent like

For example, if you have added a field called `customField` to the `CustomObservation` message, the agent code looks like:

```csharp
class MyAgent : Agent {
Expand All @@ -156,7 +157,7 @@ class MyAgent : Agent {
}
```

Then in Python, the custom field would be accessed like
In Python, the custom field would be accessed like:

```python
...
Expand Down
4 changes: 1 addition & 3 deletions docs/Installation.md
Original file line number Diff line number Diff line change
Expand Up @@ -82,8 +82,7 @@ parameters you can use with `mlagents-learn`.

If you intend to make modifications to `ml-agents` or `ml-agents-envs`, you should install
the packages from the cloned repo rather than from PyPi. To do this, you will need to install
`ml-agents` and `ml-agents-envs` separately. Do this by running (starting from the repo's main
directory):
`ml-agents` and `ml-agents-envs` separately. From the repo's root directory, run:

```sh
cd ml-agents-envs
Expand All @@ -98,7 +97,6 @@ reflected when you run `mlagents-learn`. It is important to install these packag
`mlagents` package depends on `mlagents_envs`, and installing it in the other
order will download `mlagents_envs` from PyPi.


## Docker-based Installation

If you'd like to use Docker for ML-Agents, please follow
Expand Down
17 changes: 17 additions & 0 deletions docs/Migrating.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,22 @@
# Migrating

## Migrating from ML-Agents toolkit v0.7 to v0.8

### Important Changes
* We have split the Python packges into two seperate packages `ml-agents` and `ml-agents-envs`

#### Steps to Migrate
* If you are installing via PyPI, there is no change.
* If you intend to make modifications to `ml-agents` or `ml-agents-envs` please check the Installing for Development in the [Installation documentation](Installation.md).

## Migrating from ML-Agents toolkit v0.6 to v0.7

### Important Changes
* We no longer support TFS and are now using the [Unity Inference Engine](Unity-Inference-Engine.md)

#### Steps to Migrate
* Make sure to remove the `ENABLE_TENSORFLOW` flag in your Unity Project settings

## Migrating from ML-Agents toolkit v0.5 to v0.6

### Important Changes
Expand Down
2 changes: 2 additions & 0 deletions docs/Readme.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@
* [Learning Environment Best Practices](Learning-Environment-Best-Practices.md)
* [Using the Monitor](Feature-Monitor.md)
* [Using an Executable Environment](Learning-Environment-Executable.md)
* [Creating Custom Protobuf Messages](Creating-Custom-Protobuf-Messages.md)

## Training

Expand All @@ -39,6 +40,7 @@
* [Training with LSTM](Feature-Memory.md)
* [Training on the Cloud with Amazon Web Services](Training-on-Amazon-Web-Service.md)
* [Training on the Cloud with Microsoft Azure](Training-on-Microsoft-Azure.md)
* [Training Using Concurrent Unity Instances](Training-Using-Concurrent-Unity-Instances.md)
* [Using TensorBoard to Observe Training](Using-Tensorboard.md)

## Inference
Expand Down
7 changes: 2 additions & 5 deletions docs/Training-ML-Agents.md
Original file line number Diff line number Diff line change
Expand Up @@ -134,12 +134,9 @@ environment, you can set the following command line options when invoking
[Academy Properties](Learning-Environment-Design-Academy.md#academy-properties).
* `--train` – Specifies whether to train model or only run in inference mode.
When training, **always** use the `--train` option.
* `--num-envs` - Specifies the number of parallel environments to collect
* `--num-envs=<n>` - Specifies the number of concurrent Unity environment instances to collect
experiences from when training. Defaults to 1.
* `--base-port` - Specifies the starting port for environment workers. Each Unity
environment will use the port `(base_port + worker_id)`, where the worker ID
are sequential IDs given to each environment from 0 to `num_envs - 1`.
Defaults to 5005.
* `--base-port` - Specifies the starting port. Each concurrent Unity environment instance will get assigned a port sequentially, starting from the `base-port`. Each instance will use the port `(base_port + worker_id)`, where the `worker_id` is sequential IDs given to each instance from 0 to `num_envs - 1`. Default is 5005.
* `--docker-target-name=<dt>` – The Docker Volume on which to store curriculum,
executable and model files. See [Using Docker](Using-Docker.md).
* `--no-graphics` - Specify this option to run the Unity executable in
Expand Down
25 changes: 25 additions & 0 deletions docs/Training-Using-Concurrent-Unity-Instances.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
# Training Using Concurrent Unity Instances

As part of release v0.8, we enabled developers to run concurrent, parallel instances of the Unity executable during training. For certain scenarios, this should speed up the training.

## How to Run Concurrent Unity Instances During Training

Please refer to the general instructions on [Training ML-Agents](Training-ML-Agents.md). In order to run concurrent Unity instances during training, set the number of environment instances using the command line option `--num-envs=<n>` when you invoke `mlagents-learn`. Optionally, you can also set the `--base-port`, which is the starting port used for the concurrent Unity instances.

## Considerations

### Buffer Size

If you are having trouble getting an agent to train, even with multiple concurrent Unity instances, you could increase `buffer_size` in the `config/trainer_config.yaml` file. A common practice is to multiply `buffer_size` by `num-envs`.

### Resource Constraints

Invoking concurrent Unity instances is constrained by the resources on the machine. Please use discretion when setting `--num-envs=<n>`.

### Using num-runs and num-envs

If you set `--num-runs=<n>` greater than 1 and are also invoking concurrent Unity instances using `--num-envs=<n>`, then the number of concurrent Unity instances is equal to `num-runs` times `num-envs`.

### Result Variation Using Concurrent Unity Instances

If you keep all the hyperparameters the same, but change `--num-envs=<n>`, the results and model would likely change.