Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
a4ae6c7
initial commit for LL-API
vincentpierre Dec 4, 2019
7acd227
fixing ml-agents-envs tests
vincentpierre Dec 4, 2019
c49be07
Implementing action masks
vincentpierre Dec 4, 2019
bcce783
training is fixed for 3DBall
vincentpierre Dec 4, 2019
942c0d2
Tests all fixed, gym is broken and missing documentation changes
vincentpierre Dec 4, 2019
7ccf5fc
adding case where no vector obs
vincentpierre Dec 4, 2019
89b1d34
Fixed Gym
vincentpierre Dec 5, 2019
48d675d
fixing tests of float64
vincentpierre Dec 5, 2019
9d2d70c
fixing float64
vincentpierre Dec 5, 2019
099f12b
reverting some of brain.py
vincentpierre Dec 5, 2019
02cae4d
removing old proto apis
vincentpierre Dec 5, 2019
782584b
comment type fixes
vincentpierre Dec 5, 2019
d0b6d7d
added properties to AgentGroupSpec and edited the notebooks.
vincentpierre Dec 5, 2019
6d09c91
clearing the notebook outputs
vincentpierre Dec 5, 2019
82e5e43
Update gym-unity/gym_unity/tests/test_gym.py
vincentpierre Dec 6, 2019
16988d8
Update gym-unity/gym_unity/tests/test_gym.py
vincentpierre Dec 6, 2019
0fbd170
Update ml-agents-envs/mlagents/envs/base_env.py
vincentpierre Dec 6, 2019
8106f74
Update ml-agents-envs/mlagents/envs/base_env.py
vincentpierre Dec 6, 2019
cc4456a
addressing first comments
vincentpierre Dec 6, 2019
5a45d7b
NaN checks for rewards are back
vincentpierre Dec 6, 2019
5b8a354
restoring Union[int, Tuple[int, ...]] for action_shape
vincentpierre Dec 6, 2019
3d0cad7
Made BatchdStepResult an object
vincentpierre Dec 6, 2019
1148f35
Made _agent_id_to_index private
vincentpierre Dec 6, 2019
d25a885
Update ml-agents-envs/mlagents/envs/base_env.py
vincentpierre Dec 6, 2019
329fa34
replacing np.array with np.ndarray in typing
vincentpierre Dec 6, 2019
98e10fe
adding a new type for AgentGroup and AgentId
vincentpierre Dec 6, 2019
6da142d
fixing brain_info when vec_obs == 0
vincentpierre Dec 6, 2019
617a768
Docs ll api (#3047)
vincentpierre Dec 9, 2019
d8b52c2
adding a period
vincentpierre Dec 9, 2019
81acfaf
removing change log
vincentpierre Dec 9, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/Migrating.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
## Migrating from master to develop

### Important changes
* The low level Python API has changed. You can look at the document [Low Level Python API documentation](Python-API.md) for more information. This should only affect you if you're writing a custom trainer; if you use `mlagents-learn` for training, this should be a transparent change.
* `CustomResetParameters` are now removed.
* `reset()` on the Low-Level Python API no longer takes a `train_mode` argument. To modify the performance/speed of the engine, you must use an `EngineConfigurationChannel`
* `reset()` on the Low-Level Python API no longer takes a `config` argument. `UnityEnvironment` no longer has a `reset_parameters` field. To modify float properties in the environment, you must use a `FloatPropertiesChannel`. For more information, refer to the [Low Level Python API documentation](Python-API.md)
Expand Down
220 changes: 144 additions & 76 deletions docs/Python-API.md
Original file line number Diff line number Diff line change
@@ -1,19 +1,17 @@
# Unity ML-Agents Python Interface and Trainers

The `mlagents` Python package is part of the [ML-Agents
Toolkit](https://github.com/Unity-Technologies/ml-agents). `mlagents` provides a
Python API that allows direct interaction with the Unity game engine as well as
a collection of trainers and algorithms to train agents in Unity environments.
# Unity ML-Agents Python Low Level API

The `mlagents` Python package contains two components: a low level API which
allows you to interact directly with a Unity Environment (`mlagents.envs`) and
an entry point to train (`mlagents-learn`) which allows you to train agents in
Unity Environments using our implementations of reinforcement learning or
imitation learning.

You can use the Python Low Level API to interact directly with your learning
environment, and use it to develop new learning algorithms.

## mlagents.envs

The ML-Agents Toolkit provides a Python API for controlling the Agent simulation
The ML-Agents Toolkit Low Level API is a Python API for controlling the simulation
loop of an environment or game built with Unity. This API is used by the
training algorithms inside the ML-Agent Toolkit, but you can also write your own
Python programs using this API. Go [here](../notebooks/getting-started.ipynb)
Expand All @@ -24,25 +22,31 @@ The key objects in the Python API include:
- **UnityEnvironment** — the main interface between the Unity application and
your code. Use UnityEnvironment to start and control a simulation or training
session.
- **BrainInfo** — contains all the data from Agents in the simulation, such as
observations and rewards.
- **BrainParameters** — describes the data elements in a BrainInfo object. For
example, provides the array length of an observation in BrainInfo.
- **BatchedStepResult** — contains the data from Agents belonging to the same
"AgentGroup" in the simulation, such as observations and rewards.
- **AgentGroupSpec** — describes the shape of the data inside a BatchedStepResult.
For example, provides the dimensions of the observations of a group.

These classes are all defined in the `ml-agents/mlagents/envs` folder of
the ML-Agents SDK.
These classes are all defined in the [base_env](../ml-agents-envs/mlagents/envs/base_env.py)
script.

An Agent Group is a group of Agents identified by a string name that share the same
observations and action types. You can think about Agent Group as a group of agents
that will share the same policy or behavior. All Agents in a group have the same goal
and reward signals.

To communicate with an Agent in a Unity environment from a Python program, the
Agent must use a LearningBrain.
Your code is expected to return
actions for Agents with LearningBrains.
Agent in the simulation must have `Behavior Parameters` set to communicate. You
must set the `Behavior Type` to `Default` and give it a `Behavior Name`.

__Note__: The `Behavior Name` corresponds to the Agent Group name on Python.

_Notice: Currently communication between Unity and Python takes place over an
open socket without authentication. As such, please make sure that the network
where training takes place is secure. This will be addressed in a future
release._

### Loading a Unity Environment
## Loading a Unity Environment

Python-side communication happens through `UnityEnvironment` which is located in
`ml-agents/mlagents/envs`. To load a Unity environment from a built binary
Expand All @@ -51,7 +55,7 @@ of your Unity environment is 3DBall.app, in python, run:

```python
from mlagents.envs.environment import UnityEnvironment
env = UnityEnvironment(file_name="3DBall", worker_id=0, seed=1)
env = UnityEnvironment(file_name="3DBall", base_port=5005, seed=1, side_channels=[])
```

- `file_name` is the name of the environment binary (located in the root
Expand All @@ -62,6 +66,9 @@ env = UnityEnvironment(file_name="3DBall", worker_id=0, seed=1)
training process. In environments which do not involve physics calculations,
setting the seed enables reproducible experimentation by ensuring that the
environment and trainers utilize the same random seed.
- `side_channels` provides a way to exchange data with the Unity simulation that
is not related to the reinforcement learning loop. For example: configurations
or properties. More on them in the [Modifying the environment from Python](Python-API.md#modifying-the-environment-from-python) section.

If you want to directly interact with the Editor, you need to use
`file_name=None`, then press the :arrow_forward: button in the Editor when the
Expand All @@ -70,59 +77,125 @@ displayed on the screen

### Interacting with a Unity Environment

A BrainInfo object contains the following fields:

- **`visual_observations`** : A list of 4 dimensional numpy arrays. Matrix n of
the list corresponds to the n<sup>th</sup> observation of the Brain.
- **`vector_observations`** : A two dimensional numpy array of dimension `(batch
size, vector observation size)`.
- **`rewards`** : A list as long as the number of Agents using the Brain
containing the rewards they each obtained at the previous step.
- **`local_done`** : A list as long as the number of Agents using the Brain
containing `done` flags (whether or not the Agent is done).
- **`max_reached`** : A list as long as the number of Agents using the Brain
containing true if the Agents reached their max steps.
- **`agents`** : A list of the unique ids of the Agents using the Brain.

Once loaded, you can use your UnityEnvironment object, which referenced by a
variable named `env` in this example, can be used in the following way:

- **Print : `print(str(env))`**
Prints all parameters relevant to the loaded environment and the
Brains.
- **Reset : `env.reset()`**
Send a reset signal to the environment, and provides a dictionary mapping
Brain names to BrainInfo objects.
- **Step : `env.step(action)`**
Sends a step signal to the environment using the actions. For each Brain :
- `action` can be one dimensional arrays or two dimensional arrays if you have
multiple Agents per Brain.

Returns a dictionary mapping Brain names to BrainInfo objects.

For example, to access the BrainInfo belonging to a Brain called
'brain_name', and the BrainInfo field 'vector_observations':

```python
info = env.step()
brainInfo = info['brain_name']
observations = brainInfo.vector_observations
```

Note that if you have more than one LearningBrain in the scene, you
must provide dictionaries from Brain names to arrays for `action`, `memory`
and `value`. For example: If you have two Learning Brains named `brain1` and
`brain2` each with one Agent taking two continuous actions, then you can
have:

```python
action = {'brain1':[1.0, 2.0], 'brain2':[3.0,4.0]}
```

Returns a dictionary mapping Brain names to BrainInfo objects.
- **Close : `env.close()`**
Sends a shutdown signal to the environment and closes the communication
socket.
#### The BaseEnv interface

A `BaseEnv` has the following methods:

- **Reset : `env.reset()`** Sends a signal to reset the environment. Returns None.
- **Step : `env.step()`** Sends a signal to step the environment. Returns None.
Note that a "step" for Python does not correspond to either Unity `Update` nor
`FixedUpdate`. When `step()` or `reset()` is called, the Unity simulation will
move forward until an Agent in the simulation needs a input from Python to act.
- **Close : `env.close()`** Sends a shutdown signal to the environment and terminates
the communication.
- **Get Agent Group Names : `env.get_agent_groups()`** Returns a list of agent group ids.
Note that the number of groups can change over time in the simulation if new
agent groups are created in the simulation.
- **Get Agent Group Spec : `env.get_agent_group_spec(agent_group: str)`** Returns
the `AgentGroupSpec` corresponding to the agent_group given as input. An
`AgentGroupSpec` contains information such as the observation shapes, the action
type (multi-discrete or continuous) and the action shape. Note that the `AgentGroupSpec`
for a specific group is fixed throughout the simulation.
- **Get Batched Step Result for Agent Group : `env.get_step_result(agent_group: str)`**
Returns a `BatchedStepResult` corresponding to the agent_group given as input.
A `BatchedStepResult` contains information about the state of the agents in a group
such as the observations, the rewards, the done flags and the agent identifiers. The
data is in `np.array` of which the first dimension is always the number of agents which
requested a decision in the simulation since the last call to `env.step()` note that the
number of agents is not guaranteed to remain constant during the simulation.
- **Set Actions for Agent Group :`env.set_actions(agent_group: str, action: np.array)`**
Sets the actions for a whole agent group. `action` is a 2D `np.array` of `dtype=np.int32`
in the discrete action case and `dtype=np.float32` in the continuous action case.
The first dimension of `action` is the number of agents that requested a decision
since the last call to `env.step()`. The second dimension is the number of discrete actions
in multi-discrete action type and the number of actions in continuous action type.
- **Set Action for Agent : `env.set_action_for_agent(agent_group: str, agent_id: int, action: np.array)`**
Sets the action for a specific Agent in an agent group. `agent_group` is the name of the
group the Agent belongs to and `agent_id` is the integer identifier of the Agent. Action
is a 1D array of type `dtype=np.int32` and size equal to the number of discrete actions
in multi-discrete action type and of type `dtype=np.float32` and size equal to the number
of actions in continuous action type.


__Note:__ If no action is provided for an agent group between two calls to `env.step()` then
the default action will be all zeros (in either discrete or continuous action space)
#### BathedStepResult and StepResult

A `BatchedStepResult` has the following fields :

- `obs` is a list of numpy arrays observations collected by the group of
agent. The first dimension of the array corresponds to the batch size of
the group (number of agents requesting a decision since the last call to
`env.step()`).
- `reward` is a float vector of length batch size. Corresponds to the
rewards collected by each agent since the last simulation step.
- `done` is an array of booleans of length batch size. Is true if the
associated Agent was terminated during the last simulation step.
- `max_step` is an array of booleans of length batch size. Is true if the
associated Agent reached its maximum number of steps during the last
simulation step.
- `agent_id` is an int vector of length batch size containing unique
identifier for the corresponding Agent. This is used to track Agents
across simulation steps.
- `action_mask` is an optional list of two dimensional array of booleans.
Only available in multi-discrete action space type.
Each array corresponds to an action branch. The first dimension of each
array is the batch size and the second contains a mask for each action of
the branch. If true, the action is not available for the agent during
this simulation step.

It also has the two following methods:

- `n_agents()` Returns the number of agents requesting a decision since
the last call to `env.step()`
- `get_agent_step_result(agent_id: int)` Returns a `StepResult`
for the Agent with the `agent_id` unique identifier.

A `StepResult` has the following fields:

- `obs` is a list of numpy arrays observations collected by the group of
agent. (Each array has one less dimension than the arrays in `BatchedStepResult`)
- `reward` is a float. Corresponds to the rewards collected by the agent
since the last simulation step.
- `done` is a bool. Is true if the Agent was terminated during the last
simulation step.
- `max_step` is a bool. Is true if the Agent reached its maximum number of
steps during the last simulation step.
- `agent_id` is an int and an unique identifier for the corresponding Agent.
- `action_mask` is an optional list of one dimensional array of booleans.
Only available in multi-discrete action space type.
Each array corresponds to an action branch. Each array contains a mask
for each action of the branch. If true, the action is not available for
the agent during this simulation step.

#### AgentGroupSpec

An Agent group can either have discrete or continuous actions. To check which type
it is, use `spec.is_action_discrete()` or `spec.is_action_continuous()` to see
which one it is. If discrete, the action tensors are expected to be `np.int32`. If
continuous, the actions are expected to be `np.float32`.

An `AgentGroupSpec` has the following fields :

- `observation_shapes` is a List of Tuples of int : Each Tuple corresponds
to an observation's dimensions (without the number of agents dimension).
The shape tuples have the same ordering as the ordering of the
BatchedStepResult and StepResult.
- `action_type` is the type of data of the action. it can be discrete or
continuous. If discrete, the action tensors are expected to be `np.int32`. If
continuous, the actions are expected to be `np.float32`.
- `action_size` is an `int` corresponding to the expected dimension of the action
array.
- In continuous action space it is the number of floats that constitute the action.
- In discrete action space (same as multi-discrete) it corresponds to the
number of branches (the number of independent actions)
- `discrete_action_branches` is a Tuple of int only for discrete action space. Each int
corresponds to the number of different options for each branch of the action.
For example : In a game direction input (no movement, left, right) and jump input
(no jump, jump) there will be two branches (direction and jump), the first one with 3
options and the second with 2 options. (`action_size = 2` and
`discrete_action_branches = (3,2,)`)


### Modifying the environment from Python
The Environment can be modified by using side channels to send data to the
Expand Down Expand Up @@ -194,8 +267,3 @@ var academy = FindObjectOfType<Academy>();
var sharedProperties = academy.FloatProperties;
float property1 = sharedProperties.GetPropertyWithDefault("parameter_1", 0.0f);
```

## mlagents-learn

For more detailed documentation on using `mlagents-learn`, check out
[Training ML-Agents](Training-ML-Agents.md)
40 changes: 28 additions & 12 deletions gym-unity/gym_unity/envs/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,10 @@
import numpy as np
from mlagents.envs.environment import UnityEnvironment
from gym import error, spaces
from mlagents.envs.brain_conversion_utils import (
step_result_to_brain_info,
group_spec_to_brain_parameters,
)


class UnityGymException(error.Error):
Expand Down Expand Up @@ -53,10 +57,9 @@ def __init__(
)

# Take a single step so that the brain information will be sent over
if not self._env.brains:
if not self._env.get_agent_groups():
self._env.step()

self.name = self._env.academy_name
self.visual_obs = None
self._current_state = None
self._n_agents = None
Expand All @@ -67,18 +70,17 @@ def __init__(
self._allow_multiple_visual_obs = allow_multiple_visual_obs

# Check brain configuration
if len(self._env.brains) != 1:
if len(self._env.get_agent_groups()) != 1:
raise UnityGymException(
"There can only be one brain in a UnityEnvironment "
"if it is wrapped in a gym."
)
if len(self._env.external_brain_names) <= 0:
raise UnityGymException(
"There are not any external brain in the UnityEnvironment"
)

self.brain_name = self._env.external_brain_names[0]
brain = self._env.brains[self.brain_name]
self.brain_name = self._env.get_agent_groups()[0]
self.name = self.brain_name
brain = group_spec_to_brain_parameters(
self.brain_name, self._env.get_agent_group_spec(self.brain_name)
)

if use_visual and brain.number_visual_observations == 0:
raise UnityGymException(
Expand All @@ -103,7 +105,11 @@ def __init__(
)

# Check for number of agents in scene.
initial_info = self._env.reset()[self.brain_name]
self._env.reset()
initial_info = step_result_to_brain_info(
self._env.get_step_result(self.brain_name),
self._env.get_agent_group_spec(self.brain_name),
)
self._check_agents(len(initial_info.agents))

# Set observation and action spaces
Expand Down Expand Up @@ -153,7 +159,11 @@ def reset(self):
Returns: observation (object/list): the initial observation of the
space.
"""
info = self._env.reset()[self.brain_name]
self._env.reset()
info = step_result_to_brain_info(
self._env.get_step_result(self.brain_name),
self._env.get_agent_group_spec(self.brain_name),
)
n_agents = len(info.agents)
self._check_agents(n_agents)
self.game_over = False
Expand Down Expand Up @@ -201,7 +211,13 @@ def step(self, action):
# Translate action into list
action = self._flattener.lookup_action(action)

info = self._env.step(action)[self.brain_name]
spec = self._env.get_agent_group_spec(self.brain_name)
action = np.array(action).reshape((self._n_agents, spec.action_size))
self._env.set_actions(self.brain_name, action)
self._env.step()
info = step_result_to_brain_info(
self._env.get_step_result(self.brain_name), spec
)
n_agents = len(info.agents)
self._check_agents(n_agents)
self._current_state = info
Expand Down
Loading