Name		Name	Last commit message	Last commit date
parent directory ..
bc		bc
d4pg		d4pg
ddpg		ddpg
dmpo		dmpo
dqfd		dqfd
dqn		dqn
impala		impala
jax		jax
mcts		mcts
mpo		mpo
r2d2		r2d2
r2d3		r2d3
README.md		README.md
__init__.py		__init__.py
actors_jax.py		actors_jax.py
actors_jax_test.py		actors_jax_test.py
actors_tf2.py		actors_tf2.py
actors_tf2_test.py		actors_tf2_test.py
agent.py		agent.py

README.md

Agents

Acme includes a number of pre-built agents listed below. These are all single-process agents. While there is currently no plan to release the distributed variants of these agents, they share the exact same learning and acting code as their single-process counterparts available in this repository.

We've also listed the agents below in separate sections based on their different use cases, however these distinction are often subtle. For more information on each implementation see the relevant agent-specific README.

Continuous control

Acme has long had a focus on continuous control agents (i.e. settings where the action space consists of a continuous space). The following agents focus on this setting:

Agent	Paper	Code
Deep Deterministic Policy Gradient (DDPG)	Lillicrap et al., 2015
Distributed Distributional Deep Determinist (D4PG)	Barth-Maron et al., 2018	(coming soon!)
Maximum a posteriori Policy Optimisation (MPO)	Abdolmaleki et al., 2018
Distributional Maximum a posteriori Policy Optimisation (DMPO)	-

Discrete control

We also include a number of agents built with discrete action-spaces in mind. Note that the distinction between these agents and the continuous agents listed can be somewhat arbitrary. E.g. Impala could be implemented for continuous action spaces as well, but here we focus on a discrete-action variant.

Agent	Paper	Code
Deep Q-Networks (DQN)	Horgan et al., 2018
Importance-Weighted Actor-Learner Architectures (IMPALA)	Espeholt et al., 2018
Recurrent Replay Distributed DQN (R2D2)	Kapturowski et al., 2019

Batch RL

The structure of Acme also lends itself quite nicely to "learner-only" algorithm for use in Batch RL (with no environment interactions). Implemented algorithms include:

Agent	Paper	Code
Behavior Cloning (BC)	-

Learning from demonstrations

Acme also easily allows active data acquisition to be combined with data from demonstrations. Such algorithms include:

Agent	Paper	Code
Deep Q-Learning from Demonstrations (DQfD)	Hester et al., 2017
Recurrent Replay Distributed DQN from Demonstratinos (R2D3)	Gulcehre et al., 2020

Model-based RL

Finally, Acme also includes a variant of MCTS which can be used for model-based RL using a given or learned simulator

Agent	Paper	Code
Monte-Carlo Tree Search (MCTS)	Silver et al., 2018

Files

agents

Directory actions

More options

Directory actions

More options

Latest commit

History