Skip to content


Repository files navigation



  • Install dependencies: pip install docker grpcio grpcio-tools dm_env_rpc
  • Build the docker image from the app/ directory: docker build -t basic_example .
  • Launch the container from the docker image and run the Python client by running the file:

python dm_env_rpc_example_code/

This Repo Includes

To Do (Pull Requests/Collaboration Accepted)

using_unity_to_help_solve_intelligence Picture is an outline from the paper:

Agent/Learning Interface (Outside the Black Box)

  • End-to-end example training a model and solving an env using an existing RL library like Acme (currently just an env that does 100 timesteps).
  • Be able to use a trained model to perform inference. Outside the env and maybe embedded in the env.
  • Add an OpenAI gym adaptor to get env in gym format

The Black Box

  • More black boxes to test algorithms/RL libraries on
  • Better Docker documentation, better Docker practices, and versioning. My Docker is pretty rough.
  • Streamline the headless Unity mode used? The DeepMind paper uses a different rendering solution but I'm not sure what is optimal.

The communication layer: gRPC and dm_env_rpc improvements

  • OpenAI Gym adaptor similar to dm_env adaptor for RL learners that only work with gym type outputs
  • Better packing and unpacking of C# code (similar to this dm_env_rpc Python file)
  • Figure out if current version of Unity works with gRPC and what steps need to be taken (this issue: grpc/grpc-dotnet#1309)
  • Debug the dm_env_rpc proto import bug
  • Multi-agent support
  • Confirm I'm handling the gRPC process the correct way
  • Dynamic port settings

The session layer

  • Synchronization
  • Handling multiple connections to black box. Synchronization and RequestQueue concerns
  • WorldTimeManager managers world time
  • RequestQueue is more robust. RequestQueue is fully thought through. Should it push or pull
  • Abstract part of the AgentSession class, inherit from base for specific examples
  • Fully implement the requests and response fields. ie on a ResetRequest with the settings field sent over the gRPC there's no action taken by the env.
  • Error handling
  • Exception handling
  • Figure out what the SessionFactory is and does
  • Dynamic configuration of settings
  • Fill out the TensorUtilities.cs file for other requests and responses

The interface layer

  • Abstract and inherit from base class for then do specific implementations for Avatar and Task
  • Figure out best Avatar and Task set up. The paper has Tasks spawning Avatars while I instantiate one of each and don't have them linked explicitly.
  • Abstract and dynamically define obs and action specs, obs, actions, rewards, and episode parameters. Maybe through GameObject editing.
  • Make Sensors more robust so a variety of obs can be returned.
  • Make Avatar scan for Sensors and Actuators.
  • Allow Avatars to handle a variety of actions.
  • Sequence Avatar better so JoinWorld and Reset requests can alter the settings