Skip to content
A set of high-dimensional continuous control environments for use with Unity ML-Agents Toolkit.
Branch: master
Clone or download
Latest commit 3598f6d Jan 7, 2019
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
UnitySDK
config revert aaai paper hyperparamsd Dec 23, 2018
gym-unity
images
ml-agents merge in ml-agents 0.6 code Dec 17, 2018
notebooks merge ml-agents v0.5a code into develop Dec 10, 2018
protobuf-definitions merge in ml-agents 0.6 code Dec 17, 2018
unity-volume
.gitignore
Dockerfile merge in ml-agents 0.6 code Dec 17, 2018
LICENSE resolve conflicts Sep 5, 2018
README.md

README.md

Marathon Environments

A set of high-dimensional continuous control environments for use with Unity ML-Agents Toolkit.

MarathonEnvs

MarathonEnvs enables the reproduction of these benchmarks within Unity ml-agents using Unity’s native physics simulator, PhysX. MarathonEnvs maybe useful for:

  • Video Game researchers interested in apply bleeding edge robotics research into the domain of locomotion and AI for video games.
  • Traditional academic researchers looking to leverage the strengths of Unity and ML-Agents along with the body of existing research and benchmarks provided by projects such as the DeepMind Control Suite, or OpenAI Mujoco environments.

Note: This project is the result of a contribution from Joe Booth (@Sohojo), a member of the Unity community who currently maintains the repository. As such, the contents of this repository are not officially supported by Unity Technologies.


Getting Started

* * * New Turotial: Getting Started With MarathonEnvs * * *

The tutorial covers:

  • How to setup your Development Environment (Unity, MarthonEnvs + ML-Agents + TensorflowSharp)
  • How to run each agent with their pre-trained models.
  • How to retrain the hopper agent and follow training in Tensorboard.
  • How to modify the hopper reward function and train it to jump.
  • See tutorial here

Requirements

  • Unity 2018.3 (Download here).

Project Organization

  • Marathon Enviroments Specific folders & files:
    • MLAgentsSDK\Assets\MarathonEnvs - The core project directory
    • config\marathon_envs_config.yaml - Config file for use when training with ml-agents
    • README.md - Read me for marathon-envs
  • Marathon Enviroments now incluses ML-Agents Toolkit v0.6 (Learn more here). All other files and folders are for ML-Agents. We do not include the ML-Agents example and documentation to keep the repro size down.

Publications & Usage

An early version of this work was presented March 19th, 2018 at the AI Summit - Game Developer Conference 2018 - http://schedule.gdconf.com/session/beyond-bots-making-machine-learning-accessible-and-useful/856147

Active Research using ML-Agents + MarathonEnvs


Support and Contributing

Support: Post an issue if you are having problems or need help getting a xml working.

Contributing: Ml-Agents 0.6 supports the Gym interface. It would be of value to the community to reproduce more benchmarcks and create a set of sample code for various algorthems. This would be a great way for someone looking to gain some experiance with Re-enforcement Learing. I would gladdly support and / or partner. Please post an issue if you are interesgted. Here are some ideas:


Included Environments

Humanoid

DeepMindHumanoid
DeepMindHumanoid
  • Set-up: Complex (DeepMind) Humanoid agent.
  • Goal: The agent must move its body toward the goal as quickly as possible without falling.
  • Agents: The environment contains 16 independent agents linked to a single brain.
  • Agent Reward Function:
    • Reference OpenAI.Roboschool and / or DeepMind
      • -joints at limit penality
      • -effort penality (ignors hip_y and knee)
      • +velocity
      • -height penality if below 1.2m
    • Inspired by Deliberate Practice (currently, only does legs)
      • +facing upright bonus for shoulders, waist, pelvis
      • +facing target bonus for shoulders, waist, pelvis
      • -non straight thigh penality
      • +leg phase bonus (for height of knees)
      • +0.01 times body direction alignment with goal direction.
      • -0.01 times head velocity difference from body velocity.
  • Agent Terminate Function:
    • TerminateOnNonFootHitTerrain - Agent terminates when a body part other than foot collides with the terrain.
  • Brains: One brain with the following observation/action space.
    • Vector Observation space: (Continuous) 88 variables
    • Vector Action space: (Continuous) Size of 21 corresponding to target rotations applicable to the joints.
    • Visual Observations: None.
  • Reset Parameters: None.

Hopper

DeepMindHopper
DeepMindHopper
  • Set-up: DeepMind Hopper agents.
  • Goal: The agent must move its body toward the goal as quickly as possible without falling.
  • Agents: The environment contains 16 independent agents linked to a single brain.
  • Agent Reward Function:
    • Reference OpenAI.Roboschool and / or DeepMind
      • -effort penality
      • +velocity
      • +uprightBonus
      • -height penality if below .65m OpenAI, 1.1m DeepMind
  • Agent Terminate Function:
    • DeepMindHopper: TerminateOnNonFootHitTerrain - Agent terminates when a body part other than foot collides with the terrain.
    • OpenAIHopper
      • TerminateOnNonFootHitTerrain
      • Terminate if height < .3m
      • Terminate if head tilt > 0.4
  • Brains: One brain with the following observation/action space.
    • Vector Observation space: (Continuous) 31 variables
    • Vector Action space: (Continuous) 4 corresponding to target rotations applicable to the joints.
    • Visual Observations: None.
  • Reset Parameters: None.

Walker

DeepMindWalker
DeepMindWalker
  • Set-up: DeepMind Walker agent.
  • Goal: The agent must move its body toward the goal as quickly as possible without falling.
  • Agents: The environment contains 16 independent agents linked to a single brain.
  • Agent Reward Function:
    • Reference OpenAI.Roboschool and / or DeepMind
      • -effort penality
      • +velocity
      • +uprightBonus
      • -height penality if below .65m OpenAI, 1.1m DeepMind
  • Agent Terminate Function:
    • TerminateOnNonFootHitTerrain - Agent terminates when a body part other than foot collides with the terrain.
  • Brains: One brain with the following observation/action space.
    • Vector Observation space: (Continuous) 41 variables
    • Vector Action space: (Continuous) Size of 6 corresponding to target rotations applicable to the joints.
    • Visual Observations: None.
  • Reset Parameters: None.

Ant

OpenAIAnt
OpenAIAnt
  • Set-up: OpenAI and Ant agent.
  • Goal: The agent must move its body toward the goal as quickly as possible without falling.
  • Agents: The environment contains 16 independent agents linked to a single brain.
  • Agent Reward Function:
    • Reference OpenAI.Roboschool and / or DeepMind
      • -joints at limit penality
      • -effort penality
      • +velocity
  • Agent Terminate Function:
    • Terminate if head body > 0.2
  • Brains: One brain with the following observation/action space.
    • Vector Observation space: (Continuous) 53 variables
    • Vector Action space: (Continuous) Size of 8 corresponding to target rotations applicable to the joints.
    • Visual Observations: None.
  • Reset Parameters: None.

Details

Key Files / Folders

  • MarathonEnvs - parent folder
    • Scripts/MarathonAgent.cs - Base Agent class for Marathon implementations
    • Scripts/MarathonSpawner.cs - Class for creating a Unity game object from a xml file
    • Scripts/MarathonJoint.cs - Model for mapping MuJoCo joints to Unity
    • Scripts/MarathonSensor.cs - Model for mapping MuJoCo sensors to Unity
    • Scripts/MarathonHelper.cs - Helper functions for MarathonSpawner.cs
    • Scripts/HandleOverlap.cs - helper script to for detecting overlapping Marathon elements.
    • Scripts/ProceduralCapsule.cs - Creates a Unity capsule which matches MuJoCo capsule
    • Scripts/SendOnCollisionTrigger.cs - class for sending collisions to MarathonAgent.cs
    • Scripts/SensorBehavior.cs - behavior class for sensors
    • Scripts/SmoothFollow.cs - camera script
    • Enviroments - sample enviroments
      • DeepMindReferenceXml - xml model files used in DeepMind research source
      • DeepMindHopper - Folder for reproducing DeepMindHopper
      • OpenAIAnt - Folder for reproducing OpenAIAnt
      • etc
  • config
    • marathon_envs_config.yaml - trainer-config file. The hyperparameters used when training from python.

Tuning params / Magic numbers

  • xxNamexx\Prefab\xxNamexx -> MarathonSpawner.Force2D = set to True when implementing a 2d model (hopper, walker)

  • xxNamexx\Prefab\xxNamexx -> MarathonSpawner.DefaultDesity:

    • 1000 = default (= same as MuJoCo)
    • Note: maybe overriden within a .xml script
  • xxNamexx\Prefab\xxNamexx -> MarathonSpawner.MotorScale = Magic number for tuning (scaler applied to all motors)

    • 1 = default ()
    • 1.5 used by DeepMindHopper, DeepMindWalker
  • xxNamexx\Prefab\xxNamexx -> xxAgentScript.MaxStep / DecisionFrequency:

    • 5000,5: OpenAIAnt, DeepMindHumanoid
    • 4000,4: DeepMindHopper, DeepMindWalker
    • Note: all params taken from OpenAI.Gym

Important:

  • This is not a complete implementation of MuJoCo; it is focused on doing just enough to get the locomotion enviroments working in Unity. See Scripts/MarathonSpawner.cs for which MuJoCo commands and ignored or partially implemented.
  • PhysX makes many tradeoffs in terms of accuracy when compared with Mujoco. It may not be the best choice for your research project.
  • Marathon environments are running at 300-500 physics simulations per second. This is significantly higher that Unity’s defaults setting of 50 physics simulations per second.
  • Currently, Marathon does not properly simulate how MuJoCo handles joint observations - as such, it maybe difficult to do transfer learning (from simulation to real world robots)

References:

You can’t perform that action at this time.