What does this codebase include?
This codebase implements the BADMM-based and Mirror-Descent-based guided policy search algorithms, including LQG-based trajectory optimization, fitting local dynamics model using a Gaussian Mixture Model prior, and the cost functions used our work.
For ease of applying the method, we also provide interfaces with three simulation and robotic platforms: Box2D for its ease of access, Mujoco for its more accurate and capable 3D physics simulation, and ROS for interfacing with real-world robots.
What does this codebase not include?
The codebase is a work in progress.
It includes the algorithm detailed in (Levine, Finn et al., 2016) and (Montgomery & Levine, 2016). It includes support for images and convolutional networks in the simulated mujoco interface.
It does not include the constrained guided policy search algorithm in (Levine et al., 2015; Levine and Abbeel, 2014). For a discussion on the differences between the BADMM and constrained versions of the algorithm, see (Levine, Finn et al. 2015).
Other extensions on the algorithm, including (Finn et al., 2016; Zhang et al., 2016; Fu et al., 2015) are not currently implemented, but the former two extensions are in progress.
Why Caffe and TensorFlow?
Caffe and TensorFlow both provide a straight-forward interface in python that makes it easy to define, train, and deploy architectures.
How can I find more details on the algorithms implemented in this codebase?
For the most complete, up-to-date reference, we recommend this paper:
Sergey Levine*, Chelsea Finn*, Trevor Darrell, Pieter Abbeel. End-to-End Training of Deep Visuomotor Policies. 2015. arxiv 1504.00702. [pdf]