Skip to content

Latest commit

 

History

History
319 lines (209 loc) · 11.8 KB

index.md

File metadata and controls

319 lines (209 loc) · 11.8 KB

Guided Policy Search

This code is a reimplementation of the guided policy search algorithm and LQG-based trajectory optimization, meant to help others understand, reuse, and build upon existing work. It includes a complete robot controller and sensor interface for the PR2 robot via ROS, and an interface for simulated agents in Box2D and MuJoCo. Source code is available on GitHub.

While the core functionality is fully implemented and tested, the codebase is a work in progress. See the FAQ for information on planned future additions to the code.


Relevant work

Relevant papers which have used guided policy search include:

  • Sergey Levine*, Chelsea Finn*, Trevor Darrell, Pieter Abbeel. End-to-End Training of Deep Visuomotor Policies. JMLR 2016. [pdf]
  • William Montgomery, Sergey Levine. Guided Policy Search as Approximate Mirror Descent. NIPS 2016. [pdf]
  • Marvin Zhang, Zoe McCarthy, Chelsea Finn, Sergey Levine, Pieter Abbeel. Learning Deep Neural Network Policies with Continuous Memory States. ICRA 2016. [pdf]
  • Chelsea Finn, Xin Yu Tan, Yan Duan, Trevor Darrell, Sergey Levine, Pieter Abbeel. Deep Spatial Autoencoders for Visuomotor Learning. ICRA 2016. [pdf]
  • Sergey Levine, Nolan Wagener, Pieter Abbeel. Learning Contact-Rich Manipulation Skills with Guided Policy Search. ICRA 2015. [pdf]
  • Sergey Levine, Pieter Abbeel. Learning Neural Network Policies with Guided Policy Search under Unknown Dynamics. NIPS 2014. [pdf]

If the codebase is helpful for your research, please cite any relevant paper(s) above and the following:

  • Chelsea Finn, Marvin Zhang, Justin Fu, William Montgomery, Xin Yu Tan, Zoe McCarthy, Bradly Stadie, Emily Scharff, Sergey Levine. Guided Policy Search Code Implementation. 2016. Software available from rll.berkeley.edu/gps.

For bibtex, see this page.


Installation

Dependencies

The following are required

One or more of the following agent interfaces is required. Set up instructions for each are below.

One of the following neural network libraries is required for the full guided policy search algorithm

  • Caffe (master branch as of 11/2015, with pycaffe compiled, python layer enabled, PYTHONPATH configured)
  • TensorFlow (Tested on versions 0.5-0.8. Works with and without GPU support).

Setup

Follow the following steps to get set up:

  1. Install necessary dependencies above. To install protobuf and boost:

    sudo apt-get install libprotobuf-dev protobuf-compiler libboost-all-dev
    sudo pip install protobuf
  2. Clone the repo:

    git clone https://github.com/cbfinn/gps.git
  3. Compile protobuffer:

    cd gps
    ./compile_proto.sh
  4. Set up one or more agents below.

Box2D Setup (optional)

Here are the instructions for setting up Pybox2D.

  1. Install Swig and Pygame:

    sudo apt-get install build-essential python-dev swig python-pygame git
  2. Check out the Pybox2d code via GitHub

    git clone https://github.com/pybox2d/pybox2d
  3. Build and install the library:

    python setup.py build
    sudo python setup.py install

MuJoCo Setup (optional)

In addition to the dependencies listed above, OpenSceneGraph(v3.0.1+) is also needed. It can be installed by running sudo apt-get install openscenegraph libopenscenegraph-dev.

  1. Install MuJoCo (v1.22+) and place the downloaded mjpro directory into gps/src/3rdparty. MuJoCo is a high-quality physics engine and requires requires a license. Obtain a key, which should be named mjkey.txt, and place the key into the mjpro directory.

  2. Build gps/src/3rdparty by running:

    cd gps/build
    cmake ../src/3rdparty
    make -j
  3. Set up paths by adding the following to your ~/.bashrc file:

    export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/path/to/gps/build/lib
    export PYTHONPATH=$PYTHONPATH:/path/to/gps/build/lib

    Don't forget to run source ~/.bashrc afterward.

ROS Setup (optional)

  1. Install ROS, including the standard PR2 packages

  2. Set up paths by adding the following to your ~/.bashrc file:

    export ROS_PACKAGE_PATH=$ROS_PACKAGE_PATH:/path/to/gps:/path/to/gps/src/gps_agent_pkg

    Don't forget to run source ~/.bashrc afterward.

  3. Compilation:

    cd src/gps_agent_pkg/
    cmake .
    make -j

brokenrobot warning

ROS Setup with Caffe (optional)

This is required if you intend to run neural network policies with the ROS agent.

  1. Run step 1 and 2 of the above section.

  2. Checkout and build caffe, including running make -j && make distribute within caffe.

  3. Compilation:

    cd src/gps_agent_pkg/
    cmake . -DUSE_CAFFE=1 -DCAFFE_INCLUDE_PATH=/path/to/caffe/distribute/include -DCAFFE_LIBRARY_PATH=/path/to/caffe/build/lib
    make -j

    To compile with GPU, also include the option -DUSE_CAFFE_GPU=1.


Examples

Box2D example

There are two examples of running trajectory optimizaiton using a simple 2D agent in Box2D. Before proceeding, be sure to set up Box2D.

Each example starts from a random controller and learns through experience to minimize cost.

The first is a point mass learning to move to goal position.

pm0 pm100

To try it out, run the following from the gps directory:

python python/gps/gps_main.py box2d_pointmass_example

The progress of the algorithm is displayed on the GUI. The point mass should start reaching the visualized goal by around the 4th iteration.

The second example is a 2-link arm learning to move to goal state.

box2d0 box2d100

To try it out, run this:

python python/gps/gps_main.py box2d_arm_example

The arm should start reaching the visualized goal after around 6 iterations.

All settings for these examples are located in experiments/box2d_[name]_example/hyperparams.py, which can be modified to input different target positions and change various hyperparameters of the algorihtm.

MuJoCo example

To run the mujoco example, be sure to first set up MuJoCo.

The first example is using trajectory optimizing for peg insertion. To try it, run the following from the gps directory:

python python/gps/gps_main.py mjc_example

Here the robot starts with a random initial controller and learns to insert the peg into the hole. The progress of the algorithm is displayed on the GUI.

mjc0 mjc99

Now let's learn to generalize to different positions of the hole. For this, run the guided policy search algorithm:

python python/gps/gps_main.py mjc_badmm_example

The robot learns a neural network policy for inserting the peg under varying initial conditions.

To tinker with the hyperparameters and input, take a look at experiments/mjc_badmm_example/hyperparams.py. Additionally, the neural network library can be changed through the ALGORITHM_NN_LIBRARY variable which can be set to caffe or tf.

PR2 example

To run the code on a real or simulated PR2, be sure to first follow the instructions above for ROS setup.

1. Start the controller

Real-world PR2

On the PR2 computer, run:

roslaunch gps_agent_pkg pr2_real.launch

This will stop the default arm controllers and spawn the GPSPR2Plugin.

Simulated PR2

Note: If you are running ROS hydro or later, open the launch file pr2_gazebo_no_controller.launch and change the include line as specified.

Launch gazebo and the GPSPR2Plugin:

roslaunch gps_agent_pkg pr2_gazebo.launch
2. Run the code

Now you're ready to run the examples via gps_main. This can be done on any machine as long as the ROS environment variables are set appropriately.

The first example starts from a random initial controller and learns to move the gripper to a specified location.

Run the following from the gps directory:

python python/gps/gps_main.py pr2_example

The PR2 should reach the position shown on the right below, and reach a cost of around -600 before the end of 10 iterations.

pr20 pr2end

The second example trains a neural network policy to reach a goal pose from different starting positions, using guided policy search:

python python/gps/gps_main.py pr2_badmm_example

To learn how to make your own experiment and/or set your own initial and target positions, see the next section

Running a new experiment

  1. Set up a new experiment directory by running:

    python python/gps/gps_main.py my_experiment -n

    This will create a new directory called my_experiment/ in the experiments directory, with a blank hyperparams.py file.

  2. Fill in a hyperparams.py file in your experiment. See pr2_example and mjc_example for examples.

  3. If you wish to set the initial and/or target positions for the pr2 robot agent, run target setup:

    python python/gps/gps_main.py my_experiment -t

    See the GUI documentation for details on using the GUI.

  4. Finally, run your experiment

    python python/gps/gps_main.py my_experiment

All of the output logs and data will be routed to your experiment directory. For more details, see intended usage.


Documentation

In addition to the inline docstrings and comments, see the following pages for more detailed documentation:


Learning with your own robot

The code was written to be modular, to make it easy to hook up your own robot. To do so, either use one of the existing agent interfaces (e.g. AgentROS), or write your own.


Reporting bugs and getting help

You can post questions on gps-help. If you want to contribute, please post on gps-dev. When your contribution is ready, make a pull request on GitHub.


Licensing

This codebase is released under the BSD 2-clause license.

If you plan to use this code for commercial purposes, we ask that you send us a quick email at gps-dev-private@googlegroups.com to let us know that you're using it.