Skip to content
Switch branches/tags


This repository is the official implementation of the following paper:

End-to-End Robotic Reinforcement Learning without Reward Engineering
Avi Singh, Larry Yang, Kristian Hartikainen, Chelsea Finn, Sergey Levine
Robotics: Science and Systems 2019
Website | Video | Arxiv

Visual Draping Visual Pushing Visual Bookshelf

Visual Door Opening Visual Pusher Visual Picker

We propose a method for end-to-end learning of robotic skills in the real world using deep reinforcement learning. We learn these policies directly on pixel observations, and we do so without any hand-engineered or task-specific reward functions, and instead learn the rewards for such tasks from a small number of user-provided goal examples (around 80), followed by a modest number of active queries (around 25-75).

This implementation is based on softlearning.

Getting Started


The environment can be run either locally using conda or inside a docker container. For conda installation, you need to have Conda installed. For docker installation you will need to have Docker and Docker Compose installed. Also, most of our environments currently require a MuJoCo license.

Conda Installation

  1. Download and install MuJoCo 1.50 from the MuJoCo website. We assume that the MuJoCo files are extracted to the default location (~/.mujoco/mjpro150).

  2. Copy your MuJoCo license key (mjkey.txt) to ~/.mujoco/mjkey.txt:

  3. Clone reward-learning-rl

  1. Create and activate conda environment, install softlearning to enable command line interface.
conda env create -f environment.yml
conda activate softlearning
pip install -e ${REWARD_LEARNING_PATH}

The environment should be ready to run. See examples section for examples of how to train and simulate the agents.

Finally, to deactivate and remove the conda environment:

conda deactivate
conda remove --name softlearning --all

Docker Installation


To build the image and run the container:

export MJKEY="$(cat ~/.mujoco/mjkey.txt)" \
    && docker-compose \
        -f ./docker/ \
        up \
        -d \

You can access the container with the typical Docker exec-command, i.e.

docker exec -it softlearning bash

See examples section for examples of how to train and simulate the agents.

Finally, to clean up the docker setup:

docker-compose \
    -f ./docker/ \
    down \
    --rmi all \


Training an agent

softlearning run_example_local examples.classifier_rl \
--n_goal_examples 10 \
--task=Image48SawyerDoorPullHookEnv-v0 \
--algorithm VICERAQ \
--num-samples 5 \
--n_epochs 300 \
--active_query_frequency 10

The tasks used in the paper were Image48SawyerPushForwardEnv-v0, Image48SawyerDoorPullHookEnv-v0 and Image48SawyerPickAndPlace3DEnv-v0. For the algorithm, you can experiment with VICERAQ, VICE, RAQ, SACClassifier, and SAC. The --num-samples flag specifies the number of random seeds launched. All results in the paper were averaged across five random seeds. The hyperparameters are stored in examples/classifier_rl/

examples.classifier_rl.main contains several different environments. For more information about the agents and configurations, run the scripts with --help flag: python ./examples/classifier_rl/ --help.

Version history


  • This version contains the code to reproduce the results in Singh et al, RSS 2019.


If this codebase helps you in your academic research, you are encouraged to cite our paper. Here is an example bibtex:

  title={End-to-End Robotic Reinforcement Learning without Reward Engineering},
  author={Avi Singh and Larry Yang and Kristian Hartikainen and Chelsea Finn and Sergey Levine},
  journal={Robotics: Science and Systems},

If you mainly use the VICE algorithm implemented here, you should also cite:

  title={Variational Inverse Control with Events: A General Framework for Data-Driven Reward Definition},
  author={Justin Fu and Avi Singh and Dibya Ghosh and Larry Yang and Sergey Levine},
  journal={Neural Information Processing Systems},