Skip to content
No description, website, or topics provided.
Python
Branch: master
Clone or download
Latest commit 04194de Oct 7, 2019
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
fetch_gym final code in submission Oct 7, 2019
imgs final code in submission Oct 7, 2019
mujoco_xmls final code in submission Oct 7, 2019
.gitignore Initial commit Jun 3, 2019
LICENSE initial commit Jun 3, 2019
README.md Update README.md Oct 8, 2019
algos.py final code in submission Oct 7, 2019
car.py initial commit Jun 3, 2019
demos.py final code in submission Oct 7, 2019
dynamics.py
feature.py initial commit Jun 3, 2019
input_sampler.py final code in submission Oct 7, 2019
lane.py initial commit Jun 3, 2019
models.py final code in submission Oct 7, 2019
run.py final code in submission Oct 7, 2019
run_optimizer.py final code in submission Oct 7, 2019
sampling.py
simulation_utils.py final code in submission Oct 7, 2019
simulator.py final code in submission Oct 7, 2019
trajectory.py initial commit Jun 3, 2019
utils_driving.py initial commit Jun 3, 2019
visualize.py final code in submission Oct 7, 2019
world.py a lot of changes Jun 6, 2019

README.md

Companion code to CoRL 2019 paper:
E Bıyık, M Palan, NC Landolfi, DP Losey, D Sadigh. "Asking Easy Questions: A User-Friendly Approach to Active Reward Learning". 3rd Conference on Robot Learning (CoRL), Osaka, Japan, Oct. 2019.

This code learns reward functions from human preferences in various tasks by actively generating queries to the human user based on maximum information gain. It also simulates maximum volume removal and random querying as baselines.

The codes for the physical Fetch robot is excluded, and only the simulation version is provided here.

Dependencies

You need to have the following libraries with Python3:

Running

Throughout this demo,

  • [task_name] should be selected as one of the following: LDS, Driver, Tosser, Fetch
  • [criterion] should be selected as one of the following: information, volume, random
  • [query_type] should be selected as one of the following: weak, strict For the details and positive integer parameters epsilon, M, N; we refer to the publication. You should run the codes in the following order:

Sampling the input space

This is the preprocessing step, so you need to run it only once (subsequent runs will overwrite for each task). It is not interactive and necessary only if you will use discrete query database. If you want to try continuous optimization of queries instead, which may take too much time per query, please see the instructions in volume and information functions in algos.py. For continuous optimization, you can skip this step.

You simply run

	python input_sampler.py [task_name] D

For quick (but highly suboptimal) results, we recommend D=1000. In the article, we used D=500000.

Learning preference reward function

You can simply run

	python run.py [task_name] [criterion] [query_type] epsilon M

where epsilon is the query-independent cost for optimal stopping, and M is the number of samples for Metropolis-Hastings. We recommend M=100. Setting epsilon=0 leads to infinitely many queries for the information gain formulation as information gain is always nonnegative. After each query, the user will be showed the w-vector learned up to that point.

Demonstration of learned parameters

This is just for demonstration purposes.

You simply run

	python run_optimizer.py [task_name] k w

where k is the number of initial random points for the non-convex optimization, and w is the space-separated reward vector (it must have proper number of dimensions with respect to the environment: 6 for LDS; 4 for Driver, Tosser and Fetch).

You can’t perform that action at this time.