Skip to content

Companion code to CoRL 2019 paper: E Bıyık, M Palan, NC Landolfi, DP Losey, D Sadigh. "Asking Easy Questions: A User-Friendly Approach to Active Reward Learning". 3rd Conference on Robot Learning (CoRL), Osaka, Japan, Oct. 2019.

master
Switch branches/tags
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Companion code to CoRL 2019 paper:
E Bıyık, M Palan, NC Landolfi, DP Losey, D Sadigh. "Asking Easy Questions: A User-Friendly Approach to Active Reward Learning". 3rd Conference on Robot Learning (CoRL), Osaka, Japan, Oct. 2019.

This code learns reward functions from human preferences in various tasks by actively generating queries to the human user based on maximum information gain. It also simulates maximum volume removal and random querying as baselines.

The codes for the physical Fetch robot is excluded, and only the simulation version is provided here.

Dependencies

You need to have the following libraries with Python3:

Running

Throughout this demo,

  • [task_name] should be selected as one of the following: LDS, Driver, Tosser, Fetch
  • [criterion] should be selected as one of the following: information, volume, random
  • [query_type] should be selected as one of the following: weak, strict For the details and positive integer parameters epsilon, M, N; we refer to the publication. You should run the codes in the following order:

Sampling the input space

This is the preprocessing step, so you need to run it only once (subsequent runs will overwrite for each task). It is not interactive and necessary only if you will use discrete query database. If you want to try continuous optimization of queries instead, which may take too much time per query, please see the instructions in volume and information functions in algos.py. For continuous optimization, you can skip this step.

You simply run

	python input_sampler.py [task_name] D

For quick (but highly suboptimal) results, we recommend D=1000. In the article, we used D=500000.

Learning preference reward function

You can simply run

	python run.py [task_name] [criterion] [query_type] epsilon M

where epsilon is the query-independent cost for optimal stopping, and M is the number of samples for Metropolis-Hastings. We recommend M=100. Setting epsilon=0 leads to infinitely many queries for the information gain formulation as information gain is always nonnegative. After each query, the user will be showed the w-vector learned up to that point.

Demonstration of learned parameters

This is just for demonstration purposes.

You simply run

	python run_optimizer.py [task_name] k w

where k is the number of initial random points for the non-convex optimization, and w is the space-separated reward vector (it must have proper number of dimensions with respect to the environment: 6 for LDS; 4 for Driver, Tosser and Fetch).

About

Companion code to CoRL 2019 paper: E Bıyık, M Palan, NC Landolfi, DP Losey, D Sadigh. "Asking Easy Questions: A User-Friendly Approach to Active Reward Learning". 3rd Conference on Robot Learning (CoRL), Osaka, Japan, Oct. 2019.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages