GitHub - Stanford-ILIAD/easy-active-learning: Companion code to CoRL 2019 paper: E Bıyık, M Palan, NC Landolfi, DP Losey, D Sadigh. "Asking Easy Questions: A User-Friendly Approach to Active Reward Learning". 3rd Conference on Robot Learning (CoRL), Osaka, Japan, Oct. 2019.

Companion code to CoRL 2019 paper:
E Bıyık, M Palan, NC Landolfi, DP Losey, D Sadigh. "Asking Easy Questions: A User-Friendly Approach to Active Reward Learning". 3rd Conference on Robot Learning (CoRL), Osaka, Japan, Oct. 2019.

This code learns reward functions from human preferences in various tasks by actively generating queries to the human user based on maximum information gain. It also simulates maximum volume removal and random querying as baselines.

The codes for the physical Fetch robot is excluded, and only the simulation version is provided here.

Dependencies

You need to have the following libraries with Python3:

Running

Throughout this demo,

[task_name] should be selected as one of the following: LDS, Driver, Tosser, Fetch
[criterion] should be selected as one of the following: information, volume, random
[query_type] should be selected as one of the following: weak, strict For the details and positive integer parameters epsilon, M, N; we refer to the publication. You should run the codes in the following order:

Sampling the input space

This is the preprocessing step, so you need to run it only once (subsequent runs will overwrite for each task). It is not interactive and necessary only if you will use discrete query database. If you want to try continuous optimization of queries instead, which may take too much time per query, please see the instructions in volume and information functions in algos.py. For continuous optimization, you can skip this step.

You simply run

	python input_sampler.py [task_name] D

For quick (but highly suboptimal) results, we recommend D=1000. In the article, we used D=500000.

Learning preference reward function

You can simply run

	python run.py [task_name] [criterion] [query_type] epsilon M

where epsilon is the query-independent cost for optimal stopping, and M is the number of samples for Metropolis-Hastings. We recommend M=100. Setting epsilon=0 leads to infinitely many queries for the information gain formulation as information gain is always nonnegative. After each query, the user will be showed the w-vector learned up to that point.

Demonstration of learned parameters

This is just for demonstration purposes.

You simply run

	python run_optimizer.py [task_name] k w

where k is the number of initial random points for the non-convex optimization, and w is the space-separated reward vector (it must have proper number of dimensions with respect to the environment: 6 for LDS; 4 for Driver, Tosser and Fetch).

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
fetch_gym		fetch_gym
imgs		imgs
mujoco_xmls		mujoco_xmls
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
algos.py		algos.py
car.py		car.py
demos.py		demos.py
dynamics.py		dynamics.py
feature.py		feature.py
input_sampler.py		input_sampler.py
lane.py		lane.py
models.py		models.py
run.py		run.py
run_optimizer.py		run_optimizer.py
sampling.py		sampling.py
simulation_utils.py		simulation_utils.py
simulator.py		simulator.py
trajectory.py		trajectory.py
utils_driving.py		utils_driving.py
visualize.py		visualize.py
world.py		world.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Dependencies

Running

Sampling the input space

Learning preference reward function

Demonstration of learned parameters

About

Releases

Packages

Languages

License

Stanford-ILIAD/easy-active-learning

Folders and files

Latest commit

History

Repository files navigation

Dependencies

Running

Sampling the input space

Learning preference reward function

Demonstration of learned parameters

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages