Adding reward ensembles and conservative reward functions #460

levmckinney · 2022-07-05T18:44:29Z

This pull request creates a base class for reward functions that keep track of their epistemic uncertainty, provides an ensemble based implementation for it and a conservative reward wrapper.

TODO:

codecov · 2022-07-05T18:46:41Z

Codecov Report

Merging #460 (64c4220) into master (83dfc39) will increase coverage by 0.14%.
The diff coverage is 99.76%.

@@            Coverage Diff             @@
##           master     #460      +/-   ##
==========================================
+ Coverage   96.67%   96.82%   +0.14%     
==========================================
  Files          80       82       +2     
  Lines        6775     7127     +352     
==========================================
+ Hits         6550     6901     +351     
- Misses        225      226       +1

Impacted Files	Coverage Δ
src/imitation/scripts/common/reward.py	`98.64% <97.05%> (-1.36%)`	⬇️
src/imitation/algorithms/preference_comparisons.py	`98.95% <100.00%> (+0.18%)`	⬆️
src/imitation/rewards/reward_nets.py	`97.07% <100.00%> (+1.45%)`	⬆️
src/imitation/rewards/serialize.py	`100.00% <100.00%> (ø)`
...ion/scripts/config/train_preference_comparisons.py	`84.72% <100.00%> (+0.21%)`	⬆️
src/imitation/scripts/config/train_rl.py	`77.63% <100.00%> (+0.29%)`	⬆️
.../imitation/scripts/train_preference_comparisons.py	`98.21% <100.00%> (+0.03%)`	⬆️
src/imitation/scripts/train_rl.py	`100.00% <100.00%> (ø)`
src/imitation/testing/reward_nets.py	`100.00% <100.00%> (ø)`
tests/algorithms/test_preference_comparisons.py	`100.00% <100.00%> (ø)`
... and 3 more

Help us with your feedback. Take ten seconds to tell us how you rate us.

src/imitation/rewards/reward_nets.py

AdamGleave · 2022-07-07T00:51:04Z

Finished initial review now. Seems like a reasonable design. Left some more detailed comments in line. Please request re-review when addressed.

Co-authored-by: Adam Gleave <adam@gleave.me>

… EnsembleRewardNet initalizer

…ng before this ...

levmckinney · 2022-07-09T03:18:52Z

@yawen-d It should be possible to train the reward ensemble now. If you have time can you test that this most basic version works? You can do this by just setting the reward class used by the reward ingredient to be a reward ensemble.

I have not quite figured out how to get the conservative wrapper applied during retraining with train_irl.py. It's an issue with reward deserialization. Currently, it returns a RewardFn so I'll have to think more carefully about how to introduce the reward wrapper. Once that is completed the entire pipeline for training ensembles of reward functions and making them conservative should be done.

src/imitation/rewards/reward_nets.py

yawen-d · 2022-07-10T06:28:20Z

Thanks for the implementations!

@yawen-d It should be possible to train the reward ensemble now. If you have time can you test that this most basic version works? You can do this by just setting the reward class used by the reward ingredient to be a reward ensemble.

Sure. I just started some light benchmarking environments on the current version.

In addition, one another feature to consider is to track the reward variance over time.

AdamGleave

LGTM.

Before merging, please:

Review my changes. I pushed some small changes to restructure one piece of the code, and fix some typos.
Wait for windows-ci-improvements to get merged (if not already reviewed, you can review it to accelerate that). We should then let this PR be rebased and merge into master.

AdamGleave · 2022-07-28T06:28:46Z

I think this is good to merge once conflicts resolved.

levmckinney · 2022-07-28T18:29:24Z

Seems like rerunning the tests fixed things. see #502

levmckinney requested a review from AdamGleave July 5, 2022 18:49

AdamGleave reviewed Jul 6, 2022

View reviewed changes

AdamGleave reviewed Jul 7, 2022

View reviewed changes

src/imitation/rewards/reward_nets.py Outdated Show resolved Hide resolved

AdamGleave mentioned this pull request Jul 7, 2022

[Preference Comparison] Active learning from ensemble #462

Closed

levmckinney mentioned this pull request Jul 9, 2022

RewardNet refactor #464

Open

levmckinney and others added 11 commits July 8, 2022 23:05

first draft of reward ensembles

44aa95e

Fixed doc string

3edde81

Co-authored-by: Adam Gleave <adam@gleave.me>

adressed most of reviewers comments

65f6a82

Renamed UncertainRewardNet to RewardNetWithVariance

fa9147f

moved implementation of make_reward_net to reward_nets.py and rewrote…

8d490a4

… EnsembleRewardNet initalizer

fixed conservative reward wrapper

7b6e285

added test for reward_moments

2ec296e

switched to a nn.ModuleList not sure how serialize identity was passi…

eae4477

…ng before this ...

added test for conservative reward function

83b971d

pulled loss calculation out of reward trainer

2f7b4c3

created reward ensemble trainer

42378f8

levmckinney force-pushed the reward_ensemble branch from 9b6b987 to 42378f8 Compare July 9, 2022 03:05

added documentation for cross_entropy_loss_kwarg

59b7644

levmckinney added 4 commits July 8, 2022 23:34

Merge branch 'master' into reward_ensemble

f591631

fixed tests and implementation of ensemble trainer

82048ac

modified assert so that it is actually always true

53fbd14

added loss to preference comparision notebook

4577771

yawen-d reviewed Jul 10, 2022

View reviewed changes

src/imitation/rewards/reward_nets.py Outdated Show resolved Hide resolved

add named config to reward.py and integrated tests

01add55

levmckinney marked this pull request as ready for review July 12, 2022 14:27

added logging of standard deviation

7edb59d

levmckinney added 4 commits July 22, 2022 11:47

aded load_reard_kwargs to train_rl.py

3f6e5ae

aded load_reard_kwargs to train_rl.py

24e0b1d

fixed failling tests and added load_reward_kwargs to retraining test

628e285

added error checking for improperly wrapped ensembles

af895cb

levmckinney force-pushed the reward_ensemble branch from a63c0f0 to af895cb Compare July 26, 2022 03:23

levmckinney added 4 commits July 25, 2022 21:50

added more tests for error conditions

034750e

Disallowed ensembles of size 1

e9074d4

added tests for make_reward

751d444

Merge branch 'master' into reward_ensemble

a1f4c04

levmckinney requested a review from AdamGleave July 26, 2022 18:35

AdamGleave added 10 commits July 26, 2022 14:41

Fix typo, minor restructure

08bfebe

Unignore scripts, install ffmpeg

a8f479d

Unignore scripts, install ffmpeg

9cd1535

Move skips to tests to make more self-documenting

ec828f8

Skip test that needs symlinks

f19d85b

Use os.path to support Windows backslash path

a4e1241

Use fixed SB3 version

1cfbdd1

Use master version of SB3 on Windows

b503a1b

Do not require coverage for OS-specific skip lines

27296da

Merge branch 'windows-ci-improvements' into reward_ensemble

d6c279a

AdamGleave changed the base branch from master to windows-ci-improvements July 26, 2022 23:42

Explain no cover line

5ba97a3

AdamGleave approved these changes Jul 26, 2022

View reviewed changes

Rocamonde mentioned this pull request Jul 28, 2022

Improve README.md and related files #484

Merged

6 tasks

Base automatically changed from windows-ci-improvements to master July 28, 2022 06:27

Merge branch 'master' into reward_ensemble

64c4220

levmckinney merged commit c1a992a into master Jul 28, 2022

levmckinney deleted the reward_ensemble branch July 28, 2022 18:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding reward ensembles and conservative reward functions #460

Adding reward ensembles and conservative reward functions #460

levmckinney commented Jul 5, 2022 •

edited

Loading

codecov bot commented Jul 5, 2022 •

edited

Loading

AdamGleave commented Jul 7, 2022

levmckinney commented Jul 9, 2022 •

edited

Loading

yawen-d commented Jul 10, 2022 •

edited

Loading

AdamGleave left a comment

AdamGleave commented Jul 28, 2022

levmckinney commented Jul 28, 2022

Adding reward ensembles and conservative reward functions #460

Adding reward ensembles and conservative reward functions #460

Conversation

levmckinney commented Jul 5, 2022 • edited Loading

TODO:

codecov bot commented Jul 5, 2022 • edited Loading

Codecov Report

AdamGleave commented Jul 7, 2022

levmckinney commented Jul 9, 2022 • edited Loading

yawen-d commented Jul 10, 2022 • edited Loading

AdamGleave left a comment

Choose a reason for hiding this comment

AdamGleave commented Jul 28, 2022

levmckinney commented Jul 28, 2022

levmckinney commented Jul 5, 2022 •

edited

Loading

codecov bot commented Jul 5, 2022 •

edited

Loading

levmckinney commented Jul 9, 2022 •

edited

Loading

yawen-d commented Jul 10, 2022 •

edited

Loading