Fix non-determinism in DAgger (fixes #643) by hacobe · Pull Request #649 · HumanCompatibleAI/imitation

hacobe · 2023-01-03T18:45:18Z

Description

The issue (#643) was that DAgger demonstrations were loaded from disk for training in a different order each run. This was because the filenames for the saved demonstrations changed each run and that changed the order in which os.listdir returned the filenames. The filenames changed each run, because they included a timestamp and the first 6 characters of a UUID generated without fixing a random seed.

This PR fixes the non-determinism by making the filenames the same each run as long as the same random seed is used. It does so by removing the timestamp from the filename and fixing the seed of the UUID. Because the timestamp is removed, the PR introduces a trajectory index in the filename, so that a user can tell the order in which trajectories were created. It also includes the entire UUID instead of just the first 6 characters. Finally, it sorts the filenames returned by os.listdir. listdir returns filenames in an arbitrary order that depends on the file system implementation (https://stackoverflow.com/questions/31534583/is-os-listdir-deterministic). We sort the filenames to ensure the order is consistent across file systems.

We discuss a few other design decisions below.

Why include a UUID in the filename at all? If we removed the UUID from the filename, then the DAgger trainers would not overwrite filenames, because they take care to write to a new directory each round of training. However, if the InteractiveTrajectoryCollector is used independently of those trainers, then it can end up overwriting filenames without the UUID.

Do we need to shuffle the filenames returned by os.listdir after sorting? We could, but the demonstrations loaded from the files are passed to a DataLoader, which shuffles them. That seems like the right place to handle the shuffling rather than making it the responsibility of the utility function that returns the filenames.

Testing

This PR also adds unit tests for the reproducibility of the InteractiveTrajectoryCollector, DAggerTrainer, and the SimpleDAggerTrainer. In particular, each unit test consists of running a block of code twice with the same random seeds and checking that the code produces the same result each time.

The issue was that DAgger demonstrations were loaded from disk for training in a different order each run. This was because the filenames for the saved demonstrations changed each run and that changed the order in which os.listdir returned the filenames. The filenames changed each run, because they included a timestamp and the first 6 characters of a UUID generated without fixing a random seed. This PR fixes the non-determinism by making the filenames the same each run as long as the same random seed is used. It does so by removing the timestamp from the filename and fixing the seed of the UUID. Because the timestamp is removed, the PR introduces a trajectory index in the filename, so that a user can tell the order in which trajectories were created. It also includes the entire UUID instead of just the first 6 characters. Finally, it sorts the filenames returned by os.listdir. listdir returns filenames in an arbitrary order that depends on the file system implementation (https://stackoverflow.com/questions/31534583/is-os-listdir-deterministic). We sort the filenames to ensure the order is consistent across file systems. Why include a UUID in the filename at all? If we removed the UUID from the filename, then the DAgger trainers would not overwrite filenames, because they take care to write to a new directory each round. However, if the InteractiveTrajectoryCollector is used independently of those trainers, then it can end up overwriting filenames without the UUID. Do we need to shuffle the filenames returned by os.listdir after sorting? We could, but the demonstrations loaded from the files are passed to a DataLoader, which shuffles them. That seems like the right place to handle the shuffling rather than making it the responsibility of the utility function that returns the filenames.

AdamGleave

Thanks for the PR! The fix looks good to me, I've left some minor suggestions on the tests.

Minor point but in GitHub if you write "fixes #N" it'll close issue N when the PR gets merged which is convenient, so I edited your title to include that rather than "reported by #N"

AdamGleave · 2023-01-04T01:50:22Z

There was a lint error originally but bizarrely it seems to have fixed itself on rerun. Never seen a flaky lint error before. My guess is some upstream dependency broke and a hotfix got released in the meantime?

codecov · 2023-01-04T01:59:41Z

Codecov Report

Merging #649 (5b6b8c3) into master (681cb72) will increase coverage by 0.01%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master     #649      +/-   ##
==========================================
+ Coverage   97.52%   97.54%   +0.01%     
==========================================
  Files          86       86              
  Lines        8373     8423      +50     
==========================================
+ Hits         8166     8216      +50     
  Misses        207      207

Impacted Files	Coverage Δ
src/imitation/algorithms/dagger.py	`100.00% <100.00%> (ø)`
tests/algorithms/test_dagger.py	`100.00% <100.00%> (ø)`

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

…re saving.

This PR makes the test_trainer_reproducible and test_traj_collector_reproducible more thorough. For test_trainer_reproducible, it tests that the trajectories from rolling out the trained policy are the same each run (instead of just testing that the rewards achieved by the trained policy are the same). For test_traj_collector_reproducible, it tests that the filenames for the files storing DAgger demonstrations are the same each run and that each file in the first run stores the same trajectory as the file with the same filename in the second run (instead of just testing that the observations from the trajectories are the same).

This PR reduces the number of training iterations in test_trainer_reproducible, because the previous number of iterations used was for testing that the policy improved with training, but that's not needed to test reproducibility.

AdamGleave

LGTM

hacobe requested a review from AdamGleave January 3, 2023 18:45

AdamGleave changed the title ~~Fix non-determinism in DAgger reported by #643~~ Fix non-determinism in DAgger (fixes #643) Jan 4, 2023

AdamGleave reviewed Jan 4, 2023

View reviewed changes

hacobe added 4 commits January 4, 2023 09:35

Assert that the DAgger demonstration file does not already exist befo…

57c6ab2

…re saving.

Minor clean-up: Shorten list comprehension

879b88c

AdamGleave approved these changes Jan 5, 2023

View reviewed changes

AdamGleave merged commit 8c18397 into master Jan 5, 2023

AdamGleave deleted the fix-dagger-nondet branch January 5, 2023 01:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix non-determinism in DAgger (fixes #643)#649

Fix non-determinism in DAgger (fixes #643)#649
AdamGleave merged 5 commits into
masterfrom
fix-dagger-nondet

hacobe commented Jan 3, 2023

Uh oh!

AdamGleave left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

AdamGleave commented Jan 4, 2023

Uh oh!

codecov Bot commented Jan 4, 2023 •

edited

Loading

Uh oh!

AdamGleave left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

hacobe commented Jan 3, 2023

Description

Testing

Uh oh!

AdamGleave left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

AdamGleave commented Jan 4, 2023

Uh oh!

codecov Bot commented Jan 4, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

AdamGleave left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

codecov Bot commented Jan 4, 2023 •

edited

Loading