Skip to content

Fix non-determinism in DAgger (fixes #643)#649

Merged
AdamGleave merged 5 commits into
masterfrom
fix-dagger-nondet
Jan 5, 2023
Merged

Fix non-determinism in DAgger (fixes #643)#649
AdamGleave merged 5 commits into
masterfrom
fix-dagger-nondet

Conversation

@hacobe
Copy link
Copy Markdown
Contributor

@hacobe hacobe commented Jan 3, 2023

Description

The issue (#643) was that DAgger demonstrations were loaded from disk for training in a different order each run. This was because the filenames for the saved demonstrations changed each run and that changed the order in which os.listdir returned the filenames. The filenames changed each run, because they included a timestamp and the first 6 characters of a UUID generated without fixing a random seed.

This PR fixes the non-determinism by making the filenames the same each run as long as the same random seed is used. It does so by removing the timestamp from the filename and fixing the seed of the UUID. Because the timestamp is removed, the PR introduces a trajectory index in the filename, so that a user can tell the order in which trajectories were created. It also includes the entire UUID instead of just the first 6 characters. Finally, it sorts the filenames returned by os.listdir. listdir returns filenames in an arbitrary order that depends on the file system implementation (https://stackoverflow.com/questions/31534583/is-os-listdir-deterministic). We sort the filenames to ensure the order is consistent across file systems.

We discuss a few other design decisions below.

Why include a UUID in the filename at all? If we removed the UUID from the filename, then the DAgger trainers would not overwrite filenames, because they take care to write to a new directory each round of training. However, if the InteractiveTrajectoryCollector is used independently of those trainers, then it can end up overwriting filenames without the UUID.

Do we need to shuffle the filenames returned by os.listdir after sorting? We could, but the demonstrations loaded from the files are passed to a DataLoader, which shuffles them. That seems like the right place to handle the shuffling rather than making it the responsibility of the utility function that returns the filenames.

Testing

This PR also adds unit tests for the reproducibility of the InteractiveTrajectoryCollector, DAggerTrainer, and the SimpleDAggerTrainer. In particular, each unit test consists of running a block of code twice with the same random seeds and checking that the code produces the same result each time.

The issue was that DAgger demonstrations were loaded from disk for training in a different order each run. This was because the filenames for the saved demonstrations changed each run and that changed the order in which os.listdir returned the filenames. The filenames changed each run, because they included a timestamp and the first 6 characters of a UUID generated without fixing a random seed.

This PR fixes the non-determinism by making the filenames the same each run as long as the same random seed is used. It does so by removing the timestamp from the filename and fixing the seed of the UUID. Because the timestamp is removed, the PR introduces a trajectory index in the filename, so that a user can tell the order in which trajectories were created. It also includes the entire UUID instead of just the first 6 characters. Finally, it sorts the filenames returned by os.listdir. listdir returns filenames in an arbitrary order that depends on the file system implementation (https://stackoverflow.com/questions/31534583/is-os-listdir-deterministic). We sort the filenames to ensure the order is consistent across file systems.

Why include a UUID in the filename at all? If we removed the UUID from the filename, then the DAgger trainers would not overwrite filenames, because they take care to write to a new directory each round. However, if the InteractiveTrajectoryCollector is used independently of those trainers, then it can end up overwriting filenames without the UUID.

Do we need to shuffle the filenames returned by os.listdir after sorting? We could, but the demonstrations loaded from the files are passed to a DataLoader, which shuffles them. That seems like the right place to handle the shuffling rather than making it the responsibility of the utility function that returns the filenames.
@hacobe hacobe requested a review from AdamGleave January 3, 2023 18:45
@AdamGleave AdamGleave changed the title Fix non-determinism in DAgger reported by #643 Fix non-determinism in DAgger (fixes #643) Jan 4, 2023
Copy link
Copy Markdown
Member

@AdamGleave AdamGleave left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! The fix looks good to me, I've left some minor suggestions on the tests.

Minor point but in GitHub if you write "fixes #N" it'll close issue N when the PR gets merged which is convenient, so I edited your title to include that rather than "reported by #N"

Comment thread src/imitation/algorithms/dagger.py Outdated
Comment thread src/imitation/algorithms/dagger.py
Comment thread src/imitation/algorithms/dagger.py
Comment thread tests/algorithms/test_dagger.py Outdated
Comment thread tests/algorithms/test_dagger.py Outdated
Comment thread tests/algorithms/test_dagger.py Outdated
@AdamGleave
Copy link
Copy Markdown
Member

There was a lint error originally but bizarrely it seems to have fixed itself on rerun. Never seen a flaky lint error before. My guess is some upstream dependency broke and a hotfix got released in the meantime?

@codecov
Copy link
Copy Markdown

codecov Bot commented Jan 4, 2023

Codecov Report

Merging #649 (5b6b8c3) into master (681cb72) will increase coverage by 0.01%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master     #649      +/-   ##
==========================================
+ Coverage   97.52%   97.54%   +0.01%     
==========================================
  Files          86       86              
  Lines        8373     8423      +50     
==========================================
+ Hits         8166     8216      +50     
  Misses        207      207              
Impacted Files Coverage Δ
src/imitation/algorithms/dagger.py 100.00% <100.00%> (ø)
tests/algorithms/test_dagger.py 100.00% <100.00%> (ø)

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

hacobe added 4 commits January 4, 2023 09:35
This PR makes the test_trainer_reproducible and test_traj_collector_reproducible more thorough. For test_trainer_reproducible, it tests that the trajectories from rolling out the trained policy are the same each run (instead of just testing that the rewards achieved by the trained policy are the same). For test_traj_collector_reproducible, it tests that the filenames for the files storing DAgger demonstrations are the same each run and that each file in the first run stores the same trajectory as the file with the same filename in the second run (instead of just testing that the observations from the trajectories are the same).
This PR reduces the number of training iterations in test_trainer_reproducible, because the previous number of iterations used was for testing that the policy improved with training, but that's not needed to test reproducibility.
Copy link
Copy Markdown
Member

@AdamGleave AdamGleave left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@AdamGleave AdamGleave merged commit 8c18397 into master Jan 5, 2023
@AdamGleave AdamGleave deleted the fix-dagger-nondet branch January 5, 2023 01:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants