Fix non-determinism in DAgger (fixes #643)#649
Merged
Conversation
The issue was that DAgger demonstrations were loaded from disk for training in a different order each run. This was because the filenames for the saved demonstrations changed each run and that changed the order in which os.listdir returned the filenames. The filenames changed each run, because they included a timestamp and the first 6 characters of a UUID generated without fixing a random seed. This PR fixes the non-determinism by making the filenames the same each run as long as the same random seed is used. It does so by removing the timestamp from the filename and fixing the seed of the UUID. Because the timestamp is removed, the PR introduces a trajectory index in the filename, so that a user can tell the order in which trajectories were created. It also includes the entire UUID instead of just the first 6 characters. Finally, it sorts the filenames returned by os.listdir. listdir returns filenames in an arbitrary order that depends on the file system implementation (https://stackoverflow.com/questions/31534583/is-os-listdir-deterministic). We sort the filenames to ensure the order is consistent across file systems. Why include a UUID in the filename at all? If we removed the UUID from the filename, then the DAgger trainers would not overwrite filenames, because they take care to write to a new directory each round. However, if the InteractiveTrajectoryCollector is used independently of those trainers, then it can end up overwriting filenames without the UUID. Do we need to shuffle the filenames returned by os.listdir after sorting? We could, but the demonstrations loaded from the files are passed to a DataLoader, which shuffles them. That seems like the right place to handle the shuffling rather than making it the responsibility of the utility function that returns the filenames.
AdamGleave
reviewed
Jan 4, 2023
Member
AdamGleave
left a comment
There was a problem hiding this comment.
Thanks for the PR! The fix looks good to me, I've left some minor suggestions on the tests.
Minor point but in GitHub if you write "fixes #N" it'll close issue N when the PR gets merged which is convenient, so I edited your title to include that rather than "reported by #N"
Member
|
There was a lint error originally but bizarrely it seems to have fixed itself on rerun. Never seen a flaky lint error before. My guess is some upstream dependency broke and a hotfix got released in the meantime? |
Codecov Report
@@ Coverage Diff @@
## master #649 +/- ##
==========================================
+ Coverage 97.52% 97.54% +0.01%
==========================================
Files 86 86
Lines 8373 8423 +50
==========================================
+ Hits 8166 8216 +50
Misses 207 207
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
This PR makes the test_trainer_reproducible and test_traj_collector_reproducible more thorough. For test_trainer_reproducible, it tests that the trajectories from rolling out the trained policy are the same each run (instead of just testing that the rewards achieved by the trained policy are the same). For test_traj_collector_reproducible, it tests that the filenames for the files storing DAgger demonstrations are the same each run and that each file in the first run stores the same trajectory as the file with the same filename in the second run (instead of just testing that the observations from the trajectories are the same).
This PR reduces the number of training iterations in test_trainer_reproducible, because the previous number of iterations used was for testing that the policy improved with training, but that's not needed to test reproducibility.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
The issue (#643) was that DAgger demonstrations were loaded from disk for training in a different order each run. This was because the filenames for the saved demonstrations changed each run and that changed the order in which os.listdir returned the filenames. The filenames changed each run, because they included a timestamp and the first 6 characters of a UUID generated without fixing a random seed.
This PR fixes the non-determinism by making the filenames the same each run as long as the same random seed is used. It does so by removing the timestamp from the filename and fixing the seed of the UUID. Because the timestamp is removed, the PR introduces a trajectory index in the filename, so that a user can tell the order in which trajectories were created. It also includes the entire UUID instead of just the first 6 characters. Finally, it sorts the filenames returned by os.listdir. listdir returns filenames in an arbitrary order that depends on the file system implementation (https://stackoverflow.com/questions/31534583/is-os-listdir-deterministic). We sort the filenames to ensure the order is consistent across file systems.
We discuss a few other design decisions below.
Why include a UUID in the filename at all? If we removed the UUID from the filename, then the DAgger trainers would not overwrite filenames, because they take care to write to a new directory each round of training. However, if the InteractiveTrajectoryCollector is used independently of those trainers, then it can end up overwriting filenames without the UUID.
Do we need to shuffle the filenames returned by os.listdir after sorting? We could, but the demonstrations loaded from the files are passed to a DataLoader, which shuffles them. That seems like the right place to handle the shuffling rather than making it the responsibility of the utility function that returns the filenames.
Testing
This PR also adds unit tests for the reproducibility of the InteractiveTrajectoryCollector, DAggerTrainer, and the SimpleDAggerTrainer. In particular, each unit test consists of running a block of code twice with the same random seeds and checking that the code produces the same result each time.