-
Notifications
You must be signed in to change notification settings - Fork 4.4k
Curriculum learning reward thresholding bug fix #1141
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
7efabd7 to
bedefab
Compare
| if ((progress > self.data['thresholds'][self.lesson_num]) and | ||
| (self.lesson_length > self.data['min_lesson_length'])): | ||
| if progress > self.data['thresholds'][self.lesson_num]: | ||
| print(progress, 'is above the threshold, successfully incrementing lesson') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we print here instead of logging?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Woops, that was a debug print. Let me remove that.
|
Merging, approved offline. |
|
Actually merging now, approved offline again. |
Summary
Curriculum thresholding based on rewards was shown to be broken #895. This PR redefines
min_lesson_lengthto be the minimum number of episodes that must be completed in a lesson. Once the minimum number of episodes has completed, the curriculum will become eligible for increment.Changes
reward_buffer. This buffer is a fixed-size queue that stores the cumulative reward given by the most recent episodes completed by the trainer.trainer_controller.pyuses the average cumulative reward in thereward_bufferto determine whether the reward threshold has been met.trainer_controller.pywill set the size of the capacity ofreward_bufferwhen constructing the PPOTrainer. The size of the reward buffer must be at least as large as themin_lesson_lengthin the curriculum, and in this implementation the size of the buffer is set to exactly that.lesson_lengthfield has been removed fromCurriculumsince it was a vague metric. The burden of ensuring that the minimum number of episodes have completed is ontrainer_controller.py.