Curriculum learning reward thresholding bug fix #1141

pderichai · 2018-08-28T20:18:43Z

Summary

Curriculum thresholding based on rewards was shown to be broken #895. This PR redefines min_lesson_length to be the minimum number of episodes that must be completed in a lesson. Once the minimum number of episodes has completed, the curriculum will become eligible for increment.

Changes

The PPOTrainer now holds a reward_buffer. This buffer is a fixed-size queue that stores the cumulative reward given by the most recent episodes completed by the trainer.
trainer_controller.py uses the average cumulative reward in the reward_buffer to determine whether the reward threshold has been met. trainer_controller.py will set the size of the capacity of reward_buffer when constructing the PPOTrainer. The size of the reward buffer must be at least as large as the min_lesson_length in the curriculum, and in this implementation the size of the buffer is set to exactly that.
The lesson_length field has been removed from Curriculum since it was a vague metric. The burden of ensuring that the minimum number of episodes have completed is on trainer_controller.py.

awjuliani · 2018-09-04T21:24:05Z

ml-agents/mlagents/trainers/curriculum.py

-            if ((progress > self.data['thresholds'][self.lesson_num]) and
-                    (self.lesson_length > self.data['min_lesson_length'])):
+            if progress > self.data['thresholds'][self.lesson_num]:
+                print(progress, 'is above the threshold, successfully incrementing lesson')


Why do we print here instead of logging?

Woops, that was a debug print. Let me remove that.

…bug-fix

pderichai · 2018-09-04T23:07:26Z

Merging, approved offline.

pderichai · 2018-09-04T23:59:37Z

Actually merging now, approved offline again.

pderichai requested review from awjuliani and vincentpierre August 28, 2018 20:18

Deric Pang added 3 commits August 31, 2018 13:38

Fixing curriculum.py line lengths.

2371f97

Fixing meta_curriculum.py line lengths.

49529c2

Using fixed length buffer to store rewards.

bedefab

pderichai force-pushed the develop-cl-bug-fix branch from 7efabd7 to bedefab Compare August 31, 2018 20:41

pderichai changed the base branch from develop to release-v0.5 August 31, 2018 20:41

pderichai removed the request for review from vincentpierre August 31, 2018 20:44

Changing default config.

41ec882

awjuliani reviewed Sep 4, 2018

View reviewed changes

Deric Pang added 3 commits September 4, 2018 14:32

Removing print statements.

60ed256

Merge remote-tracking branch 'upstream/release-v0.5' into develop-cl-…

45ec020

…bug-fix

Changing default curriculum values. Updating docs.

8d126fe

Deric Pang added 5 commits September 4, 2018 16:17

Adding more details to documentation.

c2ceb69

Added migration instructions for curriculum learning.

cf8bd6a

Small style cleanup in curriculum.

251975e

Setting WallJump Academy max steps to 0.

5133ee4

Replacing progress with measure_val everywhere.

79d321f

pderichai merged commit a4e7140 into release-v0.5 Sep 5, 2018

pderichai deleted the develop-cl-bug-fix branch September 5, 2018 00:00

pderichai mentioned this pull request Sep 5, 2018

Reward thresholding not working #895

Closed

github-actions bot locked as resolved and limited conversation to collaborators May 19, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Curriculum learning reward thresholding bug fix #1141

Curriculum learning reward thresholding bug fix #1141

Uh oh!

pderichai commented Aug 28, 2018

Uh oh!

awjuliani Sep 4, 2018

Uh oh!

pderichai Sep 4, 2018

Uh oh!

pderichai commented Sep 4, 2018

Uh oh!

pderichai commented Sep 4, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Curriculum learning reward thresholding bug fix #1141

Curriculum learning reward thresholding bug fix #1141

Uh oh!

Conversation

pderichai commented Aug 28, 2018

Summary

Changes

Uh oh!

awjuliani Sep 4, 2018

Choose a reason for hiding this comment

Uh oh!

pderichai Sep 4, 2018

Choose a reason for hiding this comment

Uh oh!

pderichai commented Sep 4, 2018

Uh oh!

pderichai commented Sep 4, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants