Skip to content

Conversation

@ervteng
Copy link
Contributor

@ervteng ervteng commented Jun 11, 2019

We aren't clearing the List called self.cumulative_returns_since_policy_update when we update the policy. This is used to compute the mean rewards to write to CSV, and it just gets longer and longer through training.

This PR clears it when we update the policy.

Before the CSV file's mean rewards would lag much behind the rest of the code since this buffer was never cleared.
@ervteng ervteng changed the base branch from master to develop June 11, 2019 01:37
@ervteng ervteng requested a review from xiaomaogy June 11, 2019 01:38
@xiaomaogy xiaomaogy merged commit c5226f6 into develop Jun 11, 2019
@xiaomaogy xiaomaogy deleted the develop-fix-csvwriting branch June 11, 2019 17:56
sankalp04 pushed a commit that referenced this pull request Jun 21, 2019
Before the CSV file's mean rewards would lag much behind the rest of the code since this buffer was never cleared.
@github-actions github-actions bot locked as resolved and limited conversation to collaborators May 18, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants