-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multi episode value column calculation #19
Conversation
Coverage Report
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this PR to address a frequent ask. Could you also add a unit test for this new function?
2bc4e0c
to
798abc7
Compare
|
I've added the tests! Unsure whether my new fixtures are running properly thought! |
- Change signature: episode_identifier defaults to non-empty string - Remove (1-alpha) term, in-line with PR #21 - Update test acceptance criteria, and make sure to cover the case where df already has value which is going to be over-written.
alpha: float = 0.1, | ||
gamma: float = 0.6, | ||
value_col: str = "value", | ||
episode_identifier: str = "episode", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed default to non-empty string.
|
||
if sarsa: | ||
next_value = q_table[next_state, np.argmax(q_table[next_state])] | ||
new_value = old_value + alpha * (reward + gamma * next_value - old_value) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove the (1-alpha) term from old_value
. Aligned with canonical q-value or sarsa formula, and in-line with PR #21 which fixes add_value()
.
new_value = old_value + alpha * (reward + gamma * next_value - old_value) | ||
else: | ||
next_max = np.max(q_table[next_state]) | ||
new_value = old_value + alpha * (reward + gamma * next_max - old_value) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove the (1-alpha) term from old_value
. Aligned with canonical q-value or sarsa formula, and in-line with PR #21 which fixes add_value()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Laurenstc: your changes approved.
I also proposed some changes, kindly review if they make sense (and revert if not). Whatever the case, you're good to hit the merge button :)
Issue #, if available:
Description of changes:
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.