Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi episode value column calculation #19

Merged
merged 9 commits into from
Sep 14, 2022
Merged

Conversation

Laurenstc
Copy link
Contributor

Issue #, if available:

Description of changes:

  • Allow automatic calculation of q-value for WIDataframe for multi process problems

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@github-actions
Copy link

github-actions bot commented Sep 12, 2022

Coverage

Coverage Report
FileStmtsMissCoverMissing
src/a2rl
   __init__.py20195%34
   simulator.py5775690%456–458, 471, 537, 543, 559–594, 600, 912, 1011, 1026, 1036, 1041, 1057, 1075, 1097, 1123, 1144, 1150, 1167, 1171, 1177, 1187, 1194, 1217, 1223, 1302, 1309, 1364, 1371, 1377, 1384, 1387, 1498, 1514–1516, 1525, 1549, 1588, 1591
   tokenizer.py116298%64–65
   utils.py1612286%51, 60–63, 74–76, 108, 147, 164, 179–181, 336–339, 475–484, 544, 558
src/a2rl/experimental/lightgpt
   lr_decay.py20195%30
   model.py116298%260, 263
   simulator.py35197%162
src/a2rl/mingpt
   model.py118497%57, 203, 208, 230
   trainer.py841483%50–51, 55–57, 108–111, 116, 124–126, 134, 140–141
TOTAL154910393% 

Tests Skipped Failures Errors Time
234 0 💤 0 ❌ 0 🔥 44.764s ⏱️

@verdimrc verdimrc self-requested a review September 13, 2022 02:32
src/a2rl/_dataframe.py Outdated Show resolved Hide resolved
Copy link
Contributor

@verdimrc verdimrc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this PR to address a frequent ask. Could you also add a unit test for this new function?

@verdimrc verdimrc self-assigned this Sep 13, 2022
@Laurenstc
Copy link
Contributor Author

add a unit test
I've attempted to add a test - not entirely sure whether this is as expected! Let me know @verdimrc

@Laurenstc
Copy link
Contributor Author

Thanks for this PR to address a frequent ask. Could you also add a unit test for this new function?

I've added the tests! Unsure whether my new fixtures are running properly thought!

- Change signature: episode_identifier defaults to non-empty string
- Remove (1-alpha) term, in-line with PR #21
- Update test acceptance criteria, and make sure to cover the case
  where df already has value which is going to be over-written.
alpha: float = 0.1,
gamma: float = 0.6,
value_col: str = "value",
episode_identifier: str = "episode",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed default to non-empty string.


if sarsa:
next_value = q_table[next_state, np.argmax(q_table[next_state])]
new_value = old_value + alpha * (reward + gamma * next_value - old_value)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove the (1-alpha) term from old_value. Aligned with canonical q-value or sarsa formula, and in-line with PR #21 which fixes add_value().

new_value = old_value + alpha * (reward + gamma * next_value - old_value)
else:
next_max = np.max(q_table[next_state])
new_value = old_value + alpha * (reward + gamma * next_max - old_value)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove the (1-alpha) term from old_value. Aligned with canonical q-value or sarsa formula, and in-line with PR #21 which fixes add_value().

Copy link
Contributor

@verdimrc verdimrc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Laurenstc: your changes approved.

I also proposed some changes, kindly review if they make sense (and revert if not). Whatever the case, you're good to hit the merge button :)

@Laurenstc Laurenstc merged commit 0ac93f9 into main Sep 14, 2022
@verdimrc verdimrc deleted the feature/multi-epoch-q-value branch September 14, 2022 07:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants