Multi episode value column calculation #19

Laurenstc · 2022-09-12T08:55:54Z

Issue #, if available:

Description of changes:

Allow automatic calculation of q-value for WIDataframe for multi process problems

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

github-actions · 2022-09-12T08:58:24Z

Coverage Report

File	Stmts	Miss	Cover	Missing
src/a2rl
__init__.py	20	1	95%	34
simulator.py	577	56	90%	456–458, 471, 537, 543, 559–594, 600, 912, 1011, 1026, 1036, 1041, 1057, 1075, 1097, 1123, 1144, 1150, 1167, 1171, 1177, 1187, 1194, 1217, 1223, 1302, 1309, 1364, 1371, 1377, 1384, 1387, 1498, 1514–1516, 1525, 1549, 1588, 1591
tokenizer.py	116	2	98%	64–65
utils.py	161	22	86%	51, 60–63, 74–76, 108, 147, 164, 179–181, 336–339, 475–484, 544, 558
src/a2rl/experimental/lightgpt
lr_decay.py	20	1	95%	30
model.py	116	2	98%	260, 263
simulator.py	35	1	97%	162
src/a2rl/mingpt
model.py	118	4	97%	57, 203, 208, 230
trainer.py	84	14	83%	50–51, 55–57, 108–111, 116, 124–126, 134, 140–141
TOTAL	1549	103	93%

Tests	Skipped	Failures	Errors	Time
234	0 💤	0 ❌	0 🔥	44.764s ⏱️

src/a2rl/_dataframe.py

verdimrc

Thanks for this PR to address a frequent ask. Could you also add a unit test for this new function?

Laurenstc · 2022-09-13T08:03:14Z

add a unit test
I've attempted to add a test - not entirely sure whether this is as expected! Let me know @verdimrc

Laurenstc · 2022-09-13T08:24:26Z

Thanks for this PR to address a frequent ask. Could you also add a unit test for this new function?

I've added the tests! Unsure whether my new fixtures are running properly thought!

- Change signature: episode_identifier defaults to non-empty string - Remove (1-alpha) term, in-line with PR #21 - Update test acceptance criteria, and make sure to cover the case where df already has value which is going to be over-written.

verdimrc · 2022-09-13T15:33:28Z

src/a2rl/_dataframe.py

+        alpha: float = 0.1,
+        gamma: float = 0.6,
+        value_col: str = "value",
+        episode_identifier: str = "episode",


Changed default to non-empty string.

verdimrc · 2022-09-13T15:34:49Z

src/a2rl/_dataframe.py

+
+                    if sarsa:
+                        next_value = q_table[next_state, np.argmax(q_table[next_state])]
+                        new_value = old_value + alpha * (reward + gamma * next_value - old_value)


Remove the (1-alpha) term from old_value. Aligned with canonical q-value or sarsa formula, and in-line with PR #21 which fixes add_value().

verdimrc · 2022-09-13T15:35:21Z

src/a2rl/_dataframe.py

+                        new_value = old_value + alpha * (reward + gamma * next_value - old_value)
+                    else:
+                        next_max = np.max(q_table[next_state])
+                        new_value = old_value + alpha * (reward + gamma * next_max - old_value)


Remove the (1-alpha) term from old_value. Aligned with canonical q-value or sarsa formula, and in-line with PR #21 which fixes add_value().

verdimrc

@Laurenstc: your changes approved.

I also proposed some changes, kindly review if they make sense (and revert if not). Whatever the case, you're good to hit the merge button :)

verdimrc self-requested a review September 13, 2022 02:32

verdimrc reviewed Sep 13, 2022

View reviewed changes

src/a2rl/_dataframe.py Outdated Show resolved Hide resolved

verdimrc suggested changes Sep 13, 2022

View reviewed changes

verdimrc self-assigned this Sep 13, 2022

Laurenstc added 5 commits September 13, 2022 09:56

multi episode q val

5f53e69

small update on docstring

8b8eb6c

changes to PR for fixing linter

f020e3a

pytest changes for multi-episode process

1ca3bbb

pytest changes for multi-episode process

798abc7

Laurenstc force-pushed the feature/multi-epoch-q-value branch from 2bc4e0c to 798abc7 Compare September 13, 2022 08:02

Laurenstc added 2 commits September 13, 2022 10:17

fixing tests fixtures

dc48e7a

fixture mistake

9cbcfcb

Laurenstc closed this Sep 13, 2022

Laurenstc reopened this Sep 13, 2022

Laurenstc requested a review from verdimrc September 13, 2022 12:07

Update multi-episode add value and its tests

c46c4ce

- Change signature: episode_identifier defaults to non-empty string - Remove (1-alpha) term, in-line with PR #21 - Update test acceptance criteria, and make sure to cover the case where df already has value which is going to be over-written.

verdimrc reviewed Sep 13, 2022

View reviewed changes

Add missing assert to test_add_multi_episode_value_reward_only()

62d7be5

verdimrc approved these changes Sep 13, 2022

View reviewed changes

Laurenstc merged commit 0ac93f9 into main Sep 14, 2022

verdimrc deleted the feature/multi-epoch-q-value branch September 14, 2022 07:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multi episode value column calculation #19

Multi episode value column calculation #19

Laurenstc commented Sep 12, 2022

github-actions bot commented Sep 12, 2022 •

edited

verdimrc left a comment

Laurenstc commented Sep 13, 2022

Laurenstc commented Sep 13, 2022

verdimrc Sep 13, 2022

verdimrc Sep 13, 2022

verdimrc Sep 13, 2022

verdimrc left a comment

Multi episode value column calculation #19

Multi episode value column calculation #19

Conversation

Laurenstc commented Sep 12, 2022

github-actions bot commented Sep 12, 2022 • edited

verdimrc left a comment

Choose a reason for hiding this comment

Laurenstc commented Sep 13, 2022

Laurenstc commented Sep 13, 2022

verdimrc Sep 13, 2022

Choose a reason for hiding this comment

verdimrc Sep 13, 2022

Choose a reason for hiding this comment

verdimrc Sep 13, 2022

Choose a reason for hiding this comment

verdimrc left a comment

Choose a reason for hiding this comment

github-actions bot commented Sep 12, 2022 •

edited