Enhancement/train batch function #107

djbyrne · 2020-07-03T10:19:47Z

Before submitting

Was this discussed/approved via a Github issue? (no need for typos and docs improvements)
Did you read the contributor guideline, Pull Request section?
Did you make sure to update the docs?
Did you write any new necessary tests?

What does this PR do?

This is in relation to Lightning-AI/pytorch-lightning#2453 . Although this is a PL issue, further discussion showed that the issue would be handled with the current implementation of PL. This shows a proof of concept outlining a cleaner interface for online batch generation for RL and unsupervised learning

PR review

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

Did you have fun?

👍

…lightning-bolts

What Changed: - Custom train_batch method in VPG model - This generates a batch of data at each time step - Experience source no longer gets initialized with a device, instead the correct device is passed to the step() method in the train_batch function - Moved experience methods from rl.comon to datamodules

pep8speaks · 2020-07-03T10:19:55Z

Hello @djbyrne! Thanks for updating this PR.

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2020-07-11 13:10:37 UTC

codecov-commenter · 2020-07-03T10:29:34Z

Codecov Report

Merging #107 into master will increase coverage by 0.07%.
The diff coverage is 98.66%.

@@            Coverage Diff             @@
##           master     #107      +/-   ##
==========================================
+ Coverage   91.91%   91.98%   +0.07%     
==========================================
  Files          77       78       +1     
  Lines        3944     4018      +74     
==========================================
+ Hits         3625     3696      +71     
- Misses        319      322       +3

Flag	Coverage Δ
#unittests	`91.98% <98.66%> (+0.07%)`	⬆️

Impacted Files	Coverage Δ
pl_bolts/models/rl/n_step_dqn_model.py	`100.00% <ø> (ø)`
pl_bolts/datamodules/experience_source.py	`97.72% <97.72%> (ø)`
pl_bolts/datamodules/__init__.py	`100.00% <100.00%> (ø)`
pl_bolts/models/rl/common/experience.py	`97.08% <100.00%> (+0.14%)`	⬆️
pl_bolts/models/rl/double_dqn_model.py	`74.19% <100.00%> (ø)`
pl_bolts/models/rl/dqn_model.py	`82.35% <100.00%> (-0.18%)`	⬇️
pl_bolts/models/rl/noisy_dqn_model.py	`77.77% <100.00%> (ø)`
pl_bolts/models/rl/per_dqn_model.py	`80.48% <100.00%> (-0.47%)`	⬇️
...l_bolts/models/rl/vanilla_policy_gradient_model.py	`96.49% <100.00%> (-1.22%)`	⬇️
... and 1 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 024b574...04f02cd. Read the comment docs.

tests/datamodules/test_experience_sources.py

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

…/djbyrne/pytorch-lightning-bolts into enhancement/train_batch_function

justusschock

Thanks a lot! While this already looks really good and clean, I also added some questions.

pl_bolts/datamodules/experience_source.py

justusschock · 2020-07-06T12:29:22Z

pl_bolts/datamodules/experience_source.py

+
+        return experience, reward, done
+
+    def run_episode(self, device: torch.device) -> float:


is episode a common RL term for this? Intuitively I would have called this sequence...

It depends on the task. Most tasks are Episodic in some form and will have a termination state denoting the end of the episode. This function was originally used for carrying out a validation episode and is useful

pl_bolts/datamodules/experience_source.py

justusschock · 2020-07-06T12:30:43Z

pl_bolts/datamodules/experience_source.py

+        return reward, final_state, done
+
+
+class EpisodicExperienceStream(ExperienceSource, IterableDataset):


Same question about wording with episodic

justusschock · 2020-07-06T12:31:13Z

pl_bolts/models/rl/__init__.py

@@ -5,4 +5,4 @@
 from pl_bolts.models.rl.noisy_dqn_model import NoisyDQN
 from pl_bolts.models.rl.per_dqn_model import PERDQN
 from pl_bolts.models.rl.reinforce_model import Reinforce
-from pl_bolts.models.rl.vanilla_policy_gradient_model import PolicyGradient
+# from pl_bolts.models.rl.vanilla_policy_gradient_model import PolicyGradient


why did you change this one?

I meant to raise an issue with this, some of these imports in the inits are raising errors in my runs. I meant to look into specifically why it was happening. Will update this

pl_bolts/models/rl/common/experience.py

justusschock · 2020-07-06T12:32:43Z

pl_bolts/models/rl/vanilla_policy_gradient_model.py


        self.reward_list = []
        for _ in range(100):
-            self.reward_list.append(0)
+            self.reward_list.append(torch.tensor(0))


maybe add a device here?

pl_bolts/models/rl/vanilla_policy_gradient_model.py

Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>

Borda · 2020-07-08T20:44:09Z

from @djbyrne - there are currently two things blocking the latest PR for RL bolts

the package installs in CI are failing and the docs are not building. It looks like these are stemming from issues outside of the PR
Currently, all the models are imported in the rl __init__.py file. For whatever reason, I am unable to run the VPG model when VPG is imported through the init. However when I remove it from init, it works fine. I am sure there is a very obvious fix here but I cant seem to find it

…/djbyrne/pytorch-lightning-bolts into enhancement/train_batch_function

Donal and others added 8 commits June 24, 2020 07:56

Updated RL docs with latest models

9c06583

Merge branch 'master' of https://github.com/PyTorchLightning/pytorch-…

33be076

…lightning-bolts

Merge branch 'master' of https://github.com/PyTorchLightning/pytorch-…

fdc92f9

…lightning-bolts

Merge branch 'master' of https://github.com/PyTorchLightning/pytorch-…

682bbe6

…lightning-bolts

Merge branch 'master' of https://github.com/PyTorchLightning/pytorch-…

17073bc

…lightning-bolts

Merge branch 'master' of https://github.com/PyTorchLightning/pytorch-…

47e5fa0

…lightning-bolts

Updated other models to use train_batch interface

885f198

mergify bot requested a review from Borda July 3, 2020 10:20

Borda added the enhancement New feature or request label Jul 3, 2020

Borda reviewed Jul 3, 2020

View reviewed changes

tests/datamodules/test_experience_sources.py Outdated Show resolved Hide resolved

Borda requested a review from williamFalcon July 3, 2020 10:47

djbyrne and others added 4 commits July 6, 2020 08:46

Update tests/datamodules/test_experience_sources.py

896b032

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

Fixing lint errors

2baa02c

Merge branch 'enhancement/train_batch_function' of https://github.com…

08e71f7

…/djbyrne/pytorch-lightning-bolts into enhancement/train_batch_function

Fixed linting errors

db18cd8

justusschock reviewed Jul 6, 2020

View reviewed changes

djbyrne and others added 2 commits July 6, 2020 17:23

Update pl_bolts/datamodules/experience_source.py

0f5ca79

Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>

Resolved comments

5d9dfa6

Borda and others added 7 commits July 9, 2020 00:41

req

c3f62ac

Removed cyclic import of Agents from experience source

577569c

Merge branch 'enhancement/train_batch_function' of https://github.com…

fa658a4

…/djbyrne/pytorch-lightning-bolts into enhancement/train_batch_function

Updated reference of Experience to datamodules instead of the rl.common

2292528

timeout

13cc727

Commented out test_dev_dataset to test run times

d4c1cc7

undo commenting out of test_dev_datasets

04f02cd

williamFalcon merged commit ca38ad1 into Lightning-Universe:master Jul 11, 2020

djbyrne mentioned this pull request Jul 12, 2020

RL models clean up #112

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enhancement/train batch function #107

Enhancement/train batch function #107

djbyrne commented Jul 3, 2020

pep8speaks commented Jul 3, 2020 •

edited

codecov-commenter commented Jul 3, 2020 •

edited by codecov bot

justusschock left a comment

justusschock Jul 6, 2020

djbyrne Jul 6, 2020 •

edited

justusschock Jul 6, 2020

justusschock Jul 6, 2020

djbyrne Jul 6, 2020

justusschock Jul 6, 2020

Borda commented Jul 8, 2020


		return experience, reward, done

		def run_episode(self, device: torch.device) -> float:

		return reward, final_state, done


		class EpisodicExperienceStream(ExperienceSource, IterableDataset):

Enhancement/train batch function #107

Enhancement/train batch function #107

Conversation

djbyrne commented Jul 3, 2020

Before submitting

What does this PR do?

PR review

Did you have fun?

pep8speaks commented Jul 3, 2020 • edited

Comment last updated at 2020-07-11 13:10:37 UTC

codecov-commenter commented Jul 3, 2020 • edited by codecov bot

Codecov Report

justusschock left a comment

Choose a reason for hiding this comment

justusschock Jul 6, 2020

Choose a reason for hiding this comment

djbyrne Jul 6, 2020 • edited

Choose a reason for hiding this comment

justusschock Jul 6, 2020

Choose a reason for hiding this comment

justusschock Jul 6, 2020

Choose a reason for hiding this comment

djbyrne Jul 6, 2020

Choose a reason for hiding this comment

justusschock Jul 6, 2020

Choose a reason for hiding this comment

Borda commented Jul 8, 2020

pep8speaks commented Jul 3, 2020 •

edited

codecov-commenter commented Jul 3, 2020 •

edited by codecov bot

djbyrne Jul 6, 2020 •

edited