Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhancement/train batch function #107

Merged

Conversation

djbyrne
Copy link
Contributor

@djbyrne djbyrne commented Jul 3, 2020

Before submitting

  • Was this discussed/approved via a Github issue? (no need for typos and docs improvements)
  • Did you read the contributor guideline, Pull Request section?
  • Did you make sure to update the docs?
  • Did you write any new necessary tests?

What does this PR do?

This is in relation to Lightning-AI/pytorch-lightning#2453 . Although this is a PL issue, further discussion showed that the issue would be handled with the current implementation of PL. This shows a proof of concept outlining a cleaner interface for online batch generation for RL and unsupervised learning

PR review

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

Did you have fun?

👍

@pep8speaks
Copy link

pep8speaks commented Jul 3, 2020

Hello @djbyrne! Thanks for updating this PR.

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2020-07-11 13:10:37 UTC

@mergify mergify bot requested a review from Borda July 3, 2020 10:20
@codecov-commenter
Copy link

codecov-commenter commented Jul 3, 2020

Codecov Report

Merging #107 into master will increase coverage by 0.07%.
The diff coverage is 98.66%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #107      +/-   ##
==========================================
+ Coverage   91.91%   91.98%   +0.07%     
==========================================
  Files          77       78       +1     
  Lines        3944     4018      +74     
==========================================
+ Hits         3625     3696      +71     
- Misses        319      322       +3     
Flag Coverage Δ
#unittests 91.98% <98.66%> (+0.07%) ⬆️
Impacted Files Coverage Δ
pl_bolts/models/rl/n_step_dqn_model.py 100.00% <ø> (ø)
pl_bolts/datamodules/experience_source.py 97.72% <97.72%> (ø)
pl_bolts/datamodules/__init__.py 100.00% <100.00%> (ø)
pl_bolts/models/rl/common/experience.py 97.08% <100.00%> (+0.14%) ⬆️
pl_bolts/models/rl/double_dqn_model.py 74.19% <100.00%> (ø)
pl_bolts/models/rl/dqn_model.py 82.35% <100.00%> (-0.18%) ⬇️
pl_bolts/models/rl/noisy_dqn_model.py 77.77% <100.00%> (ø)
pl_bolts/models/rl/per_dqn_model.py 80.48% <100.00%> (-0.47%) ⬇️
...l_bolts/models/rl/vanilla_policy_gradient_model.py 96.49% <100.00%> (-1.22%) ⬇️
... and 1 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 024b574...04f02cd. Read the comment docs.

@Borda Borda added the enhancement New feature or request label Jul 3, 2020
@Borda Borda requested a review from williamFalcon July 3, 2020 10:47
Copy link
Member

@justusschock justusschock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot! While this already looks really good and clean, I also added some questions.

pl_bolts/datamodules/experience_source.py Outdated Show resolved Hide resolved
pl_bolts/datamodules/experience_source.py Outdated Show resolved Hide resolved

return experience, reward, done

def run_episode(self, device: torch.device) -> float:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is episode a common RL term for this? Intuitively I would have called this sequence...

Copy link
Contributor Author

@djbyrne djbyrne Jul 6, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It depends on the task. Most tasks are Episodic in some form and will have a termination state denoting the end of the episode. This function was originally used for carrying out a validation episode and is useful

pl_bolts/datamodules/experience_source.py Show resolved Hide resolved
return reward, final_state, done


class EpisodicExperienceStream(ExperienceSource, IterableDataset):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question about wording with episodic

@@ -5,4 +5,4 @@
from pl_bolts.models.rl.noisy_dqn_model import NoisyDQN
from pl_bolts.models.rl.per_dqn_model import PERDQN
from pl_bolts.models.rl.reinforce_model import Reinforce
from pl_bolts.models.rl.vanilla_policy_gradient_model import PolicyGradient
# from pl_bolts.models.rl.vanilla_policy_gradient_model import PolicyGradient
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why did you change this one?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant to raise an issue with this, some of these imports in the inits are raising errors in my runs. I meant to look into specifically why it was happening. Will update this

pl_bolts/models/rl/common/experience.py Show resolved Hide resolved

self.reward_list = []
for _ in range(100):
self.reward_list.append(0)
self.reward_list.append(torch.tensor(0))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe add a device here?

pl_bolts/models/rl/vanilla_policy_gradient_model.py Outdated Show resolved Hide resolved
djbyrne and others added 2 commits July 6, 2020 17:23
Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
@Borda
Copy link
Member

Borda commented Jul 8, 2020

from @djbyrne - there are currently two things blocking the latest PR for RL bolts

  1. the package installs in CI are failing and the docs are not building. It looks like these are stemming from issues outside of the PR
  2. Currently, all the models are imported in the rl __init__.py file. For whatever reason, I am unable to run the VPG model when VPG is imported through the init. However when I remove it from init, it works fine. I am sure there is a very obvious fix here but I cant seem to find it

@williamFalcon williamFalcon merged commit ca38ad1 into Lightning-Universe:master Jul 11, 2020
@djbyrne djbyrne mentioned this pull request Jul 12, 2020
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants