Add REINFORCE implementation tutorial #155

siddarth-c · 2022-11-22T13:35:25Z

Description

Created a new tutorial depicting the new .step() function of gymnasium v26 using PyTorch. REINFORCE is employed to solve Mujoco's Reacher.

Type of change

Please delete options that are not relevant.

This change requires a documentation update

Checklist:

I have run the pre-commit checks with pre-commit run --all-files (see CONTRIBUTING.md instructions to set it up)
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

pseudo-rnd-thoughts

Thanks for the tutorial, it looks very helpful.
Could you fix the pre-commit issues and address the comments

docs/tutorials/reinforce_reacher_gym_v26.py

pseudo-rnd-thoughts

There is a number of issues when I build the tutorial, could you build the tutorial (see the readme.md) and look at the previous tutorials to fix the issues.

reinforce_reacher_gym_v26.rst:2: WARNING: Field list ends without a blank line; unexpected unindent.
reinforce_reacher_gym_v26.rst:14: ERROR: Unexpected indentation.
reinforce_reacher_gym_v26.rst:25: WARNING: Block quote ends without a blank line; unexpected unindent.
reinforce_reacher_gym_v26.rst:220: ERROR: Unexpected indentation.
reinforce_reacher_gym_v26.rst:224: WARNING: Definition list ends without a blank line; unexpected unindent.

siddarth-c · 2022-11-23T12:42:36Z

I have updated the files as per the readme.md
Will do better the next time, thanks for bearing with me!

pseudo-rnd-thoughts · 2022-11-23T14:10:59Z

No worries, currently when I build the tutorial, on the left hand side, all of the titles appear here.
I believe it is the way that you have structured the sections that causes this. could you fix
In addition, the same issues that get reported when building still appear and the final figure doesn't appear

siddarth-c · 2022-11-24T11:31:29Z

I have modified the structure of the titles and code following the Blackjack tutorial. And running through the tests mentioned in readme.md I do not get any warnings.

pseudo-rnd-thoughts · 2022-11-24T13:24:15Z

@siddarth-c I have made a number of upgrades to the tutorials. There are only a couple more thing before we can merge

Replace the top image of the agent with a gif of the trained agent completing the environment
Could you add an image for the policy network should the data flow. i.e., input data -> shared network -> split to mean network and std network -> output

siddarth-c · 2022-11-25T05:13:55Z

Done with the mentioned changes

pseudo-rnd-thoughts

That policy network figure is very nice, to confirm, this is not copied from someone else and is your own creation.
Also for the top gif, is this from the final agent? The agent doesn't seem to do very well in the environment

siddarth-c · 2022-11-26T10:37:26Z

The policy learned via REINFORCE is not optimal in Reacher (despite extensive hyperparameter searches).
So I have changed the environment to the Inverted Pendulum, where it is able to learn the optimal policy (achieves max reward of 1000). I hope this change is acceptable.

And yes, the policy network was designed by me

pseudo-rnd-thoughts

Amazing, thank you for the tutorial, we would be interested in anymore tutorials that you create. Probably more on the gym environment side than training though they are always helpful

siddarth-c added 3 commits November 22, 2022 18:58

Add files via upload

05b578e

Add files via upload

6e4ce02

Update reinforce_reacher_gym_v26.py

9e7ed08

pseudo-rnd-thoughts requested changes Nov 22, 2022

View reviewed changes

Add files via upload

cd038ca

pseudo-rnd-thoughts changed the title ~~V26 step tutorial~~ Add REINFORCE implementation tutorial Nov 23, 2022

pseudo-rnd-thoughts reviewed Nov 23, 2022

View reviewed changes

siddarth-c added 2 commits November 23, 2022 17:47

Update: Added pre-commit hooks

73aa906

Update: uploaded reward plot

c6ead5c

siddarth-c closed this Nov 23, 2022

siddarth-c reopened this Nov 23, 2022

siddarth-c added 3 commits November 24, 2022 16:26

Merge branch 'Farama-Foundation:main' into v26-step-tutorial

847a58c

Add files via upload

ffdddb7

Add files via upload

1b67983

Update reinforce_reacher_gym_v26.py

f134c1c

siddarth-c added 7 commits November 25, 2022 10:40

Merge branch 'Farama-Foundation:main' into v26-step-tutorial

53422f0

Delete reinforce_reacher_gym_v26.png

cafa81d

Delete reinforce_reacher_gym_v26_fig1.jpeg

701a20e

Delete reinforce_reacher_gym_v26_fig2.jpeg

900ab0c

Delete reinforce_reacher_gym_v26_fig3.png

907f67a

Add files via upload

8cd9c2d

Add files via upload

4062c7b

pseudo-rnd-thoughts approved these changes Nov 25, 2022

View reviewed changes

siddarth-c added 2 commits November 26, 2022 16:01

Delete reinforce_reacher_gym_v26_fig1.gif

7bff0c4

Delete reinforce_reacher_gym_v26_fig2.jpeg

6c0f239

siddarth-c added 5 commits November 26, 2022 16:01

Delete reinforce_reacher_gym_v26_fig3.jpeg

2477a12

Delete reinforce_reacher_gym_v26_fig4.png

b6cac56

Add files via upload

64285f9

Delete reinforce_reacher_gym_v26.py

9325d79

Changed environment: Reacher to Inverted Pendulum

260f3a8

pseudo-rnd-thoughts approved these changes Nov 26, 2022

View reviewed changes

Update reinforce_invpend_gym_v26.py

1b81efb

pseudo-rnd-thoughts merged commit 024c05c into Farama-Foundation:main Nov 26, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add REINFORCE implementation tutorial #155

Add REINFORCE implementation tutorial #155

siddarth-c commented Nov 22, 2022 •

edited

pseudo-rnd-thoughts left a comment

pseudo-rnd-thoughts left a comment

siddarth-c commented Nov 23, 2022

pseudo-rnd-thoughts commented Nov 23, 2022

siddarth-c commented Nov 24, 2022

pseudo-rnd-thoughts commented Nov 24, 2022

siddarth-c commented Nov 25, 2022

pseudo-rnd-thoughts left a comment

siddarth-c commented Nov 26, 2022

pseudo-rnd-thoughts left a comment •

edited

Add REINFORCE implementation tutorial #155

Add REINFORCE implementation tutorial #155

Conversation

siddarth-c commented Nov 22, 2022 • edited

Description

Type of change

Checklist:

pseudo-rnd-thoughts left a comment

Choose a reason for hiding this comment

pseudo-rnd-thoughts left a comment

Choose a reason for hiding this comment

siddarth-c commented Nov 23, 2022

pseudo-rnd-thoughts commented Nov 23, 2022

siddarth-c commented Nov 24, 2022

pseudo-rnd-thoughts commented Nov 24, 2022

siddarth-c commented Nov 25, 2022

pseudo-rnd-thoughts left a comment

Choose a reason for hiding this comment

siddarth-c commented Nov 26, 2022

pseudo-rnd-thoughts left a comment • edited

Choose a reason for hiding this comment

siddarth-c commented Nov 22, 2022 •

edited

pseudo-rnd-thoughts left a comment •

edited