Add MuJoCo Robotics Envs HER+TQC trained agents #71

sgillen · 2021-03-12T05:22:08Z

I did get around to this eventually :P.

Adding trained agents for her + sac on the mujoco robotics environments.
I left in the best_model too, this only matters for FetchSlide, where the best agents gets around 50% success, compared to 20% for the latest. The other three environments all get to 100%. I think this roughly matches the results from the original HER paper with DDPG.

Description

Updated hyperparams to the her.yml file, and added trained agents to the rl-trained-agents submodule.

Checklist:

[ x] I've read the CONTRIBUTION guide (required)
[ N/A ??? ] I have updated the changelog accordingly (required).
My change requires a change to the documentation.
I have updated the tests accordingly (required for a bug fix or a new feature).
I have updated the documentation accordingly.
I have reformatted the code using make format (required)
I have checked the codestyle using make check-codestyle and make lint (required)
I have ensured make pytest and make type both pass. (required)

Note: we are using a maximum length of 127 characters per line

This change is

sgillen · 2021-03-12T05:26:38Z

Looks like your checks are failing because I committed to the rl-trained-agents submodule, let me know if there is another way I should structure this commit.

araffin · 2021-03-12T09:46:55Z

Hello,
thanks for the PR =) and sorry for the lack of clarity, next time would be better to discuss everything before in an issue (so we don't do the job twice ;)).

To sum up what should be done:

please use master version of the rl zoo (I recently merged most of the models there and updated the hyperparameters for HER)
please do not change the hyperparameters and use TQC (from SB3 contrib) for training. The only hyperparameter you may change is the n_timesteps for FetchSlide (probably 2e6 is more appropriate)
you need to do a first PR here: https://github.com/DLR-RM/rl-trained-agents (where the agents are stored) and it should contain the trained agent, the monitor file and the evaluation file, all created automatically by the zoo. You can delete the best model to save some spaces too.
the changelog is here:https://github.com/DLR-RM/rl-baselines3-zoo/blob/master/CHANGELOG.md

last thing, once your PR in the rl-trained-agents repo is merged, please run python -m utils.benchmark (cf readme) to update the benchmark file.

sgillen · 2021-03-12T20:20:27Z

OK, that mostly makes sense.

please use master version of the rl zoo (I recently merged most of the models there and updated the hyperparameters for HER)

Maybe I'm confused, but this is a PR for the master branch? I cloned the latest (after you merged the models) and reran her with the params I had (mostly found from the original rl-baselines-zoo). I chose those because when I tried whichever params where in her last week all tasks except FetchReach failed (as in, ran fine but never got good reward on the tasks). But I have some time today so I'll run the current params with the current sb3 and make a new PR with the steps you detailed above if it works.

Just to check, it looks like on the current master of rl-baselines3-zoo the her hyperparams use tqc for every Fetch* env except FetchReach which uses sac. Is this intentional?

araffin · 2021-03-12T21:05:59Z

Maybe I'm confused, but this is a PR for the master branch?

sorry, I read it too quickly. That's fine then ;)

I chose those because when I tried whichever params where in her last week all tasks except FetchReach failed

The one that are there (TQC + 3 layers + some additional custom params) I remember testing them in a google colab and it worked back then (in around 4e5 timesteps for Pick and Place) (but I don't have a proper license anymore...).

Just to check, it looks like on the current master of rl-baselines3-zoo the her hyperparams use tqc for every Fetch* env except FetchReach which uses sac. Is this intentional?

Yes, TQC is SAC + Distributional RL. FetchReach is super simple to solve (in 5 minutes normally), so the algorithm choice does not matter much here.

araffin · 2021-03-13T10:42:15Z

Last thing I forgot to mentioned (but I think you are already doing it): you should use master version of SB3 (1.0rc2)

EDIT: it should normally change nothing, it mostly for consistency

araffin · 2021-03-13T10:47:26Z

More importantly: what version of python/gym/Mujoco are you using? (we should also document that somewhere)
Please use python 3.6 if possible (as we can use custom objects to load the trained model in python 3.8+ but the other way around there is a pickle incompatibility...)

sgillen · 2021-03-13T19:37:02Z

Ok, look like everything worked well this time, but I want to re-run FetchSlide with more time.

python==3.6.10
gym==0.18.0
mujoco_py==1.50.1.0

Which corresponds to mujoco 1.5 (not 2.0) due to openai/gym#1541. Not sure that's relevant here but I've just been using mujoco 1.5 ever since, never needed any of the new features. gym was downgraded as well to accommodate this.

And stable-baselines3 is at the latest commit from the github on master (which corresponds to 1.0rc2)

Like I said I am going to re run at least FetchSlide, let me know soon if you want to changes any of the above. I have a mujoco 2.0 install / key that works just fine too, and changing the rest is easy. When that finishes (probably another whole day...) I will open the pull request in rl-trained-agents/. Do you want the PR here to be a new one? or should I just overwrite the commits here and keep this PR?

araffin · 2021-03-13T20:01:55Z

Ok, look like everything worked well this time, but I want to re-run FetchSlide with more time.

Good to hear =)
Yes, FetchSlide is the hardest task. Don't hesitate to double or even more the budget (1e6 is pretty small compared to what was used in the paper).

python==3.6.10
gym==0.18.0

perfect

Which corresponds to mujoco 1.5 (not 2.0) due to openai/gym#1541. Not sure that's relevant here but I've just been using mujoco 1.5 ever since, never needed any of the new features. gym was downgraded as well to accommodate this.

yes, I'm aware of that issue (but I think it did not see much change when I was using mujoco 2 with robotics envs).
So, mujoco version does not matter much, what matter is to document which one we used ;)

Do you want the PR here to be a new one? or should I just overwrite the commits here and keep this PR?

Let's keep that one

araffin · 2021-03-13T20:30:15Z

I just realized you will need to temporary comment out this line for updating the benchmark file: https://github.com/DLR-RM/rl-baselines3-zoo/blob/master/utils/benchmark.py#L47

(this is because we don't have a mujoco license on the CI server).

…add more time

araffin · 2021-03-15T08:50:17Z

benchmark.md

@@ -91,7 +95,6 @@ and also allow users to have access to pretrained agents.*
 |qrdqn|BeamRiderNoFrameskip-v4    |  17122.941| 10769.997|10M        |        596483|           17|
 |qrdqn|BreakoutNoFrameskip-v4     |    393.600|    79.828|10M        |        579711|           40|
 |qrdqn|CartPole-v1                |    500.000|     0.000|50k        |        150000|          300|
-|qrdqn|EnduroNoFrameskip-v4       |   3231.200|  1311.801|10M        |        585728|            5|


it looks like I forgot to push the benchmark log of that one... (I will do that soon)
Please do the same for the robotics envs (files are in logs/benchmark/, you may need git add -f for that)

araffin · 2021-03-15T08:51:13Z

Apart from the missing entry in changelog (and the missing benchmark log) LGTM =)
(I will push the QR-DQN logs soon and then also update the README)

sgillen · 2021-03-15T18:47:10Z

Ok, probably didn't need to be three commits on my part but there you go...

I took a look at the changelog and wasn't entirely sure where / what to add, figure it's faster for you to add something in then for me to ask what.

araffin · 2021-03-15T18:50:49Z

Ok, probably didn't need to be three commits on my part but there you go...

No worry, commits will be squashed at the end.

I took a look at the changelog and wasn't entirely sure where / what to add, figure it's faster for you to add something in then for me to ask what.

I'll do that.

I also plotted the training success rate (which should be higher at test time) using:

python scripts/plot_train.py -a her -e Fetch -y success -f rl-trained-agents/ -w 500

araffin

LGTM, thanks =)

caishanglei · 2021-06-30T10:01:30Z

How are the parameters of the pickplace environment using TQC+HER set? Why do I not work during training? Are the hyperparameters in the initial download code already optimal?

ArashVahabpour · 2021-08-20T19:31:30Z

can someone guide me pls how I can enjoy the pretrained model?

Miffyli · 2021-08-20T19:34:44Z

I will let @araffin answer these, but he is currently on vacation, so please give him some time :)

ArashVahabpour · 2021-08-20T19:43:12Z

thanks for your prompt response. I have some urgency for using this... if someone can help in the mean time I will be very thankful!

Miffyli · 2021-08-20T19:47:14Z

Seems like these instructions should work out of the box: https://github.com/DLR-RM/rl-baselines3-zoo#enjoy-a-trained-agent

ArashVahabpour · 2021-08-20T19:50:38Z

the issue is that HER is not an algorithm from sb2 onwards
but @araffin has stored some trained models for HER solving Fetch problems.

Miffyli · 2021-08-20T19:51:51Z

the issue is that HER is not an algorithm from sb2 onwards

HER is included in SB3 and TQC is in contrib

ArashVahabpour · 2021-08-20T20:43:44Z

@Miffyli thanks so much. I think HER has some problem and @araffin should clarify in documentations.
But I got TQC working. For reference of others:
python enjoy.py --algo tqc --env FetchPickAndPlace-v1

araffin · 2021-08-23T09:31:42Z

I think HER has some problem and @araffin should clarify in documentations.

Could you elaborate?

the issue is that HER is not an algorithm from sb2 onwards

yes, it is documented both in SB3 changelog and in the HER hyperparameter file: "# NOTE: STARTING WITH SB3 >= 1.1.0, because HER is now HerReplayBuffer,"

ArashVahabpour · 2021-08-24T04:38:09Z

I think HER has some problem and @araffin should clarify in documentations.

Could you elaborate?

the issue is that HER is not an algorithm from sb2 onwards

yes, it is documented both in SB3 changelog and in the HER hyperparameter file: "# NOTE: STARTING WITH SB3 >= 1.1.0, because HER is now HerReplayBuffer,"

Sure, I meant I couldn't find a set of input arguments by which I could load and enjoy the trained weights stored in "her" directory. Could you clarify for everyone's reference. Thanks!

araffin · 2021-08-24T09:11:53Z

yes, I probably need to update the README.
But as a general rule, all the available agents are listed in https://github.com/DLR-RM/rl-baselines3-zoo/blob/master/benchmark.md

I'm also thinking about removing the her folder...

matinmoezzi · 2023-03-21T17:31:06Z

Hi,
Could you please share the training command? I am wondering if there is a way to increase the training speed and utilize all CPU cores.
There is a command in the OpenAI baselines repo using MPI:
mpirun -np 19 python -m baselines.run --num_env=2 --alg=her
This command uses 19 physical CPU cores.

araffin · 2023-03-22T12:14:42Z

Could you please share the training command?

https://stable-baselines3.readthedocs.io/en/master/modules/her.html#how-to-replicate-the-results

I am wondering if there is a way to increase the training speed and utilize all CPU cores.

See DLR-RM/stable-baselines3#704 (comment)
We don't offer mpi acceleration, but if you use SBX (see comment) with latest SB3 master version, you should be able to have up to a 3x speed boost.

araffin changed the title ~~Add mujoco robotics her+sac agents and hyper params~~ Add MuJoCo Robotics Envs HER+TQC trained agents Mar 13, 2021

Added HER+TQC robotics benchmarks + update FetchSlide hyperparams to …

8d64471

…add more time

araffin reviewed Mar 15, 2021

View reviewed changes

araffin and others added 4 commits March 15, 2021 10:15

Merge branch 'master' into master

01462f9

Merge in Enduro Logs

692b8f8

add in HER+TQC Fetch env logs

af62603

Merge https://github.com/sgillen/rl-baselines3-zoo

1bc22d1

araffin approved these changes Mar 15, 2021

View reviewed changes

araffin merged commit bca831b into DLR-RM:master Mar 15, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add MuJoCo Robotics Envs HER+TQC trained agents #71

Add MuJoCo Robotics Envs HER+TQC trained agents #71

sgillen commented Mar 12, 2021 •

edited

sgillen commented Mar 12, 2021

araffin commented Mar 12, 2021

sgillen commented Mar 12, 2021 •

edited

araffin commented Mar 12, 2021

araffin commented Mar 13, 2021 •

edited

araffin commented Mar 13, 2021 •

edited

sgillen commented Mar 13, 2021 •

edited

araffin commented Mar 13, 2021

araffin commented Mar 13, 2021

araffin Mar 15, 2021

araffin commented Mar 15, 2021

sgillen commented Mar 15, 2021 •

edited

araffin commented Mar 15, 2021

araffin left a comment

caishanglei commented Jun 30, 2021

ArashVahabpour commented Aug 20, 2021 •

edited

Miffyli commented Aug 20, 2021

ArashVahabpour commented Aug 20, 2021

Miffyli commented Aug 20, 2021

ArashVahabpour commented Aug 20, 2021 •

edited

Miffyli commented Aug 20, 2021

ArashVahabpour commented Aug 20, 2021

araffin commented Aug 23, 2021

ArashVahabpour commented Aug 24, 2021 •

edited

araffin commented Aug 24, 2021

matinmoezzi commented Mar 21, 2023

araffin commented Mar 22, 2023

Add MuJoCo Robotics Envs HER+TQC trained agents #71

Add MuJoCo Robotics Envs HER+TQC trained agents #71

Conversation

sgillen commented Mar 12, 2021 • edited

Description

Checklist:

sgillen commented Mar 12, 2021

araffin commented Mar 12, 2021

sgillen commented Mar 12, 2021 • edited

araffin commented Mar 12, 2021

araffin commented Mar 13, 2021 • edited

araffin commented Mar 13, 2021 • edited

sgillen commented Mar 13, 2021 • edited

araffin commented Mar 13, 2021

araffin commented Mar 13, 2021

araffin Mar 15, 2021

Choose a reason for hiding this comment

araffin commented Mar 15, 2021

sgillen commented Mar 15, 2021 • edited

araffin commented Mar 15, 2021

araffin left a comment

Choose a reason for hiding this comment

caishanglei commented Jun 30, 2021

ArashVahabpour commented Aug 20, 2021 • edited

Miffyli commented Aug 20, 2021

ArashVahabpour commented Aug 20, 2021

Miffyli commented Aug 20, 2021

ArashVahabpour commented Aug 20, 2021 • edited

Miffyli commented Aug 20, 2021

ArashVahabpour commented Aug 20, 2021

araffin commented Aug 23, 2021

ArashVahabpour commented Aug 24, 2021 • edited

araffin commented Aug 24, 2021

matinmoezzi commented Mar 21, 2023

araffin commented Mar 22, 2023

sgillen commented Mar 12, 2021 •

edited

sgillen commented Mar 12, 2021 •

edited

araffin commented Mar 13, 2021 •

edited

araffin commented Mar 13, 2021 •

edited

sgillen commented Mar 13, 2021 •

edited

sgillen commented Mar 15, 2021 •

edited

ArashVahabpour commented Aug 20, 2021 •

edited

ArashVahabpour commented Aug 20, 2021 •

edited

ArashVahabpour commented Aug 24, 2021 •

edited