Add enjoy script for Stable Baselines #2627

araffin · 2020-02-15T20:09:33Z

This is a follow up to #2565 :

it fixes call from the module
it adds the enjoy script

You can use:

python3 -m pybullet_envs.stable_baselines.train --algo sac --env HalfCheetahBulletEnv-v0
# If the saved model "sac_HalfCheetahBulletEnv-v0.zip" is in the current folder
python3 -m pybullet_envs.stable_baselines.enjoy --algo sac --env HalfCheetahBulletEnv-v0 --n-episodes 5

erwincoumans · 2020-02-22T01:25:33Z

Thanks for the contribution! LGTM

erwincoumans · 2020-02-26T20:19:16Z

The zip file is called sac_HalfCheetahBulletEnv-v0_best.zip and the enjoy script doesn't work unless you rename/copy that file to sac_HalfCheetahBulletEnv-v0.zip
Why doesn't the enjoy script not just work with sac_HalfCheetahBulletEnv-v0_best.zip?

araffin · 2020-02-26T20:30:12Z

The zip file is called sac_HalfCheetahBulletEnv-v0_best.zip and the enjoy script doesn't work unless you rename/copy that file to sac_HalfCheetahBulletEnv-v0.zip

That's because you did not wait until the end, no?
There will be two files: one creating at the end of training, one updated each time a best model is found.

erwincoumans · 2020-02-26T22:43:29Z

Thanks! Indeed, no idea there was an automatic 'end'. I assumed terminating the training would be the end. Perhaps enjoy should try to use the _best if it cannot find the other file (with a warning that the results are from an intermediate policy that was not fully trained).

araffin · 2020-02-27T08:02:25Z

Good point.
If you agree, i would do a PR that does like the zoo:

it saves the model when you terminate the script
it adds an option '--load-best' to load the best model

erwincoumans · 2020-03-01T05:56:46Z

I'd like to see a fully automatic fallback to load the best, that doesn't require command line options. So it just loads the _best file, if it cannot find its file.

Also, the enjoy script would be nice to use PyBullet's own visualizer.
Simply call the 'render(mode='human') before the first reset, then you get a visualizer GUI window.
It looks better than the OpenCV one, and it actually works.

Also, I didn't 'enjoy' the enjoy script due to those errors, that would be fixed by just using PyBullet's own visualizer (render(mode='human' before reset).

Traceback (most recent call last):
  File "C:\python-3.5.3.amd64_numpydebug\lib\runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "C:\python-3.5.3.amd64_numpydebug\lib\runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "C:\python-3.5.3.amd64_numpydebug\lib\pybullet_envs\stable_baselines\enjoy.py", line 69, in <module>
    env.render(mode='human')
  File "C:\python-3.5.3.amd64_numpydebug\lib\site-packages\stable_baselines\common\vec_env\subproc_vec_env.py", line 138, in render
    cv2.imshow('vecenv', bigimg[:, :, ::-1])
cv2.error: OpenCV(4.2.0) C:/projects/opencv-python/opencv/modules/highgui/src/precomp.hpp:137: error: (-215:Assertion failed) src_depth != CV_16F && src_depth != CV_32S in function 'convertToShow'

erwincoumans · 2020-03-01T06:03:04Z

Please check this enjoy script, from the CL (if it hasn't been applied yet)
#2649

erwincoumans · 2020-03-02T05:40:34Z

Also, how does baselines determine that the training is ended?

araffin · 2020-03-04T09:40:33Z

I'd like to see a fully automatic fallback to load the best, that doesn't require command line options. So it just loads the _best file, if it cannot find its file.

ok, I'll do that

Also, the enjoy script would be nice to use PyBullet's own visualizer.
Simply call the 'render(mode='human') before the first reset, then you get a visualizer GUI window

Oh, I did not know that... In fact, I've been struggling (cf zoo) with it. Using OpenCV was a hack.

Also, how does baselines determine that the training is ended?

Good question.
It depends. For now, you give a budget (maximum number of interaction with the environment) but we would also use a callback that ends the training once a reward threshold is attained (cf the new callback collection that is now on master but not yet released).

erwincoumans · 2020-03-04T16:16:37Z

It would help a lot to save weights during training at regular intervals (and perhaps also whenever the policy exceeds the current best). Is there a way to achieve this with the regular rl_zoo_baselines? Preferably this interval can be user specified (in terms of number of steps).

python3 train.py --algo ppo2 --env HumanoidDeepMimicWalkBulletEnv-v1 --tensorboard-log deepmimic

I'm testing a new environment and like to test various intermediate snapshots.

Also, the reward in Tensorboard doesn't seem to line up with avg episode reward in the console.

Finally, do the actions need to be normalized in range -1,1 for a custom env ? How about observations/rewards?

araffin · 2020-03-15T20:29:07Z

It would help a lot to save weights during training at regular intervals

I've got good news for you, we just release a callback collection and checkpoints are included. I will add that feature to the zoo soon.

Also, the reward in Tensorboard doesn't seem to line up with avg episode reward in the console.

That's normal if you are normalizing the reward. The console should output the original mean reward.

Finally, do the actions need to be normalized in range -1,1 for a custom env ? How about observations/rewards?

Answers for those questions are in the tips and tricks section of the documentation ;)
in short yes, and especially true (obs/reward normalization using VecNormalize) for A2C/PPO2.

araffin · 2020-03-17T21:07:11Z

This will be added (checkpoints) in araffin/rl-baselines-zoo#69

Add enjoy script for Stable Baselines

21efd84

erwincoumans merged commit e78eb27 into bulletphysics:master Feb 22, 2020

araffin mentioned this pull request Mar 22, 2020

Add CheckpointCallback and load best automatically #2688

Merged

araffin deleted the feat/sb-enjoy branch March 22, 2020 20:57

araffin mentioned this pull request Feb 21, 2021

[Question] Why do envs look different? araffin/rl-baselines-zoo#112

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add enjoy script for Stable Baselines #2627

Add enjoy script for Stable Baselines #2627

araffin commented Feb 15, 2020

erwincoumans commented Feb 22, 2020

erwincoumans commented Feb 26, 2020 •

edited

Loading

araffin commented Feb 26, 2020

erwincoumans commented Feb 26, 2020

araffin commented Feb 27, 2020

erwincoumans commented Mar 1, 2020 •

edited

Loading

erwincoumans commented Mar 1, 2020

erwincoumans commented Mar 2, 2020

araffin commented Mar 4, 2020

erwincoumans commented Mar 4, 2020

araffin commented Mar 15, 2020

araffin commented Mar 17, 2020

Add enjoy script for Stable Baselines #2627

Add enjoy script for Stable Baselines #2627

Conversation

araffin commented Feb 15, 2020

erwincoumans commented Feb 22, 2020

erwincoumans commented Feb 26, 2020 • edited Loading

araffin commented Feb 26, 2020

erwincoumans commented Feb 26, 2020

araffin commented Feb 27, 2020

erwincoumans commented Mar 1, 2020 • edited Loading

erwincoumans commented Mar 1, 2020

erwincoumans commented Mar 2, 2020

araffin commented Mar 4, 2020

erwincoumans commented Mar 4, 2020

araffin commented Mar 15, 2020

araffin commented Mar 17, 2020

erwincoumans commented Feb 26, 2020 •

edited

Loading

erwincoumans commented Mar 1, 2020 •

edited

Loading