Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add enjoy script for Stable Baselines #2627

Merged
merged 1 commit into from
Feb 22, 2020

Conversation

araffin
Copy link
Contributor

@araffin araffin commented Feb 15, 2020

This is a follow up to #2565 :

  • it fixes call from the module
  • it adds the enjoy script

You can use:

python3 -m pybullet_envs.stable_baselines.train --algo sac --env HalfCheetahBulletEnv-v0
# If the saved model "sac_HalfCheetahBulletEnv-v0.zip" is in the current folder
python3 -m pybullet_envs.stable_baselines.enjoy --algo sac --env HalfCheetahBulletEnv-v0 --n-episodes 5

@erwincoumans
Copy link
Member

Thanks for the contribution! LGTM

@erwincoumans erwincoumans merged commit e78eb27 into bulletphysics:master Feb 22, 2020
@erwincoumans
Copy link
Member

erwincoumans commented Feb 26, 2020

The zip file is called sac_HalfCheetahBulletEnv-v0_best.zip and the enjoy script doesn't work unless you rename/copy that file to sac_HalfCheetahBulletEnv-v0.zip
Why doesn't the enjoy script not just work with sac_HalfCheetahBulletEnv-v0_best.zip?

@araffin
Copy link
Contributor Author

araffin commented Feb 26, 2020

The zip file is called sac_HalfCheetahBulletEnv-v0_best.zip and the enjoy script doesn't work unless you rename/copy that file to sac_HalfCheetahBulletEnv-v0.zip

That's because you did not wait until the end, no?
There will be two files: one creating at the end of training, one updated each time a best model is found.

@erwincoumans
Copy link
Member

Thanks! Indeed, no idea there was an automatic 'end'. I assumed terminating the training would be the end. Perhaps enjoy should try to use the _best if it cannot find the other file (with a warning that the results are from an intermediate policy that was not fully trained).

@araffin
Copy link
Contributor Author

araffin commented Feb 27, 2020

Good point.
If you agree, i would do a PR that does like the zoo:

  • it saves the model when you terminate the script
  • it adds an option '--load-best' to load the best model

@erwincoumans
Copy link
Member

erwincoumans commented Mar 1, 2020

I'd like to see a fully automatic fallback to load the best, that doesn't require command line options. So it just loads the _best file, if it cannot find its file.

Also, the enjoy script would be nice to use PyBullet's own visualizer.
Simply call the 'render(mode='human') before the first reset, then you get a visualizer GUI window.
It looks better than the OpenCV one, and it actually works.

Also, I didn't 'enjoy' the enjoy script due to those errors, that would be fixed by just using PyBullet's own visualizer (render(mode='human' before reset).

Traceback (most recent call last):
  File "C:\python-3.5.3.amd64_numpydebug\lib\runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "C:\python-3.5.3.amd64_numpydebug\lib\runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "C:\python-3.5.3.amd64_numpydebug\lib\pybullet_envs\stable_baselines\enjoy.py", line 69, in <module>
    env.render(mode='human')
  File "C:\python-3.5.3.amd64_numpydebug\lib\site-packages\stable_baselines\common\vec_env\subproc_vec_env.py", line 138, in render
    cv2.imshow('vecenv', bigimg[:, :, ::-1])
cv2.error: OpenCV(4.2.0) C:/projects/opencv-python/opencv/modules/highgui/src/precomp.hpp:137: error: (-215:Assertion failed) src_depth != CV_16F && src_depth != CV_32S in function 'convertToShow'

@erwincoumans
Copy link
Member

Please check this enjoy script, from the CL (if it hasn't been applied yet)
#2649

@erwincoumans
Copy link
Member

Also, how does baselines determine that the training is ended?

@araffin
Copy link
Contributor Author

araffin commented Mar 4, 2020

I'd like to see a fully automatic fallback to load the best, that doesn't require command line options. So it just loads the _best file, if it cannot find its file.

ok, I'll do that

Also, the enjoy script would be nice to use PyBullet's own visualizer.
Simply call the 'render(mode='human') before the first reset, then you get a visualizer GUI window

Oh, I did not know that... In fact, I've been struggling (cf zoo) with it. Using OpenCV was a hack.

Also, how does baselines determine that the training is ended?

Good question.
It depends. For now, you give a budget (maximum number of interaction with the environment) but we would also use a callback that ends the training once a reward threshold is attained (cf the new callback collection that is now on master but not yet released).

@erwincoumans
Copy link
Member

It would help a lot to save weights during training at regular intervals (and perhaps also whenever the policy exceeds the current best). Is there a way to achieve this with the regular rl_zoo_baselines? Preferably this interval can be user specified (in terms of number of steps).

python3 train.py --algo ppo2 --env HumanoidDeepMimicWalkBulletEnv-v1 --tensorboard-log deepmimic

I'm testing a new environment and like to test various intermediate snapshots.

Also, the reward in Tensorboard doesn't seem to line up with avg episode reward in the console.

Finally, do the actions need to be normalized in range -1,1 for a custom env ? How about observations/rewards?

@araffin
Copy link
Contributor Author

araffin commented Mar 15, 2020

It would help a lot to save weights during training at regular intervals

I've got good news for you, we just release a callback collection and checkpoints are included. I will add that feature to the zoo soon.

Also, the reward in Tensorboard doesn't seem to line up with avg episode reward in the console.

That's normal if you are normalizing the reward. The console should output the original mean reward.

Finally, do the actions need to be normalized in range -1,1 for a custom env ? How about observations/rewards?

Answers for those questions are in the tips and tricks section of the documentation ;)
in short yes, and especially true (obs/reward normalization using VecNormalize) for A2C/PPO2.

@araffin
Copy link
Contributor Author

araffin commented Mar 17, 2020

This will be added (checkpoints) in araffin/rl-baselines-zoo#69

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants