-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add enjoy script for Stable Baselines #2627
Conversation
Thanks for the contribution! LGTM |
The zip file is called sac_HalfCheetahBulletEnv-v0_best.zip and the enjoy script doesn't work unless you rename/copy that file to sac_HalfCheetahBulletEnv-v0.zip |
That's because you did not wait until the end, no? |
Thanks! Indeed, no idea there was an automatic 'end'. I assumed terminating the training would be the end. Perhaps enjoy should try to use the _best if it cannot find the other file (with a warning that the results are from an intermediate policy that was not fully trained). |
Good point.
|
I'd like to see a fully automatic fallback to load the best, that doesn't require command line options. So it just loads the _best file, if it cannot find its file. Also, the enjoy script would be nice to use PyBullet's own visualizer. Also, I didn't 'enjoy' the enjoy script due to those errors, that would be fixed by just using PyBullet's own visualizer (render(mode='human' before reset).
|
Please check this enjoy script, from the CL (if it hasn't been applied yet) |
Also, how does baselines determine that the training is ended? |
ok, I'll do that
Oh, I did not know that... In fact, I've been struggling (cf zoo) with it. Using OpenCV was a hack.
Good question. |
It would help a lot to save weights during training at regular intervals (and perhaps also whenever the policy exceeds the current best). Is there a way to achieve this with the regular rl_zoo_baselines? Preferably this interval can be user specified (in terms of number of steps).
I'm testing a new environment and like to test various intermediate snapshots. Also, the reward in Tensorboard doesn't seem to line up with avg episode reward in the console. Finally, do the actions need to be normalized in range -1,1 for a custom env ? How about observations/rewards? |
I've got good news for you, we just release a callback collection and checkpoints are included. I will add that feature to the zoo soon.
That's normal if you are normalizing the reward. The console should output the original mean reward.
Answers for those questions are in the tips and tricks section of the documentation ;) |
This will be added (checkpoints) in araffin/rl-baselines-zoo#69 |
This is a follow up to #2565 :
You can use: