Inclusion of baseline results #48

sytelus · 2019-11-01T23:16:37Z

There should be a way to see your results that tells you what one should expect if you run the training from scratch. At a minimum, there should be information on a number of training steps and eventual 100-episode average that one might expect in baseline but much better would be to show the entire training curve. Without this baseline is not very meaningful as one may never know if they actually replicated the expected result.

Few good RL baseline frameworks do this, for example here is how other framework display their results: Garage, RLLib, Coach. I love the UX that Garage provides as well as Coach's approach of making results as part of repo itself.

Currently, there is benchmark.zip file in the repo but it seems monitor.csv and progess.csv are not helpful (for example, for DQN progress.csv is empty and monitor.csv only has last few rows). Furthermore, these files are not produced at all currently if you run the experiment.

araffin · 2019-11-01T23:27:49Z

Hello,

At a minimum, there should be information on a number of training steps

You have that in the hyperparameters file and the config file associated with each trained agent (at least starting with release 1.0 of the zoo). The final performance can be found in benchmark.md, note the results correspond only to one seed (it is not meant for quantitative comparison).

Yes, training curve would be a good addition, even better learning curve using a test env periodically (it is planned to be supported with the callback collection), but you would at least 10 runs per algorithm per environment.

it seems monitor.csv

Monitor.csv can give you the training learning curve, which is only a proxy to the real performance.

Furthermore, these files are not produced at all currently if you run the experiment.

If you don't specify a log folder, nothing is produced, yes.

sytelus · 2019-11-02T03:54:06Z

I think my comment is probably misunderstood. I'm currently trying to train model for Breakout and reproduce the results. There is nothing in this repo that tells me what I should expect and how do I know the training was successful. As it happens, something is possibly broken in OpenAI baselines as well as stable-baselines so the training for Breakout isn't generating graphs that are convincingly converging.

sytelus · 2019-11-02T04:46:46Z

Also, looks like in current codebase, there is no call to logger.configure() at all made when running training.py. This possibly explains why there are no monitor.csv and progress.csv generated even when log directory is specified.

sytelus mentioned this issue Nov 2, 2019

Cannot reproduce the benchmark results of DQN on Breakout openai/baselines#672

Closed

araffin mentioned this issue Feb 23, 2020

Misc Improvements #65

Merged

araffin closed this as completed in #65 Feb 23, 2020

yycho0108 pushed a commit to yycho0108/rl-baselines-zoo that referenced this issue Feb 2, 2021

Release v0.10.0 (araffin#48)

a1b35ec

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inclusion of baseline results #48

Inclusion of baseline results #48

sytelus commented Nov 1, 2019 •

edited

araffin commented Nov 1, 2019

sytelus commented Nov 2, 2019 •

edited

sytelus commented Nov 2, 2019 •

edited

Inclusion of baseline results #48

Inclusion of baseline results #48

Comments

sytelus commented Nov 1, 2019 • edited

araffin commented Nov 1, 2019

sytelus commented Nov 2, 2019 • edited

sytelus commented Nov 2, 2019 • edited

sytelus commented Nov 1, 2019 •

edited

sytelus commented Nov 2, 2019 •

edited

sytelus commented Nov 2, 2019 •

edited