You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi. I am a little confused about how many steps should the agent train and then evalauted on. To add a little context, the paper mentions on pg2
Crafter evaluates many different abilities of an agent by training only on a single environment for 5M steps
and this can be seen in crafter_baselines code as well (e.g. PPO, Rainbow)
But in Sec 3.3 of the paper
An agent is granted a budget of 1M environment steps to interact with the environment.
elsewhere (pg 6, sec 4.1), multiple figures
budget of 1M environment steps
Table A.1
It is computed across all training episodes within the budget of 1M environment steps
I also see that you commented out evaluation code from the Rainbow code (here).
What I can make of this is that I need to run crafter agent for 1M steps (I saw the PPO example) and then use the saved stats (json file?) and the analysis code to calculate the success rate and score. Precisely, using the existing crafter code available, how can I go from training to plotting meaningful results. Can you please clarify? Thanks
The text was updated successfully, but these errors were encountered:
Thanks for spotting the typo in the paper. It's always 1M steps.
I ran the methods for 5M steps but only used the first 1M steps because afterwards the training curves started to flat off and require a lot more steps for relatively small improvements.
If you use the recorder class to wrap the environment, it will write a JSONL file that you can plot results and generate tables from using the scripts in the analysis directory. If you have more concrete questions about those, just let me know (in a new ticket).
Hi. I am a little confused about how many steps should the agent train and then evalauted on. To add a little context, the paper mentions on pg2
and this can be seen in crafter_baselines code as well (e.g. PPO, Rainbow)
But in Sec 3.3 of the paper
elsewhere (pg 6, sec 4.1), multiple figures
Table A.1
I also see that you commented out evaluation code from the Rainbow code (here).
What I can make of this is that I need to run crafter agent for 1M steps (I saw the PPO example) and then use the saved stats (json file?) and the
analysis
code to calculate the success rate and score. Precisely, using the existing crafter code available, how can I go from training to plotting meaningful results. Can you please clarify? ThanksThe text was updated successfully, but these errors were encountered: