Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reproducing Atari FPS #280

Open
djbyrne opened this issue Jul 28, 2023 · 5 comments
Open

Reproducing Atari FPS #280

djbyrne opened this issue Jul 28, 2023 · 5 comments

Comments

@djbyrne
Copy link

djbyrne commented Jul 28, 2023

Hi guys, really liking the repo and found the paper very insightful! Very excited to see the potential of single node RL experimentation 😄

I am trying to reproduce the throughput shown in the paper, ~45k for System 1 and ~130k for System 2, However I am currently platauing at ~20k on a machine that surpasses system 2.

Would it be possible to share the optimal config for reproducing the max throughput?

Thanks so much,

Donal

@alex-petrenko
Copy link
Owner

Hi @djbyrne !

First of all, see this section in the documentation: https://www.samplefactory.dev/09-environment-integrations/vizdoom/#reproducing-paper-results

It's on VizDoom but I bet you can use similar configurations to reach very high throughput.
Specifically, the last one:

python -m sf_examples.vizdoom.train_vizdoom --env=doom_benchmark --algo=APPO --env_frameskip=4 --use_rnn=True --num_workers=72 --num_envs_per_worker=24 --num_policies=1 --batch_size=8192 --wide_aspect_ratio=False --experiment=doom_battle_appo_w72_v24 --policy_workers_per_policy=2

Replace Doom-related params with Atari, and you should be good to go.

The most important parameters for throughput:

num_workers: this should ideally be the same as number of logical CPUs on your machine

num_envs_per_worker: usually in 10-20 range, but if you see below 100% CPU utilization, increase a bit more?

worker_num_splits=2 to enable double buffering

@alex-petrenko
Copy link
Owner

You would also need to increase the batch size to accommodate so much data. Start in 2048-4096 range and go from here.

@alex-petrenko
Copy link
Owner

That said, there's actually a better way to work with Atari: https://www.samplefactory.dev/09-environment-integrations/envpool/

Envpool is a C++ vectorized env runner that supports atari and some other envs. It is even faster than running many envs in Python multiprocessing.
You need very different parameters for envpool, because it's essentially one very big vectorized environment, rather than hundreds of individual envs.

Here's my guess:

num_workers: 1-4?
num_envs_per_worker 1 or 2 if you use double buffering
worker_num_splits 1 or 2 for double buffering
env_agents=64 - how many params in a vector we have... I'm not sure what it should be, try as many as you have CPU cores and go from there!

@djbyrne
Copy link
Author

djbyrne commented Jul 30, 2023

Hey @alex-petrenko thank you for the insight! Apologies, I did not think to look at the other environments for this config 🙈

I will run with the what you have given above 😄

Yes, I have worked with envpool before, this is what I will try next. Have you done a benchmark comparison between envpool and standard atari on SampleFactory yet? I would imagine it gets similar speed up seen in the Sebulba PodRacer architecture, as it is also using a C++ based implementation for vectorising the environments.

@alex-petrenko
Copy link
Owner

I haven't done comparisons really, but I know Costa did.
He has some implementations here that maybe you can harvest for parameters https://github.com/vwxyzjn/cleanrl/blob/master/cleanrl/ppo_atari_envpool.py

There's also some info in their paper and repo: https://arxiv.org/abs/2206.10558
https://github.com/vwxyzjn/envpool-cleanrl

My guess is that you should be able to get 100+K easily with or without envpool, because you're probably going to be bottlenecked by the convnet backprop.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants