AtariEnv, should repeat_action_probability be greater than 0 for sticky actions? #105

DanielTakeshi · 2020-02-13T19:10:26Z

I am working on debugging some issues related to random-ness and determinism in the Atari environments. Here is where I think the random-ness comes from for a vanilla DQN agent that runs an epsilon-greedy policy:

First there is a random seed for a particular environment. This is created upon initialization. If we use 10 parallel environments, each get a different random seed. [I think the only effect of this seed will be to impact the random-ness in the max_start_noops and repeat_action_probability, described below, is that right? Or is the random-ness in repeat_action_probability set separate, somehow?]
Second, there is the standard max_start_noops of N where the agent takes no-op actions for between 0 and N time steps, where usually N=30.
Third, there is a repeat_action_probability which leads to sticky actions. This is 0 by default but the paper "Revisiting the Arcade Learning Environment: Evaluation Protocols and Open Problems for General Agents" recommends using sticky actions, and I think their choice of probability is 0.25.
Fourth, when the agent takes steps in the environment, it has an epsilon parameter for its epsilon-greedy policy, which starts at 1.0 and decays to a value such as 0.1, 0.01, or 0.001 at the end depending on what we choose.

Do these four exhaustively cover all sources of random-ness for an epsilon-greedy DQN based agent?

I quickly checked some of the values:

rlpyt/rlpyt/envs/atari/atari_env.py

Lines 67 to 76 in d797dd8

    
           def __init__(self, 
        
                        game="pong", 
        
                        frame_skip=4,  # Frames per step (>=1). 
        
                        num_img_obs=4,  # Number of (past) frames in observation (>=1). 
        
                        clip_reward=True, 
        
                        episodic_lives=True, 
        
                        max_start_noops=30, 
        
                        repeat_action_probability=0., 
        
                        horizon=27000, 
        
                        ):

and it looks like the repeat action probability is 0, so we are not using sticky actions. I am wondering if there is a reason for not enabling this by default. I searched the repository but could not find any code that explicitly changes the repeat action probability. [I am also wondering if you set repeat action probability to a higher value for the benchmarks in the white paper.]

The text was updated successfully, but these errors were encountered:

astooke · 2020-02-20T00:23:03Z

Hi, good questions! Yes despite that paper urging everyone to start using sticky actions, basically none of the algorithm benchmarks we tried to reproduce use it (going all the way up to R2D2). But you're right, the repeat_action_probability kwarg is exactly sticky actions, and that paper suggested using 0.25.

As for the random seeds. I think the way it works is that each worker gets its own seed, where each worker might have several environment instances. The environment instances do not get their own random seeds. I think you're right the places for randomness in the environment are the sticky actions (which happens inside the ALE code) and the random noops (which happens in the rlpyt code).
And the randomness for the agent is all for sampling towards epsilon-greedy. If you use GPU sampler, this happens inside the master, according to its seed, if you use CPU sampler, this happens inside each worker. Also the agent's initial parameters are based on the master process's random seed.

Off the top of my head, I can't think of other randomness, since ALE is otherwise deterministic. Except, I think some convolution procedures might not be deterministic, while others are...I remember playing around with that in Theano (settings which are passed to cudnn), but I haven't done it with PyTorch.

DanielTakeshi · 2020-02-21T19:43:41Z

Thanks! OK, I guess we can close this since my questions are resolved, as long as we make it clear if we're using sticky or not, people should know what we mean.

DanielTakeshi closed this as completed Feb 21, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AtariEnv, should repeat_action_probability be greater than 0 for sticky actions? #105

AtariEnv, should repeat_action_probability be greater than 0 for sticky actions? #105

DanielTakeshi commented Feb 13, 2020

astooke commented Feb 20, 2020

DanielTakeshi commented Feb 21, 2020 •

edited

AtariEnv, should repeat_action_probability be greater than 0 for sticky actions? #105

AtariEnv, should repeat_action_probability be greater than 0 for sticky actions? #105

Comments

DanielTakeshi commented Feb 13, 2020

astooke commented Feb 20, 2020

DanielTakeshi commented Feb 21, 2020 • edited

DanielTakeshi commented Feb 21, 2020 •

edited