Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bypass out-of-sync Gym registry in SubprocVecEnv by resolving EnvSpec #160

Merged
merged 10 commits into from
Jan 22, 2020
16 changes: 15 additions & 1 deletion src/imitation/util/util.py
Original file line number Diff line number Diff line change
Expand Up @@ -49,15 +49,29 @@ def make_vec_env(env_name: str,
max_episode_steps: If specified, wraps VecEnv in TimeLimit wrapper with
this episode length before returning.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
this episode length before returning.
this episode length before returning. Otherwise, defaults to `max_episode_steps` for `env_name` in the Gym registry.

Copy link
Member Author

@shwang shwang Jan 22, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I expanded this comment a bit more in 902fe96. Wanted to note that the gym registry total timesteps thing is default behavior for gym.make.

"""
# Resolve the spec outside of the subprocess first, so that it is available to
# subprocesses via automatic pickling.
spec = gym.spec(env_name)

def make_env(i, this_seed):
env = gym.make(env_name)
# Previously, we directly called `gym.make(env_name)`.
#
# That direct approach was problematic (especially in combination with Ray)
# because the forkserver from which subprocesses are forked might have been
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we sure this is what's going on? It's fine as a hypothesis but I don't want to immortalize in a comment something we're not sure about.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm 80% confident the forkserver has a very minimal state, since:

  1. The Python multiprocessing docs say "No unnecessary resources are inherited".
  2. The code seems to use spawn to start the forkserver, which starts a new Python process.
  3. This blog states " Note that children retain a copy of the forkserver state. This state is intended to be relatively simple, but it is possible to adjust this through the multiprocess API through the set_forkserver_preload() method."

With that said, there does seem to be some logic to preload modules (by default, I think, the __main__ module i.e. the entrypoint to the script). So IIUC, forkserver is intended to execute the code of the parent process (i.e. all imports), but should not execute an if __name__ == '__main__'-guarded block (Python docs explicitly state need to have this guard).

Seems plausible that starting forkserver in a Ray worker messes with the autodetection of what to import.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we can't get to the bottom of this, may be better to leave the comment vague and cite this PR.

Copy link
Member Author

@shwang shwang Jan 21, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree that we should be vague about the cause in the comments and note the PR, I've made these changes.

# spawned before `env_name` was registered in the main process,
# causing `env_name` to never exist in the Gym registry of the forked
# subprocess that is running `make_env(env_name)`.
env = spec.make()

# Seed each environment with a different, non-sequential seed for diversity
# (even if caller is passing us sequentially-assigned base seeds). int() is
# necessary to work around gym bug where it chokes on numpy int64s.
env.seed(int(this_seed))

if max_episode_steps is not None:
env = TimeLimit(env, max_episode_steps)
elif (spec.max_episode_steps is not None) and not spec.tags.get('vnc'):
shwang marked this conversation as resolved.
Show resolved Hide resolved
env = TimeLimit(env, max_episode_steps=spec.max_episode_steps)

# Use Monitor to record statistics needed for Baselines algorithms logging
# Optionally, save to disk
Expand Down
13 changes: 11 additions & 2 deletions tests/test_scripts.py
Original file line number Diff line number Diff line change
Expand Up @@ -144,9 +144,9 @@ def test_transfer_learning(tmpdir):
dict(
sacred_ex_name="expert_demos",
base_named_configs=["cartpole", "fast"],
n_seeds=2,
search_space={
"config_updates": {
"seed": tune.grid_search([0, 1]),
"init_rl_kwargs": {
"learning_rate": tune.grid_search([3e-4, 1e-4]),
},
Expand All @@ -169,9 +169,18 @@ def test_transfer_learning(tmpdir):
},
}},
),
# Test that custom environments are passed to SubprocVecEnv in Ray workers.
dict(
sacred_ex_name="train_adversarial",
base_named_configs=["custom_ant", "fast"],
base_config_updates=dict(
init_trainer_kwargs=dict(
parallel=True,
num_vec=2,
)),
),
]


PARALLEL_CONFIG_LOW_RESOURCE = {
# CI server only has 2 cores.
"init_kwargs": {"num_cpus": 2},
Expand Down