Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bypass out-of-sync Gym registry in SubprocVecEnv by resolving EnvSpec #160

Merged
merged 10 commits into from
Jan 22, 2020

Conversation

shwang
Copy link
Member

@shwang shwang commented Jan 18, 2020

We ran into an unusual problem when using imitation.scripts.parallel to parallelize training of our custom ant environment. The subprocesses created by imitation.util.make_vec_env reported that all our custom environments weren't registered, but when we ran imitation.scripts.parallel. During debuggins, we found that it didn't help to forcibly reregister the environments in scope of the stack trace before the inner worker function to SubprocVecEnv. (Even forcibly reregistering environments inside make_vec_env itself did not suffice)

The most likely explanation is that the Python forkserver used by SubprocVecEnv was somehow prematurely initialized by ray so that new processes forked from the forkserver didn't include our custom environments inside the Gym registry. (edit: We are actually quite unsure if this is actually the cause of the bug. THis is just a hypothesis)

h/t: @AdamGleave for the idea to resolve the custom environment's EnvSpec before initializing the SubprocVecEnv.

@shwang
Copy link
Member Author

shwang commented Jan 18, 2020

Confirmed that the new test fails before this PR.

@shwang shwang changed the title Bypass out-of-sync Gym registry in SubprocVecEnv by passing EnvSpec Bypass out-of-sync Gym registry in SubprocVecEnv by resolving EnvSpec Jan 18, 2020
@shwang shwang force-pushed the fix_ray_subproc_gym_registry branch from 7431fb4 to f527f0f Compare January 18, 2020 04:42
@AdamGleave
Copy link
Member

AdamGleave commented Jan 18, 2020

We ran into an unusual problem when using imitation.scripts.parallel to parallelize training of our custom ant environment. The subprocesses created by imitation.util.make_vec_env reported that all our custom environments weren't registered, but when we ran imitation.scripts.parallel. During debuggins, we found that it didn't help to forcibly reregister the environments in scope of the stack trace before the inner worker function to SubprocVecEnv. (Even forcibly reregistering environments inside make_vec_env itself did not suffice)

The most likely explanation is that the Python forkserver used by SubprocVecEnv was somehow prematurely initialized by ray so that new processes forked from the forkserver didn't include our custom environments inside the Gym registry.

Just to flag I'm still not sure about this particular explanation, although I agree with the overall conclusion that it's a bad interaction between forkserver and ray. In particular my (imperfect) understanding of forkserver is that it's a fresh process, like with spawn start method, but for efficiency it only has to be created once and then fork() inside the forkserver can cheaply create new (clean) Python processes. Some evidence in favour of this is that using spawn also failed, even though those processes are being created JIT.

Admittedly this doesn't explain why it works in Ray local mode. My guess is that the key difference is that when using Ray the worker is executing a pickled function rather than native code. This plausibly would cause a difference in how the helper method is pickled and sent to the SubprocVecEnv worker, although I've not confirmed this.

(We probably don't need to get to the bottom of this if the EnvSpec thing works, but I wanted to put this in writing in case we need to debug a similar issue in the future.)

Also if you haven't seen it https://codewithoutrules.com/2018/09/04/python-multiprocessing/ is a fun discussion of issues with fork (not related to this directly, but explains why we're using forkserver in the first place)

Copy link
Member

@AdamGleave AdamGleave left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure about the explanation in the comment. Otherwise LGTM.

# Previously, we directly called `gym.make(env_name)`.
#
# That direct approach was problematic (especially in combination with Ray)
# because the forkserver from which subprocesses are forked might have been
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we sure this is what's going on? It's fine as a hypothesis but I don't want to immortalize in a comment something we're not sure about.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm 80% confident the forkserver has a very minimal state, since:

  1. The Python multiprocessing docs say "No unnecessary resources are inherited".
  2. The code seems to use spawn to start the forkserver, which starts a new Python process.
  3. This blog states " Note that children retain a copy of the forkserver state. This state is intended to be relatively simple, but it is possible to adjust this through the multiprocess API through the set_forkserver_preload() method."

With that said, there does seem to be some logic to preload modules (by default, I think, the __main__ module i.e. the entrypoint to the script). So IIUC, forkserver is intended to execute the code of the parent process (i.e. all imports), but should not execute an if __name__ == '__main__'-guarded block (Python docs explicitly state need to have this guard).

Seems plausible that starting forkserver in a Ray worker messes with the autodetection of what to import.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we can't get to the bottom of this, may be better to leave the comment vague and cite this PR.

Copy link
Member Author

@shwang shwang Jan 21, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree that we should be vague about the cause in the comments and note the PR, I've made these changes.

@codecov
Copy link

codecov bot commented Jan 21, 2020

Codecov Report

❗ No coverage uploaded for pull request base (master@365c177). Click here to learn what that means.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff            @@
##             master     #160   +/-   ##
=========================================
  Coverage          ?   87.64%           
=========================================
  Files             ?       64           
  Lines             ?     4542           
  Branches          ?        0           
=========================================
  Hits              ?     3981           
  Misses            ?      561           
  Partials          ?        0
Impacted Files Coverage Δ
src/imitation/algorithms/adversarial.py 88.81% <100%> (ø)
src/imitation/util/util.py 100% <100%> (ø)
src/imitation/scripts/config/train_adversarial.py 71.96% <100%> (ø)
src/imitation/scripts/train_adversarial.py 98% <100%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 365c177...902fe96. Read the comment docs.

Copy link
Member

@AdamGleave AdamGleave left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two minor suggestions: expanding documentation and removing unused argument

@@ -49,15 +49,30 @@ def make_vec_env(env_name: str,
max_episode_steps: If specified, wraps VecEnv in TimeLimit wrapper with
this episode length before returning.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
this episode length before returning.
this episode length before returning. Otherwise, defaults to `max_episode_steps` for `env_name` in the Gym registry.

Copy link
Member Author

@shwang shwang Jan 22, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I expanded this comment a bit more in 902fe96. Wanted to note that the gym registry total timesteps thing is default behavior for gym.make.

src/imitation/util/util.py Outdated Show resolved Hide resolved
Copy link
Member

@AdamGleave AdamGleave left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, can merge once CI passes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants