Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent race condition in trying to collect result from DagFileProcessor #11306

Merged

Commits on Oct 6, 2020

  1. Prevent race condition in trying to collect result from DagFileProcessor

    A rare race condition was noticed in the Scheduler HA PR where the
    "test_dags_with_system_exit" test would occasionally fail with the
    following symptoms:
    
    - The pipe was "readable" as returned by
      `multiprocessing.connection.wait`
    - On reading it yielded an EOFError, meaning the other side had closed
      the connection
    - But the process was still alive/running
    
    This previously would result in the Manager process dying with an error.
    
    This PR makes a few changes:
    
    - It ensures that the pipe is simplex, not duplex (we only ever send one
      data) as this is simpler
    - We ensure that the "other" end of the pipe is correctly closed in both
      parent and child processes. Without this the pipe would be kept open
      (sometimes) until the child process had closed anyway.
    - When we get an EOFError on reading and the process is still alive, we
      give it a few seconds to shut down cleanly, and then kill it.
    ashb committed Oct 6, 2020
    Configuration menu
    Copy the full SHA
    afc2d47 View commit details
    Browse the repository at this point in the history