Fix `run_process()` and `open_process().aexit` leaking an orphan process when cancelled #672

gschaffner · 2024-01-17T08:46:24Z

fixes #669.

underlying AnyIO bug

currently, cancellation of async with await open_process() (i.e. Process.__aexit__() in a cancelled scope) can cause the AsyncResource to not get fully closed, leaking an orphan process. this is because a cancelled Process.__aexit__() is not performing the minimum level of cleanup: Trio's cancellation model states that

Async cleanup operations–like __aexit__ methods or async close methods–are cancellable just like anything else except that if they are cancelled, they still perform a minimum level of cleanup before raising Cancelled. (https://trio.readthedocs.io/en/stable/reference-core.html#cancellation-and-primitive-operations)

how it caused #669

#669 is a special case of this bug: in #669, even though run_process does call process.kill if cancelled, it still (very briefly) leaks an orphan process. this is because when process.kill() returns, the process is not necessarily dead yet: process.kill (kill(2)/TerminateProcess) is like call_soon and only schedules the kill. so after process.kill() we need to wait briefly for the death to occur and for the event loop to learn about it.

so the problem seen in #669 was: an orphan process was leaked briefly. this started a race between

the event loop closing and
the OS killing the process and the event loop learning about it

note that this race is present on both backends. if the event loop closing happened first and won the race, then:

on Trio it caused Popen.__del__ to emit a ResourceWarning
on asyncio it should have caused BaseSubprocessTransport.__del__ to emit a ResouceWarning, but due to an asyncio bug (BaseSubprocessTransport.__del__ fails if the event loop is already closed, which can leak an orphan process python/cpython#114177) it caused an unraisable error instead of a ResourceWarning.

the tests currently added in this PR should cover the underlying AnyIO bug that caused #669 while being deterministic (not flaky). however, if you also want a test that looks more like the original example in #669 (testing asyncio/Trio close & __del__ behavior), here's another test that could be added to this PR:

def test_process_aexit_cancellation_doesnt_orphan_del(
    anyio_backend_name: str, anyio_backend_options: dict[str, Any]
) -> None:
    """
    Regression test for #669.

    Ensure that, when cancelled, open_process.__aexit__() doesn't leave behind any
    __del__() finalizers that emit ResourceWarning (or fail due to
    https://github.com/python/cpython/issues/114177) if there is a race where the event
    loop is closed too soon.

    N.B.: This is a test of asyncio and Trio, not AnyIO: it should pass if
    test_process_aexit_cancellation_doesnt_orphan_process() passes and asyncio/Trio are
    not bugged. This test can also have false negatives (passes when it should fail)
    because it relies on run() finishing either before the OS kills the process or
    before the event loop learns the process died.

    """
    # Don't del process until after run() finishes
    process: Process

    async def main() -> None:
        nonlocal process
        with CancelScope() as scope:
            async with await open_process(
                [sys.executable, "-c", "import time; time.sleep(1)"]
            ) as process:
                scope.cancel()

    run(main, backend=anyio_backend_name, backend_options=anyio_backend_options)

…process has not yet closed them

gschaffner · 2024-01-17T09:16:29Z

i should also note that this appears to also be how Trio implemented Process.aclose before it was removed.

agronholm

Just some formatting changes.

src/anyio/_core/_subprocesses.py

src/anyio/_backends/_asyncio.py

tests/test_subprocesses.py

With pytest.mark.xfail, the test is still run and will emit xpass if/when it stops failing, i.e. it will detect when the mark.xfail should be removed. In contrast, pytest.xfail immediately force-skips the test, skipping the xpass check. The only remaining use of pytest.xfail (in TestBlockingPortal.test_from_async) needs to remain a pytest.xfail until someone gets around to agronholm#524 (comment) because it will deadlock otherwise.

tests/test_subprocesses.py

agronholm · 2024-01-23T23:12:41Z

How is it that the xfail test passes? Explain please?

…ncio only

gschaffner · 2024-01-24T04:23:32Z

oops, i forgot to make the xfail conditional on the backend. fixed.

agronholm · 2024-01-25T11:43:22Z

Thanks!

gschaffner added 4 commits January 12, 2024 00:00

Test that a cancelled Process.__aexit__ doesn't leave an orphan process

a4c70d1

Test a cancelled Process.__aexit__ closes standard streams if the sub…

05ba249

…process has not yet closed them

Fix a cancelled Process.__aexit__ skipping necessary cleanup

42bf49e

Fix Process.std{in,out,err}.aclose not checkpointing on asyncio

c6f0334

agronholm requested changes Jan 20, 2024

View reviewed changes

src/anyio/_core/_subprocesses.py Outdated Show resolved Hide resolved

src/anyio/_backends/_asyncio.py Show resolved Hide resolved

tests/test_subprocesses.py Outdated Show resolved Hide resolved

tests/test_subprocesses.py Outdated Show resolved Hide resolved

agronholm and others added 2 commits January 22, 2024 16:30

Apply suggestions from code review

db1e534

agronholm reviewed Jan 23, 2024

View reviewed changes

tests/test_subprocesses.py Outdated Show resolved Hide resolved

Update tests/test_subprocesses.py

c6682d7

agronholm and others added 2 commits January 24, 2024 01:12

Merge branch 'master' into fix-669

745f47e

Mark Process.std{in,out,err} ClosedResourceError test as xfail on asy…

d029262

…ncio only

gschaffner force-pushed the fix-669 branch from 241c927 to 8e18799 Compare January 24, 2024 07:51

agronholm approved these changes Jan 25, 2024

View reviewed changes

agronholm merged commit 1e60219 into agronholm:master Jan 25, 2024
16 checks passed

gschaffner deleted the fix-669 branch February 1, 2024 08:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix `run_process()` and `open_process().aexit` leaking an orphan process when cancelled #672

Fix `run_process()` and `open_process().aexit` leaking an orphan process when cancelled #672

gschaffner commented Jan 17, 2024 •

edited

Loading

gschaffner commented Jan 17, 2024

agronholm left a comment

agronholm commented Jan 23, 2024

gschaffner commented Jan 24, 2024

agronholm commented Jan 25, 2024

Fix run_process() and open_process().__aexit__ leaking an orphan process when cancelled #672

Fix run_process() and open_process().__aexit__ leaking an orphan process when cancelled #672

Conversation

gschaffner commented Jan 17, 2024 • edited Loading

underlying AnyIO bug

how it caused #669

gschaffner commented Jan 17, 2024

agronholm left a comment

Choose a reason for hiding this comment

agronholm commented Jan 23, 2024

gschaffner commented Jan 24, 2024

agronholm commented Jan 25, 2024

Fix `run_process()` and `open_process().aexit` leaking an orphan process when cancelled #672

Fix `run_process()` and `open_process().aexit` leaking an orphan process when cancelled #672

gschaffner commented Jan 17, 2024 •

edited

Loading