New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Not waiting for connection_lost() when transport closed #1925

Open
omnicognate opened this Issue May 26, 2017 · 19 comments

Comments

Projects
None yet
4 participants
@omnicognate

omnicognate commented May 26, 2017

Long story short

aiohttp calls close() on the transport and then immediately discards it without waiting for connection_lost(). This is a problem for SSL transports, which have to shut down asynchronously, and it appears aiohttp only "works" with the standard library's SSL transport because of a bug in that transport.

Expected behaviour

aiohttp waiting for connection_lost() before discarding transport

Actual behaviour

It not doing so

Steps to reproduce

import aiohttp
import asyncio

async def main():
    conn = aiohttp.TCPConnector(verify_ssl=False)
    async with aiohttp.ClientSession(connector=conn) as session:
        async with session.get('https://httpbin.org/') as resp:
            resp.raise_for_status()

loop = asyncio.get_event_loop()
try:
    loop.run_until_complete(main())
finally:
    loop.close()

Run this program and breakpoint connection_made() and connection_lost() in client_proto.py. The former is called, but not the latter.

The program appears to run successfully, but to convince you that something is severely amiss, add a __del__ method to the class _SelectorSocketTransport in asyncio's selector_events.py and breakpoint it and that transport's close() method. You'll see that close() is never called and __del__ gets called during shutdown, after the loop has been closed and the module finalised, because it still has a reader registered and is therefore leaked (while still waiting on I/O) in a reference cycle with the event loop.

If I'm understanding it right, the overall behaviour appears to be due to two bugs, one in asyncio and one in aiohttp:

  1. When close() is called on the SSLProtocolTransport in asyncio's ssl_proto.py it initiates an asynchronous shutdown process, which should ultimately result in it calling connection_lost(). However, this doesn't progress for reasons I haven't fully figured out, so connection_lost() is never called. The shutdown process is clearly broken because it never results in close() being called on the underlying transport, which is leaked as shown above.
  2. aiohttp doesn't wait for connection_lost(), so the hang that one would expect as a result of this never happens. By luck, the broken shutdown sequence never actually causes any errors.

I'm actually more interested in the latter, the aiohttp behaviour, because I've written my own SSL transport (based on pyopenssl), which does a correct asynchronous shutdown sequence. When aiohttp discards my transport without waiting for connection_lost(), it causes errors, which I don't think are my fault.

Your environment

Tested on Windows and Linux with python 3.5.1 and 3.5.2 and aiohttp 2.0.7 and 2.1.0.

@omnicognate

This comment has been minimized.

omnicognate commented May 26, 2017

I think the waiting for the transport shutdown would have to happen within the aexit coroutine in ClientSession, which currently just calls an ordinary function to perform the close. I don't see any other stage in program execution where the wait could be done.

@fafhrd91

This comment has been minimized.

Member

fafhrd91 commented Jun 18, 2017

What is the reason to wait until close procedure completes in client code?

BaseConnection accepts enable_cleanup_closed parameter, it adds special wait queue
for ssl connections https://github.com/aio-libs/aiohttp/blob/master/aiohttp/connector.py#L151
but that is work around buggy server which never complete close procedure.

@asvetlov

This comment has been minimized.

Member

asvetlov commented Jun 26, 2017

Well, graceful closing all connections before exit from async with ClientSession() might be good idea.
But I see problems with implementation until ClientSession.close() is actually a function returning a future object.
This behavior should be supported until we drop Python 3.4 along with with ClientSession() etc.
So the issue implementation should be postponed.

@fafhrd91

This comment has been minimized.

Member

fafhrd91 commented Jun 26, 2017

connection_lost has nothing to do with client session. Also why should we care?

@asvetlov adding graicful close support to session is bad idea.

@asvetlov

This comment has been minimized.

Member

asvetlov commented Jun 26, 2017

Maybe graceful close is bad wording but I want to finish client.close() only after getting connection_lost notifications from all opened connections.

Otherwise people should put await sleep(0) but even this doesn't help for SSL connections -- they need more than one sleep for shut down.

@fafhrd91

This comment has been minimized.

Member

fafhrd91 commented Jun 26, 2017

So what is so important in close? Is session is alive we can wait for connection_close, but if developer dropped session why aiohttp should wait for connection_lost? That is expected behavior otherwise will get same as with old release method implemention. What explicit behavior then add something like release_all_connections. Another question how often you drop session in normal implication? Once per process run? Sure it makes live of transport developers harder, but who cares?

@omnicognate

This comment has been minimized.

omnicognate commented Jun 27, 2017

Another problem with implementing this at the moment is that the default SSL transport never calls connection_lost() so you'll get a hang. I might get around to submitting an issue to python for that at some point, but I've had to move on from asyncio entirely as a result of this and numerous other bugs, and the fact that there appears to be little clarity on how to correctly implement and use the interfaces.

As to why proper shutdown behaviour is required, it's because without it it is not safe for the client to close the event loop after the session is closed, and the client has no reliable way to wait for a time at which it is safe to discard the loop.

SSL has a shutdown sequence that requires multiple steps, which in asyncio are naturally implemented asynchronously. Calling close() initiates that sequence, but the indication that it has finished is connection_lost(). If you don't wait for that sequence to complete before the aiohttp session's close has finished then the program will proceed without the SSL shutdown sequence having been completed. Worse, the program will proceed while the shutdown is ongoing, with tasks pending on the event loop. There is then no way for the client to wait for those pending tasks to be complete before destroying the event loop, and destroying the event loop without them completing will cause "Task was destroyed but it is pending" warnings and a host of other problems. As @asvetlov comments, even doing an await sleep(0) isn't sufficient in the client as there can be an arbitrary number of such awaits required before the queue is empty (2 in the particular case of SSL) and there is no way to explicitly check for the queue being empty.

The only reason that these problems are not currently occurring is that asyncio's default SSL transport isn't proceeding with its multi-step SSL shutdown sequence, so there are no tasks pending on the queue. This is a bug in that transport, and means that it never calls connection_lost().

If I'm understanding it correctly, this is a very nasty 2-way bug-bug dependency between aiohttp and asyncio. Fixing either of the two bugs will result in breakage: if you fix asyncio's SSL transport to do a correct SSL shutdown sequence without fixing aiohttp to wait for it to complete then existing clients who discard the event loop immediately after the aiohttp session is closed will be destroying the event loop with tasks pending; if you make aiohttp wait for the transport to do its shutdown without fixing the SSL transport to actually proceed with its shutdown, you will cause existing clients to hang.

It's a bit of a Gordian knot, if I'm understanding it correctly.

@omnicognate

This comment has been minimized.

omnicognate commented Jun 27, 2017

Oops, I didn't mean to close the issue!

@omnicognate omnicognate reopened this Jun 27, 2017

@asvetlov asvetlov added this to the 3.0 milestone Jun 27, 2017

@fafhrd91

This comment has been minimized.

Member

fafhrd91 commented Jun 27, 2017

I dont think we should do anything until asyncio get fixed or changes. I still don't see why we should care about transports if developer dropped session.

@asvetlov asvetlov changed the title from Not waiting for connection_lost() when transport closed to [on hold] Not waiting for connection_lost() when transport closed Jun 27, 2017

@asvetlov

This comment has been minimized.

Member

asvetlov commented Jun 27, 2017

Ok, let just put the issue on hold until async fixes SSL transport.
But please don't close -- I want to keep tracking.

@fafhrd91

This comment has been minimized.

Member

fafhrd91 commented Jun 27, 2017

I still don't see reason why aiohttp needs to wait for connection_close call. lets just create issue in cpython repo and make reference.

@asvetlov

This comment has been minimized.

Member

asvetlov commented Jun 27, 2017

At least because without it user will get a warning about non-closed resources in debug mode IIRC.

@asvetlov

This comment has been minimized.

Member

asvetlov commented Jun 27, 2017

Or he need put a sleep after closing session but before program finishing.

@fafhrd91

This comment has been minimized.

Member

fafhrd91 commented Jun 27, 2017

this will complicates logic, and reason only debug message. doesn't sound like a good win for me.

@omnicognate

This comment has been minimized.

omnicognate commented Jun 28, 2017

See https://docs.python.org/3/library/asyncio-dev.html#pending-task-destroyed

The warnings are there for a reason. Closing the queue with tasks pending means that those tasks will never run. The implications of that depend on exactly what the tasks were supposed to do, but could include leaks, incorrect protocol behaviour, crashes, anything really. I only listed the warnings because they're a specific, predictable result, but the warnings are there to alert you to the less predictable and more serious potential consequences.

If you don't think the warnings are serious enough to warrant any effort to avoid them you could raise an issue suggesting they be removed, but I doubt it would get very far.

@omnicognate

This comment has been minimized.

omnicognate commented Jun 28, 2017

Oh, just one detail: These aren't "debug messages" (with the implication that they are a detail that can be ignored in production). They are warnings.

@fafhrd91

This comment has been minimized.

Member

fafhrd91 commented Jun 29, 2017

But close is not a task, it just bunch of callbacks, it is transpor's responsibility to execute all of them

@smurfix

This comment has been minimized.

smurfix commented Jan 19, 2018

This boils down to programming contracts. When I use async with foo() as bar: I expect foo()'s _aenter__() code to set up the "bar" object so that I can use it, and when the block goes out of scope I expect its __aexit__(…) to return after everything is closed so that I can do whatever I need to do next – including shutting down the program – and without requiring some arcane asyncio.sleep() call to pass the time for connection_lost() and similar callbacks to run.

If that's not possible due to the way asyncio is designed, fine, document that – and urge people to ultimately find a better way of handling async code. Like, well, https://github.com/python-trio/trio for example.

NB: is there an actual issue which this bug is on hold for?

@asvetlov asvetlov modified the milestones: 3.0, 3.1 Feb 9, 2018

@asvetlov asvetlov modified the milestones: 3.1, 3.2 Mar 22, 2018

@asvetlov asvetlov modified the milestones: 3.2, 3.3 May 7, 2018

@asvetlov asvetlov changed the title from [on hold] Not waiting for connection_lost() when transport closed to Not waiting for connection_lost() when transport closed Oct 2, 2018

@asvetlov

This comment has been minimized.

Member

asvetlov commented Oct 2, 2018

We should support it finally.

@asvetlov asvetlov modified the milestones: 3.3, 3.5 Oct 18, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment