Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

I think Socket.close should be a regular method, not an async method #70

Closed
njsmith opened this issue Jul 5, 2016 · 13 comments
Closed

Comments

@njsmith
Copy link
Contributor

njsmith commented Jul 5, 2016

So obviously, you cannot call async methods from __del__ methods. And because of this, close methods generally have to be non-async as well. And in particular, Python enforces that the generator protocol's close method is synchronous and cannot yield:

In [5]: def f():
   ...:     try:
   ...:         yield 1
   ...:     finally:
   ...:         yield 2
   ...:         

In [6]: gen = f()

In [7]: next(gen)
Out[7]: 1

In [8]: gen.close()
RuntimeError: generator ignored GeneratorExit

This doesn't come up much in the context of coroutines, because coroutines are usually iterated to exhaustion. But as soon as you have asynchronous generators then it's a real problem.

For example, imagine that we have an asynchronous WSGI-like API that wants to use an async for loop to collect up the body of an HTTP response. And suppose that we have some code that builds that body by reading from a remote socket::

from async_generator import async_generator, yield_
@async_generator
async def get_html_body_from_socket():
    sock = await do_connect(...)
    try:
        while True:
            got = await sock.recv(1024)
            if not got:
                return
            await yield_(got)
    finally:
        await sock.close()

And suppose that this iterator is not exhausted -- for example, because the HTTP client went away, so the WSGI-like server code stopped iterating over it. Then eventually either the server code or the Python runtime will call .close() on the get_html_body_from_socket coroutine, this will inject a GeneratorExit exception, and we are required to handle this exception without yielding. So the above code is... kinda wrong. You'll get away with it in practice because Curio's Socket.close() doesn't actually yield, but the easy general rule is that coroutine finally blocks should never contain await, and this violates that. And this constraint means that Curio's Socket.close() must never change in the future to yield, so we might as well make it non-async to remind us of this and enforce it.

For the same reason, I guess Socket should implement the synchronous __enter__/__exit__ protocol instead of __aenter__/__aexit__.

Possibly there are other places in curio with the same issue.

@dabeaz
Copy link
Owner

dabeaz commented Jul 5, 2016

Ah. I see that you have fallen into the async close() trap. I was wondering how long it was going to take for someone to explode their head on this... ;-).

Let me preface this by saying that I do NOT know the correct answer to the problem, but here are some of the reasons why I defined it as an async method.

  1. Consistency. All of the other methods concerning I/O are async methods so for the purpose of uniformity in the API, close was also defined as async. Likewise, only an async-context manager is provided.
  2. I/O Buffering. Curio allows stream-like objects to be wrapped with async. Under the covers, streams might be using some kind of I/O buffering. When you close a stream, the contents of the underlying buffer might not yet be written. Thus, there is an implicit flush() operation that takes place. In performing the flush(), it's possible you'll take an exception on non-blocking I/O. So, to handle that, you'd have coordinate back with the scheduler. This means that it has to be async. This issue is related to this http://bugs.python.org/issue25476

Concerning context-managers, I didn't provide sockets with a normal context manager (enter, exit) primarily because of the same issue, but also to prevent possible mistakes. For example, using a synchronous context manager in a context where an asynchronous one should have been used.

Again, I don't know the "answer" on this, but this is a weird corner of the whole API. Part of a bigger discussion where async is allowed and not allowed perhaps.

@dabeaz
Copy link
Owner

dabeaz commented Jul 5, 2016

Would add: If there is the concept of an "Asynchronous Generator", the issue of how to handle GeneratorExit on close() is going to be a bigger issue than curio. For example, are you simply not allowed to perform any async operation at all in a finally block? Yikes. I could see that being a big can of worms.

@njsmith
Copy link
Contributor Author

njsmith commented Jul 6, 2016

a big can of worms.

Yeah, it's definitely a big can of worms! But it's unavoidable AFAICT. Some facts that are unlikely to change:

  • async functions, and objects based on async functions like async generators, can be garbage collected.
  • garbage collection happens in an environment where no coroutine runner is available.
  • at garbage collection time, we have to perform clean up

Conclusion: every async function MUST have the capability to synchronously unwind and clean up after itself

I guess one could try to change those bullet points somehow, but it seems difficult. And AFAICT, once you have those, then the conclusion follows inevitably.

When I first read through curio I saw your comment with that rationale for async def close, and I too sort of shrugged and said "huh, yeah, that's a tricky one, I'm not sure what the right way is either". But now that I've bumped into this cleanup issue, I think that tips the balance :-). For something like socket close that can and always will be synchronous, I think the right solution is just to make close synchronous and to only support __enter__/__exit__ (not __aenter__/__aexit__), in order to avoid confusion (like you say).

The broader/trickier question is indeed what to do with types like your async file with buffering, or async generators, where there are legitimate reasons why you'd like to be able to do async I/O during cleanup. In these cases you must have some strategy available for doing synchronous cleanup (as per the argument above), but it would kinda suck if we made that the only option available... yet we don't want to trick users into calling the wrong one.

One idea: for these cases, create the convention that you get methods:

  • async def aclose(self): does a clean shutdown
  • def synchronous_close(self): emits a SynchronousCleanupWarning and does a violent shutdown (e.g. discarding buffers and closing sockets, like how file.close apparently works according to your bugs.python.org/issue25476 link)
  • __del__: calls synchronous_close
  • __enter__ or __exit__/close: undefined, to avoid confusion
  • __aenter__ / __aexit__: defined, and __aexit__ looks like:
async def __aexit__(self, exc_type, exc, tb):
    if exc_type is GeneratorExit:
        # Must be handled synchronously
        self.synchronous_close()
    else:
        await self.aclose()

So the net effect of all of the above would be that user code can just do async with ...:, and if the user code runs to completion it will DTRT. But if users don't use async with ...:, or they do but their stack gets garbage collected before completing (i.e. their caller doesn't clean up properly), then we still do get a clean up, just a sub-optimal one with a warning.

And I guess aclose on async generators would have to inject a new built-in exception type, like

async def aclose(self):
    if not self.gi_running:
        # if we haven't started, nothing to do
        return
    try:
        await self.athrow(AsyncGeneratorExit)
    except AsyncGeneratorExit:
        pass
    except:
        raise
    else:
        raise RuntimeError('async generator ignored AsyncGeneratorExit')

It mighhht even make sense to do something like this for async functions, not just async generators... not sure.

CC: @1st1

@njsmith
Copy link
Contributor Author

njsmith commented Jul 6, 2016

...I should probably forward this to async-sig or something, huh

@dabeaz
Copy link
Owner

dabeaz commented Jul 6, 2016

For the sake of argument, I'm going to take an opposing view ;-).

With coroutines, they must always execute under the control of some kind of supervisor or manager. In the case of Curio, that's the kernel, but in a different environment such as asyncio, it might be an event loop. I think that one could reasonably claim that simply letting an unfinished coroutine go out of scope and garbage collect without some kind of controlled shutdown is an error. For example, uncontrolled coroutine shutdown makes it impossible to properly use any async-context manager, makes it impossible to use the await statement in finally blocks and has other potentially bad side-effects. For example, independently of Curio, if you had a coroutine like this:

  async def spam():
          async with s:
                  ...

It's understood that the __aexit__() method of that context manager might involve coroutines and potentially involve blocking/scheduling by the manager. I don't see how you'd ever make this code reliable if it simply died (or you called the generator's .close() method) and it never invoked the __aexit__() method. For example, what happens if that code is using various synchronization primitives like locks and semaphores? And never allowing anyone to use a coroutine in the __aexit__() method sort of defeats the whole purpose of having an async context manager. No, as part of managing a coroutine, the kernel/event loop is not only responsible for scheduling, but I think it must manage the entire life-cycle of the coroutine.

In Curio, I think the solution to this problem is to properly cancel a task using the Task.cancel() method. That is, if a coroutine must terminate early for some reason, you should explicitly cancel it instead of relying on the whims of garbage collection. This forces the task to abandon whatever operation it's waiting for, but it also causes Curio to properly execute all of the appropriate __aexit__() methods on its way out.

This is the purpose of using the Curio Kernel.run(shutdown=True) method as well. If you do this, all remaining tasks are cancelled, but through an orderly shutdown.

There are tricky cases that probably require some thought. For example, what happens if a coroutine raises a SystemExit? I'd claim that such an action should probably cause all coroutines to cancel and terminate in an orderly way.

@njsmith
Copy link
Contributor Author

njsmith commented Jul 6, 2016

@njsmith
Copy link
Contributor Author

njsmith commented Jul 6, 2016

Hmm, that's an interesting and plausible argument. I'll buy that, at that least tentatively.

I think there are still two questions though.

First, we do still need to know what happens if someone breaks your new rule and lets an un-exhausted coroutine be gc'ed. I guess that the coroutine __del__ method should issue a warning if it's called on any un-exhausted coroutine, similarly to how to currently issues a warning if it's called on a non-awaited coroutine? (Rationale: this is a rather serious bug in a coroutine runner that could otherwise lurk unnoticed for quite a while.) And then I guess it should attempt a synchronous GeneratorExit, just like it does now? It's more likely to preserve invariants than doing nothing, and I think at least some library authors will insist on having the option to make their code more mistake-proof by handling this "can't happen" case.

Second, not all references to sockets (or whatever) are coroutine locals, so even if we've made sure that coroutine locals always get cleaned up under the coroutine runner's supervision, we still have to figure out what to do in these other cases.

If someone writes code like

async def f():
    stream = await open_async_stream()
    await stream.write(...)
    # stream falls out of scope without being closed

then, well, they should have wrapped all that in an async with, but nonetheless something has to happen. What? I guess the stream object has to print a warning and give up on flushing buffers, but presumably it should still close the underlying socket that it owns? Or I guess you could just assume the garbage collector will take care of that too eventually...

Similar example (using the async yield notation that @1st1 has been talking about):

async def data_chunks():
    async with open_connection(...) as sock:
        while True:
            chunk = await sock.recv(...)
            if not chunk:
                break
            async yield chunk

async def f():
    async for chunk in data_chunks():
        if chunk == b"xxx":
            return # exit early, without exhausting the async iterator

Again, obviously the right solution is to throw in an async with ..., but you do need some fallback.

I think this still ends up leading back to my proposal above, of having a standard cleanup interface that's async (and for async generators I think that means the close method has to be called aclose with regular close being an error), plus a fallback synchronous interface? And, what I didn't say but probably should have, making the synchronous interface issue a warning when hit.

And the synchronous interface should be public, because (a) this is python, (b) it's easier to write a correct last-ditch cleanup method if you can delegate some of the work.

@dabeaz
Copy link
Owner

dabeaz commented Jul 6, 2016

On the first question, I don't think an unexpected GC would happen in Curio because the kernel maintains a task table of all actively maintained coroutines. Nothing gets removed from the task table until it has fully run to termination. So, in that context, there is always at least one reference to a coroutine sitting around until Curio thinks it's done. I'd agree that in the general case of an asynchronous generator, the possibility of GC is a real concern though.

There's one other subtlety to this as well. Assuming that a coroutine was allowed to perform async operations on shutdown (i.e., GeneratorExit), how do you guarantee that it actually terminates? It's possible that any async operation could block, maybe blocking indefinitely, and the whole shutdown process would stall for some indeterminate period of time (imagine that the whole thing gets blocked up in the middle of an __aexit__() method). Curio has this issue in the context of task cancellation. Right now, a cancellation request in Curio raises an exception, but the task is allowed to continue in order to cleanup. The task requesting cancellation blocks. So, if for some reason, the cancellation request was ignored, you'd end up with some kind of deadlock. I don't know if that's a good strategy, but it does have the distinct feature of things "will go wrong" if a task ignores a cancellation request.

On the second question regarding cleanup, I originally had some code to explicitly close sockets in the __del__ method, but later decided against it, preferring to just let the normal Python GC deal with it. In Curio, everything is basically a wrapper layer. So, instead of performing any kind of action, I just let the wrapper layer get destroyed (which in turn would GC the underlying socket if there's no other reference to it sitting around). I can't remember the exact context now, but I seem to remember having some problems with putting close() operations in __del__. It might have been a situation where I wanted to take a socket that had already been wrapped by Curio, but then wrap it with a different kind of I/O layer. In doing that, I didn't want the original layer (being discarded) to close the socket by accident.

I'm not sure how I feel about a special aclose() vs close() method at the moment. I suppose that if an explicit close were needed immediately, one could do os.close(sock.fileno()). Or maybe a special synchronous sock.closenow() method could be added. I'd need to think more about that.

@dabeaz
Copy link
Owner

dabeaz commented Jul 6, 2016

I'm just noting this now in relation to discussion on the async-sig. Curio uses async context managers for much more than just closing sockets. For example, they're used with synchronization primitives and I was planning to use them to do some advanced kinds of task scheduling involving those primitives. This won't be possible if Python devs decide that coroutines can't be used in the __aexit__() method.

@imrn
Copy link

imrn commented Nov 19, 2016

On its own, sync close() 'seems' harmless. I've used it in asyncio without any problems.
And you've already discussed the mechanics all in detail.

However, going down to the road of layered generics, combined with other async ops,
my 'gut' tells me breaking consistency is not a good thing.

Given that it's a not very common operation, cutting another async call doesn't save much.
So I guess keeping it as 'async' is fine.

@imrn
Copy link

imrn commented Nov 19, 2016

But, there are cases where you'll want to close without any delay.
You may not want to wait for async close() to be scheduled.
Hence you'll need another sync close(). :(

Personally I don't like the idea of the two different close()'s.
But we are faced with the consequences of async programming..

@imrn
Copy link

imrn commented Nov 20, 2016

Given that 'Nice to have' s quickly turning into problematic things
most the time, I'm now thinking in terms of 'Absolute Necessity':

What are the 'Really' hard constraints for having async close()?
Bulleted one-liners please..

@dabeaz
Copy link
Owner

dabeaz commented Nov 20, 2016

I'm keeping close() as an async method. There are sometimes cleanup actions that need to be performed (e.g., flushing I/O buffers, unregistration, etc.). I'm also going to be doing something in the close() method to address part of issue #104.

@dabeaz dabeaz closed this as completed Dec 10, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants