Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

connection.cancel can hang #491

Closed
thehesiod opened this issue Jul 31, 2018 · 30 comments · Fixed by #570
Closed

connection.cancel can hang #491

thehesiod opened this issue Jul 31, 2018 · 30 comments · Fixed by #570

Comments

@thehesiod
Copy link
Contributor

we noticed this callstack in our prod server which hung our main thread breaking the server:

Jul 30 16:48:45.114            File "/usr/local/lib/python3.6/asyncio/base_events.py", line 422 in run_forever
Jul 30 16:48:45.114            File "/usr/local/lib/python3.6/asyncio/base_events.py", line 1434 in _run_once
Jul 30 16:48:45.114            File "/usr/local/lib/python3.6/asyncio/events.py", line 145 in _run
Jul 30 16:48:45.114            File "/usr/local/lib/python3.6/site-packages/aiopg/connection.py", line 226 in cancel

suggest loop.run_in_executor'ing it as supposedly this method is thread safe. Ideally there would be a timeout that affects this as well. I'm not sure if the DSN connect_timeout applies

@brianmaissy
Copy link
Contributor

brianmaissy commented May 30, 2019

Same here, after the connection was lost to the database, my process was hung, so I connected with gdb and saw that connection.cancel() did not return.

Running aiopg 0.16.0 on python 3.6.7 on Ubuntu 18.04

Traceback (most recent call first):
  <built-in method cancel of psycopg2.extensions.connection object at remote 0x7fe5b517b180>
  File "/opt/medigator/env/lib/python3.6/site-packages/aiopg/connection.py", line 232, in cancel
    self._conn.cancel()
  File "/usr/lib/python3.6/asyncio/events.py", line 145, in _run
    self._callback(*self._args)
  File "/usr/lib/python3.6/asyncio/base_events.py", line 1440, in _run_once
    handle._run()
  File "/usr/lib/python3.6/asyncio/base_events.py", line 427, in run_forever
    self._run_once()
  File "/usr/lib/python3.6/asyncio/base_events.py", line 460, in run_until_complete
    self.run_forever()

EDIT: see this: psycopg/psycopg2#561. The underlying problem is a hung tcp connection, which psycopg doesn't have a good way of detection without setting a timeout on the socket. However, the fact that psycopg2.connection.cancel() hangs doesn't mean that aiopg should let it block the event loop. I agree with @thehesiod's suggestion that maybe aiopg could call it from a thread?

@asvetlov
Copy link
Member

Please clarify: is it a hang in psycopg2.connection.cancel() method or aiopg code?

@brianmaissy
Copy link
Contributor

In psycopg. But aiopg calls it from code running in the event loop.

@asvetlov
Copy link
Member

If .cancel() is blocking call and even hanging -- it should be removed from aiopg API.
More precisely, aiopg should raise psycopg2.ProgrammingError with a message that the cancel is not supported in async mode.

@brianmaissy
Copy link
Contributor

I was mistaken earlier, the hanging call was not in aiopg.connection.cancel(), rather in the internal cancel() function in connection._poll(), on line 232 in connection.py as you see in the stack trace above.

But now that I take a deeper look at the implementation of aiopg, I see that I was also mistaken about how the async is implemented. I had assumed that psycopg2 provides only a synchronous interface, and aiopg ran it in a thread pool behind the scenes. But now that I see that psycopg2 itself is the one implementing the async, It looks like the problem is that psycopg2 doesn't have an asynchronous implementation of cancel().

@brianmaissy
Copy link
Contributor

I just found this: psycopg/psycopg2#113 (comment)

The current best solution is just to close the connection on callback error. PQcancel is blocking as well

@brianmaissy
Copy link
Contributor

And additionally, you're right. It's not the case I ran into, but in any case maybe cancel() should be removed from the API

@asvetlov
Copy link
Member

Would you make a PR for raising psycopg2.ProgrammingError unconditionally if Connection.cancel() is called?
@vir-mir could you track the discussion and review/accept the proposed change?
Honestly, I prefer to don't spend my personal time on this project. I have a lot of other FOSS projects where my involvement is crucial. I hope you understand my position.

@brianmaissy
Copy link
Contributor

brianmaissy commented Jun 1, 2019 via email

@asvetlov
Copy link
Member

asvetlov commented Jun 1, 2019

  1. Are you talking about cancel() or close()?

  2. Seems a shame not
    to provide close() if there's a way to implement it

You can try to make a PR but I don't see how to mix sync and async mode for the same connection. It makes an unreliable mess at least.

  1. this would be a breaking API change

No. Explicit raising an exception for a broken code is not the API breakage but a fix. It deserves a new library version to emphasize the change.

Say again, if the feature is not provided by psycopg2 we should not to emulate it.

@brianmaissy
Copy link
Contributor

brianmaissy commented Jun 1, 2019 via email

@brianmaissy
Copy link
Contributor

Related: #375

@brianmaissy
Copy link
Contributor

Now that I think about it a bit more, starting a thread might not be up to standard, because someone could be using async to solve the C10K problem, and might make large numbers of calls to cancel() concurrently. In that case, starting a large number of threads might not be scalable.

@asvetlov
Copy link
Member

asvetlov commented Jun 2, 2019

This is not about C10K problem but worse.
TCP socket has two mutually exclusive modes: blocking and non-blocking.
Very simple, like Schrodinger's Cat life status :)

psycopg2 in async-mode switches connection sockets to non-blocking, this feature is used by aiopg.
If psycopg2.connection.cancel() blocks it means that the socket is non-blocking for everything except .cancel(). This is not supposed to work by design.

That's why I wrote: if psycopg2 doesn't support async cancel that aiopg should drop this (already not working and buggy) feature.

@brianmaissy
Copy link
Contributor

brianmaissy commented Jun 2, 2019 via email

@asvetlov
Copy link
Member

asvetlov commented Jun 2, 2019

Aaah, the answer is very simple: close the socket and drop the connection.
SQL server should (and does) handle these premature client disconnections as a cancellation signal.

@brianmaissy
Copy link
Contributor

You mean close the socket brutally with os.close()? I tried it, and the query continues running on the server. psycopg2.connection.close() doesn't even cancel the query unless you do connection.cancel() first.

@asvetlov
Copy link
Member

asvetlov commented Jun 3, 2019

Sorry for that.
Nothing to do on aiopg side.
You can go and fix .cancel() for psycopg2 though.

@brianmaissy
Copy link
Contributor

psycopg2 doesn't have any better way of doing it than aiopg, since libpq doesn't expose an asynchronous way to do PQcancel. Maybe the best way to do it is just to call cancel() from a thread.

@asvetlov
Copy link
Member

asvetlov commented Jun 3, 2019

No.
Please re-read my message above: the same socket cannot be used in sync and async mode at the same time. If there is not async API for PQcancel we just should remove the method from our API.

@brianmaissy
Copy link
Contributor

It doesn't use the same socket. PQcancel opens a new TCP connection to the server

@brianmaissy
Copy link
Contributor

And again, removing it from the API doesn't solve the problem. We still need to cancel the query if there was an exception during a query.

@thehesiod
Copy link
Contributor Author

probably should call cancel from a executor

@brianmaissy
Copy link
Contributor

probably should call cancel from a executor

I'm in favor. Happy to modify my PR accordingly

@brianmaissy
Copy link
Contributor

@asvetlov @vir-mir
Do you either want to review my PR or approve the plan to implement the cancel from a ThreadPoolExecutor and I'll re-implement it that way?

@asvetlov
Copy link
Member

Will do. aiopg is not the only FOSS project what I'm working on.

@brianmaissy
Copy link
Contributor

bump?

@brianmaissy
Copy link
Contributor

bump again?

@brianmaissy
Copy link
Contributor

brianmaissy commented Jun 20, 2020

@asvetlov @vir-mir do you intend on returning to this at any point? It's been over a year since I opened the PR...

@calebbaker194
Copy link

Still creates an issue for some of us. I guess there isn't much to do?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants