Queries start failing after some time #67

nerandell · 2015-07-31T03:47:06Z

I am facing an issue where my db queries start failing after sometime. Here is some code that I use to create a pool

class PostgresStore:
    _pool = None
    _connection_params = {}

    @classmethod
    def connect(cls, database:str, user:str, password:str, host:str, port:int):
        """
        Sets connection parameters
        """
        cls._connection_params['database'] = database
        cls._connection_params['user'] = user
        cls._connection_params['password'] = password
        cls._connection_params['host'] = host
        cls._connection_params['port'] = port

    @classmethod
    def use_pool(cls, pool:Pool):
        """
        Sets an existing connection pool instead of using connect() to make one
        """
        cls._pool = pool

    @classmethod
    @coroutine
    def get_pool(cls) -> Pool:
        """
        Yields:
            existing db connection pool
        """
        if len(cls._connection_params) < 5:
            raise ConnectionError('Please call SQLStore.connect before calling this method')
        if not cls._pool:
            cls._pool = yield from create_pool(**cls._connection_params)
        return cls._pool

I use aiohttp to create a web server and once the server is up and running for a few hours, db quesries start failing. All other apis work perfectly fine. Here are the logs attached:

2015-07-14 16:19:18,531 ERROR [base_events:698] Fatal read error on socket transport
protocol: 
transport: 
Traceback (most recent call last):
  File "/usr/lib/python3.4/asyncio/selector_events.py", line 459, in _read_ready
    data = self._sock.recv(self.max_size)
TimeoutError: [Errno 110] Connection timed out
2015-07-14 17:26:42,070 ERROR [base_events:698] Fatal error on aiopg connection: bad state in _ready callback
connection: 
2015-07-14 17:26:58,017 ERROR [base_events:698] Fatal error on aiopg connection: bad state in _ready callback
connection: 
2015-07-14 17:27:02,606 ERROR [base_events:698] Fatal error on aiopg connection: bad state in _ready callback
connection: 
2015-07-14 17:27:03,226 ERROR [base_events:698] Fatal error on aiopg connection: bad state in _ready callback
connection: 
2015-07-14 17:27:14,691 ERROR [base_events:698] Fatal error on aiopg connection: bad state in _ready callback
connection: 
2015-07-14 18:47:51,427 ERROR [base_events:698] Fatal read error on socket transport
protocol: 
transport: 
Traceback (most recent call last):
  File "/usr/lib/python3.4/asyncio/selector_events.py", line 459, in _read_ready
    data = self._sock.recv(self.max_size)
TimeoutError: [Errno 110] Connection timed out
2015-07-14 18:50:02,499 ERROR [base_events:698] Fatal read error on socket transport
protocol: 
transport: 
Traceback (most recent call last):
  File "/usr/lib/python3.4/asyncio/selector_events.py", line 459, in _read_ready
    data = self._sock.recv(self.max_size)
TimeoutError: [Errno 110] Connection timed out

asvetlov · 2015-08-13T00:35:04Z

Are you sure that error is generated by aiopg?
Looks like I've got error from transport but aiopg doesn't use asyncio transports at all.

P.S. Please don't use global variables, push aoipg pool into aiohttp.web.Application:

app = aiohttp.web.Application(...)
app['pgpool'] = yield from aiopg.create_pool(...)

in request handler

def handler(request):
pool = request.app['pgpool']

nerandell · 2015-08-13T04:22:27Z

The http requests that do not depend on database work perfectly fine.
I will try your suggestion. However, why should declaring a global variable be an issue in the first place though?

asvetlov · 2015-08-13T12:58:28Z

No, global variables are not related to your issue.
But as far as I see you exception traceback is not related to aiopg.

bitdancer · 2015-09-24T14:59:24Z

I have a similar situation. The process works great for extended periods of time, but then suddenly the database connection stops working. In my log I see only the following:

2015-09-24 02:04:31,841 asyncio ERROR Fatal error on aiopg connection: bad state in _ready callback

I'm giong to be trying to add more error recovery and logging code, so if I learn anything more I'll update.

bitdancer · 2015-09-24T15:10:19Z

By the way, I get 10 copies of that message in my log one right after the other. I presume it is one for each connection in the pool.

jettify · 2015-09-27T17:33:45Z

Is any way to reproduce this? We had similar issue with aiomysql due to MySQL server connection timeout, successfully fixed in aio-libs/aiomysql#27

bitdancer · 2015-09-28T13:37:57Z

I don't have a reproducer yet, but the data I have so far indicates it is probably a result of a timeout on the DB connection. I'm planning to try to create a reproducer once I have more data.

bitdancer · 2015-10-01T13:34:07Z

My case appears to be a consequence of #65. The reproducer in that issue produces the same kind of traceback that I'm seeing in my applicaion, which I now catch and ignore. Then 30 seconds later I get the bad state in _ready.

I'll investigate fixing this, but it will probably be a week or more before I can circle back to this problem.

pankajnits · 2015-12-16T17:08:04Z

@asvetlov i was able to reproduce this scenario and basically it was due to timeoutError.
i just ran my test cases on your recent commits and issue no longer exists.

but there is a suggestion https://github.com/aio-libs/aiopg/blob/master/aiopg/connection.py#L182 could you make it to execute in asyncio.sheild() and change cancel() method accordingly, atleast in this case it will ensure that future gets executed properly even if timeout or cancelledError occurs

asvetlov · 2015-12-16T17:13:18Z

@pankajnits thanks for good news!

I don't follow your suggestion, sorry.
Cancellation is already executed under the shield.
Waiting for future object doesn't require shielding, isn't it?

pankajnits · 2015-12-16T17:24:47Z

@asvetlov its good that cancel is ensured , but out of curiosity i wanted to ask that what if we can ensure future is shielded irrespective of timeouts.

pankajnits · 2015-12-16T17:30:28Z

also will it help in solving #65

pankajnits · 2015-12-17T04:39:46Z

in case of waiter being timedout and scheduled for cancellation, i can see that there are queries in postgres db that are in "idle" state

DevSpouk · 2015-12-28T16:17:46Z

Simulate load error. The error manifests itself only when the load increases.
https://gist.github.com/DevSpouk/92a51f2bffb5f6f67352

ludovic-gasc · 2015-12-28T17:48:58Z

I've the same issue with a big load when I launch a benchmark on any project based on aiopg.

ludovic-gasc · 2016-01-05T23:29:21Z

If somebody has a solution for that, I'm interested in, I'm working for the round 12 of FrameworkBenchmarks.
If I've a solution before Jan 9, I'll push the solution for the round 12.
After, it will be the next round, except if they add more delay.

steveholden · 2016-05-27T11:15:22Z

Is this issue still present? We've been looking for an asynchronous wrapper for Postgres access, but this would probably be a show-stopper.

jettify · 2016-06-01T01:06:03Z

Looks like this issue was fixed. @asvetlov could you confirm?

asvetlov · 2016-07-16T15:25:10Z

I believe we we have solved the issue.

rudyryk mentioned this issue Sep 27, 2015

Connection handling in web application example #76

Closed

asvetlov closed this as completed Jul 16, 2016

aio-libs-bot mentioned this issue Feb 5, 2019

Timed out statements raises poor exception content #537

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Queries start failing after some time #67

Queries start failing after some time #67

nerandell commented Jul 31, 2015

asvetlov commented Aug 13, 2015

nerandell commented Aug 13, 2015

asvetlov commented Aug 13, 2015

bitdancer commented Sep 24, 2015

bitdancer commented Sep 24, 2015

jettify commented Sep 27, 2015

bitdancer commented Sep 28, 2015

bitdancer commented Oct 1, 2015

pankajnits commented Dec 16, 2015

asvetlov commented Dec 16, 2015

pankajnits commented Dec 16, 2015

pankajnits commented Dec 16, 2015

pankajnits commented Dec 17, 2015

DevSpouk commented Dec 28, 2015

ludovic-gasc commented Dec 28, 2015

ludovic-gasc commented Jan 5, 2016

steveholden commented May 27, 2016 •

edited

jettify commented Jun 1, 2016

asvetlov commented Jul 16, 2016

Queries start failing after some time #67

Queries start failing after some time #67

Comments

nerandell commented Jul 31, 2015

asvetlov commented Aug 13, 2015

nerandell commented Aug 13, 2015

asvetlov commented Aug 13, 2015

bitdancer commented Sep 24, 2015

bitdancer commented Sep 24, 2015

jettify commented Sep 27, 2015

bitdancer commented Sep 28, 2015

bitdancer commented Oct 1, 2015

pankajnits commented Dec 16, 2015

asvetlov commented Dec 16, 2015

pankajnits commented Dec 16, 2015

pankajnits commented Dec 16, 2015

pankajnits commented Dec 17, 2015

DevSpouk commented Dec 28, 2015

ludovic-gasc commented Dec 28, 2015

ludovic-gasc commented Jan 5, 2016

steveholden commented May 27, 2016 • edited

jettify commented Jun 1, 2016

asvetlov commented Jul 16, 2016

steveholden commented May 27, 2016 •

edited