Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Queries start failing after some time #67

Closed
nerandell opened this issue Jul 31, 2015 · 19 comments
Closed

Queries start failing after some time #67

nerandell opened this issue Jul 31, 2015 · 19 comments

Comments

@nerandell
Copy link

I am facing an issue where my db queries start failing after sometime. Here is some code that I use to create a pool

class PostgresStore:
    _pool = None
    _connection_params = {}

    @classmethod
    def connect(cls, database:str, user:str, password:str, host:str, port:int):
        """
        Sets connection parameters
        """
        cls._connection_params['database'] = database
        cls._connection_params['user'] = user
        cls._connection_params['password'] = password
        cls._connection_params['host'] = host
        cls._connection_params['port'] = port

    @classmethod
    def use_pool(cls, pool:Pool):
        """
        Sets an existing connection pool instead of using connect() to make one
        """
        cls._pool = pool

    @classmethod
    @coroutine
    def get_pool(cls) -> Pool:
        """
        Yields:
            existing db connection pool
        """
        if len(cls._connection_params) < 5:
            raise ConnectionError('Please call SQLStore.connect before calling this method')
        if not cls._pool:
            cls._pool = yield from create_pool(**cls._connection_params)
        return cls._pool

I use aiohttp to create a web server and once the server is up and running for a few hours, db quesries start failing. All other apis work perfectly fine. Here are the logs attached:

2015-07-14 16:19:18,531 ERROR [base_events:698] Fatal read error on socket transport
protocol: 
transport: 
Traceback (most recent call last):
  File "/usr/lib/python3.4/asyncio/selector_events.py", line 459, in _read_ready
    data = self._sock.recv(self.max_size)
TimeoutError: [Errno 110] Connection timed out
2015-07-14 17:26:42,070 ERROR [base_events:698] Fatal error on aiopg connection: bad state in _ready callback
connection: 
2015-07-14 17:26:58,017 ERROR [base_events:698] Fatal error on aiopg connection: bad state in _ready callback
connection: 
2015-07-14 17:27:02,606 ERROR [base_events:698] Fatal error on aiopg connection: bad state in _ready callback
connection: 
2015-07-14 17:27:03,226 ERROR [base_events:698] Fatal error on aiopg connection: bad state in _ready callback
connection: 
2015-07-14 17:27:14,691 ERROR [base_events:698] Fatal error on aiopg connection: bad state in _ready callback
connection: 
2015-07-14 18:47:51,427 ERROR [base_events:698] Fatal read error on socket transport
protocol: 
transport: 
Traceback (most recent call last):
  File "/usr/lib/python3.4/asyncio/selector_events.py", line 459, in _read_ready
    data = self._sock.recv(self.max_size)
TimeoutError: [Errno 110] Connection timed out
2015-07-14 18:50:02,499 ERROR [base_events:698] Fatal read error on socket transport
protocol: 
transport: 
Traceback (most recent call last):
  File "/usr/lib/python3.4/asyncio/selector_events.py", line 459, in _read_ready
    data = self._sock.recv(self.max_size)
TimeoutError: [Errno 110] Connection timed out
@asvetlov
Copy link
Member

Are you sure that error is generated by aiopg?
Looks like I've got error from transport but aiopg doesn't use asyncio transports at all.

P.S. Please don't use global variables, push aoipg pool into aiohttp.web.Application:

app = aiohttp.web.Application(...)
app['pgpool'] = yield from aiopg.create_pool(...)

in request handler

def handler(request):
pool = request.app['pgpool']

@nerandell
Copy link
Author

The http requests that do not depend on database work perfectly fine.
I will try your suggestion. However, why should declaring a global variable be an issue in the first place though?

@asvetlov
Copy link
Member

No, global variables are not related to your issue.
But as far as I see you exception traceback is not related to aiopg.

@bitdancer
Copy link
Contributor

I have a similar situation. The process works great for extended periods of time, but then suddenly the database connection stops working. In my log I see only the following:

2015-09-24 02:04:31,841 asyncio ERROR Fatal error on aiopg connection: bad state in _ready callback

I'm giong to be trying to add more error recovery and logging code, so if I learn anything more I'll update.

@bitdancer
Copy link
Contributor

By the way, I get 10 copies of that message in my log one right after the other. I presume it is one for each connection in the pool.

@jettify
Copy link
Member

jettify commented Sep 27, 2015

Is any way to reproduce this? We had similar issue with aiomysql due to MySQL server connection timeout, successfully fixed in aio-libs/aiomysql#27

@bitdancer
Copy link
Contributor

I don't have a reproducer yet, but the data I have so far indicates it is probably a result of a timeout on the DB connection. I'm planning to try to create a reproducer once I have more data.

@bitdancer
Copy link
Contributor

My case appears to be a consequence of #65. The reproducer in that issue produces the same kind of traceback that I'm seeing in my applicaion, which I now catch and ignore. Then 30 seconds later I get the bad state in _ready.

I'll investigate fixing this, but it will probably be a week or more before I can circle back to this problem.

@pankajnits
Copy link

@asvetlov i was able to reproduce this scenario and basically it was due to timeoutError.
i just ran my test cases on your recent commits and issue no longer exists.

but there is a suggestion https://github.com/aio-libs/aiopg/blob/master/aiopg/connection.py#L182 could you make it to execute in asyncio.sheild() and change cancel() method accordingly, atleast in this case it will ensure that future gets executed properly even if timeout or cancelledError occurs

@asvetlov
Copy link
Member

@pankajnits thanks for good news!

I don't follow your suggestion, sorry.
Cancellation is already executed under the shield.
Waiting for future object doesn't require shielding, isn't it?

@pankajnits
Copy link

@asvetlov its good that cancel is ensured , but out of curiosity i wanted to ask that what if we can ensure future is shielded irrespective of timeouts.

@pankajnits
Copy link

also will it help in solving #65

@pankajnits
Copy link

in case of waiter being timedout and scheduled for cancellation, i can see that there are queries in postgres db that are in "idle" state

@DevSpouk
Copy link

Simulate load error. The error manifests itself only when the load increases.
https://gist.github.com/DevSpouk/92a51f2bffb5f6f67352

@ludovic-gasc
Copy link

I've the same issue with a big load when I launch a benchmark on any project based on aiopg.

@ludovic-gasc
Copy link

If somebody has a solution for that, I'm interested in, I'm working for the round 12 of FrameworkBenchmarks.
If I've a solution before Jan 9, I'll push the solution for the round 12.
After, it will be the next round, except if they add more delay.

@steveholden
Copy link

steveholden commented May 27, 2016

Is this issue still present? We've been looking for an asynchronous wrapper for Postgres access, but this would probably be a show-stopper.

@jettify
Copy link
Member

jettify commented Jun 1, 2016

Looks like this issue was fixed. @asvetlov could you confirm?

@asvetlov
Copy link
Member

I believe we we have solved the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants