Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow asynchronous queries #263

Closed
sametmax opened this issue Nov 28, 2013 · 23 comments
Closed

Allow asynchronous queries #263

sametmax opened this issue Nov 28, 2013 · 23 comments

Comments

@sametmax
Copy link

Hard one, as peewee has been built on top of synchronous DB driver up untill then, but with Python 3.4 comming and shipping with asyncio + yield from, this can be an interesting for Python 3 users.

@coleifer
Copy link
Owner

This is an interesting request, but it's so big I'm not sure how to address it. Since I intend to keep compatibility with 2.6 for quite some time, I think it's not really possible. Also, my experience with "async" in python has been limited to gevent and I don't feel very comfortable with the new APIs (yield from, Task, coroutine, etc).

Do you have any more thoughts or information to add?

@sametmax
Copy link
Author

I don't, and I'm not expecting peewee to change any time soon. I am
very aware that these things are hard and take time, therefor I'm
opening this ticket now so the communuty can start the process slowly
but early.

I am myself not yet comfortable with asyncio but since I'm forced to
start thinking about it in my own work, if I got any idea popping, I'll
come back here with a proposal.

Le ven. 29 nov. 2013 05:42:40 CET, Charles Leifer a écrit :

This is an interesting request, but it's so big I'm not sure how to
address it. Since I intend to keep compatibility with 2.6 for quite
some time, I think it's not really possible. My experience with
"async" in python has been limited to gevent and I also don't feel
very comfortable with the new APIs (|yield from|, |Task|, |coroutine|,
etc).

Do you have any more thoughts or information to add?


Reply to this email directly or view it on GitHub
#263 (comment).

@coleifer
Copy link
Owner

Thanks for the message @sametmax. Based on my understanding of the new asyncio library, I'm going to go out on a limb here and say that I don't think peewee will be adding support.

  • A new "asyncio-aware" driver would need to be written for each database.
  • All code that connects to the db and makes queries in peewee would need to be rewritten, i.e. yield from db.connect(...), yield from db.execute() and yield from cursor.fetchXXX().
  • Any application using peewee would need to be rewritten where it makes queries: yield from Model.select().where(...)

Have a look at the examples to see how the structure of the application changes when you start using the yield from syntax. Using callbacks is no better, IMO.

@sametmax
Copy link
Author

sametmax commented Dec 1, 2013

I understand. Best solution would be to have the common code, such as
the query builder, in one module (well, wrapper, since peewee is a one
file lib), the sync API code in another module, and the async API code
in a third module.

And allow something like :

from peewee import blabla # sync
from peewee.async import blabla # async

Or make peewee decoupled enough so that it's possible to build an async
lib on top of it wall async_peewee.

But I get it, it's a lot of work, and when you already have coded the
ORM, how to have the time to do this ? I'm not even offering to do it
because I'm well aware it's so huge.

Anyway, thanks for answering.

Le ven. 29 nov. 2013 17:20:19 CET, Charles Leifer a écrit :

Thanks for the message @sametmax https://github.com/sametmax. Based
on my understanding of the new asyncio library, I'm going to go out on
a limb here and say that I don't think peewee will be adding support.

  • A new "asyncio-aware" driver would need to be written for each
    database.
  • All code that connects to the db and makes queries in peewee would
    need to be rewritten, i.e. |yield from db.connect(...)|, |yield
    from db.execute()| and |yield from cursor.fetchXXX()|.
  • Any application using peewee would need to be rewritten where it
    makes queries: |yield from Model.select().where(...)|

Have a look at the examples
https://code.google.com/p/tulip/source/browse/examples/fetch1.py to
see how the structure of the application changes when you start using
the |yield from| syntax. Using callbacks is no better, IMO.


Reply to this email directly or view it on GitHub
#263 (comment).

@coleifer coleifer closed this as completed Dec 2, 2013
@csytan
Copy link

csytan commented Apr 4, 2014

Just to add to this discussion, I've had positive experiences with the way Guido's NDB handles async operations.

If in the future you are ever in need of any ideas, here are the docs:
https://developers.google.com/appengine/docs/python/ndb/async

@soasme
Copy link
Contributor

soasme commented Apr 4, 2014

We can use Trollius for Py2/3 when needed. Much like asyncio, but it has friendly grammer for Py2(yield From(do_something()), raise Return(value)).

@coleifer
Copy link
Owner

coleifer commented Apr 4, 2014

Thanks a bunch for the links - I will read up on them.

@coleifer coleifer reopened this Apr 4, 2014
@coleifer coleifer closed this as completed Apr 5, 2014
@cpbotha
Copy link

cpbotha commented May 13, 2014

I've just tried a flask+peewee app with uwsgi+gevent+psycogreen.gevent.patch_psycopg -- in theory, this should patch psycopg so that calling code (such as peewee) can use it as if it were still blocking, see https://bitbucket.org/dvarrazzo/psycogreen

When I try this, peewee gives me:

peewee.ProgrammingError: execute cannot be used while an asynchronous query is underway

Am I expecting too much?

@coleifer
Copy link
Owner

Strange...I've used psycopg2 with gevent and not had any issues. The code I used was similar to the monkeypatch you linked up @cpbotha . One thing you might check is that the gevent monkeypatch needs to be the first thing that happens.

So the entry-point to your application would look like this:

from gevent import monkey
monkey.patch_all()
from psycopg2_green_monkeypatch import whatever
whatever()

# here begins your actual code...

@cpbotha
Copy link

cpbotha commented May 13, 2014

Thanks for helping me with this!

I already have in my wsgi.py (entry point for uwsgi):

import gevent
import gevent.monkey
gevent.monkey.patch_all()

import psycogreen.gevent
psycogreen.gevent.patch_psycopg()

import main;main.init();from cnids.app import fapp

With "normal" DB access (RESTful via browser), I see no issues. However, I do see the ProgrammingError exception when I do ab -c 3 -n 10 URL (3 concurrent requesters)

Anything else I could look at? (this is with peewee 2.2.3)

@coleifer
Copy link
Owner

Strange... I'm not sure what might be happening.

@cpbotha
Copy link

cpbotha commented May 13, 2014

Seems it has happened before: https://bitbucket.org/dvarrazzo/psycogreen-hg/issue/1/databaseerror-execute-used-with

The explanation offered is issue would occur when two queries are done using two different cursors on the same database connection. Does that make sense?

The reporter of that bug then wrote a blog post with a solution http://www.manasupo.com/2012/03/geventpsycopg2-execute-cannot-be-used.html where the simply do the psycopg2 monkey-patching before the fork.

I don't understand why that fixed the problem in their case. In my case, the monkey patching is done before forking, if I may judge by the uwsgi log file (my "MONKEY PATCHING YEAH!" output appears once, before the the three worker processes report for duty):

mapped 4368256 bytes (4265 KB) for 300 cores
*** Operational MODE: preforking+async ***
MONKEY PATCHING YEAH!
Database migration not required.
WSGI app 0 (mountpoint='') ready in 0 seconds on interpreter 0x10035d0 pid: 30745 (default app)
*** uWSGI is running in multiple interpreter mode ***
spawned uWSGI master process (pid: 30745)
spawned uWSGI worker 1 (pid: 30751, cores: 100)
spawned uWSGI worker 2 (pid: 30752, cores: 100)
spawned uWSGI worker 3 (pid: 30753, cores: 100)
*** running gevent loop engine [addr:0x485620] ***

I'm putting this here as a log for future travellers. Any tips would be welcome of course!

@cpbotha
Copy link

cpbotha commented May 13, 2014

In Momoko (Tornado wrapping of psycopg2), they had to build in more explicit handling of busy database connections to work around this issue: Tsumanga-Studios/momoko@e4752c9

This was supposed to be a short experiment to benchmark multi-process uwsgi+flask-peewee against multi-process+asnyc+flask-peewee. I think I should let it go, as they say. :)

Thanks in any case!

@coleifer
Copy link
Owner

The explanation offered is issue would occur when two queries are done using two different cursors on the same database connection. Does that make sense?

Are you using threadlocals=True with your database connection? e.g.

db = PostgresqlDatabase('foo', threadlocals=True)

@cpbotha
Copy link

cpbotha commented May 13, 2014

If I could fly over there and buy you a beer I would!!

You even documented it: https://github.com/coleifer/flask-peewee/blob/master/docs/gevent.rst

I don't understand why my searching didn't take me there, but I'm happy that you've solved it! (in my app.py, I added a 'threalocals' key to the DATABASE configuration dictionary.

What impact does that setting have in a non-greened environment?

Thanks again,
Charl

@coleifer
Copy link
Owner

In a non-greened environment if you were using a multi-threaded WSGI server, then your connections would be opened per-thread.

@rudyryk
Copy link

rudyryk commented Sep 26, 2014

I think we just need a separate package, let's say 'aiopeewee', providing asyncio interface. We're using peewee with Tornado+asyncio, so I think we'll start porting it shortly. Meanwhile we use 'run_in_executor' for performing slow requests asynchronously.

@rudyryk
Copy link

rudyryk commented Sep 26, 2014

And I'm not sure yet do we actually need porting or just a couple of wrappers and asyncio powered database backend classes.

@sametmax
Copy link
Author

You will need more than that actually because the nature of the API
itself is synchronous.

E.G : when you access an attribute in peewee, it can fire an new query:
product.shop.name will make a query to get the shop by default.

You'll probably need to return a promise for very thing that should
return a an object or queryset.

Now if you do that in a template :

{{ product.shop.name }}

Things get funny, because most template don't have a way to handle
promises (no callback, no yield, etc).

Plus, there are not just promises, but also deferred and futures,
depending of the framework. So you'll need a different backup for the
async event loop AND for the result wrapper.

It's a lot of work, it's hard. But if you do it, peewee will be actually
the ONLY Python ORM dealing with these issues properly. And hence, I
garantie i will be used much, much more. Technically, you'll be front
page in python subreddit and hackernews the day of the release.
Everybody is waiting for something like this, because right now,
everybody uses hacks (defer to threads, mongodb motor, gevent monkey
patching...) and big solutions like Django ORM or SQLAlchemy stated
officially they didn't want to do it in the current context.

Unfortunatly, I don't know any way to do async with most DB without
using a compiled drivers for it. sqlite3 stdlib driver is syncronous if
I recall, so you won't have the "wow out of the box experience you
should have.

I would make it something la peewee.async, instead of a separate lib. It
would make adoption much easier. But maybe that's just me.

Le 26/09/2014 11:18, Alexey a écrit :

And I'm not sure yet do we actually need porting or just a couple of
wrappers and asyncio powered database backend classes.


Reply to this email directly or view it on GitHub
#263 (comment).

@rudyryk
Copy link

rudyryk commented Sep 28, 2014

Hi everyone :) I've just published a kind of working proto: https://github.com/05bit/peewee-async

I think basic async support may also be useful, we can deal with related objects by sending explicit prefetch queries. And yes, async queries are difficult to support in templates (without rewriting template engine), so I think it's more suitable for API services where we generally just need to serialise to JSON.

@rudyryk
Copy link

rudyryk commented Oct 11, 2014

Just published alpha v0.0.2 on PyPi and here's the docs: https://python-aiopeewee.readthedocs.org Interface seems working and simple one and I think it's now rather close to stable version. But internals don't really shine, I've started issue to discuss better integration with peewee: 05bit/peewee-async#1

@Serkan-devel
Copy link

Has anything happened since then?

@coleifer
Copy link
Owner

coleifer commented Aug 1, 2019

Use gevent

Repository owner deleted a comment from lucasgadams Nov 22, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants