Skip to content
This repository has been archived by the owner on Mar 15, 2020. It is now read-only.

Optional Orator ORM for database migrations #4

Closed
florimondmanca opened this issue Nov 20, 2018 · 4 comments
Closed

Optional Orator ORM for database migrations #4

florimondmanca opened this issue Nov 20, 2018 · 4 comments
Labels
discussing This requires discussion and feedback enhancement New feature or improvement of an existing one

Comments

@florimondmanca
Copy link
Member

florimondmanca commented Nov 20, 2018

A lot of web frameworks (except Django) give you complete control over how to deal with the database — most likely you'll resort to SQLAlchemy and Alembic.

But setting up and using a database is one of the most common tasks web developers have to deal with. I've always needed one at some point for every single web project I've worked on.

I've used SQLAlchemy/Alembic, but it was a pain because I had to be extremely careful about integrating them in my project and workflow. Really not straight-forward.

So the idea is to have an officially supported ORM into Bocadillo (although the ORM would most likely be 3rd-party), but to make it optional. We could provide an extension (in the form of a setup.py extra) that would allow to install any dependencies as pip install bocadillo[db]. See this StackOverflow question for more info about extras.

One issue, though, is that most ORMs out there are synchronous, whereas one of the core promises of Bocadillo is the ability to go fully async. Plus, database queries are one area where asynchronous I/O really helps with performance and multi-request handling.

That said, there is hope in the name of Tortoise. It is an async ORM supporting the three major DB backends (sqlite, mysql, postgres). However, it is still in alpha (pre-v1) and most likely won't be tackling migrations before a few months. And I feel like migration management is a key part of what Bocadillo should offer.

I already began working on this on feature/db, experimenting with Orator ORM (a synchronous ORM) to get a feel for how databases could be optionally incorporated within a Bocadillo app. I'm not fond of the result because it seems to expose Orator too much. I even had issues with the DB configuration because of how Orator works.

I'm raising the issue because I'm still not sure which path to go. If you have ideas or suggestions on this topic (like recommendations for ORMs w/ async), feel free to comment! 🎉

@florimondmanca florimondmanca added discussing This requires discussion and feedback Priority: Medium enhancement New feature or improvement of an existing one labels Nov 20, 2018
@florimondmanca
Copy link
Member Author

florimondmanca commented Nov 20, 2018

Hey @alin23, I'm porting our discussion from #3 here.

[…] A sync ORM is almost impossible to use in the context of an async/single-thread framework because every database call would halt every request handling until the call finishes. Of course, the db calls can be made in separate threads but as you probably know, Python doesn't have real parallelization when it comes to threads.

Just interested in the almost impossible here: technically, it is still possible to use a sync ORM, right? The only thing is that, as you said, DB queries would block the thread, basically ruining the purpose of using an async framework like Bocadillo.

Oh man, finally an ORM that uses asyncpg behind! I wish tortoise existed half a year ago when I had to convert all the db interactions in Noiseblend to async.

I suppose a possible (temporary) solution waiting for async ORMs like tortoise to come along, would be to provide guidelines for using an async database client (like asyncpg) in a Bocadillo app. That's a low-level approach (not sure I'd take it myself) that requires more advanced knowledge of the underlying database (e.g. it would require setting up tables directly via Postgres) — but it could do the trick for more experienced users. Do you think that might be useful in the meantime, @alin23?

@alin23
Copy link
Contributor

alin23 commented Nov 20, 2018

Yes, you're right. It's still possible to do synchronous db calls inside an async context, but as you already understood, it blocks the whole server until the db responds. And when the db is on a different machine than the server, this is a big issue.

I think providing guidelines until the db layer is sorted out is a perfect idea! Indeed, it is not an easy start using solutions like asyncpg. 99% of the time you're in SQL land creating the database structure, writing migrations and crafting complex queries. This gives you great power and very good performance, but it's not for inexperienced users trying to create a simple web app with some persistence.

The way I did it:

  • Run a sync ORM to create the database and run the migrations (Orator sounds like a great choice and we could use this PR for this step)
  • Close all the db pools/connections the sync ORM has opened
  • Run the async server
  • Use asyncpg or whatever async lib the user wants for queries

This doesn't spare you from writing SQL, but should dramatically reduce the complexity and the amount of code that would need to be converted when tortoise becomes available.

Another idea would be to use GraphQL. For example, for Postgres there is PostGraphile which is a server that connects to your Postgres DB and using introspection, exposes a GraphQL endpoint for querying/modifying whatever you want in the database.
This gets rid completely of SQL queries and since GraphQL queries are simple POST requests, I'm pretty sure there should already be an async Python GraphQL client.

There are some caveats though:

  1. You have to run another server (the PostGraphile server) and for some people this means another running machine
  2. If the PostGraphile server runs on another machine, the latency for db calls gets even higher
  3. I've only been talking about Postgres, I don't know if there is another solution for MySQL or SQLite.

Anyway, providing guidelines is something that should be done if you want to increase adoption and I'd be glad to help you if we settle on a working idea ^_^

@florimondmanca
Copy link
Member Author

A lot of great ideas!

I'm not comfortable taking the PostGraphile path, though. It does looke like an awesome tool, but as you said it's specific to PostgreSQL, so I don't think we'd be able to provide a consistent Developer Experience across database systems.
Also, GraphQL is not in the scope of Bocadillo just yet. (And if we want to provide GraphQL in the future, I think we'd be able to integrate the database layer with it using Starlette's integration with graphene.)

A sync ORM for migrations + async library for execution sounds like a solid temporary solutions. Still requires to write SQL, but only for queries, so I guess that's manageable.

As you said, we've only talked about Postgres up to now but I think it's important we support all major relational databases, i.e. SQLite (lightweight and portable, good for beginners), MySQL and PostgreSQL.

I've taken a look at async db clients for other database systems, and I've found:

In terms of UX, not providing any utilities whatsoever seems a bit harsh, though. That said, we can't afford writing a whole abstraction layer either — because that'd basically re-implement Tortoise's backends package.

So I suggest, as you said, we keep this PR for the Orator migrations part, and we settle to write backend-specific guides on how to use the async library of choice. 👍 Issue for the guides upcoming!

@florimondmanca florimondmanca changed the title Optional databases and ORM Optional Orator ORM for database migrations Dec 15, 2018
@florimondmanca
Copy link
Member Author

florimondmanca commented Dec 15, 2018

After giving this some thought, I'm closing the issue for now. In the wake of the various options for integrating an async database layer into Bocadillo, I don't think Orator is a very good choice as it seems too heavy to be recommended by default, let aside only for migrations.

Take, for example, Starlette's built-in DatabaseMiddleware: it is based on the familiar SQLAlchemy, and it turns out we might actually have it integrated into Bocadillo already because it is tied to the Starlette Request object (which we're just aliasing to for now).

(Also, they recommend Alembic for migrations, which makes me re-think why I ditched it as an option in the first place.)

Anyway, the point is — we'll need to give this a bit more thought. As a means to gain experience, I suppose we should focus on writing how-to guides for using existing solutions (be it Tortoise, Starlette or plain Python clients) before considering integrating with any solution by default.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
discussing This requires discussion and feedback enhancement New feature or improvement of an existing one
Projects
None yet
Development

No branches or pull requests

2 participants