New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Async I/O #399

Closed
killercup opened this Issue Aug 9, 2016 · 24 comments

Comments

Projects
None yet
@killercup
Member

killercup commented Aug 9, 2016

With futures-rs and tokio the Rust ecosystem recently got a whole new chapter in the async IO story. It would be pretty great for diesel to work with this as well.

From an enduser standpoint, a straightforward API change would be to 'just' use an Async PgConnection instead of the regular PgConnection (DB specific for more efficient implementations?). All traits generic over Connection (e.g. LoadDsl) would get an associated type to determine if they are returning e.g. a QueryResult<T> or a Future<T>.

@sgrif

This comment has been minimized.

Member

sgrif commented Aug 11, 2016

Worth noting that in the async case I'd also like load and get_results to return some sort of stream instead of Future<Vec<T>>, as it frees me up to do cursory things (granted, even just a future is fine). This of course will require re-implementing the postgresql adapter to not rely on libpq (which I'm already working on for other reasons, not sure if I've spoken about it outside of the podcast). It also will not be possible with sqlite.

@chpio

This comment has been minimized.

chpio commented Aug 11, 2016

@sgrif maybe something like the "async iterator" in JS? Which would return an Iterator<Item=Future<T>>, the consumer would then call await on each future, but for this to work we would need compiler support for await. We could also make a simple helper function which would take an Iterator<Future> and a closure and iterate over the Iterator and calling the closure when the Future is ready.

@sgrif

This comment has been minimized.

Member

sgrif commented Aug 11, 2016

That's a possibility as well, but it's untrue to how I'll actually have access to the data, which is in batches. If we're doing true async IO I also will not have the length ahead of time, so it'd have to be an actual iterator not Vec<Future>, at which point why not just do a true stream?

@killercup

This comment has been minimized.

Member

killercup commented Aug 11, 2016

Please note that futures-rs also has a Stream trait!

@sgrif

This comment has been minimized.

Member

sgrif commented Aug 11, 2016

Stream<T> is to Vec<T> as Future<T> is to Result<T, _> (or I guess Option<T> in this case)

@macobo

This comment has been minimized.

macobo commented Jan 7, 2017

@sgrif off-topic, but curious as to the reasons for moving off of libpq. Could you also post a link to the podcast?

@beatgammit

This comment has been minimized.

beatgammit commented Jan 31, 2017

@macobo - I'm guessing it's The Bike Shed.

@sgrif

This comment has been minimized.

Member

sgrif commented Feb 1, 2017

Sorry that I didn't reply -- I had an unexpected baby. Yes, that is the correct link to the podcast.

@pyros2097

This comment has been minimized.

pyros2097 commented Mar 1, 2017

Well there is tokio-postgres now which you can use for your async stuff.
https://docs.rs/tokio-postgres/0.2.1/tokio_postgres/

@yazaddaruvala

This comment has been minimized.

yazaddaruvala commented Sep 27, 2017

Just want to clarify, will async make the 1.0 milestone?

@sgrif

This comment has been minimized.

Member

sgrif commented Sep 27, 2017

No.

@sgrif

This comment has been minimized.

Member

sgrif commented Jan 12, 2018

So... I've fully implemented a PoC for this twice now. I'm going to close this for the time being.

This is not a statement that async I/O will never happen, but it is a statement that it is not currently on the roadmap, and I do not consider this to be actionable at this time. Neither tokio nor futures have stable APIs at the moment (tokio in particular is supposedly going to have a major overhaul at some point).

At absolute minimum, we need to wait for things to settle down there. However, there are problems that need to be solved in those APIs as well. I found that in order to get to an API that I was happy with, I had to resort to some pretty nasty hacks which had a lot of gotchas and I'm not comfortable shipping. It seemed that ownership hits you way harder here as well. It's virtually impossible to have anything that isn't 'static. This might seem obvious or NBD, but consider for a moment that you basically never actually own the connection. Everything takes &connection (or more likely a PooledConnection<'a, T>).

With the state of things as they are today, we would also need to re-implement connection pooling to make this work. This is on top of what would already be an enormous amount of work. This is not just "make it async". When we switch to tokio, we can no longer use the client libraries provided to us by the database creators. We need to learn the wire protocol that database uses, and implement it at the TCP level.

Finally, I hear a lot of people saying that the lack of async support is the big blocker for them using Diesel, but I'm not buying it. PostgreSQL is the only backend likely to get an async driver any time soon (I do not know the MySQL wire protocol, and it is mostly undocumented). The tokio-postgres crate, which people would likely be using instead if async was really the blocker for them, has had 300 downloads in the past 3 months. I'm not saying that async isn't a thing worth doing, but I think people are perhaps overstating how important it is to them.

Anyway with all that said, I'm going to close this issue. Open issues in this repo should represent something that is actionable, and that you could reasonably open a pull request to fix. I do not consider this to be actionable with the state of async in Rust today. I will keep my eye on that situation, and if that changes, I will open a new issue for this. But you should not expect async Diesel in the near future.

@sgrif sgrif closed this Jan 12, 2018

@yazaddaruvala

This comment has been minimized.

yazaddaruvala commented Jan 12, 2018

I understand your frustrations with the current API, and understand you don't want to build async support just yet, only to re-do it later. All of that makes perfect sense, and this is your project so absolutely do what you feel is best.

I just want to point out that your rationale for "async isn't a big deal" is full of selection bias (i.e. inaccurate), and you're doing yourself a disservice by believing those stats.

Finally, I hear a lot of people saying that the lack of async support is the big blocker for them using Diesel, but I'm not buying it.

You're using the stats for tokio-postgres to infer the popularity of async needs and since it is not popular you infer that async is not a blocker.

However, this is inaccurate, people who view async as a blocker are not using sync-Rust instead. They are simply not using Rust.

At least for me, I do not use Rust for any web based applications because, while hyper is pretty good, there is no great ecosystem around async in Rust yet.

@sgrif

This comment has been minimized.

Member

sgrif commented Jan 12, 2018

I want to re-iterate: I am not trying to say that "async isn't a big deal". I am trying to say it is not currently a priority, and it is not feasible for us to attempt to accomplish it at this point in time.

people who view async as a blocker are not using sync-Rust instead. They are simply not using Rust.

While you're probably right, and it may not have been a great stat to bring up, I'm not sure I agree with "if I can't use Diesel I'm just not going to use Rust" being the norm in this case.

I'm also not trying to use the stats for tokio-postgres to infer the popularity of async, I'm using to to give some number to the only async alternative available for the backend that we are most likely to support. Presumably there are a decent number of people who want all of:

  • Rust
  • PostgreSQL
  • Async I/O

This is what those people would be using.

But again, I'm probably calling too much attention to what was by far the smallest part of my reasoning for closing this (which I should also mention was discussed with the rest of the core team). This is mostly about it not being something we can do right now, which is why I wanted to close this issue. If circumstances change and async-diesel becomes feasible in the future, we'll open a new issue. But I want to avoid keeping issues open which cannot reasonably be closed with a pull request (this would of course be a very large pull request)

@batisteo

This comment has been minimized.

batisteo commented Jan 16, 2018

Just a heads up on Pleingres, “An asynchronous, Tokio-based, 100% Rust postgres driver”, as mentioned in the Pijul blog.
It might be an inspiration or a base for the future work. Or not.

@beatgammit

This comment has been minimized.

beatgammit commented Jan 21, 2018

I don't know if this exists yet, but perhaps documentation (e.g. an example on the website) about how to integrate Diesel into an async project without blocking the whole event loop. If this example does connection pooling, then this seems like a reasonable solution until the async ecosystem in Rust stabilizes some.

I'm using Diesel on a site that I hope will have high volume database operations (mostly simple CRUD, but relatively high volume, as in potentially hundreds or even thousands more-or-less concurrently). I'll be figuring out how to do pooling properly, but I honestly wouldn't go through this effort if I wasn't dead-set on using Rust and another solution has an out-of-the-box solution (e.g. Go is "good enough").

And honestly, an async interface into a thread pool of synchronous connections is probably better for a high traffic site anyway, which is probably the bulk of the "async or bust" crowd. That's not to say that async in Diesel isn't useful, just that we can probably solve the problem for most people with documentation.

@fluxxu

This comment has been minimized.

Contributor

fluxxu commented Jan 25, 2018

This method is considered as a workaround

Just use
https://github.com/alexcrichton/futures-rs/tree/master/futures-cpupool
to wrap diesel's operations, then you can use diesel with futures/tokio nicely.
Define something like

use futures::prelude::*;
use futures_cpupool::CpuPool;
use diesel;

pub type Conn = diesel::pg::PgConnection;

pub fn exec_async<T, E, F, R>(&self, f: F) -> impl Future<Item = T, Error = E>
  where
    T: Send + 'static,
    E: From<::diesel::result::Error> + Send + 'static,
    F: FnOnce(&Conn) -> R + Send + 'static,
    R: IntoFuture<Item = T, Error = E> + Send + 'static,
    <R as IntoFuture>::Future: Send,
  {
    lazy_static! {
      static ref THREAD_POOL: CpuPool = {
        CpuPool::new_num_cpus()
      };
    }

    let pool = /* clone a ref of diesel's connection pool */;
    THREAD_POOL.spawn_fn(move || {
      pool
        .get()
        .map_err(|err| E::from(err))
        .map(|conn| f(&conn))
        .into_future()
        .and_then(|f| f)
    })
  }

Then you can

exec_async(|conn| {
  diesel::insert_into(??).values(??).execute(conn)
}).and_then(...)
@ivanovaleksey

This comment has been minimized.

Contributor

ivanovaleksey commented Mar 26, 2018

@vorot93 @friktor @lnicola is the example above wrong?

@chpio

This comment has been minimized.

chpio commented Mar 26, 2018

@ivanovaleksey

  1. it's not real async code, we are talking about handling the whole connection asynchronously, not running blocking code on a threadpool
  2. you can't do more concurrent queries than there are cpu cores (CpuPool::new_num_cpus())

not saying it's "wrong"

@lnicola

This comment has been minimized.

lnicola commented Mar 26, 2018

@ivanovaleksey I can't say it's correct, as I didn't test it (e.g. I'm not sure about the trait bounds). In theory it should work. However,

  • it won't work with the async database drivers (I know there are a couple of implementations)
  • it's rather verbose
  • CpuPool is fixed-size; in general I'd prefer a more flexible thread pool
  • after the tokio changes that are coming soon, it will look a bit different (but still with a fixed thread pool)

So it works, but I'm not a fan of that approach.

@vorot93

This comment has been minimized.

vorot93 commented Mar 27, 2018

@ivanovaleksey Pretty much what @chpio said. This code simply masks the blocking operations behind a thread pool. This doesn't help in the high load use case where large number of incoming DB queries will cause unacceptable delays for each of them.

@hayd

This comment has been minimized.

hayd commented Jun 8, 2018

See also actix-diesel example https://github.com/actix/examples/blob/f8e3570bd16bcf816070af20f210ce1b4f2fca69/diesel/src/main.rs#L64-L70

Perhaps offering a canonical pool implementation might be useful, that way diesel could have async/futures API without using async drivers (yet).

This sync driver / async API is how slick (scala) works https://stackoverflow.com/a/29597868/1240268 and might suffice to appease the "people who view async as a blocker" (at least some of them).


Aside:

CpuPool is fixed-size; in general I'd prefer a more flexible thread pool

IME it's desirable to have a fixed size (with potentially multiple pools), that way all connections are made on startup... and generally your db server won't allow an unlimited number of connections anyway.

@chpio

This comment has been minimized.

chpio commented Jun 8, 2018

@a3kov

This comment has been minimized.

a3kov commented Sep 10, 2018

This article by the SQLAlchemy author is a very good read on the topic (while it's about Python many facts listed there still hold in the Rust world):
http://techspot.zzzeek.org/2015/02/15/asynchronous-python-and-databases/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment