Stuck in "Timed out in bb8" #67

m9xiuz · 2020-05-19T07:42:27Z

I have a web api application that uses bb8-postgres, tokio_postgres and hyper. When I started to do performance tests, I noticed that if there're too many requests, it shows this error "Timed out in bb8" and it doesn't go away, only restart helps to get rid of it.
I do create a single instance of bb8::Pool and clone it for each request. Then, I do db_pool.run(|conn| async {Ok((<some_func_that_does_queries_using_conn>.await, conn))}).await.
I'm not sure if it's a bug or I'm doing something wrong. Could you help me, please?

The text was updated successfully, but these errors were encountered:

djc · 2020-05-19T07:51:49Z

I'm afraid there are still ways to "leak" connections out of the pool. Can you tell me how you have configured/instantiated the Pool?

m9xiuz · 2020-05-19T07:54:44Z

pub type Pool = bb8::Pool<bb8_postgres::PostgresConnectionManager<tokio_postgres::NoTls>>;
pub type Conn = tokio_postgres::Client;

pub async fn create_pool(settings: &Settings) -> Result<Pool, tokio_postgres::Error> {
    let db_settings = &settings.database;

    let pg_mgr = PostgresConnectionManager::new_from_stringlike(
        format!(
            "postgresql://{}:{}@{}:{}/{}",
            &db_settings.user, &db_settings.pass,
            &db_settings.host, &db_settings.port,
            &db_settings.name,
        ),
        tokio_postgres::NoTls,
    ).map_err(|err| {
        error!("{}", err);
    }).unwrap();

    Pool::builder().build(pg_mgr).await
}

m9xiuz · 2020-05-19T08:05:22Z

and then I pass it hyper like this:

    let db_pool = create_pool(settings).await?;
    let addr = (host, port).into();

    let make_service = make_service_fn(move |_| {
        let db_pool = db_pool.clone();
        async move {
            Ok::<_, Error>(service_fn(move |req| {
                let db_pool = db_pool.clone();
                
                async move {
                    handle_request(req, db_pool).await
                }
            }))
        }
    });

m9xiuz · 2020-05-19T08:37:38Z

In Cargo.toml:
bb8-postgres = "0.4"
and I use re-exported bb8 and tokio-postgres from https://docs.rs/bb8-postgres/0.4.0/bb8_postgres/#reexports

djc · 2020-05-21T12:06:48Z

Can you try with #68? I think this should fix the problem.

m9xiuz · 2020-05-21T15:14:47Z

I've changed Cargo.toml:
bb8-postgres = { git = "https://github.com/khuey/bb8", branch = "inner-guard" }
and rebuilt the app, and it didn't help.
Btw, I've already switched to another connection pool library and it works great.

fakeshadow · 2020-05-21T17:39:50Z

It could be because the unfairness of Mutex. Certain threads would acquire the lock quickly while others starving. It's only noticeable when the lock is under very heavy load.

djc · 2020-05-21T20:35:01Z

Interesting. I pushed a change in 8ec0c0a to switch to tokio's Mutex, which is fair.

m9xiuz · 2020-05-21T21:15:27Z

This commit also didn't help solve the problem. With this connection pool https://github.com/bikeshedder/deadpool everything is ok. Maybe you could check how it's implemented and do the same thing, but I suspect its implementation is fundamentally different.

djc · 2020-05-21T21:22:41Z

Yeah, deadpool uses a very different approach. I have one more idea which is in progress, would be happy if you could test that as soon as I finish it.

m9xiuz · 2020-05-22T10:57:31Z

Ok, I'm still here.

djc · 2020-05-22T14:51:18Z

I just pushed some more changes to the inner-guard branch. If you have a chance to test those, that would be great!

m9xiuz · 2020-05-22T15:31:22Z

No, it doesn't fix the problem. The error message doesn't appear now, requests just hang forever.

djc · 2020-05-22T15:34:51Z

So how are you stress-testing your project?

m9xiuz · 2020-05-22T16:14:05Z

I used jmeter. But I noticed that this problem appears even if I simply take the url of the slowest GET request (~100ms), put it into Firefox's address bar and press F5 frequently and too many times (it takes ~10 seconds for the problem to appear.) So your changes I tested this way and didn't use jmeter tests.
After your last change, firsts requests are working fine, but eventually it hangs forever, I cannot send any request anymore, neither from the browser, nor from curl. But the error message doesn't appear. It just behaves as if all timeouts are infinite and that's why it doesn't complain about timeout errors anymore, but the problem is there anyway.

sync · 2020-05-23T06:31:22Z

i was experiencing the same issue when testing pooling with redis, would get stuck in the Timed out in bb8, when refreshing my browser a couple of time, using warp and websocket

fakeshadow · 2020-05-26T10:09:58Z

Interesting. I pushed a change in 8ec0c0a to switch to tokio's Mutex, which is fair.

I was just guessing so I could be wrong. Altough tokio's mutex is backed by a FIFO linked list which makes it fair but it also make the connection returning to pool have a very long wait for the lock or even a dead lock. I imagine a lock where it's fair and can also give returning connection a highest priority in the wait list would be the best for bb8.

fggarcia · 2020-10-21T22:43:59Z

Hi @djc do you have any progress about this issue? I know cdrs use bb8 as its connection pool to cassandra.

djc · 2020-10-23T12:00:17Z

I have a branch in progress, but haven't been able to spend much time on it recently. I'll see if I can spend some time on it soon.

djc · 2020-11-05T10:20:08Z

I've spent a bunch of time over the past days to improve the internals, and I think issues like this should be fixed. If you're still interested, please give current master a shot and let me know if it improves things for you.

fggarcia · 2020-11-05T18:06:35Z

Hi @djc thank you so much. I read in another that someone has tried master version and issue is solved. Do you plan to release a new version as 0.4.x?

djc · 2020-11-05T18:12:13Z

No, in order to provide more robustness I've had to make some API changes. In particular, the run() API has been removed, and ManageConnection::is_valid() now takes a &mut PooledConnection. So these changes will be released as 0.5, though I was hoping to gather some feedback before I release it.

I'll likely also release the 0.6 version soon after, which will rely on tokio 0.3. That will not come with redis support until the redis crate is also updated to tokio 0.3, though.

This is just my thinking, any feedback on what people are looking for is much appreciated.

djc · 2020-11-15T22:03:41Z

0.5.0 has been released, so I'm going to close this issue for now. If you still see issues, please open a new issue!

lassipulkkinen · 2020-12-10T16:47:08Z

This seems to be reproducible (again?) in 0.6.0. To clarify, I'm using bb8-postgres. This consistently happens when I stress test my server using wrk with a large number of connections.

djc · 2020-12-10T21:18:56Z

Can you say a bit more about how you've configured the pool and how you use it? Just pool.get()?

lassipulkkinen · 2020-12-10T23:26:52Z

Just pool.get(). The pool is configured with a maximum size of 20, and default values for everything else.

I made a wrapper around the future returned by Pool::get that logs when it's polled or dropped. Looks like some of those futures were being dropped before they completed, i.e. Hyper was canceling some request handlers early. Therefore this very likely has to do with bb8 not handling canceling properly. (EDIT: the earlier fixes seem to be related to this as well.)

As for why the handlers were being canceled, my best guess is that at the end of the benchmark wrk forcefully closed all the connections that were still waiting for a response, and Hyper decided that there was no reason to wait for their handlers to complete.

lassipulkkinen · 2020-12-12T20:42:27Z

This minimally reproduces the bug:

use std::{convert::Infallible, future::Future, task::Poll};

use async_trait::async_trait;
use futures::future::poll_fn;

struct ConnectionManager;
struct Connection;

impl bb8::ManageConnection for ConnectionManager {
    type Connection = Connection;
    type Error = Infallible;

    async fn connect(&self) -> Result<Self::Connection, Self::Error> {
        Ok(Connection)
    }

    async fn is_valid(&self, _: &mut bb8::PooledConnection<'_, Self>) -> Result<(), Self::Error> {
        Ok(())
    }

    fn has_broken(&self, _: &mut Self::Connection) -> bool {
        false
    }
}

// Works with flavor = "current_thread", too.
#[tokio::main]
async fn main() {
    let pool = bb8::Pool::builder().max_size(20).build(ConnectionManager).await.unwrap();

    let mut connections = Vec::new();

    // With flavor = "current_thread" this issue never occurs if the number of connections acquired
    // here is less then the pool size. With flavor = "multi_thread", however, this still
    // *sometimes* occurs as long as at least one connection is acquired.
    for _ in 0..20u32 {
        connections.push(pool.get().await);
    }

    let mut futures = Vec::new();

    for _ in 0..20u32 {
        futures.push(Box::pin(pool.get()));

        // Poll the future once.
        poll_fn(|context| {
            let _ = futures.last_mut().unwrap().as_mut().poll(context);
            Poll::Ready(())
        }).await;
    }

    // The order in which these are dropped is important, and the futures need to be dropped, not awaited.
    drop(connections);
    drop(futures);

    pool.get().await.unwrap(); // error!
}

djc · 2020-12-12T20:46:29Z

That is awesome, thanks! I'll check it out ASAP.

djc · 2020-12-21T20:40:04Z

@lassipulkkinen #91 fixes your minimal reproduction. Can you check if it fixes your actual application?

lassipulkkinen · 2020-12-22T07:33:18Z

@lassipulkkinen #91 fixes your minimal reproduction. Can you check if it fixes your actual application?

That fixed it, thanks!

djc · 2020-12-22T20:19:54Z

I released this fix as 0.5.2 (tokio 0.2) and 0.6.2 (tokio 0.3), thanks for the quick feedback!

… (#149) Co-authored-by: Danilo Cianfrone <danilocianfr@gmail.com>

fggarcia mentioned this issue Jun 23, 2020

General("Timed out in bb8") AlexPikalov/cdrs#342

Closed

pjenvey mentioned this issue Aug 20, 2020

Switch spanner's db pool to deadpool mozilla-services/syncstorage-rs#794

Closed

djc mentioned this issue Nov 5, 2020

stucked server #84

Closed

djc closed this as completed Nov 15, 2020

to266 added a commit to to266/eventually-rs that referenced this issue Dec 21, 2020

Use bb8 0.5 which should help with djc/bb8#67

6930ba4

to266 mentioned this issue Dec 21, 2020

Upgrade bb8 get-eventually/eventually-rs#149

Merged

ar3s3ru added a commit to get-eventually/eventually-rs that referenced this issue Dec 27, 2020

fix(eventually-postgres): use bb8 0.5 which should help with djc/bb8#67…

fcf705e

… (#149) Co-authored-by: Danilo Cianfrone <danilocianfr@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stuck in "Timed out in bb8" #67

Stuck in "Timed out in bb8" #67

m9xiuz commented May 19, 2020

djc commented May 19, 2020

m9xiuz commented May 19, 2020

m9xiuz commented May 19, 2020

m9xiuz commented May 19, 2020 •

edited

Loading

djc commented May 21, 2020

m9xiuz commented May 21, 2020 •

edited

Loading

fakeshadow commented May 21, 2020 •

edited

Loading

djc commented May 21, 2020

m9xiuz commented May 21, 2020

djc commented May 21, 2020

m9xiuz commented May 22, 2020

djc commented May 22, 2020

m9xiuz commented May 22, 2020

djc commented May 22, 2020

m9xiuz commented May 22, 2020 •

edited

Loading

sync commented May 23, 2020

fakeshadow commented May 26, 2020

fggarcia commented Oct 21, 2020

djc commented Oct 23, 2020

djc commented Nov 5, 2020

fggarcia commented Nov 5, 2020

djc commented Nov 5, 2020

djc commented Nov 15, 2020

lassipulkkinen commented Dec 10, 2020

djc commented Dec 10, 2020

lassipulkkinen commented Dec 10, 2020 •

edited

Loading

lassipulkkinen commented Dec 12, 2020 •

edited

Loading

djc commented Dec 12, 2020

djc commented Dec 21, 2020

lassipulkkinen commented Dec 22, 2020

djc commented Dec 22, 2020

Stuck in "Timed out in bb8" #67

Stuck in "Timed out in bb8" #67

Comments

m9xiuz commented May 19, 2020

djc commented May 19, 2020

m9xiuz commented May 19, 2020

m9xiuz commented May 19, 2020

m9xiuz commented May 19, 2020 • edited Loading

djc commented May 21, 2020

m9xiuz commented May 21, 2020 • edited Loading

fakeshadow commented May 21, 2020 • edited Loading

djc commented May 21, 2020

m9xiuz commented May 21, 2020

djc commented May 21, 2020

m9xiuz commented May 22, 2020

djc commented May 22, 2020

m9xiuz commented May 22, 2020

djc commented May 22, 2020

m9xiuz commented May 22, 2020 • edited Loading

sync commented May 23, 2020

fakeshadow commented May 26, 2020

fggarcia commented Oct 21, 2020

djc commented Oct 23, 2020

djc commented Nov 5, 2020

fggarcia commented Nov 5, 2020

djc commented Nov 5, 2020

djc commented Nov 15, 2020

lassipulkkinen commented Dec 10, 2020

djc commented Dec 10, 2020

lassipulkkinen commented Dec 10, 2020 • edited Loading

lassipulkkinen commented Dec 12, 2020 • edited Loading

djc commented Dec 12, 2020

djc commented Dec 21, 2020

lassipulkkinen commented Dec 22, 2020

djc commented Dec 22, 2020

m9xiuz commented May 19, 2020 •

edited

Loading

m9xiuz commented May 21, 2020 •

edited

Loading

fakeshadow commented May 21, 2020 •

edited

Loading

m9xiuz commented May 22, 2020 •

edited

Loading

lassipulkkinen commented Dec 10, 2020 •

edited

Loading

lassipulkkinen commented Dec 12, 2020 •

edited

Loading