Reentrant deadlock #154

nmldiegues · 2023-03-31T22:51:55Z

We're using bb8 (thanks for doing it!) in a service handling lots of traffic, with bb8 used millions of times per minute.
Today we hit a deadlock that seems to be induced by a single task/thread scenario due to reentrancy:

Thread 3 (Thread 0x7f4d193fe700 (LWP 99758)):
#0  syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
#1  0x0000563841825ea5 in parking_lot_core::thread_parker::imp::ThreadParker::futex_wait () at /cfsetup_build/vendor/parking_lot_core/src/thread_parker/linux.rs:112
#2  parking_lot_core::thread_parker::imp::{impl#0}::park () at /cfsetup_build/vendor/parking_lot_core/src/thread_parker/linux.rs:66
#3  parking_lot_core::parking_lot::park::{closure#0}<parking_lot::raw_mutex::{impl#3}::lock_slow::{closure_env#0}, parking_lot::raw_mutex::{impl#3}::lock_slow::{closure_env#1}, parking_lot::raw_mutex::{impl#3}::lock_slow::{closure_env#2}> () at /cfsetup_build/vendor/parking_lot_core/src/parking_lot.rs:635
#4  parking_lot_core::parking_lot::with_thread_data<parking_lot_core::parking_lot::ParkResult, parking_lot_core::parking_lot::park::{closure_env#0}<parking_lot::raw_mutex::{impl#3}::lock_slow::{closure_env#0}, parking_lot::raw_mutex::{impl#3}::lock_slow::{closure_env#1}, parking_lot::raw_mutex::{impl#3}::lock_slow::{closure_env#2}>> () at /cfsetup_build/vendor/parking_lot_core/src/parking_lot.rs:207
#5  parking_lot_core::parking_lot::park<parking_lot::raw_mutex::{impl#3}::lock_slow::{closure_env#0}, parking_lot::raw_mutex::{impl#3}::lock_slow::{closure_env#1}, parking_lot::raw_mutex::{impl#3}::lock_slow::{closure_env#2}> () at /cfsetup_build/vendor/parking_lot_core/src/parking_lot.rs:600
#6  parking_lot::raw_mutex::RawMutex::lock_slow () at src/raw_mutex.rs:262
#7  0x0000563841a3ef55 in parking_lot::raw_mutex::{impl#0}::lock () at vendor/parking_lot/src/raw_mutex.rs:72
#8  lock_api::mutex::Mutex::lock<parking_lot::raw_mutex::RawMutex, bb8::internals::PoolInternals<quicksilver_client::conn_pool::QuicksilverConnManager<quicksilver_client::conn::MemcachedConn, quicksilver_client::new::{async_fn#0}::{closure_env#0}, quicksilver_client::new::{async_fn#0}::{closure#0}::{async_block_env#0}>>> () at vendor/lock_api/src/mutex.rs:214
#9  bb8::internals::{impl#4}::drop<quicksilver_client::conn_pool::QuicksilverConnManager<quicksilver_client::conn::MemcachedConn, quicksilver_client::new::{async_fn#0}::{closure_env#0}, quicksilver_client::new::{async_fn#0}::{closure#0}::{async_block_env#0}>> () at vendor/bb8/src/internals.rs:191
#10 0x000056384185007a in core::ptr::drop_in_place<bb8::internals::InternalsGuard<quicksilver_client::conn_pool::QuicksilverConnManager<quicksilver_client::conn::MemcachedConn, quicksilver_client::new::{async_fn#0}::{closure_env#0}, quicksilver_client::new::{async_fn#0}::{closure#0}::{async_block_env#0}>>> () at /rustc/fc594f15669680fa70d255faec3ca3fb507c3405/library/core/src/ptr/mod.rs:490
#11 core::ptr::drop_in_place<core::option::Option<bb8::internals::InternalsGuard<quicksilver_client::conn_pool::QuicksilverConnManager<quicksilver_client::conn::MemcachedConn, quicksilver_client::new::{async_fn#0}::{closure_env#0}, quicksilver_client::new::{async_fn#0}::{closure#0}::{async_block_env#0}>>>> () at /rustc/fc594f15669680fa70d255faec3ca3fb507c3405/library/core/src/ptr/mod.rs:490
#12 core::ptr::drop_in_place<core::cell::UnsafeCell<core::option::Option<bb8::internals::InternalsGuard<quicksilver_client::conn_pool::QuicksilverConnManager<quicksilver_client::conn::MemcachedConn, quicksilver_client::new::{async_fn#0}::{closure_env#0}, quicksilver_client::new::{async_fn#0}::{closure#0}::{async_block_env#0}>>>>> () at /rustc/fc594f15669680fa70d255faec3ca3fb507c3405/library/core/src/ptr/mod.rs:490
#13 core::ptr::drop_in_place<futures_channel::lock::Lock<core::option::Option<bb8::internals::InternalsGuard<quicksilver_client::conn_pool::QuicksilverConnManager<quicksilver_client::conn::MemcachedConn, quicksilver_client::new::{async_fn#0}::{closure_env#0}, quicksilver_client::new::{async_fn#0}::{closure#0}::{async_block_env#0}>>>>> () at /rustc/fc594f15669680fa70d255faec3ca3fb507c3405/library/core/src/ptr/mod.rs:490
#14 core::ptr::drop_in_place<futures_channel::oneshot::Inner<bb8::internals::InternalsGuard<quicksilver_client::conn_pool::QuicksilverConnManager<quicksilver_client::conn::MemcachedConn, quicksilver_client::new::{async_fn#0}::{closure_env#0}, quicksilver_client::new::{async_fn#0}::{closure#0}::{async_block_env#0}>>>> () at /rustc/fc594f15669680fa70d255faec3ca3fb507c3405/library/core/src/ptr/mod.rs:490
#15 alloc::sync::Arc::drop_slow<futures_channel::oneshot::Inner<bb8::internals::InternalsGuard<quicksilver_client::conn_pool::QuicksilverConnManager<quicksilver_client::conn::MemcachedConn, quicksilver_client::new::{async_fn#0}::{closure_env#0}, quicksilver_client::new::{async_fn#0}::{closure#0}::{async_block_env#0}>>>> () at /rustc/fc594f15669680fa70d255faec3ca3fb507c3405/library/alloc/src/sync.rs:1123
#16 0x0000563841914acc in alloc::sync::{impl#27}::drop<futures_channel::oneshot::Inner<bb8::internals::InternalsGuard<quicksilver_client::conn_pool::QuicksilverConnManager<quicksilver_client::conn::MemcachedConn, quicksilver_client::new::{async_fn#0}::{closure_env#0}, quicksilver_client::new::{async_fn#0}::{closure#0}::{async_block_env#0}>>>> () at /rustc/fc594f15669680fa70d255faec3ca3fb507c3405/library/alloc/src/sync.rs:1745
#17 core::ptr::drop_in_place<alloc::sync::Arc<futures_channel::oneshot::Inner<bb8::internals::InternalsGuard<quicksilver_client::conn_pool::QuicksilverConnManager<quicksilver_client::conn::MemcachedConn, quicksilver_client::new::{async_fn#0}::{closure_env#0}, quicksilver_client::new::{async_fn#0}::{closure#0}::{async_block_env#0}>>>>> () at /rustc/fc594f15669680fa70d255faec3ca3fb507c3405/library/core/src/ptr/mod.rs:490
#18 core::ptr::drop_in_place<futures_channel::oneshot::Sender<bb8::internals::InternalsGuard<quicksilver_client::conn_pool::QuicksilverConnManager<quicksilver_client::conn::MemcachedConn, quicksilver_client::new::{async_fn#0}::{closure_env#0}, quicksilver_client::new::{async_fn#0}::{closure#0}::{async_block_env#0}>>>> () at /rustc/fc594f15669680fa70d255faec3ca3fb507c3405/library/core/src/ptr/mod.rs:490
#19 futures_channel::oneshot::Sender::send<bb8::internals::InternalsGuard<quicksilver_client::conn_pool::QuicksilverConnManager<quicksilver_client::conn::MemcachedConn, quicksilver_client::new::{async_fn#0}::{closure_env#0}, quicksilver_client::new::{async_fn#0}::{closure#0}::{async_block_env#0}>>> () at vendor/futures-channel/src/oneshot.rs:343
#20 0x0000563841a251d7 in bb8::internals::PoolInternals::put<quicksilver_client::conn_pool::QuicksilverConnManager<quicksilver_client::conn::MemcachedConn, quicksilver_client::new::{async_fn#0}::{closure_env#0}, quicksilver_client::new::{async_fn#0}::{closure#0}::{async_block_env#0}>> () at vendor/bb8/src/internals.rs:74
#21 0x0000563841836ac4 in bb8::inner::PoolInner::put_back<quicksilver_client::conn_pool::QuicksilverConnManager<quicksilver_client::conn::MemcachedConn, quicksilver_client::new::{async_fn#0}::{closure_env#0}, quicksilver_client::new::{async_fn#0}::{closure#0}::{async_block_env#0}>> () at vendor/bb8/src/inner.rs:166
#22 0x0000563841b139f7 in bb8::api::{impl#10}::drop<quicksilver_client::conn_pool::QuicksilverConnManager<quicksilver_client::conn::MemcachedConn, quicksilver_client::new::{async_fn#0}::{closure_env#0}, quicksilver_client::new::{async_fn#0}::{closure#0}::{async_block_env#0}>> () at vendor/bb8/src/api.rs:364

This is using tag v0.8.0.

If you ignore the boilerplate, you'll see essentially that:

a connection was being returned to the pool
that logic acquires the internals lock
then it calls PoolInternals#PoolInternals
which creates a InternalsGuard
now while the InternalsGuard is in the call stack, everything gets dropped (this is possible in our case because requests are handled in a tokio runtime with a timeout per request, and timeout happened here)
so InternalsGuard#drop is called, which tries to acquire the internals lock <-- and voila, we hit a deadlock since the lock is not reentrant

The text was updated successfully, but these errors were encountered:

djc · 2023-04-01T12:04:35Z

Thanks for digging into this! Would you be able to submit a PR to fix it?

nmldiegues · 2023-04-03T09:59:17Z

Thanks for digging into this! Would you be able to submit a PR to fix it?

Fixing it with parking_lot should be trivial because we can just use https://docs.rs/parking_lot/latest/parking_lot/type.ReentrantMutex.html --- see 5c1fe4a for the example patch on top of v0.8.0 (see below for why I did this)

However I see this repo very recently had this change #152 which makes that trivial fix impossible.

Any other fix is going to require a design change as to how the connection is managed throughout the code, which puts us at a crossroads here.

djc · 2023-05-26T11:06:23Z

Any other fix is going to require a design change as to how the connection is managed throughout the code, which puts us at a crossroads here.

What kind of design changes? Would it introduce a lot of extra complexity or just move the code around?

drdrsh · 2023-05-27T02:15:50Z

@nmldiegues I think I am running into the same issue and I am able to reproduce it. I tried your patch I am a getting panic

panicked at 'already borrowed: BorrowMutError', /usr/local/cargo/git/checkouts/bb8-c4fdc7c4996dd882/5c1fe4a/bb8/src/internals.rs:196

djc · 2024-01-26T11:52:35Z

I've submitted #186 which tries to fix this.

To get djc/bb8#186 and djc/bb8#189 which fix potential deadlocks (djc/bb8#154)

To get djc/bb8#186 and djc/bb8#189 which fix potential deadlocks (djc/bb8#154). Also, this (djc/bb8#225) was needed to prevent a connection leak which was conveniently spotted in our integration tests.

* Update bb8 to 0.8.6 To get djc/bb8#186 and djc/bb8#189 which fix potential deadlocks (djc/bb8#154). Also, this (djc/bb8#225) was needed to prevent a connection leak which was conveniently spotted in our integration tests. * Ignore ./.bundle (created by dev console) --------- Co-authored-by: Jose Fernandez (magec) <joseferper@gmail.com>

nmldiegues added a commit to nmldiegues/bb8 that referenced this issue Apr 3, 2023

Fix djc#154 on top of v0.8.0

5c1fe4a

This was referenced May 27, 2023

Switch to Notify levkk/bb8#2

Closed

Use Notify instead of InternalsGuard #164

Closed

djc mentioned this issue Jun 9, 2023

bb8 deadlocks in rust-pool-benchmark #122

Closed

djc mentioned this issue Jan 26, 2024

Use Notify to coordinate waiters #186

Merged

djc closed this as completed in #186 Jan 29, 2024

nicolasvan added a commit to OneSignal/pgcat that referenced this issue Mar 18, 2024

Update bb8 to 0.8.3

049b93c

To get djc/bb8#186 and djc/bb8#189 which fix potential deadlocks (djc/bb8#154)

nicolasvan mentioned this issue Mar 18, 2024

Bump bb8 from 0.8.1 to 0.8.6 postgresml/pgcat#709

Merged

magec pushed a commit to OneSignal/pgcat that referenced this issue Oct 23, 2024

Update bb8 to 0.8.3

96a09e1

To get djc/bb8#186 and djc/bb8#189 which fix potential deadlocks (djc/bb8#154)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reentrant deadlock #154

Reentrant deadlock #154

nmldiegues commented Mar 31, 2023

djc commented Apr 1, 2023 •

edited

Loading

nmldiegues commented Apr 3, 2023 •

edited

Loading

djc commented May 26, 2023

drdrsh commented May 27, 2023

djc commented Jan 26, 2024

Reentrant deadlock #154

Reentrant deadlock #154

Comments

nmldiegues commented Mar 31, 2023

djc commented Apr 1, 2023 • edited Loading

nmldiegues commented Apr 3, 2023 • edited Loading

djc commented May 26, 2023

drdrsh commented May 27, 2023

djc commented Jan 26, 2024

djc commented Apr 1, 2023 •

edited

Loading

nmldiegues commented Apr 3, 2023 •

edited

Loading