Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Notify instead of InternalsGuard #164

Closed
wants to merge 10 commits into from
Closed

Conversation

levkk
Copy link
Contributor

@levkk levkk commented May 27, 2023

Potential fix for #154

Problem

InternalsGuard is problematic because of the deadlock. Using a ReentrantLock is not possible because both put and drop are holding a mutable reference to the internals. Using RefCell won't work also, because it doesn't allow two or more mutable borrows, which is what's required to make this work, a violation of Rust safety guarantees.

Possible Solution

Switch to using Tokio's Notify that provides a fair queue for waiters. There is no more internal pool guarantee that connections are fairly given to the tasks that have waited the longest, but this fairness may be good enough if the Tokio scheduler and the kernel scheduler ensure fairness as well. Starvation is possible if the caller infinitely retries calls to get.

Additionally, the entire make_pooled function is timed out using the connection_timeout setting. This ensures that there is no internal starvation & that is_valid() is timed out as well; if it isn't, we can starve all tasks & block the caller indefinitely while waiting for a promise.

Open Questions

The error forwarding is not clear to me. I removed it, but I'm not entirely sure what it does. May need some help here to understand if I broke something.

@codecov
Copy link

codecov bot commented May 27, 2023

Codecov Report

Patch coverage: 96.96% and project coverage change: +0.84 🎉

Comparison is base (ea5c162) 72.12% compared to head (0dc8eda) 72.96%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #164      +/-   ##
==========================================
+ Coverage   72.12%   72.96%   +0.84%     
==========================================
  Files           6        6              
  Lines         599      603       +4     
==========================================
+ Hits          432      440       +8     
+ Misses        167      163       -4     
Impacted Files Coverage Δ
bb8/src/inner.rs 87.38% <95.91%> (+1.06%) ⬆️
bb8/src/internals.rs 95.45% <100.00%> (+2.69%) ⬆️

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

This was referenced Jun 14, 2023
@dertin
Copy link

dertin commented Jun 18, 2023

Has there been any progress with the tests? This PR have any issues as 'blocked' waiting for some discussion or support ... ?

@levkk
Copy link
Contributor Author

levkk commented Jun 19, 2023

Not really, just waiting for djc to take a look.

@djc
Copy link
Owner

djc commented Sep 26, 2023

Sorry for taking so long to look at this. So if I understand correctly, this fix will make the pool less fair? And also, more tasks have to do work if a number of them are waiting for a connection to become available, whereas previously they would just spend that time sleeping?

Because, while the previous design with waiters made it so that earlier requesters would be sent the freed up connection, in this design all waiters are notified and then make another attempt at getting a connection?

@levkk
Copy link
Contributor Author

levkk commented Sep 26, 2023

I don't think so. Notify is fair according to the docs and notify_one notifies only one task, not all of them.

nicolasvan added a commit to OneSignal/pgcat that referenced this pull request Oct 5, 2023
while waiting on djc/bb8#164 to ship
@levkk
Copy link
Contributor Author

levkk commented Jan 4, 2024

So if I understand correctly, this fix will make the pool less fair?

I took a look at the implementation of Notify. It uses a FIFO queue [1] [2] for tasks, so the first task to .await will be the first one to be woken up. I think that's good enough to maintain fairness, since the pool will never wake up more tasks than there are available connections. The only way I can think we can have a task skip the queue is a bug in the tokio scheduler.

[1] https://docs.rs/tokio/latest/src/tokio/sync/notify.rs.html#985
[2] https://docs.rs/tokio/latest/src/tokio/sync/notify.rs.html#732

@djc
Copy link
Owner

djc commented Jan 26, 2024

I tried to simplify this a bit, please take a look at #186.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants