-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prevent deadlock when notifying shard of ledger #3683
Conversation
2aeed84
to
52c7e94
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. I had one question about who closes the backend_
in one case, and left two nits. Other than that, nice solution to the deadlock issue!
{ | ||
JLOG(j_.error()) << "shard " << index_ | ||
<< " missing acquire SQLite database"; | ||
auto const scopedCount{makeBackendCount()}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On line 153 of this file (not in this changeset), we check the backendCount_ and exit early if in use. We also lock mutex_
but not storedMutex_
there. The old code would have closed the backend_
but the new code may not.
Who closes backend_
in such a case? Is it DatabaseShardImp::sweep()
? If nobody does, an easy fix is to lock the storedMutex
there as well. If it's not an issue, the of couse fine as-is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, DatabaseShardImp::sweep()
will close it or the shard Dtor. I don't see an issue with tryClose
and storedMutex_
as is, but maybe I missed something you saw.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I compiled these changes together with Deterministic shards PR and run it on mainnet. It successfully collecting shards.
79839d4
to
561e437
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
561e437
to
c0f64d9
Compare
High Level Overview of Change
This change is primarily about preventing a potential thread deadlock when notifying shards on ledger completions.
Context of Change
Shards are notified by the inbound ledger process when pertaining ledgers have been acquired. Upon notification, the SQLite database is populated with ledger data. If the ledger data resides in the same shard, it is possible to encounter a deadlock when shard fetch calls are made while storing to the SQLite database.
Type of Change