-
Notifications
You must be signed in to change notification settings - Fork 6.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid deadlocks on JOIN Engine tables #29544
Conversation
Needed for check test under ASAN
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test 01732_race_condition_storage_join_long.sh is interesting with this changes. Now all queries in this test should success.
Would some of them fail with timeout error after change?
throw DB::Exception("addJoinedBlock called when HashJoin locked to prevent updates", | ||
ErrorCodes::LOGICAL_ERROR); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not changed in this PR, but I wonder why it's LOGICAL_ERROR
, can it be reached in we perform INSERT
into JOIN Engine
table during SELECT
?
UPD: because lock already acquired in StorageJoin::insertBlock
, but storage_join_lock
can be set only from StorageJoin::getJoinLocked
that are used for SELECT
ing data.
Before the change (master) the queries would get stuck forever if a deadlock happens (read starts and gets lock, writes starts and asks for write lock, first read tries to get another read lock). The With this change, the issue should not appear anymore as RWLock allows you to jump the lock queue is you already hold a read lock and request another. This is what fixes the bug. The timeouts only happen in the tests because I set it too low for the sanitizers but if they were large enough it should eventually go through. I'm not testing that the timeout occurs for low enough values of |
Ups I see that you were talking about a different test and not the one added in the PR.
Not unless the server is extremely slow for other reasons and I would say that would be a desirable outcome (respect the settings instead of blocking forever). In the test itself there are only 4 concurrent queries (3 reads and an insert) so I think it's highly unlikely that they would need to wait for 120 to acquire the lock. |
If I'm not mistaken lock is acquired for the whole Another question does Finally, have you research how difficult to solve problem without timeout? Like to lock mutex in correct order, change logic a bit? What is the main issue here? |
For SELECT queries the lock is acquired multiple times, the first one I think happens during InterpreterSelectQuery, and then when read() is called. Might be more.
No, the main reason was to avoid the deadlock due to acquiring the same lock multiple times (and somebody else gets in between).
I think it's doable, ideally by just holding the mutex at the start (
So, from what's available I couldn't see a better option; but I don't say there isn't. |
I can't access the logs of the failed tests as they point to an internal URL. Are they related to changes in this PR? |
Seems that it isn't task just timed out (I see similar issue in other PRs).
What's about |
Thanks!
It only provides exclusive ownership only so you wouldn't be able to have multiple read queries running at the same time. |
Let's mark it as long |
Manually backported PRs:
Further backports are possible but they require extra manual work. |
Backport #29544 to 21.10: Avoid deadlocks on JOIN Engine tables
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Avoid deadlocks when reading and writting on JOIN Engine tables at the same time
Detailed description / Documentation draft:
The patch ended up requiring way more changes that I wanted to. I decided to replace the shared_mutex by a RWLock for several reasons:
Since RWLock requires a query_id to work effectively I had to add a context parameter to multiple places. The 2 functions that I didn't add it and instead used
RWLockImpl::NO_QUERY
weretotalRows
andtotalBytes
as their declaration is used in 10 other places, but it could be done if we wanted to.The added test fails all the time in my system pre-change (usually even in the second iteration of the loop) but I had to increase the size the table until if was reliable deadlocking the database as it's purely timebased.
Fixes #29485
The bug affects all stable releases so it would be great to have it backported to 21.8+ at least. Let me know if I can help there if the proposed solution is accepted.