New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix RWLock inconsistency after write lock timeout #57454
Conversation
This is an automated comment for commit 179a0a2 with description of existing statuses. It's updated for the latest CI running ❌ Click here to open a full report in a separate page Successful checks
|
6c21a70
to
d5c043d
Compare
@alesapin It says Bugfix validate check — Changed tests don't reproduce the bug, but that's not true because the bug is reproduced well in unit-tests ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general it looks great. I've left some comments about minor things
} | ||
|
||
|
||
String RWLockImpl::getOwnerQueryIdsDescription() const |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd recommend fmt::formatter to do things like this. It makes formatting containers much simpler.
Not necessary for the PR, just something to keep in mind
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we build our output in a cycle, fmt::formatter()
doesn't seem to be much better in this case.
) | ||
|
||
def truncate_tables(): | ||
while time.time() < end_time: | ||
table_name = f"mydb.tbl{randint(1, num_nodes)}" | ||
node = nodes[randint(0, num_nodes - 1)] | ||
node.query(f"TRUNCATE TABLE IF EXISTS {table_name} SYNC") | ||
# "TRUNCATE TABLE IF EXISTS" still can throw some errors (e.g. "WRITE locking attempt on node0 has timed out!") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this depends on the table type right? MergeTree should not require an exclusive lock for truncate as it's going to replace the old parts with an empty range.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OTOH In the case of DROP it depends on the database type (Atomic is fine)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For MergeTree-based tables TRUNCATE
doesn't require an exclusive lock (see) however this test can create "Log" tables too. I added a comment about that to the test.
@@ -41,8 +41,8 @@ RWLockImpl::LockHolder IStorage::tryLockTimed( | |||
{ | |||
const String type_str = type == RWLockImpl::Type::Read ? "READ" : "WRITE"; | |||
throw Exception(ErrorCodes::DEADLOCK_AVOIDED, | |||
"{} locking attempt on \"{}\" has timed out! ({}ms) Possible deadlock avoided. Client should retry", | |||
type_str, getStorageID(), acquire_timeout.count()); | |||
"{} locking attempt on \"{}\" has timed out! ({}ms) Possible deadlock avoided. Client should retry. Owner query ids: {}", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although this might be inconsistent (time to check vs time to read) it's an amazing QOL improvement. It'd have been extremely useful to have it in the past.
@@ -169,11 +178,12 @@ RWLockImpl::getLock(RWLockImpl::Type type, const String & query_id, const std::c | |||
if (rdlock_owner == readers_queue.end() && wrlock_owner == writers_queue.end()) | |||
{ | |||
(type == Read ? rdlock_owner : wrlock_owner) = it_group; /// SM2: nothrow |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This ternary conditional seems wrong, as this can only be reached if type == Read
. Otherwise writers_queue
won't be empty
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
readers_queue
and writers_queue
can't be both empty here, at least they must contain it_group
.
But we don't check those queues for emptyness here, we assign rdlock_owner
or wrlock_owner
to it_group
.
This condition is not wrong.
{ | ||
if (rdlock_owner != readers_queue.end()) | ||
{ | ||
for (;;) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There should be, at most, only one group of readers pending right?
You can have N readers that are already owner, a writers, and then only a block of readers all together. AFAICS it's not possible to have more than one group of readers that aren't owners, but I might be missing something
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we append readers to the same reader group if this group doesn't have ownership yet.
We create another reading group only if the last reading group is already an owner.
But there can be multiple reader groups with ownership.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added chassert()
to check that.
wrlock_owner = writers_queue.begin(); | ||
} | ||
else |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this even possible anymore? If we removed all readers there must be a writer or nothing in the queue, but never another reader group.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There can be an active reader group, then a writer, then another (inactive) reading group.
We remove the active reader group, then we activate the writer.
src/Common/tests/gtest_rw_lock.cpp
Outdated
if (timepoint.length() < 5) | ||
timepoint.insert(0, 5 - timepoint.length(), ' '); | ||
std::lock_guard lock{mutex}; | ||
std::cout << timepoint << " : " << event << std::endl; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can remove the prints now and leave the asserts
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
49415fe
to
ec5348a
Compare
@vitlibar a test has failed: https://s3.amazonaws.com/clickhouse-test-reports/57712/6b55c16b4ed40864aa0577fa61a9a6a41c12912d/unit_tests__msan_.html I've temporarily reverted it. |
Changelog category:
Changelog entry:
Fix RWLock inconsistency after write lock timeout.
This PR is to provide a correct fix for the issue described in the description of #38864. The fix provided in #38864 was not complete and after that fix after exclusive lock failure
RWLock
could become inconsistent and not be actually unlocked after releasing all locks. See #42719 (comment)