New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CLICKHOUSE-606: query deduplication based on parts' UUID #17348
CLICKHOUSE-606: query deduplication based on parts' UUID #17348
Conversation
} | ||
catch (const DB::Exception & ex) | ||
{ | ||
if (ex.code() == ErrorCodes::DUPLICATED_UUIDS) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have an awful failure case for this where it will never make any progress:
- 3 shard cluster
- shard 1 has part 1, shard 2 has part 2 and shard 3 has both part 1 and part 2 (2 concurrent part moves).
- the packet with part uuids is received first from shard 1
- then from shard 3
- now this query is resent as it contains part duplicates (in flight)
- then from shard 3
- packet with part uuids is received from shard 2
- packet with part uuids is received from shard 3 which again contains duplicates
One solution is multiple retries with a very awful worst case.
I'm ok with ignoring this case for now but it is worth being mentioned.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One solution is multiple retries with a very awful worst case.
From the initial proposal #13574
duplicated parts can be found during the query processing only during a short period of time:
- destination shard attaches a recently moved part
X - (DEDUPLICATION HAPPENS HERE) - source shard detaches moving part
it multi parts movement it's a know tradeoff -> better fail and retry later, when switching is done.
surely, number of retries can be configured later if necessary, but I don't think we should proceed with this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And max_distributed_connections=1
is a workaround for such case, right?
99bfb0a
to
b1ca682
Compare
b1ca682
to
17ffbe8
Compare
auto prev_parts = parts; | ||
parts.clear(); | ||
Context & query_context = context.hasQueryContext() ? | ||
const_cast<Context &>(context).getQueryContext() : const_cast<Context &>(context); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better to make get*PartUUIDs
constant than const_cast
context each time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but there's initialization inside getter:
auto lock = getLock();
if (!part_uuids)
part_uuids = std::make_shared<PartUUIDs>();
hence this dirty trick...
} | ||
|
||
/// populate UUIDs and exclude ignored parts if enabled | ||
if (query_context.getSettingsRef().allow_experimental_query_deduplication && part->uuid != UUIDHelpers::Nil) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I know, currently, all parts have UUIDs. So we will send all of them each time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
only when setting allow_experimental_query_deduplication=1, later, when moving part logic is implemented, it's possible to look at memtable/zk to say which parts are switching/should be deduplecated.
@@ -1749,6 +1749,9 @@ class Client : public Poco::Util::Application | |||
|
|||
switch (packet.type) | |||
{ | |||
case Protocol::Server::PartUuids: | |||
return true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems that client should not get them (if so LOGICAL_ERROR looks better), or I'm missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it can potentially, because server can have parts with UUIDs and if settings is enabled, it will send them.
I'd just ignore this packet for now, rather than changing the protocol where client explicitly asking for those UUIDs to be sent back (which is done now via setting).
4dc30fe
to
2b29c56
Compare
2b29c56
to
1e55c7e
Compare
committed changed submodule by mistake, fixed |
0545759
to
98c3f25
Compare
* add the query data deduplication excluding duplicated parts in MergeTree family engines. query deduplication is based on parts' UUID which should be enabled first with merge_tree setting assign_part_uuids=1 allow_experimental_query_deduplication setting is to enable part deduplication, default ot false. data part UUID is a mechanism of giving a data part a unique identifier. Having UUID and deduplication mechanism provides a potential of moving parts between shards preserving data consistency on a read path: duplicated UUIDs will cause root executor to retry query against on of the replica explicitly asking to exclude encountered duplicated fingerprints during a distributed query execution. NOTE: this implementation don't provide any knobs to lock part and hence its UUID. Any mutations/merge will update part's UUID. * add _part_uuid virtual column, allowing to use UUIDs in predicates. Signed-off-by: Aleksei Semiglazov <asemiglazov@cloudflare.com> address comments
4cf8449
to
d05c644
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, the code quite isolated, so we can merge it with the experimental flag.
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Distributed query deduplication is a followup to #16033 and partially resolves the proposal #13574
query deduplication is based on parts' UUID which should be enabled first with merge_tree setting
assign_part_uuids=1
allow_experimental_query_deduplication setting is to enable part deduplication, default to false.
data part UUID is a mechanism of giving a data part a unique identifier.
Having UUID and deduplication mechanism provides a potential of moving parts
between shards preserving data consistency on a read path:
duplicated UUIDs will cause root executor to retry query against on of the replica explicitly
asking to exclude encountered duplicated fingerprints during a distributed query execution.
NOTE: this implementation don't provide any knobs to lock part and hence its UUID. Any mutations/merge will
update part's UUID.