Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallel reading from replicas #29279

Conversation

nikitamikhaylov
Copy link
Member

@nikitamikhaylov nikitamikhaylov commented Sep 22, 2021

I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en

Changelog category (leave one):

  • New Feature

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Added an ability to read from all replicas within a shard during distributed query. To enable this, set allow_experimental_parallel_reading_from_replicas=true and max_parallel_replicas to any number. This closes #26748

Detailed description / Documentation draft:
We already have such mechanism, but it works only for tables with SAMPLE BY key. This feature will work for any kind of MergeTree tables. If max_parallel_replicas is set to some value > 1, then the old algorithm is enabled (backward compatible change).

@robot-clickhouse robot-clickhouse added doc-alert pr-feature Pull request with new product feature labels Sep 22, 2021
@nikitamikhaylov nikitamikhaylov added the force tests The label does nothing, NOOP, None, nil label Sep 22, 2021
@CLAassistant
Copy link

CLAassistant commented Sep 28, 2021

CLA assistant check
All committers have signed the CLA.

/// We are the first who wants to process parts in partition
if (partition_it == partitions.end())
{
auto part_and_projection = request.part_name + "#" + request.projection_name;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't we store part name and projection separately?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to underline their strict connectivity and also this concatenation simplifies the code a little bit (There are no additional checks that we want to read from the same projection). Maybe we can move them to std::pair or somewhat similar.

result.intersect(intervals_to_do);

/// Update intervals_to_do
intervals_to_do.intersect(HalfIntervals::initializeFromMarkRanges(std::move(request.mark_ranges)).negate());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should add some comments, from your paper :)

Copy link
Member

@KochetovNicolai KochetovNicolai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally ok

@KochetovNicolai
Copy link
Member

Ok, just need to disable for FINAL and we can merge.

@nikitamikhaylov
Copy link
Member Author

@Mergifyio update

@mergify
Copy link
Contributor

mergify bot commented Dec 8, 2021

update

✅ Branch has been successfully updated

@nikitamikhaylov nikitamikhaylov merged commit dbf5091 into ClickHouse:master Dec 9, 2021
@ka1bi4
Copy link
Contributor

ka1bi4 commented Dec 11, 2021

Internal documentation ticket: DOCSUP-19955

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
force tests The label does nothing, NOOP, None, nil 🎅 🎁 gift🎄 To make people wonder pr-feature Pull request with new product feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Parallel processing on replicas, reworked.
6 participants