Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unify the way how all the modes for parallel replicas are enabled and used #63521

Open
nikitamikhaylov opened this issue May 8, 2024 · 2 comments

Comments

@nikitamikhaylov
Copy link
Member

nikitamikhaylov commented May 8, 2024

Use case

There are several modes of parallel replicas and tons of settings. In order to make it usable we need to do the following:

  • All the modes should work on top the MergeTree tables without additional Distributed on top.
  • Introduce a setting parallel_replicas_mode with Enum type with values read_tasks, key_hash, key_range and sample_offset. Later on we will introduce another mode auto.
    • This is kind of backward-incompatible change, because for sample-offset mode you will need to set: use_parallel_replicas=true, parallel_replicas_mode='sample_offset', max_parallel_replicas=X instead of just max_parallel_replicas=X. I think this is Ok, because this mode is not popular among our users + we can document it well.
    • Same for other modes, but all other ones are considered experimental.

Additional context

Take a look at comment in this PR: #63151

@devcrafter
Copy link
Member

parallel_replicas_mode should make use_parallel_replicas unnecessary, since it can have empty or/and none values

@nikitamikhaylov
Copy link
Member Author

I would love parallel_replicas_mode to be read_tasks by default.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants