Unify the way how all the modes for parallel replicas are enabled and used #63521

nikitamikhaylov · 2024-05-08T12:26:17Z

Use case

There are several modes of parallel replicas and tons of settings. In order to make it usable we need to do the following:

All the modes should work on top the MergeTree tables without additional Distributed on top.
Introduce a setting parallel_replicas_mode with Enum type with values read_tasks, key_hash, key_range and sample_offset. Later on we will introduce another mode auto.
- This is kind of backward-incompatible change, because for sample-offset mode you will need to set: use_parallel_replicas=true, parallel_replicas_mode='sample_offset', max_parallel_replicas=X instead of just max_parallel_replicas=X. I think this is Ok, because this mode is not popular among our users + we can document it well.
- Same for other modes, but all other ones are considered experimental.

Additional context

Take a look at comment in this PR: #63151

The text was updated successfully, but these errors were encountered:

devcrafter · 2024-05-08T21:25:59Z

parallel_replicas_mode should make use_parallel_replicas unnecessary, since it can have empty or/and none values

nikitamikhaylov · 2024-05-09T00:04:57Z

I would love parallel_replicas_mode to be read_tasks by default.

nikitamikhaylov added usability unfinished code labels May 8, 2024

nikitamikhaylov mentioned this issue May 8, 2024

Parallel replicas feature is Beta #63151

Open

29 tasks

Provide feedback