Skip to content

Cannot use projection when parallel replicas is enabled. #82659

@RinChanNOWWW

Description

@RinChanNOWWW

ClickHouse version: the latest codes from master.
I create a test shard with 2 replicas, and create a ReplicatedMergeTree with projection in this test cluster.

CREATE TABLE default.test_proj
(
    `event_date` Date,
    `user_id` UInt32,
    `value` UInt32,
    `category` String,
    PROJECTION p_category_sum
    (
        SELECT
            category,
            sum(value) AS total_value
        GROUP BY category
    )
)
ENGINE = ReplicatedMergeTree('/clickhouse/tables/{shard}/test_proj', '{replica}')
PARTITION BY toYYYYMM(event_date)
ORDER BY (event_date, user_id)
SETTINGS index_granularity = 8192;

INSERT INTO test_proj VALUES ('2024-06-01', 1, 100, 'A'), ('2024-06-01', 2, 200, 'B'), ('2024-06-02', 3, 150, 'A'),('2024-06-02', 4, 300, 'C');
INSERT INTO test_proj VALUES ('2024-06-01', 1, 100, 'D');
INSERT INTO test_proj VALUES ('2024-06-01', 1, 100, 'F');

I test several cases to use the projection:

-- 1. disable parallel replicas
EXPLAIN
SELECT
    category,
    sum(value) AS total_value
FROM test_proj
GROUP BY category
SETTINGS allow_experimental_parallel_reading_from_replicas = 0;
-- result:
   ┌─explain─────────────────────────────────────┐
1. │ Expression ((Projection + Before ORDER BY)) │
2. │   Aggregating                               │
3. │     Expression                              │
4. │       ReadFromMergeTree (p_category_sum)    │
   └─────────────────────────────────────────────┘
-- 2. enable parallel replicas and enable analyzer
EXPLAIN
SELECT
    category,
    sum(value) AS total_value
FROM test_proj
GROUP BY category
SETTINGS allow_experimental_parallel_reading_from_replicas = 1, enable_analyzer = 1
-- result:
   ┌─explain─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
1. │ Expression ((Project names + Projection))                                                                                                                                                                                           │
2. │   MergingAggregated                                                                                                                                                                                                                 │
3. │     Union4. │       Aggregating                                                                                                                                                                                                                   │
5. │         Expression ((Before GROUP BY + Change column names to column identifiers))                                                                                                                                                  │
6. │           ReadFromMergeTree (default.test_proj)                                                                                                                                                                                     │
7. │       ReadFromRemoteParallelReplicas (Query: SELECT `__table1`.`category` AS `category`, sum(`__table1`.`value`) AS `total_value` FROM `default`.`test_proj` AS `__table1` GROUP BY `__table1`.`category` Replicas: 127.0.0.1:9010) │
   └─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
-- 3. enable parallel replicas and disable analyzer
EXPLAIN
SELECT
    category,
    sum(value) AS total_value
FROM test_proj
GROUP BY category
SETTINGS allow_experimental_parallel_reading_from_replicas = 1, enable_analyzer = 0, parallel_replicas_only_with_analyzer = 0
-- result:
   ┌─explain──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
1. │ Expression ((Projection + Before ORDER BY))                                                                                                                                          │
2. │   MergingAggregated                                                                                                                                                                  │
3. │     ReadFromRemoteParallelReplicas (Query: SELECT `category`, sum(`value`) AS `total_value` FROM `default`.`test_proj` GROUP BY `category` Replicas: 127.0.0.1:9000, 127.0.0.1:9010) │
   └──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

Furthermore, I use LLDB to inspect the execution, I found that it cannot reach the line of the second breakpoint in the picture below:

Image

It means the query cannot pass the line of the first breakpoint.

Metadata

Metadata

Assignees

No one assigned

    Labels

    comp-parallel-replicasParallel reading from replicas for query speed (not data replication consistency).

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions