Skip to content

[Feature] refactor colocate #63584

@hqx871

Description

@hqx871

Search before asking

  • I had searched in the issues and found no similar issues.

Description

当前colocate的限制:

  1. 多租户部署时副本冗余:例如租户A需要JOIN租户B的表,租户C需要JOIN租户B的表,但是租户A不需要JOIN租户C的表,理论上我们只需要在租户A/C的机器上增加租户B的表的副本,租户A/C就可以在自己的机器上做ColocateJoin,但是当前的Colocate实现要求租户A/B/C的表必须在A/B/C三个租户的机器上都放置副本,导致副本冗余。
  2. 分桶数限制:当前colocate的实现要求两个表的分桶数必须相同才能做ColocateJoin,当JOIN的表的数量级差别比较大时,难以设置统一的分桶数。

方案:

  1. 支持租户粒度的colcoate,避免副本冗余问题。同时,单个租户的colocate失效不影响其他租户。

例如租户A/B/C上的3个表如下,t1与t2、t3与t2可以支持colocate join,没有副本冗余问题。

CREATE TABLE `t1` (
  `id` bigint NOT NULL,
  `m1` bigint NULL
) ENGINE=OLAP
duplicate KEY(`id`)
COMMENT 'OLAP'
DISTRIBUTED BY hash(id) BUCKETS 2
PROPERTIES (
"replication_allocation" = "tag.location.a: 1",
"colocate_group"="tag.location.a:coloc1"
);


CREATE TABLE `t2` (
  `id` bigint NOT NULL,
  `m2` bigint NULL
) ENGINE=OLAP
duplicate KEY(`id`)
COMMENT 'OLAP'
DISTRIBUTED BY hash(id) BUCKETS 2
PROPERTIES (
"replication_allocation" = "tag.location.a: 1,tag.location.b: 1,tag.location.c: 1",
"colocate_group"="tag.location.a:coloc1,tag.location.c:coloc3"
);


CREATE TABLE `t3` (
  `id` bigint NOT NULL,
  `m3` bigint NULL
) ENGINE=OLAP
duplicate KEY(`id`)
COMMENT 'OLAP'
DISTRIBUTED BY hash(id) BUCKETS 2
PROPERTIES (
"replication_allocation" = "tag.location.c: 1",
"colocate_group"="tag.location.c:coloc3"
);

  1. 支持分桶数为倍数关系的colocate

例如下面的t4和t5的分桶数是t1的倍数,理论上t1/t4/t5也可以支持colocate join。(由于历史原因,我们内部的实现使用了两个表属性)

CREATE TABLE `t4` (
  `id` bigint NOT NULL,
  `m4` bigint NULL
) ENGINE=OLAP
duplicate KEY(`id`)
COMMENT 'OLAP'
DISTRIBUTED BY hash(id) BUCKETS 4
PROPERTIES (
"replication_allocation" = "tag.location.a: 1",
"colocate_slave"="tag.location.a:coloc1"
);

CREATE TABLE `t5` (
  `id` bigint NOT NULL,
  `m5` bigint NULL
) ENGINE=OLAP
duplicate KEY(`id`)
COMMENT 'OLAP'
DISTRIBUTED BY hash(id) BUCKETS 6
PROPERTIES (
"replication_allocation" = "tag.location.a: 1",
"colocate_slave"="tag.location.a:coloc1"
);

Use case

No response

Related issues

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/featureCategorizes issue or PR as related to a new feature.

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions