Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Proposal] Support Dynamic Partition in Doris #2262

Closed
WingsGo opened this issue Nov 21, 2019 · 6 comments
Closed

[Proposal] Support Dynamic Partition in Doris #2262

WingsGo opened this issue Nov 21, 2019 · 6 comments
Assignees
Labels
kind/feature Categorizes issue or PR as related to a new feature.

Comments

@WingsGo
Copy link
Contributor

WingsGo commented Nov 21, 2019

Background:

In some scenarios, when a user creates an Olap table that is range partition by time, the user needs to periodically add and remove partitions to ensure that the data is valid. As a result, adding and removing partitions dynamically can be very useful for users.

SQL语法

CREATE [EXTERNAL] TABLE [IF NOT EXISTS] [database.]table_name

    (column_definition1[, column_definition2, ...])

    [ENGINE = [olap|mysql|broker]]

    [key_desc]

    [COMMENT "table comment"];

    [partition_desc]

    [distribution_desc]

    [PROPERTIES ("key"="value", ...)]

    [BROKER PROPERTIES ("key"="value", ...)]
  1. 如果希望使用Dynamic Partition特性,需要在properties中指定
PROPERTIES (

	"dynamic_partition.time_unit" = "[DAY|WEEK|MONTH]",

	"dynamic_partition.end" = "integer_value",

	"dynamic_partition.prefix" = "string_value",

	"dynamic_partition.buckets" = "integer_value",

	["dynamic_partition.enable" = "false"]

)

dynamic_partition.time_unit: 用于指定动态添加分区的时间单位,可选择为DAY(天),WEEK(周),MONTH(月)
dynamic_partition.end: 用于指定提前创建的分区数量
dynamic_partition.prefix: 用于指定创建的分区名前缀
dynamic_partition.buckets: 用于指定自动创建的分区分桶数量

Example

CREATE TABLE example_db.dynamic_partition
    (
    k1 DATE,
    k2 INT,
    k3 SMALLINT,
    v1 VARCHAR(2048),
    v2 DATETIME DEFAULT "2014-02-04 15:36:00"
    )
    ENGINE=olap
    DUPLICATE KEY(k1, k2, k3)
    PARTITION BY RANGE (k1)
    (
    PARTITION p1 VALUES LESS THAN ("2014-01-01"),
    PARTITION p2 VALUES LESS THAN ("2014-06-01"),
    PARTITION p3 VALUES LESS THAN ("2014-12-01")
    )
    DISTRIBUTED BY HASH(k2) BUCKETS 32
    PROPERTIES(
    "storage_medium" = "SSD",
    "dynamic_partition.time_unit" = "DAY",
    "dynamic_partition.end" = "3",
    "dynamic_partition.prefix" = "p",
    "dynamic_partition.buckets" = "32"
     );

If the time to create this table is November 23, 2019, the following partitions will be create automatically

*************************** 4. row ***************************
             PartitionId: 13745
           PartitionName: p20191123
          VisibleVersion: 1
      VisibleVersionHash: 0
                   State: NORMAL
            PartitionKey: k1
                   Range: [types: [DATE]; keys: [2014-12-01]; ‥types: [DATE]; keys: [2019-11-24]; )
         DistributionKey: k2
                 Buckets: 32
          ReplicationNum: 3
           StorageMedium: HDD
            CooldownTime: 9999-12-31 23:59:59
LastConsistencyCheckTime: N/A
                DataSize: .000
*************************** 5. row ***************************
             PartitionId: 13874
           PartitionName: p20191124
          VisibleVersion: 1
      VisibleVersionHash: 0
                   State: NORMAL
            PartitionKey: k1
                   Range: [types: [DATE]; keys: [2019-11-24]; ‥types: [DATE]; keys: [2019-11-25]; )
         DistributionKey: k2
                 Buckets: 32
          ReplicationNum: 3
           StorageMedium: HDD
            CooldownTime: 9999-12-31 23:59:59
LastConsistencyCheckTime: N/A
                DataSize: .000
*************************** 6. row ***************************
             PartitionId: 14003
           PartitionName: p20191125
          VisibleVersion: 1
      VisibleVersionHash: 0
                   State: NORMAL
            PartitionKey: k1
                   Range: [types: [DATE]; keys: [2019-11-25]; ‥types: [DATE]; keys: [2019-11-26]; )
         DistributionKey: k2
                 Buckets: 32
          ReplicationNum: 3
           StorageMedium: HDD
            CooldownTime: 9999-12-31 23:59:59
LastConsistencyCheckTime: N/A
                DataSize: .000

DDL

语法:
ALTER TABLE [database.]table
alter_clause1[, alter_clause2, ...];

alter_clause 分为 partition 、rollup、schema change 、rename 和properties五种。

properties 支持如下几种修改方式

  1. 修改动态分区属性:
    语法:
    SET PROPERTIES("key" = "value", ...)
    说明:
    1)除了支持已有的dynamic_partition属性修改,当前还支持修改动态分区的开启或者关闭属性,例如:ALTER TABLE table_name SET PROPERTIES("dynamic_partition.enable" = "false"");

DML

SHOW DYNAMIC PARTITION TABLES

description

该语句用于展示当前db下所有的动态分区table状态
语法:
    SHOW DYANMIC PARTITION TABLES;

keyword

SHOW,DYNAMIC,PARTITION,TABLES
@morningman
Copy link
Contributor

These grammar looks good to me.

@imay imay added the kind/feature Categorizes issue or PR as related to a new feature. label Nov 22, 2019
@imay
Copy link
Contributor

imay commented Nov 22, 2019

@WingsGo This is a wonderful job, and I have some suggestion about this proposal.

  1. when give example, better to list what parititions will be created in some day.
  2. Should specify what column types this will support.
  3. If it supports int type or string type, what's the format it should be. timepstamp? or yyyyMMdd or yyyyMMddHHmmhh.

@WingsGo
Copy link
Contributor Author

WingsGo commented Nov 23, 2019

@WingsGo This is a wonderful job, and I have some suggestion about this proposal.

1. when give example, better to list what parititions will be created in some day.

2. Should specify what column types this will support.

3. If it supports int type or string type, what's the format it should be. timepstamp? or yyyyMMdd or yyyyMMddHHmmhh.

I think because doris only support int type , date type and datetime type as partition column, and the minimize time unit is day, what about unify the format as yyyyMMdd?

@imay
Copy link
Contributor

imay commented Nov 24, 2019

I think because doris only support int type , date type and datetime type as partition column, and the minimize time unit is day, what about unify the format as yyyyMMdd?

What I want to emphasize is that how to parse the content of int type, is there an option to specify this? For example, if users store timestamp value in an int column, it also can be used as a partition column, and the dynamic partition function can be applied to it. And user can also store value like "20190101" in int value. If we intent to support both two types, we should provide an option to user to tell Doris how to parse its value.

@WingsGo
Copy link
Contributor Author

WingsGo commented Nov 25, 2019

I think because doris only support int type , date type and datetime type as partition column, and the minimize time unit is day, what about unify the format as yyyyMMdd?

What I want to emphasize is that how to parse the content of int type, is there an option to specify this? For example, if users store timestamp value in an int column, it also can be used as a partition column, and the dynamic partition function can be applied to it. And user can also store value like "20190101" in int value. If we intent to support both two types, we should provide an option to user to tell Doris how to parse its value.

I think you mean that if user store timestamp such as '1546272000000', we should add partition like ALTER TABLE ADD PARTITION VALUES LESS THAN("1546358400000"). If user store value like "2019010100000000", we should add partition like ALTER TABLE ADD PARTITION VALUES LESS THAN("2019010200000000")?

@imay
Copy link
Contributor

imay commented Nov 25, 2019

I think you mean that if user store timestamp such as '1546272000000', we should add partition like ALTER TABLE ADD PARTITION VALUES LESS THAN("1546358400000"). If user store value like "2019010100000000", we should add partition like ALTER TABLE ADD PARTITION VALUES LESS THAN("2019010200000000")?

Yes, it is what I mean. If you don't intent to support this this time, it's better to reserve option to support it in the future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature.
Projects
None yet
Development

No branches or pull requests

4 participants