Skip to content

[Feature] when using chain table , not all partition key should be used to fallback and filter data #7503

@blackflash997997

Description

@blackflash997997

Search before asking

  • I searched in the issues and found nothing similar.

Motivation

my table detail

 PARTITIONED BY (dt,a,b)
TBLPROPERTIES (
  'chain-table.enabled' = 'true',
  'partition.timestamp-formatter' = 'yyyyMMdd',
  'partition.timestamp-pattern' = '$dt',
  'primary-key' = 'dt,a,b,c,d');

Because it involves hundreds of millions of data row, i wish using PARTITIONED BY (dt,a,b) parition path for a and b column, to query more faster than using PARTITIONED BY (dt),

but i find using select count(1) from table where dt=20250101 result is 7000+
using select count(1) from table$branch_snapshot where dt=20250101 result is 16m+
using select count(1) from table$branch_delta where dt=20250102 result is 170k+

i assumed that chain table filter data by dt , and also by a 、b column

example

two table:
t1 and t1_only_dt

CREATE TABLE default.t1 (
`t1` string ,
`t2` string ,
`t3` string,
`dt` string
) PARTITIONED BY (dt,t1)
TBLPROPERTIES (
'chain-table.enabled' = 'true',
-- props about primary key table  
'primary-key' = 'dt,t1,t3',
'sequence.field' = 't2',
'bucket' = '2',
-- props about partition
'partition.timestamp-pattern' = '$dt', 
'partition.timestamp-formatter' = 'yyyyMMdd'
);

CALL sys.create_branch('default.t1', 'snapshot');

CALL sys.create_branch('default.t1', 'delta');

ALTER TABLE default.t1 SET tblproperties 
('scan.fallback-snapshot-branch' = 'snapshot', 
'scan.fallback-delta-branch' = 'delta');

ALTER TABLE `default`.`t1$branch_snapshot` SET tblproperties
('scan.fallback-snapshot-branch' = 'snapshot',
'scan.fallback-delta-branch' = 'delta');

ALTER TABLE `default`.`t1$branch_delta` SET tblproperties 
('scan.fallback-snapshot-branch' = 'snapshot',
'scan.fallback-delta-branch' = 'delta');

The only difference between t1 and t1_only_dt is that t1_only_dt is: PARTITIONED BY (dt)

insert data sql

INSERT OVERWRITE `default`.`t1$branch_snapshot` PARTITION (dt = '20250810')
VALUES 
    ('aaa', 'seq1', 'x'),
    ('bbb', 'seq2', 'y'),
    ('ccc', 'seq3', 'z1'),
    ('ccc', 'seq4', 'z2'); 


INSERT OVERWRITE `default`.`t1$branch_delta` PARTITION (dt = '20250811')
VALUES 
    ('aaa', 'seq5', 'x_new'),  
    ('ddd', 'seq6', 'w'),      
    ('eee', 'seq7', 'm'),      
    ('eee', 'seq8', 'n');  


INSERT OVERWRITE `default`.`t1_only_dt$branch_snapshot` PARTITION (dt = '20250810')
VALUES 
    ('aaa', 'seq1', 'x'),
    ('bbb', 'seq2', 'y'),
    ('ccc', 'seq3', 'z1'),
    ('ccc', 'seq4', 'z2'); 


INSERT OVERWRITE `default`.`t1_only_dt$branch_delta` PARTITION (dt = '20250811')
VALUES 
    ('aaa', 'seq5', 'x_new'),  
    ('ddd', 'seq6', 'w'),      
    ('eee', 'seq7', 'm'),      
    ('eee', 'seq8', 'n');  

query result:

select *,"t1_only_dt" from `default`.`t1_only_dt`    where dt = '20250811'
union all
select *,"t1" from `default`.`t1`    where dt = '20250811'; 


aaa     seq5    x_new   20250811        t1_only_dt
bbb     seq2    y       20250811        t1_only_dt
aaa     seq1    x       20250811        t1_only_dt
ccc     seq3    z1      20250811        t1_only_dt
ccc     seq4    z2      20250811        t1_only_dt
ddd     seq6    w       20250811        t1_only_dt
eee     seq7    m       20250811        t1_only_dt
eee     seq8    n       20250811        t1_only_dt
aaa     seq3    z1      20250811        t1
aaa     seq4    z2      20250811        t1
ddd     seq3    z1      20250811        t1
ddd     seq4    z2      20250811        t1
eee     seq3    z1      20250811        t1
eee     seq4    z2      20250811        t1

Solution

adding a new table options like
chain-table.fallback-partition-keys , using this options to fallback and filtered data ,
not using partition keys

Anything else?

No response

Are you willing to submit a PR?

  • I'm willing to submit a PR!

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions