-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Description
Search before asking
- I searched in the issues and found nothing similar.
Motivation
my table detail
PARTITIONED BY (dt,a,b)
TBLPROPERTIES (
'chain-table.enabled' = 'true',
'partition.timestamp-formatter' = 'yyyyMMdd',
'partition.timestamp-pattern' = '$dt',
'primary-key' = 'dt,a,b,c,d');
Because it involves hundreds of millions of data row, i wish using PARTITIONED BY (dt,a,b) parition path for a and b column, to query more faster than using PARTITIONED BY (dt),
but i find using select count(1) from table where dt=20250101 result is 7000+
using select count(1) from table$branch_snapshot where dt=20250101 result is 16m+
using select count(1) from table$branch_delta where dt=20250102 result is 170k+
i assumed that chain table filter data by dt , and also by a 、b column
example
two table:
t1 and t1_only_dt
CREATE TABLE default.t1 (
`t1` string ,
`t2` string ,
`t3` string,
`dt` string
) PARTITIONED BY (dt,t1)
TBLPROPERTIES (
'chain-table.enabled' = 'true',
-- props about primary key table
'primary-key' = 'dt,t1,t3',
'sequence.field' = 't2',
'bucket' = '2',
-- props about partition
'partition.timestamp-pattern' = '$dt',
'partition.timestamp-formatter' = 'yyyyMMdd'
);
CALL sys.create_branch('default.t1', 'snapshot');
CALL sys.create_branch('default.t1', 'delta');
ALTER TABLE default.t1 SET tblproperties
('scan.fallback-snapshot-branch' = 'snapshot',
'scan.fallback-delta-branch' = 'delta');
ALTER TABLE `default`.`t1$branch_snapshot` SET tblproperties
('scan.fallback-snapshot-branch' = 'snapshot',
'scan.fallback-delta-branch' = 'delta');
ALTER TABLE `default`.`t1$branch_delta` SET tblproperties
('scan.fallback-snapshot-branch' = 'snapshot',
'scan.fallback-delta-branch' = 'delta');
The only difference between t1 and t1_only_dt is that t1_only_dt is: PARTITIONED BY (dt)
insert data sql
INSERT OVERWRITE `default`.`t1$branch_snapshot` PARTITION (dt = '20250810')
VALUES
('aaa', 'seq1', 'x'),
('bbb', 'seq2', 'y'),
('ccc', 'seq3', 'z1'),
('ccc', 'seq4', 'z2');
INSERT OVERWRITE `default`.`t1$branch_delta` PARTITION (dt = '20250811')
VALUES
('aaa', 'seq5', 'x_new'),
('ddd', 'seq6', 'w'),
('eee', 'seq7', 'm'),
('eee', 'seq8', 'n');
INSERT OVERWRITE `default`.`t1_only_dt$branch_snapshot` PARTITION (dt = '20250810')
VALUES
('aaa', 'seq1', 'x'),
('bbb', 'seq2', 'y'),
('ccc', 'seq3', 'z1'),
('ccc', 'seq4', 'z2');
INSERT OVERWRITE `default`.`t1_only_dt$branch_delta` PARTITION (dt = '20250811')
VALUES
('aaa', 'seq5', 'x_new'),
('ddd', 'seq6', 'w'),
('eee', 'seq7', 'm'),
('eee', 'seq8', 'n');
query result:
select *,"t1_only_dt" from `default`.`t1_only_dt` where dt = '20250811'
union all
select *,"t1" from `default`.`t1` where dt = '20250811';
aaa seq5 x_new 20250811 t1_only_dt
bbb seq2 y 20250811 t1_only_dt
aaa seq1 x 20250811 t1_only_dt
ccc seq3 z1 20250811 t1_only_dt
ccc seq4 z2 20250811 t1_only_dt
ddd seq6 w 20250811 t1_only_dt
eee seq7 m 20250811 t1_only_dt
eee seq8 n 20250811 t1_only_dt
aaa seq3 z1 20250811 t1
aaa seq4 z2 20250811 t1
ddd seq3 z1 20250811 t1
ddd seq4 z2 20250811 t1
eee seq3 z1 20250811 t1
eee seq4 z2 20250811 t1
Solution
adding a new table options like
chain-table.fallback-partition-keys , using this options to fallback and filtered data ,
not using partition keys
Anything else?
No response
Are you willing to submit a PR?
- I'm willing to submit a PR!