-
Notifications
You must be signed in to change notification settings - Fork 1.3k
[Bug] Delete data is not timely #4767
Copy link
Copy link
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Search before asking
- I searched in the issues and found nothing similar.
Paimon version
0.8
Compute Engine
spark
Minimal reproduce step
In our company's business, users will first delete data older than 30 days for a non-partitioned table, and then insert new data to update with the smaller sequence.field, however, the data will not be written because the deleted record's sequence.filed is more smaller. We must perform the full compaction after delete operation. And the full compaction is expensive for large data tables.
Is there any way to ensure that the delete operation is timely?
create table test_tb (
`req_id` STRING,
`ad_id` STRING,
`info` STRING,
`dt_seconds_asc` BIGINT
)USING paimon
TBLPROPERTIES(
'bucket' = '1',
'file.compression' = 'ZSTD',
'file.format' = 'PARQUET',
'primary-key' = 'req_id,ad_id',
'sequence.field' = 'dt_seconds_asc');
insert into test_tb values('a', 'b', 'info-1', 100);
delete from test_tb where dt_seconds_asc < 200;
insert into test_tb values('a', 'b', 'info-1', 50);
// audit log still -D rowkind, insert data '50' is useless
select * from `test_tb$audit_log`;
OK
rowkind req_id ad_id info dt_seconds_asc
-D a b info-1 100
// result is null
select * from test_tb;
What doesn't meet your expectations?
Deleted data will not be included in the sequence.field comparison
Anything else?
No response
Are you willing to submit a PR?
- I'm willing to submit a PR!
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working