Skip to content

[Bug] Delete data is not timely #4767

@wkang13579

Description

@wkang13579

Search before asking

  • I searched in the issues and found nothing similar.

Paimon version

0.8

Compute Engine

spark

Minimal reproduce step

In our company's business, users will first delete data older than 30 days for a non-partitioned table, and then insert new data to update with the smaller sequence.field, however, the data will not be written because the deleted record's sequence.filed is more smaller. We must perform the full compaction after delete operation. And the full compaction is expensive for large data tables.

Is there any way to ensure that the delete operation is timely?

create table test_tb (
`req_id` STRING,
`ad_id` STRING,
`info` STRING,
`dt_seconds_asc` BIGINT
)USING paimon
TBLPROPERTIES(
  'bucket' = '1',
  'file.compression' = 'ZSTD',
  'file.format' = 'PARQUET',
  'primary-key' = 'req_id,ad_id',
  'sequence.field' = 'dt_seconds_asc');

insert into test_tb values('a', 'b', 'info-1', 100);

delete from test_tb where dt_seconds_asc < 200;

insert into test_tb values('a', 'b', 'info-1', 50);

// audit log still -D rowkind, insert data '50' is useless
select * from `test_tb$audit_log`;
OK
rowkind req_id  ad_id info  dt_seconds_asc
-D  a b info-1  100

// result is null
select * from test_tb;

What doesn't meet your expectations?

Deleted data will not be included in the sequence.field comparison

Anything else?

No response

Are you willing to submit a PR?

  • I'm willing to submit a PR!

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions