Skip to content

icebergS3Cluster adds deleted rows #88287

@PavelkoSemen

Description

@PavelkoSemen

Company or project name

No response

Describe what's wrong

If you use icebergS3, everything works correctly, as soon as I start using icebergS3Cluster, I see duplicates by keys. On the Iceberg tables side, this row is marked as deleted.

Does it reproduce on the most recent release?

Yes

How to reproduce

Which ClickHouse server version to use:
25.8.4.13
Queries to run that lead to unexpected result:
TRINO:

create table test.test.test_ch_cluster(
   dataflow_dttm timestamp,
   id integer
);

insert into test.test.test_ch_cluster values 
(current_timestamp,1)


update test.test.test_ch_cluster set dataflow_dttm = current_timestamp;

CH:

SELECT * FROM icebergS3Cluster(standard_cluster,  NAMED_COLLECTION , url='https://s3a', filename ='table')
Image
SELECT * FROM icebergS3(NAMED_COLLECTION , url='https://s3a', filename ='table')
Image

Expected behavior

No response

Error message and/or stacktrace

No response

Additional context

No response

Metadata

Metadata

Assignees

Labels

bugConfirmed user-visible misbehaviour in official releasecomp-datalakeData lake table formats (Iceberg/Delta/Hudi) integration.potential bugTo be reviewed by developers and confirmed/rejected.

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions