Skip to content

is_deleted for ReplacingMergeTree should delete the row during automatic merge #75100

@otan

Description

@otan

Company or project name

No response

Use case

is_deleted rows should automatically be removed from.the database for ReplacingMergeTree to simplify setup.

Describe the solution you'd like

From the docs, is_deleted rows do not get removed from clickhouse. Unfortunately the way the feature works at the moment is not really helpful. You're putting the onus back onto the developer that an is_deleted row still exists forever without manual intervention, and reclaiming that space requires some manual labour using a manual ALTER TABLE ... DELETE query (easy to forget, need to coordinate some async job process periodically or attach this on all delete operations) or an easy but expensive OPTIMIZE FINAL CLEANUP with experimental flags set.

I'm still not totally understanding of why you can't remove any is_deleted rows during the automatic merge process. I can see some complications around making sure you're considering all possible PKs when doing the merge, but it still seems possible at a smaller extra coordination cost.

I find these kinds of decisions make the concept leak everywhere on the developer side rather than on the database which should know how to handle it by the pros. As such, is it possible to handle this in ClickHouse?

Describe alternatives you've considered

Using OPTIMIZE FINAL CLEANUP is not recommended by you, manual deletion requires extra onus as per above.

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    featurenot plannedKnown issue, no plans to fix it currenlty

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions