Consider lightweight deleted rows when selecting parts to merge #57648

jewelzqiu · 2023-12-08T07:37:33Z

resolves #56728

Changelog category:

Improvement

Changelog entry:

Consider lightweight deleted rows when selecting parts to merge if enabled

CLAassistant · 2023-12-08T07:37:39Z

All committers have signed the CLA.

robot-clickhouse-ci-1 · 2023-12-08T20:55:56Z

This is an automated comment for commit fd46056 with description of existing statuses. It's updated for the latest CI running

❌ Click here to open a full report in a separate page

Successful checks

Check name	Description	Status
AST fuzzer	Runs randomly generated queries to catch program errors. The build type is optionally given in parenthesis. If it fails, ask a maintainer for help	✅ success
ClickBench	Runs [ClickBench](https://github.com/ClickHouse/ClickBench/) with instant-attach table	✅ success
ClickHouse build check	Builds ClickHouse in various configurations for use in further steps. You have to fix the builds that fail. Build logs often has enough information to fix the error, but you might have to reproduce the failure locally. The cmake options can be found in the build log, grepping for cmake. Use these options and follow the general build process	✅ success
Compatibility check	Checks that clickhouse binary runs on distributions with old libc versions. If it fails, ask a maintainer for help	✅ success
Docker image for servers	The check to build and optionally push the mentioned image to docker hub	✅ success
Docs check	There's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS	✅ success
Fast test	Normally this is the first check that is ran for a PR. It builds ClickHouse and runs most of stateless functional tests, omitting some. If it fails, further checks are not started until it is fixed. Look at the report to see which tests fail, then reproduce the failure locally as described here	✅ success
Flaky tests	Checks if new added or modified tests are flaky by running them repeatedly, in parallel, with more randomization. Functional tests are run 100 times with address sanitizer, and additional randomization of thread scheduling. Integrational tests are run up to 10 times. If at least once a new test has failed, or was too long, this check will be red. We don't allow flaky tests, read the doc	✅ success
Install packages	Checks that the built packages are installable in a clear environment	✅ success
Integration tests	The integration tests report. In parenthesis the package type is given, and in square brackets are the optional part/total tests	✅ success
Mergeable Check	Checks if all other necessary checks are successful	✅ success
Performance Comparison	Measure changes in query performance. The performance test report is described in detail here. In square brackets are the optional part/total tests	✅ success
SQLTest	There's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS	✅ success
SQLancer	Fuzzing tests that detect logical bugs with SQLancer tool	✅ success
Sqllogic	Run clickhouse on the sqllogic test set against sqlite and checks that all statements are passed	✅ success
Stateful tests	Runs stateful functional tests for ClickHouse binaries built in various configurations -- release, debug, with sanitizers, etc	✅ success
Stress test	Runs stateless functional tests concurrently from several clients to detect concurrency-related errors	✅ success
Style Check	Runs a set of checks to keep the code style clean. If some of tests failed, see the related log from the report	✅ success
Unit tests	Runs the unit tests for different release types	✅ success

Check name	Description	Status
CI running	A meta-check that indicates the running CI. Normally, it's in success or pending state. The failed status indicates some problems with the PR	⏳ pending
Stateless tests	Runs stateless functional tests for ClickHouse binaries built in various configurations -- release, debug, with sanitizers, etc	❌ failure
Upgrade check	Runs stress tests on server version from last release and then tries to upgrade it to the version from the PR. It checks if the new server can successfully startup without any errors, crashes or sanitizer asserts	❌ failure

jewelzqiu · 2023-12-13T05:39:00Z

@alexey-milovidov Hi, CI checks had all passed, could you please assign a reviewer for this PR?

src/Storages/MergeTree/IMergeTreeDataPart.cpp

src/Storages/MergeTree/MergeTreeSettings.h

alexey-milovidov · 2023-12-20T22:47:46Z

Sorry, this has been reverted because another reviewer (not me, I didn't read the code) has found an obvious race condition in the code. Let's fix this race condition and resubmit.

alexey-milovidov · 2023-12-20T22:48:50Z

Also, we have to add a test to make this race condition exposed.

yakov-olkhovskiy · 2023-12-20T22:57:18Z

@alexey-milovidov could you please specify what race condition we have here? do we have a note somewhere I missed?

davenger · 2023-12-20T23:35:35Z

Also, grabbing a mutex in readExistingRowsCount() and reading the whole _row_exists column from disk while holding it seems like a very dangerous approach. E.g. it can be called from ReplicatedMergeTreeQueue::shouldExecuteLogEntry() via part->getExistingBytesOnDisk(); and can negatively affect replication queue processing.
I think the proper way is to save the count of "existing" rows in part's metadata when part is created similar to total row count.

yakov-olkhovskiy · 2023-12-20T23:42:28Z

@davenger saving count is definitely better way of dealing with this but it requires metadata modification one way or another.
BTW do you know which race condition Alexey is talking about?

jewelzqiu · 2023-12-21T02:35:33Z

@davenger Actually my initial thought was to add an existing_count.txt (just like count.txt) before finishing mutation
As for existing parts, create existing_count.txt on loading process.
Do you think it is a better solution?

jewelzqiu · 2023-12-21T05:25:01Z

@davenger Actually my initial thought was to add an existing_count.txt (just like count.txt) before finishing mutation As for existing parts, create existing_count.txt on loading process. Do you think it is a better solution?

@alexey-milovidov @yakov-olkhovskiy what are your thoughts about this approach?

yakov-olkhovskiy · 2023-12-21T05:36:16Z

I think it's better than lazy calculation, but backward compatibility should be taken into account.

jewelzqiu · 2023-12-25T08:37:04Z

Sorry, this has been reverted because another reviewer (not me, I didn't read the code) has found an obvious race condition in the code. Let's fix this race condition and resubmit.

@alexey-milovidov Could you please provide more information about the race condition in this PR?

alexey-milovidov · 2023-12-25T12:33:23Z

I don't know what the race condition is, but my colleagues told me that it is there... I will clarify.

jewelzqiu · 2023-12-26T02:05:25Z

I don't know what the race condition is, but my colleagues told me that it is there... I will clarify.

@alexey-milovidov thanks

alexey-milovidov · 2023-12-30T18:39:35Z

They said it was around const bool & is_merge

jewelzqiu · 2024-01-02T02:33:21Z

They said it was around const bool & is_merge

I don't see any race conditions here... @yakov-olkhovskiy Do you have any clue?

alexey-milovidov · 2024-01-02T13:18:09Z

@jewelzqiu, My colleagues said it is related to existing_rows_count.

jewelzqiu · 2024-01-03T02:46:34Z

@jewelzqiu, My colleagues said it is related to existing_rows_count.

@alexey-milovidov Thanks, I got it, looks like existing_rows_count can be read before lock acquired.
I have created another PR with different approach as discussed above in this PR: #58223
In the new PR existing_rows_count will only be set once before part became active
cc @yakov-olkhovskiy @davenger

alexey-milovidov added the can be tested Allows running workflows for external contributors label Dec 8, 2023

robot-clickhouse-ci-1 added the pr-improvement Pull request with some product improvements label Dec 8, 2023

jewelzqiu force-pushed the refine-lwd-merge branch from 76bec64 to ac00f6e Compare December 12, 2023 03:02

Consider lightweight deleted rows when selecting parts to merge

e34c13b

jewelzqiu force-pushed the refine-lwd-merge branch from ac00f6e to e34c13b Compare December 12, 2023 03:40

yakov-olkhovskiy self-assigned this Dec 15, 2023

yakov-olkhovskiy reviewed Dec 16, 2023

View reviewed changes

src/Storages/MergeTree/IMergeTreeDataPart.cpp Show resolved Hide resolved

src/Storages/MergeTree/IMergeTreeDataPart.cpp Show resolved Hide resolved

src/Storages/MergeTree/MergeTreeSettings.h Outdated Show resolved Hide resolved

fix setting description

fd46056

yakov-olkhovskiy approved these changes Dec 19, 2023

View reviewed changes

alexey-milovidov merged commit af32b33 into ClickHouse:master Dec 20, 2023
257 of 265 checks passed

alexey-milovidov mentioned this pull request Dec 20, 2023

Revert "Consider lightweight deleted rows when selecting parts to merge" #58097

Merged

jewelzqiu deleted the refine-lwd-merge branch December 26, 2023 05:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consider lightweight deleted rows when selecting parts to merge #57648

Consider lightweight deleted rows when selecting parts to merge #57648

jewelzqiu commented Dec 8, 2023

CLAassistant commented Dec 8, 2023 •

edited

robot-clickhouse-ci-1 commented Dec 8, 2023 •

edited by robot-clickhouse

jewelzqiu commented Dec 13, 2023

alexey-milovidov commented Dec 20, 2023

alexey-milovidov commented Dec 20, 2023

yakov-olkhovskiy commented Dec 20, 2023

davenger commented Dec 20, 2023

yakov-olkhovskiy commented Dec 20, 2023

jewelzqiu commented Dec 21, 2023

jewelzqiu commented Dec 21, 2023

yakov-olkhovskiy commented Dec 21, 2023

jewelzqiu commented Dec 25, 2023

alexey-milovidov commented Dec 25, 2023

jewelzqiu commented Dec 26, 2023

alexey-milovidov commented Dec 30, 2023

jewelzqiu commented Jan 2, 2024

alexey-milovidov commented Jan 2, 2024

jewelzqiu commented Jan 3, 2024

Consider lightweight deleted rows when selecting parts to merge #57648

Consider lightweight deleted rows when selecting parts to merge #57648

Conversation

jewelzqiu commented Dec 8, 2023

Changelog category:

Changelog entry:

CLAassistant commented Dec 8, 2023 • edited

robot-clickhouse-ci-1 commented Dec 8, 2023 • edited by robot-clickhouse

jewelzqiu commented Dec 13, 2023

alexey-milovidov commented Dec 20, 2023

alexey-milovidov commented Dec 20, 2023

yakov-olkhovskiy commented Dec 20, 2023

davenger commented Dec 20, 2023

yakov-olkhovskiy commented Dec 20, 2023

jewelzqiu commented Dec 21, 2023

jewelzqiu commented Dec 21, 2023

yakov-olkhovskiy commented Dec 21, 2023

jewelzqiu commented Dec 25, 2023

alexey-milovidov commented Dec 25, 2023

jewelzqiu commented Dec 26, 2023

alexey-milovidov commented Dec 30, 2023

jewelzqiu commented Jan 2, 2024

alexey-milovidov commented Jan 2, 2024

jewelzqiu commented Jan 3, 2024

CLAassistant commented Dec 8, 2023 •

edited

robot-clickhouse-ci-1 commented Dec 8, 2023 •

edited by robot-clickhouse