Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance of SELECTs with active mutations #59531

Merged
merged 4 commits into from
Feb 22, 2024

Conversation

azat
Copy link
Collaborator

@azat azat commented Feb 2, 2024

Changelog category (leave one):

  • Performance Improvement

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Improve performance of SELECTs with active mutations

getAlterMutationCommandsForPart() can be a hot path for query execution when there are pending mutations.

  • LOG_TEST - it is not only check one bool, but actually a bunch of atomics as well.

  • Return std::vector over std::map (map is not required there) - no changes in performance.

  • Copy only RENAME_COLUMN (since only this mutation is required by AlterConversions).

And here are results:

run result
SELECT w/o ALTER queries: 1565, QPS: 355.259, RPS: 355.259
SELECT w/ ALTER unpatched queries: 2099, QPS: 220.623, RPS: 220.623
SELECT w/ ALTER and w/o LOG_TEST queries: 2730, QPS: 235.859, RPS: 235.859
SELECT w/ ALTER and w/o LOG_TEST and w/ RENAME_COLUMN only queries: 2995, QPS: 290.982, RPS: 290.982

But there are still room for improvements, at least MergeTree engines could implement getStorageSnapshotForQuery().

@azat azat changed the title Improve performance of SELECTs with active mutations [RFC] Improve performance of SELECTs with active mutations Feb 2, 2024
@robot-ch-test-poll robot-ch-test-poll added the pr-performance Pull request with some performance improvements label Feb 2, 2024
@robot-ch-test-poll
Copy link
Contributor

robot-ch-test-poll commented Feb 2, 2024

This is an automated comment for commit d78f760 with description of existing statuses. It's updated for the latest CI running

❌ Click here to open a full report in a separate page

Successful checks
Check nameDescriptionStatus
AST fuzzerRuns randomly generated queries to catch program errors. The build type is optionally given in parenthesis. If it fails, ask a maintainer for help✅ success
ClickBenchRuns [ClickBench](https://github.com/ClickHouse/ClickBench/) with instant-attach table✅ success
ClickHouse build checkBuilds ClickHouse in various configurations for use in further steps. You have to fix the builds that fail. Build logs often has enough information to fix the error, but you might have to reproduce the failure locally. The cmake options can be found in the build log, grepping for cmake. Use these options and follow the general build process✅ success
Compatibility checkChecks that clickhouse binary runs on distributions with old libc versions. If it fails, ask a maintainer for help✅ success
Docker keeper imageThe check to build and optionally push the mentioned image to docker hub✅ success
Docker server imageThe check to build and optionally push the mentioned image to docker hub✅ success
Docs checkBuilds and tests the documentation✅ success
Fast testNormally this is the first check that is ran for a PR. It builds ClickHouse and runs most of stateless functional tests, omitting some. If it fails, further checks are not started until it is fixed. Look at the report to see which tests fail, then reproduce the failure locally as described here✅ success
Flaky testsChecks if new added or modified tests are flaky by running them repeatedly, in parallel, with more randomization. Functional tests are run 100 times with address sanitizer, and additional randomization of thread scheduling. Integrational tests are run up to 10 times. If at least once a new test has failed, or was too long, this check will be red. We don't allow flaky tests, read the doc✅ success
Install packagesChecks that the built packages are installable in a clear environment✅ success
Mergeable CheckChecks if all other necessary checks are successful✅ success
Performance ComparisonMeasure changes in query performance. The performance test report is described in detail here. In square brackets are the optional part/total tests✅ success
SQLTestThere's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS✅ success
SQLancerFuzzing tests that detect logical bugs with SQLancer tool✅ success
SqllogicRun clickhouse on the sqllogic test set against sqlite and checks that all statements are passed✅ success
Stateful testsRuns stateful functional tests for ClickHouse binaries built in various configurations -- release, debug, with sanitizers, etc✅ success
Stateless testsRuns stateless functional tests for ClickHouse binaries built in various configurations -- release, debug, with sanitizers, etc✅ success
Style checkRuns a set of checks to keep the code style clean. If some of tests failed, see the related log from the report✅ success
Unit testsRuns the unit tests for different release types✅ success
Check nameDescriptionStatus
A SyncThere's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS❌ failure
CI runningA meta-check that indicates the running CI. Normally, it's in success or pending state. The failed status indicates some problems with the PR⏳ pending
Integration testsThe integration tests report. In parenthesis the package type is given, and in square brackets are the optional part/total tests❌ failure
Stress testRuns stateless functional tests concurrently from several clients to detect concurrency-related errors❌ failure
Upgrade checkRuns stress tests on server version from last release and then tries to upgrade it to the version from the PR. It checks if the new server can successfully startup without any errors, crashes or sanitizer asserts❌ failure

@al13n321 al13n321 self-assigned this Feb 9, 2024
Copy link
Member

@al13n321 al13n321 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -1378,7 +1378,7 @@ class MergeTreeData : public IStorage, public WithMutableContext
/// Used to receive AlterConversions for part and apply them on fly. This
/// method has different implementations for replicated and non replicated
/// MergeTree because they store mutations in different way.
virtual std::map<int64_t, MutationCommands> getAlterMutationCommandsForPart(const DataPartPtr & part) const = 0;
virtual std::vector<MutationCommands> getAlterMutationCommandsForPart(const DataPartPtr & part) const = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding a comment about whether the returned vector is sorted. Looks like it isn't, and the caller doesn't care about the order? Also mention that only RENAME_COLUMN are included.

(I'm confused by this. I would've expected the order to matter: if a column was renamed twice A -> B -> C then it wouldn't work if we tried to do B -> C before A -> B. But I see that AlterConversions doesn't handle that. I guess such chain-rename is just not handled correctly here? It's also weird that getAlterMutationCommandsForPart() doesn't take a metadata snapshot, and instead returns commands to promote the part all the way to the latest schema; I guess that's also incorrect? If it used a metadata snapshot, maybe the whole thing would be correct because at most one ALTER can be in progress at a time, and dependent renames are not allowed within one ALTER. This is all just speculation, I'm not familiar with this code.)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also mention that only RENAME_COLUMN are included.

Ok, I've added the comment.

I would've expected the order to matter: if a column was renamed twice A -> B -> C then it wouldn't work if we tried to do B -> C before A -> B. But I see that AlterConversions doesn't handle that.

It don't need to, since only one ALTER_METADATA can be executed concurrently, so if you have two ALTER that modifies metadata the first one should be finished completely, only after that the second can proceed -

if (!alter_sequence.canExecuteMetaAlter(entry.alter_version, state_lock))
.

It's also weird that getAlterMutationCommandsForPart() doesn't take a metadata snapshot, and instead returns commands to promote the part all the way to the latest schema; I guess that's also incorrect? If it used a metadata snapshot, maybe the whole thing would be correct because at most one ALTER can be in progress at a time, and dependent renames are not allowed within one ALTER

This code is required to handle RENAME COLUMN that had not been finished yet, while the metadata had been already updated, so when query contains new names it can transform new names to old ones, to find this data on the disk. Metadata snapshot does not have information about mutations so it cannot be used here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does anything prevent the following scenario?:

  1. A SELECT calls StorageMergeTree::read().
  2. StorageMergeTree::read() grabs the current list of active parts. Say, it has one part: all_1_1_1, with one column A.
  3. ALTER RENAME COLUMN A TO B starts and finishes.
  4. ALTER RENAME COLUMN B TO C starts and finishes.
  5. The SELECT from step 1 calls getAlterMutationCommandsForPart() for part all_1_1_1.
  6. getAlterMutationCommandsForPart() returns both renames: A->B, B->C.
  7. AlterConversions gets confused (with or without this PR), and the SELECT fails.
  8. Even if AlterConversions could handle A->B->C correctly, this SELECT would still fail because it expects data to match the schema from the metadata snapshot obtained in step 1, so column C would be unexpected.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Non-mutated part (all_1_1_1) should not have any mutations (i.e. getAlterConversionsForPart should return nothing), so it should be OK.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getAlterConversionsForPart() -> getAlterMutationCommandsForPart() take all mutations that happened in the table (current_mutations_by_version) after this part was created (part_data_version). For all_1_1_1 that would be both of the renames. On SELECT path, getAlterConversionsForPart() is called from MergeTreeData::getStorageSnapshot().

Reproduced this scenario by adding a sleep in the middle of MergeTreeData::getStorageSnapshot() and doing ALTER while SELECT is stuck. The SELECT fails even with one ALTER, presumably because the metadata snapshot expects the old column name, while AlterConversions produce the new name:

:) desc (select * from a); select * from a;

DESCRIBE TABLE
(
    SELECT *
    FROM a
)

Query id: 0af8ba13-838c-4a21-9337-c0a1c2800636

┌─name─┬─type──┬─default_type─┬─default_expression─┬─comment─┬─codec_expression─┬─ttl_expression─┐
│ k    │ Int64 │              │                    │         │                  │                │
│ x    │ Int64 │              │                    │         │                  │                │
└──────┴───────┴──────────────┴────────────────────┴─────────┴──────────────────┴────────────────┘

2 rows in set. Elapsed: 0.001 sec. 


SELECT *
FROM a

Query id: b42a9aca-0d37-43bc-92f8-2c001efd203f


Elapsed: 10.014 sec. 

Received exception from server (version 24.2.1):
Code: 47. DB::Exception: Received from localhost:9000. DB::Exception: Missing columns: 'z' while processing query: 'SELECT k, z FROM a', required columns: 'k' 'z', maybe you meant: 'k' or 'z'. (UNKNOWN_IDENTIFIER)

(Also directly confirmed that the getAlterMutationCommandsForPart() returns 2 commands in this case, if there were 2 ALTERs.)

So I guess my understanding was correct.

This is all mostly unrelated to this PR, I just wanted to figure out if this code is as broken as it looks.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, yes, you are right, actually my comment was wrong it should be the opposite all_1_1_1 should have mutations, so yes, the SELECT will fail.

I guess this can be fixed by looking at which metadata version had bee used for the SELECT, but this is completely different story...

@azat azat force-pushed the alter-select-throughtput branch 3 times, most recently from 9531690 to 1253a73 Compare February 11, 2024 14:53
@azat
Copy link
Collaborator Author

azat commented Feb 11, 2024

AST fuzzer (asan) — Logical error: 'Cannot capture column 2 because it has incompatible type: got UInt8, but Nullable(UInt8) is expected.'.

Stateless tests (debug) [2/5] — fail: 3, passed: 1174, skipped: 10

Timeouts:

  • 02782_uniq_exact_parallel_merging_bug
  • 01459_manual_write_to_replicas
  • 01171_mv_select_insert_isolation_long

@azat azat changed the title [RFC] Improve performance of SELECTs with active mutations Improve performance of SELECTs with active mutations Feb 12, 2024
@azat azat requested a review from al13n321 February 15, 2024 11:59
@al13n321
Copy link
Member

Oops, turns out the cloud version of CH actually implements other mutation types in AlterConversions, not just RENAME_COLUMN (https://clickhouse.com/docs/en/guides/developer/lightweight-update).

I pushed a commit to this PR that adds AlterConversions::supportsMutationCommandType() and uses it instead of directly checking for RENAME_COLUMN in getAlterMutationCommandsForPart(), to make it less error-prone to maintain. And also flattened vector<MutationCommands> to just MutationCommands along the way. Feel free to review the changes, or revert them and push something else, or whatever.

@azat
Copy link
Collaborator Author

azat commented Feb 16, 2024

Thanks, LGTM! Though clang-tidy fails:

Feb 16 03:10:25 /build/src/Storages/MergeTree/ReplicatedMergeTreeQueue.cpp:1814:41: error: Dereference of undefined pointer value [clang-analyzer-core.NullDereference,-warnings-as-errors]
Feb 16 03:10:25 1814 | for (const auto & command : mutation_status->entry->commands | std::views::reverse)

@al13n321
Copy link
Member

Huh, that warning makes absolutely no sense to me. Shuffled the code a little to work around it, guess it's a clang issue.

azat and others added 4 commits February 21, 2024 12:54
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
getAlterMutationCommandsForPart() can be a hot path for query execution
when there are pending mutations.

- LOG_TEST - it is not only check one bool, but actually a bunch of
  atomics as well.

- Return std::vector over std::map (map is not required there) - no
  changes in performance.

- Copy only RENAME_COLUMN (since only this mutation is required by
  AlterConversions).

And here are results:

run|result
-|-
SELECT w/o ALTER|queries: 1565, QPS: 355.259, RPS: 355.259
SELECT w/ ALTER unpatched|queries: 2099, QPS: 220.623, RPS: 220.623
SELECT w/ ALTER and w/o LOG_TEST|queries: 2730, QPS: 235.859, RPS: 235.859
SELECT w/ ALTER and w/o LOG_TEST and w/ RENAME_COLUMN only|queries: 2995, QPS: 290.982, RPS: 290.982

But there are still room for improvements, at least MergeTree engines
could implement getStorageSnapshotForQuery().

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
@al13n321 al13n321 merged commit a4f765c into ClickHouse:master Feb 22, 2024
232 of 260 checks passed
@azat azat deleted the alter-select-throughtput branch February 22, 2024 08:52
@robot-ch-test-poll3 robot-ch-test-poll3 added the pr-synced-to-cloud The PR is synced to the cloud repo label Feb 22, 2024
baibaichen added a commit to Kyligence/gluten that referenced this pull request Feb 23, 2024
baibaichen added a commit to Kyligence/gluten that referenced this pull request Feb 23, 2024
baibaichen added a commit to Kyligence/gluten that referenced this pull request Feb 23, 2024
baibaichen added a commit to Kyligence/gluten that referenced this pull request Feb 23, 2024
baibaichen added a commit to Kyligence/gluten that referenced this pull request Feb 24, 2024
baibaichen added a commit to Kyligence/gluten that referenced this pull request Feb 25, 2024
(cherry picked from commit 666208f)
(cherry picked from commit 60f3415)
baibaichen added a commit to Kyligence/gluten that referenced this pull request Feb 25, 2024
(cherry picked from commit 666208f)
(cherry picked from commit 60f3415)
(cherry picked from commit f073c5b)
zzcclp pushed a commit to apache/incubator-gluten that referenced this pull request Feb 25, 2024
* [GLUTEN-1632][CH]Daily Update Clickhouse Version (20240225)

* Add custom target libch to avoid name conflict(due to ClickHouse/ClickHouse#58609)

(cherry picked from commit 0f6281d)
(cherry picked from commit 46c4f8f)

* Revert workaround for fixing bug introduced by ClickHouse/ClickHouse#58886 since ClickHouse/ClickHouse#59911 reverts #58886

(cherry picked from commit 7708bb3)
(cherry picked from commit 4da4f72)

* Fix array_max and array_min due to ClickHouse/ClickHouse#60188

(cherry picked from commit e576f70)
(cherry picked from commit d032fde)

* Fix build CustomStorageMergeTree.h due to ClickHouse/ClickHouse#59531

(cherry picked from commit 666208f)
(cherry picked from commit 60f3415)
(cherry picked from commit f073c5b)

* Fix build CustomStorageMergeTree.cpp due to ClickHouse/ClickHouse#60159

(cherry picked from commit 7cf074a)
(cherry picked from commit a797ba7)

* Fix ActionDAG bug introduced by ClickHouse/ClickHouse#58554

(cherry picked from commit 3432fda)
(cherry picked from commit 066271703498828fa98b31d75bc2dbc27967f78b)
(cherry picked from commit 8d78221)
(cherry picked from commit 3d34c62)

---------

Co-authored-by: kyligence-git <gluten@kyligence.io>
Co-authored-by: Chang Chen <baibaichen@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr-performance Pull request with some performance improvements pr-synced-to-cloud The PR is synced to the cloud repo
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants