Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve CHECK TABLE system query #52745

Merged
merged 7 commits into from Aug 10, 2023
Merged

Conversation

vdimir
Copy link
Member

@vdimir vdimir commented Jul 28, 2023

Changelog category (leave one):

  • Improvement

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

  • QueryCHECK TABLE has better performance and usability (sends progress updates, cancellable)

TODO

  • Document this query
  • Add test for progress and cancellation

@vdimir vdimir force-pushed the vdimir/check_table_improvements branch from a4b2a73 to d515b6f Compare July 28, 2023 16:05
@robot-clickhouse-ci-2 robot-clickhouse-ci-2 added the pr-improvement Pull request with some product improvements label Jul 28, 2023
@robot-clickhouse-ci-2
Copy link
Contributor

robot-clickhouse-ci-2 commented Jul 28, 2023

This is an automated comment for commit 1183dac with description of existing statuses. It's updated for the latest CI running
The full report is available here
The overall status of the commit is 🔴 failure

Check nameDescriptionStatus
AST fuzzerRuns randomly generated queries to catch program errors. The build type is optionally given in parenthesis. If it fails, ask a maintainer for help🟢 success
CI runningA meta-check that indicates the running CI. Normally, it's in success or pending state. The failed status indicates some problems with the PR🟢 success
ClickHouse build checkBuilds ClickHouse in various configurations for use in further steps. You have to fix the builds that fail. Build logs often has enough information to fix the error, but you might have to reproduce the failure locally. The cmake options can be found in the build log, grepping for cmake. Use these options and follow the general build process🟢 success
Compatibility checkChecks that clickhouse binary runs on distributions with old libc versions. If it fails, ask a maintainer for help🟢 success
Docker image for serversThe check to build and optionally push the mentioned image to docker hub🟢 success
Docs CheckBuilds and tests the documentation🟢 success
Fast testNormally this is the first check that is ran for a PR. It builds ClickHouse and runs most of stateless functional tests, omitting some. If it fails, further checks are not started until it is fixed. Look at the report to see which tests fail, then reproduce the failure locally as described here🟢 success
Flaky testsChecks if new added or modified tests are flaky by running them repeatedly, in parallel, with more randomization. Functional tests are run 100 times with address sanitizer, and additional randomization of thread scheduling. Integrational tests are run up to 10 times. If at least once a new test has failed, or was too long, this check will be red. We don't allow flaky tests, read the doc🟢 success
Install packagesChecks that the built packages are installable in a clear environment🟢 success
Integration testsThe integration tests report. In parenthesis the package type is given, and in square brackets are the optional part/total tests🟢 success
Mergeable CheckChecks if all other necessary checks are successful🟢 success
Performance ComparisonMeasure changes in query performance. The performance test report is described in detail here. In square brackets are the optional part/total tests🔴 failure
Push to DockerhubThe check for building and pushing the CI related docker images to docker hub🟢 success
SQLancerFuzzing tests that detect logical bugs with SQLancer tool🟢 success
SqllogicRun clickhouse on the sqllogic test set against sqlite and checks that all statements are passed🟢 success
Stateful testsRuns stateful functional tests for ClickHouse binaries built in various configurations -- release, debug, with sanitizers, etc🟢 success
Stateless testsRuns stateless functional tests for ClickHouse binaries built in various configurations -- release, debug, with sanitizers, etc🟢 success
Stress testRuns stateless functional tests concurrently from several clients to detect concurrency-related errors🟢 success
Style CheckRuns a set of checks to keep the code style clean. If some of tests failed, see the related log from the report🟢 success
Unit testsRuns the unit tests for different release types🟢 success
Upgrade checkRuns stress tests on server version from last release and then tries to upgrade it to the version from the PR. It checks if the new server can successfully startup without any errors, crashes or sanitizer asserts🟢 success

@nikitamikhaylov
Copy link
Member

I'm not sure, but I thought this query is single-threaded.. Is it true? Can we make it parallelized?

@alesapin alesapin self-assigned this Jul 31, 2023
Copy link
Member

@alesapin alesapin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we have tests for progress, maybe it make sense to add it here.

src/Interpreters/InterpreterCheckQuery.cpp Outdated Show resolved Hide resolved
src/Interpreters/InterpreterCheckQuery.cpp Outdated Show resolved Hide resolved
src/Interpreters/InterpreterCheckQuery.cpp Outdated Show resolved Hide resolved

check_results.push_back(check_result);

bool should_continue = check_result.success || !check_query_single_value_result;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't it a semantic change? I think we should process all data parts regardless of check_query_single_value_result.

Copy link
Member Author

@vdimir vdimir Aug 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assumed it was good to have a mode with early exit and thought we could do it since the user couldn't know how many errors there were with single-value result. However, it's doubtful, so I returned it to the original behavior.

src/Interpreters/InterpreterCheckQuery.cpp Outdated Show resolved Hide resolved
@vdimir vdimir force-pushed the vdimir/check_table_improvements branch 3 times, most recently from 75948b2 to 5f14590 Compare August 1, 2023 13:08
@vdimir
Copy link
Member Author

vdimir commented Aug 1, 2023

@alesapin

I updated the implementation to build a more sophisticated pipeline and use its features properly.

It allowed to eliminate std::async and sleep entirely and execute everything in the pipeline executor pool. And also query is naturally parallelized like any other query.

I split checkData into two methods:
The first method obtains some container with a list of work to check table (parts or files depending on an engine). And the second executes one entry from this list.
We retrieve all sets of parts to check and then simultaneously execute entries from it by several processors.

I also added a doc (we have this query documented, so I added more information and examples) and added a test for progress.

@vdimir vdimir force-pushed the vdimir/check_table_improvements branch from 5f14590 to de315d3 Compare August 1, 2023 14:28
@vdimir vdimir requested a review from alesapin August 1, 2023 14:28
@vdimir vdimir force-pushed the vdimir/check_table_improvements branch from de315d3 to bb22394 Compare August 2, 2023 09:37
src/Interpreters/InterpreterCheckQuery.cpp Outdated Show resolved Hide resolved

Chunk generate() override
{
if (is_valuer_emitted.exchange(true))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we return chunk only once, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, only one block with single row is expected in the result

src/Storages/StorageReplicatedMergeTree.cpp Outdated Show resolved Hide resolved
@vdimir vdimir force-pushed the vdimir/check_table_improvements branch from bb22394 to ceaee43 Compare August 4, 2023 09:22
@vdimir
Copy link
Member Author

vdimir commented Aug 7, 2023

Docs Check — Found errors in docs Details


[ERROR] Error: Docusaurus found broken links!

Please check the pages of your site in the list below, and make sure you don't reference any path that does not exist.
Note: it's possible to ignore broken links with the 'onBrokenLinks' Docusaurus configuration, and let the build pass.

Exhaustive list of all broken links found:

- On source page path = /docs/en/sql-reference/statements/check-table:
   -> linking to ../../operations/settings/settings.md#settings-max_threads (resolved as: /docs/en/operations/settings/settings.md)

- On source page path = /docs/zh/sql-reference/statements/check-table:
   -> linking to ../../operations/settings/settings.md#settings-max_threads (resolved as: /docs/zh/operations/settings/settings.md)

The path ../../operations/settings/settings.md#settings-max_threads should be ok

$ cd docs/en/sql-reference/statements
$ cat ../../operations/settings/settings.md | grep -F settings-max_threads
## max_threads {#settings-max_threads}
Parallel `INSERT SELECT` has effect only if the `SELECT` part is executed in parallel, see [max_threads](#settings-max_threads) setting.
$ 

@vdimir vdimir force-pushed the vdimir/check_table_improvements branch from ceaee43 to dc98d33 Compare August 7, 2023 11:57
@vdimir vdimir force-pushed the vdimir/check_table_improvements branch from dc98d33 to 1183dac Compare August 8, 2023 09:45
@alesapin alesapin merged commit 57025ee into master Aug 10, 2023
278 of 279 checks passed
@alesapin alesapin deleted the vdimir/check_table_improvements branch August 10, 2023 09:48
@tavplubix
Copy link
Member

vdimir added a commit that referenced this pull request Aug 14, 2023
vdimir added a commit that referenced this pull request Oct 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr-improvement Pull request with some product improvements
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants