Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

checksum: do not check inverted index files #47607

Merged
merged 2 commits into from Mar 16, 2023

Conversation

save-my-heart
Copy link
Contributor

MergeTreeDataPartChecksums::checkSize will check file size while loading parts, but inverted index files only have default MergeTreeDataPartChecksum(file_size = 0), these parts will fail to load.

Changelog category (leave one):

  • Not for changelog (changelog entry is not required)

Information about CI checks: https://clickhouse.com/docs/en/development/continuous-integration/

@robot-ch-test-poll3 robot-ch-test-poll3 added the pr-not-for-changelog This PR should not be mentioned in the changelog label Mar 15, 2023
@rschu1ze rschu1ze self-assigned this Mar 15, 2023
@rschu1ze rschu1ze added the can be tested Allows running workflows for external contributors label Mar 15, 2023
@@ -46,6 +46,10 @@ void MergeTreeDataPartChecksum::checkEqual(const MergeTreeDataPartChecksum & rhs

void MergeTreeDataPartChecksum::checkSize(const IDataPartStorage & storage, const String & name) const
{
/// Do not check inverted index files
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please move l. 50+51 down to l. 55? (the exception for inverted indexes is the least important one of the three if's)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kindly add a reason to the comment in l. 49, e.g. /// Skip inverted index files, these have a default MergeTreeDataPartChecksum with file_size == 0

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you don't mind, I would like to understand how the database can end up with a checksum with file_size == 0 for inverted indexes (e.g. which methods are called in which order). Asking because I never encountered this error.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would be great if you could add a simple end-to-end stateless test (tests/queries/0_stateless/) which makes sure the situation doesn't occur again.

@save-my-heart
Copy link
Contributor Author

@rschu1ze I just built clickhouse on latest master branch(commit id: 8e0a2a3), then:

  1. create a table with inverted index.
  2. insert some data.
  3. stop clickhouse server and then start it again.

The data can not be loaded, and there are some error in logs as ...gin_dict has unexpected size: 44 instead of 0. (BAD_SIZE_OF_FILE_IN_DATA_PART)...

@save-my-heart
Copy link
Contributor Author

@rschu1ze I should put the if at the top of three, because when I:

  1. create a table with inverted index.
  2. insert some data.
  3. detach the previous parts
  4. stop clickhouse server and then start again

There are some errors in logs as ...gin_dict doesn't exist. (FILE_DOESNT_EXIST)....
Because detach part will generate an empty part, but without .gin_dict, .gin_post, .gin_seg or .gin_sid files. (so we can also create these files while we generating an empty part?)

@rschu1ze
Copy link
Member

Alright, that makes sense. Thanks a ton! I can confirm after local testing that the test indeed turns green with your fix.

@rschu1ze rschu1ze merged commit 02a2c73 into ClickHouse:master Mar 16, 2023
@save-my-heart save-my-heart deleted the inverted_index branch March 17, 2023 02:12
@cangyin
Copy link
Contributor

cangyin commented Mar 19, 2023

inverted index did introduce some special treatmemts.

@cangyin
Copy link
Contributor

cangyin commented Jul 25, 2023

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
can be tested Allows running workflows for external contributors pr-not-for-changelog This PR should not be mentioned in the changelog
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants