Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A better alternative to #46344 #46921

Merged
merged 2 commits into from
Feb 27, 2023
Merged

A better alternative to #46344 #46921

merged 2 commits into from
Feb 27, 2023

Conversation

rschu1ze
Copy link
Member

@rschu1ze rschu1ze commented Feb 26, 2023

The (experimental) inverted index writes/reads files different from the standard files written by the other skip indexes. The original problem was that with database engine "ordinary", DROP TABLE of a table with inverted index finds unknown files in persistence and complains. The same will happen with engine "atomic" but deferred. As a hotfix, the error was silenced in #46344 by explicitly adding the four files created in a specific test to the deletion code.

This PR tries a cleaner solution where all needed files are provided via the normal checksum mechanism. One drawback remains which is that the affected files were written earlier and we don't have their checksums available. Therefore, the inverted index is currently excluded from CHECK TABLE (but it was like that also before).

Minimal repro:

  SET allow_experimental_inverted_index = 1;
  DROP TABLE IF EXISTS tab;
  CREATE TABLE tab(s String, INDEX af(s) TYPE inverted(2)) ENGINE = MergeTree() ORDER BY s;
  INSERT INTO tab VALUES ('Alick a01');
  CHECK TABLE tab;
  DROP TABLE IF EXISTS tab;

Then run ./clickhouse-test with --db-engine Ordinary

Changelog category (leave one):

  • Not for changelog

The (experimental) inverted index writes/reads files different from the
standard files written by the other skip indexes. The original problem
was that with database engine "ordinary", DROP TABLE of a table with
inverted index finds unknown files in persistence and complains. The
same will happen with engine "atomic" but deferred. As a hotfix, the
error was silenced by explicitly adding the four files created in a
specific test to the deletion code.

This PR tries a cleaner solution where all needed files are provided via
the normal checksum structure. One drawback remains which is that the
affected files were written earlier and we don't have their checksums
available. Therefore, the inverted index is currently excluded from
CHECK TABLE.

Minimal repro:
  SET allow_experimental_inverted_index = 1;
  DROP TABLE IF EXISTS tab;
  CREATE TABLE tab(s String, INDEX af(s) TYPE inverted(2)) ENGINE = MergeTree() ORDER BY s;
  INSERT INTO tab VALUES ('Alick a01');
  CHECK TABLE tab;
  DROP TABLE IF EXISTS tab;

  run ./clickhouse-test with --db-engine Ordinary
@robot-ch-test-poll4 robot-ch-test-poll4 added the pr-not-for-changelog This PR should not be mentioned in the changelog label Feb 26, 2023
@UnamedRus
Copy link
Contributor

Therefore, the inverted index is currently excluded from CHECK TABLE (but it was like that also before).

AFAIK, ClickHouse recalculate checksums if some files does not exist in checksum.txt.

So it's completely OK?

@rschu1ze
Copy link
Member Author

AFAIK, ClickHouse recalculate checksums if some files does not exist in checksum.txt.

True.

So it's completely OK?

Well, "Storages/MergeTree/MergeTreeDataPartChecksum.cpp" was changed to suppress that. My local testing was green. The whole situation isn't great right now but better than before and hopefully only an interim solution.

@tavplubix
Copy link
Member

It would be great to add a test with a mutation that hardlinks all files (to check that it keeps/fills the checksums)

@rschu1ze rschu1ze merged commit 6f07090 into master Feb 27, 2023
@rschu1ze rschu1ze deleted the rs/inv-idx-checksum branch February 27, 2023 13:44
@tavplubix tavplubix self-assigned this Feb 27, 2023
@rschu1ze
Copy link
Member Author

Can do that for sure. But I don't really understand how I could make a mutation hardlink the files. Is there a setting for that?

@tavplubix
Copy link
Member

No, it depends on what files have to be modified by a mutation. For example, when you run ALTER ... UPDATE col1 = ..., it rewrites col1 and hardlinks other columns (and indexes that do not depend on col1). And ALTER ... DROP COLUMN col1 hardlinks everything except col1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr-not-for-changelog This PR should not be mentioned in the changelog
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants