Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix bloom filters for String (data skipping indices) #11638

Merged
merged 3 commits into from Jun 15, 2020

Conversation

azat
Copy link
Collaborator

@azat azat commented Jun 12, 2020

Changelog category (leave one):

  • Bug Fix

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Fix bloom filters for String (data skipping indices)

Detailed description / Documentation draft:
bloom filter was broken for the first element, if all of the following
conditions satisfied:

  • they are created on INSERT (in thie case bloom filter hashing uses
    offsets, in case of OPTIMIZE it does not, since it already has
    granulars).
  • the record is not the first in the block
  • the record is the first per index_granularity (do not confuse this
    with data skipping index GRANULARITY).
  • type of the field for indexing is "String" (not FixedString)

Because in this case there was incorrect length and data for that string.

Fixes: #11634
Cc: @filimonov

bloom filter was broken for the first element, if all of the following
conditions satisfied:
- they are created on INSERT (in thie case bloom filter hashing uses
  offsets, in case of OPTIMIZE it does not, since it already has
  granulars).
- the record is not the first in the block
- the record is the first per index_granularity (do not confuse this
  with data skipping index GRANULARITY).
- type of the field for indexing is "String" (not FixedString)

Because in this case there was incorrect length and *data* for that string.
@blinkov blinkov added the pr-bugfix Pull request with bugfix, not backported by default label Jun 12, 2020
UInt64 city_hash = CityHash_v1_0_2::CityHash64(
reinterpret_cast<const char *>(&data[current_offset]), offsets[index + pos] - current_offset - 1);
reinterpret_cast<const char *>(&data[current_offset]), length);
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Speaking about this, there are tons of such places, but there is getDataAt, and asm looks the same (at least after a quick look), maybe complexity does not worth it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand that this may be because getDataAt is a virtual function

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point! But AFAICS the getDataAt may be marked final, and I guess it is always used after casting to some column type so compiler will optimize this (like here)

Although there can be cases when this will not happen I guess, and in this case explicit writing preferable.

Anyway this is just a thought, not related to this bugfix.

The expected values was incorrect, since for strings we have 1 and 10
and there will be at least two index granulas, hence 12 rows.
@azat
Copy link
Collaborator Author

azat commented Jun 12, 2020

Cc: @zhang2014

@azat azat marked this pull request as draft June 12, 2020 23:17
@zhang2014
Copy link
Contributor

LGTM

…g_multi_granulas

This better reflects the covered case.
@azat azat marked this pull request as ready for review June 13, 2020 11:21
@akuzm
Copy link
Contributor

akuzm commented Jun 15, 2020

Build failure and perftest error are CI problem.

@akuzm akuzm merged commit e460d7c into ClickHouse:master Jun 15, 2020
@akuzm akuzm self-assigned this Jun 15, 2020
@azat azat deleted the skip-idx-bloom-filter-fix branch June 15, 2020 17:36
vitlibar pushed a commit that referenced this pull request Jun 19, 2020
Fix bloom filters for String (data skipping indices)

(cherry picked from commit e460d7c)
vitlibar pushed a commit that referenced this pull request Jun 19, 2020
Fix bloom filters for String (data skipping indices)

(cherry picked from commit e460d7c)
vitlibar pushed a commit that referenced this pull request Jun 20, 2020
Fix bloom filters for String (data skipping indices)

(cherry picked from commit e460d7c)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
no-docs-needed pr-bugfix Pull request with bugfix, not backported by default
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Bloom filter issue
6 participants