New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix bloom filters for String (data skipping indices) #11638
Conversation
bloom filter was broken for the first element, if all of the following conditions satisfied: - they are created on INSERT (in thie case bloom filter hashing uses offsets, in case of OPTIMIZE it does not, since it already has granulars). - the record is not the first in the block - the record is the first per index_granularity (do not confuse this with data skipping index GRANULARITY). - type of the field for indexing is "String" (not FixedString) Because in this case there was incorrect length and *data* for that string.
UInt64 city_hash = CityHash_v1_0_2::CityHash64( | ||
reinterpret_cast<const char *>(&data[current_offset]), offsets[index + pos] - current_offset - 1); | ||
reinterpret_cast<const char *>(&data[current_offset]), length); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Speaking about this, there are tons of such places, but there is getDataAt
, and asm looks the same (at least after a quick look), maybe complexity does not worth it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand that this may be because getDataAt is a virtual function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point! But AFAICS the getDataAt
may be marked final
, and I guess it is always used after casting to some column type so compiler will optimize this (like here)
Although there can be cases when this will not happen I guess, and in this case explicit writing preferable.
Anyway this is just a thought, not related to this bugfix.
The expected values was incorrect, since for strings we have 1 and 10 and there will be at least two index granulas, hence 12 rows.
Cc: @zhang2014 |
LGTM |
…g_multi_granulas This better reflects the covered case.
Build failure and perftest error are CI problem. |
Fix bloom filters for String (data skipping indices) (cherry picked from commit e460d7c)
Fix bloom filters for String (data skipping indices) (cherry picked from commit e460d7c)
Fix bloom filters for String (data skipping indices) (cherry picked from commit e460d7c)
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Fix bloom filters for String (data skipping indices)
Detailed description / Documentation draft:
bloom filter was broken for the first element, if all of the following
conditions satisfied:
offsets, in case of OPTIMIZE it does not, since it already has
granulars).
with data skipping index GRANULARITY).
Because in this case there was incorrect length and data for that string.
Fixes: #11634
Cc: @filimonov