-
Notifications
You must be signed in to change notification settings - Fork 6.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bloom filter indices #4499
Bloom filter indices #4499
Conversation
Some comparison ( |
|
||
/// Builds reverse polish notation | ||
template <typename RPNElement> | ||
class RPNBuilder |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can be also used in KeyCondition (RPNBuilder is simply a copy-paste of some of its functions), but it fails performance tests. https://clickhouse-test-reports.s3.yandex.net/4499/fcb82ba901651b73229a0be1bbd71fba308a4d57/performance_test.html
(For BloomFilterIndex I have not noticed any difference in performance between copy-pasted functions and RPNBuilder)
Comparision between no index and ngram index (50 runs for each) |
Comparison for insert # insert into ... select * from datasets.hits_v1 no index: 0 rows in set. Elapsed: 20.463 sec. Processed 8.87 million rows, 8.46 GB (433.65 thousand rows/s., 413.45 MB/s.) 3 x ngrambf(3, 512, 2, 0) (URLDomain, SearchPhrase, Title): 0 rows in set. Elapsed: 42.752 sec. Processed 8.87 million rows, 8.46 GB (207.57 thousand rows/s., 197.90 MB/s.) 1 x ngrambf(4, 512, 1, 0) (Title): 0 rows in set. Elapsed: 35.820 sec. Processed 8.87 million rows, 8.46 GB (247.74 thousand rows/s., 236.19 MB/s.) 1 x tokenbf(512, 1, 0) (Title): 0 rows in set. Elapsed: 27.339 sec. Processed 8.87 million rows, 8.46 GB (324.59 thousand rows/s., 309.47 MB/s.) https://gist.github.com/nikvas0/c60ecb9c37d4a61b4cd924090ff6a806 |
I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en
Category (leave one):
Short description (up to few sentences):
A new type of data skipping indices based on bloom filters (can be used for
equal
,in
andlike
functions).WHERE s LIKE '%cats%'
will use fullscan, but queryWHERE s LIKE %.cats.%
will use 'cat' token)