-
Notifications
You must be signed in to change notification settings - Fork 24.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Percolate performance difference in 1.0.0 and 1.2.2 #6806
Comments
I can't reproduce the slowdown that you're reporting. I benchmarked with different types of queries (term, range and geo_distance) between 1.0 and 1.2.2. In fact on 1.2.2 I get a slightly better performance. Can you share a more detailed reproduction of the big performance difference that you're experiencing? The indexing slowdown in 1.0.0 was due to [1] and has been resolved in 1.0.2. |
I've performed additional tests. I've used 1.0.2 (as inserting percolate queries was very slow) and 1.2.2. Difference is much smaller (previously i had difference in nr of queries), but still 1.2.2 is 3 times slower. With 1.0.2 I got average 198 ms per percolate with 200k queries and 312 ms with 300k queries. Edit: |
@quhar I can confirm the performance regression:
The following change introduced in 1.2.2 is causing this slowdown: On 1.2.1 there is similar performance as is on 1.0.3:
This change disabled caching for both filter cache and field data at all times in the percolator. In general this is a good improvement, since the caching for an in memory index make no sense. However the disabling of the field data caching has a drawback in that even constructing data structures for the percolator in memory index is relatively expensive, and that is what you're noticing if you percolating across 300k docs with a geo_distance filter that relies on field data. So the percolator is creating the same data structure 300k times, which is causing the performance regression. Instead the percolator should temporarily cache field data during the execution of a percolator request and that should fix the performance regression. |
The performance regression only occurs if percolator queries contain filters that rely on field data. (e.g. @quhar If you set mappings = {
'mappings': {
TYPE: {
'properties': {
'name': {
u'type': u'string',
},
'location': {
u'type': u'geo_point',
u'doc_values': u'true',
}
}
}
}
} By enabling doc_values the cost of building field data is done only once instead of 300k times. I think this the best way to get around this performance regression. |
Thanks. I will take a look at |
Before the index reader used by the percolator didn't allow to register a CoreCloseListener, but now it does, making it safe to cache index field data cache entries. Creating field data structures is relatively expensive and caching them can save a lot of noise if many queries are evaluated in a percolator call. Closes elastic#6806 Closes elastic#7081
Before the index reader used by the percolator didn't allow to register a CoreCloseListener, but now it does, making it safe to cache index field data cache entries. Creating field data structures is relatively expensive and caching them can save a lot of noise if many queries are evaluated in a percolator call. Closes #6806 Closes #7081
Before the index reader used by the percolator didn't allow to register a CoreCloseListener, but now it does, making it safe to cache index field data cache entries. Creating field data structures is relatively expensive and caching them can save a lot of noise if many queries are evaluated in a percolator call. Closes #6806 Closes #7081
Before the index reader used by the percolator didn't allow to register a CoreCloseListener, but now it does, making it safe to cache index field data cache entries. Creating field data structures is relatively expensive and caching them can save a lot of noise if many queries are evaluated in a percolator call. Closes #6806 Closes #7081
We found that when we use doc_values, percolator finds about 30% less matches (queries). Execution time was smaller by about 30% too. We do batch percolation. We made this change to our index:
with:
Number of queries to percolate against: 154,195 |
@dennisgorelik ouch, does this also occur with a more recent version of ES? (1.3.8 / 1.4.3) If so can you open a new issue for this? and if possible share a smaller reproduction of the bug? |
Martin, |
Martin, |
We tried performance of the same percolate batch (of 50 items) on ElasticSearch 1.4.3. Using doc_value=true on ElasticSearch 1.4.3 did not improve performance (but caused percolator to miss ~30% of queries that it should have found - the same issue as in ElasticSearch 1.2.2). |
Before the index reader used by the percolator didn't allow to register a CoreCloseListener, but now it does, making it safe to cache index field data cache entries. Creating field data structures is relatively expensive and caching them can save a lot of noise if many queries are evaluated in a percolator call. Closes elastic#6806 Closes elastic#7081
Hi,
I was doing some test with percolate and I noticed huge difference in performance of percolate query between 1.0.0 and 1.2.2. In 1.0.0 indexing of percolate queries takes much longer than in 1.2.2 (around 10 times), but percolate query is much faster. With 400k queries I get tens of ms in 1.0.0 to percolate document and couple of seconds in 1.2.2. I performed all tests with default settings always on clean ES instance. Percolate document looks like:
and percolate query:
Has something changes with default setting of ES between these versions regarding percolate? I couldn't find anything in release notes.
The text was updated successfully, but these errors were encountered: