Percolate performance difference in 1.0.0 and 1.2.2 #6806

quhar · 2014-07-09T23:21:13Z

Hi,

I was doing some test with percolate and I noticed huge difference in performance of percolate query between 1.0.0 and 1.2.2. In 1.0.0 indexing of percolate queries takes much longer than in 1.2.2 (around 10 times), but percolate query is much faster. With 400k queries I get tens of ms in 1.0.0 to percolate document and couple of seconds in 1.2.2. I performed all tests with default settings always on clean ES instance. Percolate document looks like:

{
"query": {
   "filtered": {
      "filter": {   
         "and": [      
            {
               "geo_distance": {
                  "distance": "94km",
                  "location": {
                     "lat": "97",
                     "lon": "-76"
                  }       
               }       
            }       
         ]       
      },
      "query": {
         "match": {
            "_all": "note father surprise"
         }       
      }       
   }       
},
"type": "offer"
}

and percolate query:

GET /test-index/offer/_percolate
{
    "doc": {
        "name": "note",
        "location": [-7,-80]
    }
}

Has something changes with default setting of ES between these versions regarding percolate? I couldn't find anything in release notes.

The text was updated successfully, but these errors were encountered:

martijnvg · 2014-07-10T09:56:44Z

I can't reproduce the slowdown that you're reporting. I benchmarked with different types of queries (term, range and geo_distance) between 1.0 and 1.2.2. In fact on 1.2.2 I get a slightly better performance. Can you share a more detailed reproduction of the big performance difference that you're experiencing?

The indexing slowdown in 1.0.0 was due to [1] and has been resolved in 1.0.2.
1: #5339

quhar · 2014-07-10T22:11:33Z

I've performed additional tests. I've used 1.0.2 (as inserting percolate queries was very slow) and 1.2.2. Difference is much smaller (previously i had difference in nr of queries), but still 1.2.2 is 3 times slower.
Test script
I did two tests, one with 200k queries, second one with 300k. In both cases for test I percolated same document 100 times and computed average. Here are detailed results.

With 1.0.2 I got average 198 ms per percolate with 200k queries and 312 ms with 300k queries.
With 1.2.2 values are 667 and 1008 respectively.

Edit:
In both cases it's standard ES from deb without any config or runtime changes.

martijnvg · 2014-07-11T14:22:54Z

@quhar I can confirm the performance regression:

1.0.3:
No handlers could be found for logger "elasticsearch"
Indexed 300000 queries into precolator, took 234.444020987 seconds
After 100 tests, average query time: 288.63

1.2.2:
No handlers could be found for logger "elasticsearch"
Indexed 300000 queries into precolator, took 250.046157837 seconds
After 100 tests, average query time: 846.89

The following change introduced in 1.2.2 is causing this slowdown:
#6578

On 1.2.1 there is similar performance as is on 1.0.3:

1.2.1:
No handlers could be found for logger "elasticsearch"
Indexed 300000 queries into precolator, took 217.544229984 seconds
After 100 tests, average query time: 270.79

This change disabled caching for both filter cache and field data at all times in the percolator. In general this is a good improvement, since the caching for an in memory index make no sense. However the disabling of the field data caching has a drawback in that even constructing data structures for the percolator in memory index is relatively expensive, and that is what you're noticing if you percolating across 300k docs with a geo_distance filter that relies on field data.

So the percolator is creating the same data structure 300k times, which is causing the performance regression. Instead the percolator should temporarily cache field data during the execution of a percolator request and that should fix the performance regression.

martijnvg · 2014-07-13T23:10:06Z

The performance regression only occurs if percolator queries contain filters that rely on field data. (e.g. geo_distance filter). The best way to get around the field data loading overhead costs is to enable doc values for geo fields.

@quhar If you set doc_values to true on the location field the performance is similar to 1.0.3:

mappings = {
    'mappings': {
        TYPE: {
            'properties': {
                'name': {
                    u'type': u'string',
                },
                'location': {
                    u'type': u'geo_point',
                    u'doc_values': u'true',
                }
            }
        }
    }
}

By enabling doc_values the cost of building field data is done only once instead of 300k times. I think this the best way to get around this performance regression.

quhar · 2014-07-14T08:13:17Z

Thanks. I will take a look at doc_value then

Before the index reader used by the percolator didn't allow to register a CoreCloseListener, but now it does, making it safe to cache index field data cache entries. Creating field data structures is relatively expensive and caching them can save a lot of noise if many queries are evaluated in a percolator call. Closes elastic#6806 Closes elastic#7081

Before the index reader used by the percolator didn't allow to register a CoreCloseListener, but now it does, making it safe to cache index field data cache entries. Creating field data structures is relatively expensive and caching them can save a lot of noise if many queries are evaluated in a percolator call. Closes #6806 Closes #7081

dennisgorelik · 2015-02-16T18:19:15Z

We found that when we use doc_values, percolator finds about 30% less matches (queries).
Percolator (against index with doc_values setting on Location property) does not return matches that it clearly should have find (query is searching for keyword that is present in percolated item; percolated item is within location range specified by the query).

Execution time was smaller by about 30% too.

We do batch percolation.
We percolate 50 items per batch.
It took:
28.225s to percolate 50 items without "doc_values" option (found 14787 matching queries).
19.855s to percolate 50 items with "doc_values" option (found 10781 matching queries).

We made this change to our index:
Replaced:

""Location"": {
    ""type"": ""geo_point""
}

with:

""Location"": {
    ""type"": ""geo_point"",
    ""doc_values"": ""true""
}

Number of queries to percolate against: 154,195
Version: ElasticSearch 1.2.2

martijnvg · 2015-02-16T21:42:33Z

@dennisgorelik ouch, does this also occur with a more recent version of ES? (1.3.8 / 1.4.3) If so can you open a new issue for this? and if possible share a smaller reproduction of the bug?

dennisgorelik · 2015-02-16T23:02:12Z

Martin,
We are going to install and test doc_values effect on ElasticSearch 1.4.3 (probably in a day or two).
If we reproduce the same percolation mismatch issue on a smaller set or data we will file a separate issue.

dennisgorelik · 2015-02-17T05:42:37Z

Martin,
We were able to reproduce that doc_value mismatch.
I created separate issue for that:
#9714

dennisgorelik · 2015-02-17T05:53:03Z

We tried performance of the same percolate batch (of 50 items) on ElasticSearch 1.4.3.
The performance improved by ~25%
28.225s on ElasticSearch 1.2.2
21.270s on ElasticSearch 1.4.3

Using doc_value=true on ElasticSearch 1.4.3 did not improve performance (but caused percolator to miss ~30% of queries that it should have found - the same issue as in ElasticSearch 1.2.2).

Before the index reader used by the percolator didn't allow to register a CoreCloseListener, but now it does, making it safe to cache index field data cache entries. Creating field data structures is relatively expensive and caching them can save a lot of noise if many queries are evaluated in a percolator call. Closes elastic#6806 Closes elastic#7081

s1monw assigned martijnvg Jul 10, 2014

s1monw added the regression label Jul 10, 2014

martijnvg mentioned this issue Jul 13, 2014

Reuse IndexFieldData instances between percolator queries #6845

Closed

martijnvg mentioned this issue Jul 29, 2014

Percolator should cache index field data instances. #7081

Merged

martijnvg closed this as completed in #7081 Aug 4, 2014

dennisgorelik mentioned this issue Feb 17, 2015

ElasticSearch percolator cannot find local query if doc_value=true #9714

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Percolate performance difference in 1.0.0 and 1.2.2 #6806

Percolate performance difference in 1.0.0 and 1.2.2 #6806

quhar commented Jul 9, 2014

martijnvg commented Jul 10, 2014

quhar commented Jul 10, 2014

martijnvg commented Jul 11, 2014

martijnvg commented Jul 13, 2014

quhar commented Jul 14, 2014

dennisgorelik commented Feb 16, 2015

martijnvg commented Feb 16, 2015

dennisgorelik commented Feb 16, 2015

dennisgorelik commented Feb 17, 2015

dennisgorelik commented Feb 17, 2015

Percolate performance difference in 1.0.0 and 1.2.2 #6806

Percolate performance difference in 1.0.0 and 1.2.2 #6806

Comments

quhar commented Jul 9, 2014

martijnvg commented Jul 10, 2014

quhar commented Jul 10, 2014

martijnvg commented Jul 11, 2014

martijnvg commented Jul 13, 2014

quhar commented Jul 14, 2014

dennisgorelik commented Feb 16, 2015

martijnvg commented Feb 16, 2015

dennisgorelik commented Feb 16, 2015

dennisgorelik commented Feb 17, 2015

dennisgorelik commented Feb 17, 2015