Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Percolate performance difference in 1.0.0 and 1.2.2 #6806

Closed
quhar opened this issue Jul 9, 2014 · 10 comments · Fixed by #7081
Closed

Percolate performance difference in 1.0.0 and 1.2.2 #6806

quhar opened this issue Jul 9, 2014 · 10 comments · Fixed by #7081
Assignees

Comments

@quhar
Copy link

quhar commented Jul 9, 2014

Hi,

I was doing some test with percolate and I noticed huge difference in performance of percolate query between 1.0.0 and 1.2.2. In 1.0.0 indexing of percolate queries takes much longer than in 1.2.2 (around 10 times), but percolate query is much faster. With 400k queries I get tens of ms in 1.0.0 to percolate document and couple of seconds in 1.2.2. I performed all tests with default settings always on clean ES instance. Percolate document looks like:

{
"query": {
   "filtered": {
      "filter": {   
         "and": [      
            {
               "geo_distance": {
                  "distance": "94km",
                  "location": {
                     "lat": "97",
                     "lon": "-76"
                  }       
               }       
            }       
         ]       
      },
      "query": {
         "match": {
            "_all": "note father surprise"
         }       
      }       
   }       
},
"type": "offer"
}

and percolate query:

GET /test-index/offer/_percolate
{
    "doc": {
        "name": "note",
        "location": [-7,-80]
    }
}

Has something changes with default setting of ES between these versions regarding percolate? I couldn't find anything in release notes.

@martijnvg
Copy link
Member

I can't reproduce the slowdown that you're reporting. I benchmarked with different types of queries (term, range and geo_distance) between 1.0 and 1.2.2. In fact on 1.2.2 I get a slightly better performance. Can you share a more detailed reproduction of the big performance difference that you're experiencing?

The indexing slowdown in 1.0.0 was due to [1] and has been resolved in 1.0.2.
1: #5339

@quhar
Copy link
Author

quhar commented Jul 10, 2014

I've performed additional tests. I've used 1.0.2 (as inserting percolate queries was very slow) and 1.2.2. Difference is much smaller (previously i had difference in nr of queries), but still 1.2.2 is 3 times slower.
Test script
I did two tests, one with 200k queries, second one with 300k. In both cases for test I percolated same document 100 times and computed average. Here are detailed results.

With 1.0.2 I got average 198 ms per percolate with 200k queries and 312 ms with 300k queries.
With 1.2.2 values are 667 and 1008 respectively.

Edit:
In both cases it's standard ES from deb without any config or runtime changes.

@martijnvg
Copy link
Member

@quhar I can confirm the performance regression:

1.0.3:
No handlers could be found for logger "elasticsearch"
Indexed 300000 queries into precolator, took 234.444020987 seconds
After 100 tests, average query time: 288.63

1.2.2:
No handlers could be found for logger "elasticsearch"
Indexed 300000 queries into precolator, took 250.046157837 seconds
After 100 tests, average query time: 846.89

The following change introduced in 1.2.2 is causing this slowdown:
#6578

On 1.2.1 there is similar performance as is on 1.0.3:

1.2.1:
No handlers could be found for logger "elasticsearch"
Indexed 300000 queries into precolator, took 217.544229984 seconds
After 100 tests, average query time: 270.79

This change disabled caching for both filter cache and field data at all times in the percolator. In general this is a good improvement, since the caching for an in memory index make no sense. However the disabling of the field data caching has a drawback in that even constructing data structures for the percolator in memory index is relatively expensive, and that is what you're noticing if you percolating across 300k docs with a geo_distance filter that relies on field data.

So the percolator is creating the same data structure 300k times, which is causing the performance regression. Instead the percolator should temporarily cache field data during the execution of a percolator request and that should fix the performance regression.

@martijnvg
Copy link
Member

The performance regression only occurs if percolator queries contain filters that rely on field data. (e.g. geo_distance filter). The best way to get around the field data loading overhead costs is to enable doc values for geo fields.

@quhar If you set doc_values to true on the location field the performance is similar to 1.0.3:

mappings = {
    'mappings': {
        TYPE: {
            'properties': {
                'name': {
                    u'type': u'string',
                },
                'location': {
                    u'type': u'geo_point',
                    u'doc_values': u'true',
                }
            }
        }
    }
}

By enabling doc_values the cost of building field data is done only once instead of 300k times. I think this the best way to get around this performance regression.

@quhar
Copy link
Author

quhar commented Jul 14, 2014

Thanks. I will take a look at doc_value then

martijnvg added a commit to martijnvg/elasticsearch that referenced this issue Aug 4, 2014
Before the index reader used by the percolator didn't allow to register a CoreCloseListener, but now it does, making it safe to cache index field data cache entries.
Creating field data structures is relatively expensive and caching them can save a lot of noise if many queries are evaluated in a percolator call.

Closes elastic#6806
Closes elastic#7081
martijnvg added a commit that referenced this issue Aug 4, 2014
Before the index reader used by the percolator didn't allow to register a CoreCloseListener, but now it does, making it safe to cache index field data cache entries.
Creating field data structures is relatively expensive and caching them can save a lot of noise if many queries are evaluated in a percolator call.

Closes #6806
Closes #7081
martijnvg added a commit that referenced this issue Aug 4, 2014
Before the index reader used by the percolator didn't allow to register a CoreCloseListener, but now it does, making it safe to cache index field data cache entries.
Creating field data structures is relatively expensive and caching them can save a lot of noise if many queries are evaluated in a percolator call.

Closes #6806
Closes #7081
martijnvg added a commit that referenced this issue Sep 8, 2014
Before the index reader used by the percolator didn't allow to register a CoreCloseListener, but now it does, making it safe to cache index field data cache entries.
Creating field data structures is relatively expensive and caching them can save a lot of noise if many queries are evaluated in a percolator call.

Closes #6806
Closes #7081
@dennisgorelik
Copy link

We found that when we use doc_values, percolator finds about 30% less matches (queries).
Percolator (against index with doc_values setting on Location property) does not return matches that it clearly should have find (query is searching for keyword that is present in percolated item; percolated item is within location range specified by the query).

Execution time was smaller by about 30% too.

We do batch percolation.
We percolate 50 items per batch.
It took:
28.225s to percolate 50 items without "doc_values" option (found 14787 matching queries).
19.855s to percolate 50 items with "doc_values" option (found 10781 matching queries).

We made this change to our index:
Replaced:

""Location"": {
    ""type"": ""geo_point""
}

with:

""Location"": {
    ""type"": ""geo_point"",
    ""doc_values"": ""true""
}

Number of queries to percolate against: 154,195
Version: ElasticSearch 1.2.2

@martijnvg
Copy link
Member

@dennisgorelik ouch, does this also occur with a more recent version of ES? (1.3.8 / 1.4.3) If so can you open a new issue for this? and if possible share a smaller reproduction of the bug?

@dennisgorelik
Copy link

Martin,
We are going to install and test doc_values effect on ElasticSearch 1.4.3 (probably in a day or two).
If we reproduce the same percolation mismatch issue on a smaller set or data we will file a separate issue.

@dennisgorelik
Copy link

Martin,
We were able to reproduce that doc_value mismatch.
I created separate issue for that:
#9714

@dennisgorelik
Copy link

We tried performance of the same percolate batch (of 50 items) on ElasticSearch 1.4.3.
The performance improved by ~25%
28.225s on ElasticSearch 1.2.2
21.270s on ElasticSearch 1.4.3

Using doc_value=true on ElasticSearch 1.4.3 did not improve performance (but caused percolator to miss ~30% of queries that it should have found - the same issue as in ElasticSearch 1.2.2).

mute pushed a commit to mute/elasticsearch that referenced this issue Jul 29, 2015
Before the index reader used by the percolator didn't allow to register a CoreCloseListener, but now it does, making it safe to cache index field data cache entries.
Creating field data structures is relatively expensive and caching them can save a lot of noise if many queries are evaluated in a percolator call.

Closes elastic#6806
Closes elastic#7081
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants