Elasticsearch hangs using phrase suggester with collate option #9377

silvestrelosada · 2015-01-21T13:36:53Z

Hi I have elasticsearch installation single node with 100 documents, making simultaneously requests to phrase suggester with collate option blocks completely elastic search. And does gives the option to do any thing else.

Here is my query

{  
   "query":{  
      "bool":{  
         "should":[  
            {  
               "prefix":{  
                  "label":"michael"
               }
            },
            {  
               "prefix":{  
                  "synonym":"michael"
               }
            }
         ]
      }
   },
   "explain":false,
   "suggest":{  
      "spell_ngram":{  
         "text":"michael",
         "phrase":{  
            "field":"spell_ngram",
            "size":1,
            "confidence":0.0,
            "max_errors":0.9,
            "gram_size":5,
            "collate":{  
               "query":{ "match" : {
                   "spell_ngram" : {
                        "query" : "{{suggestion}}", "type" : "phrase"
                            }  
                    }
                }
            }
         }
      }
   }
}

And here is my mappings.

{
"concept":{
    "properties" :{
            "_id":{ "type" : "string", "index" : "not_analyzed"},
            "_type":{ "type" : "string", "index" : "not_analyzed"},
            "label": {
            "type" : "multi_field",
                "fields" : {
                    "label" : {"type" : "string", "index" : "not_analyzed","copy_to" : ["spell", "spell_ngram"],"store" : true},
                    "label_analysis" : {"type" : "string", "index" : "analyzed","store" : true,"term_vector": "with_positions_offsets_payloads"},
                    "phonetic_label" : { "type" : "string", "index" : "analyzed", "index_analyzer":"phonetic_analyzer_index", "search_analyzer":"phonetic_analyzer_search","store" : true,"term_vector": "with_positions_offsets_payloads" }
                }
             },
            "synonym" : {
                "type" : "multi_field",
                "fields" : {
                    "synonym" : {"type" : "string", "index" : "not_analyzed","copy_to" : ["spell", "spell_ngram"],"store" : true,"term_vector": "with_positions_offsets_payloads"},
                    "synonym_analysis" : {"type" : "string", "index" : "analyzed" ,"store" : true},
                    "phonetic_synonym" : { "type" : "string", "index" : "analyzed",  "index_analyzer":"phonetic_analyzer_index", "search_analyzer":"phonetic_analyzer_search","store":true,"term_vector": "with_positions_offsets_payloads"  }
                },
           "spell_ngram" : { "type" : "string", "index" : "analyzed", "analyzer":"shingle_analyzer","store":true, "term_vector": "with_positions_offsets_payloads" }
        }
     }
   }
}

clintongormley · 2015-01-21T13:49:29Z

Hi @silvestrelosada

Please could you provide the simplest complete recreation of the problem, so that we can try it out:

you're only running suggestions on one field, but you've provided the mapping for all fields
you haven't provided the index settings (including the analysis section)
you haven't provided an example document

thanks

silvestrelosada · 2015-01-21T17:11:01Z

Hi here is the data needed to reproduce the error, mapping file, elasticsarch.yml, and data

https://gist.github.com/silvestrelosada/eb32afdc8c971504e45c

To reproduce it you have to send seval concurrent queries, Im using jmeter.

Best

silvestrelosada · 2015-01-22T13:33:55Z

If it is helps the issue takes place on multi shard enviroment in fetchMatchingDocCountResponses method when executing the search.

seyart · 2015-01-26T20:21:17Z

Hi,

I think I have the same issue. ES hangs when I send multiple simultaneous suggest requests with collate option to the search API. I tested the version 1.37 and 1.42 of ES. ES is running with one node and the default settings (5 shards per index, but 0 replica). There are about 19 000 documents in the target index.

To make ES hang, I created a script which run a lot of suggest requests with random text. When I launch 2 or more instances of this script, depending on the hardware configuration of my testing environment, ES hangs and my ES client (elasticsearch-py) raises timeout exceptions. Then, I'm unable to make curl request to the search API (it's keep waiting for a response). Index API and bulk API are still running fine.
Using BigDesk to monitor, there is no memory issue. The search thread pools are also fine but the queue size increases after it hangs.

ES log gives nothing.

Here is an exemple of a request :

{
    "aggregations": {
        "brand": {
            "terms": {
                "field": "brand.untouched",
                "size": 0
            }
        },
        "categories": {
            "terms": {
                "field": "categories_parents_catalog_codes",
                "size": 0
            }
        },
        "offers": {
            "terms": {
                "field": "offers.untouched",
                "size": 0
            }
        },
        "stores": {
            "terms": {
                "field": "categories_parents_catalog_codes",
                "size": 0
            }
        }
    },
    "fields": [
        "full_title"
    ],
    "from": 0,
    "query": {
        "filtered": {
            "filter": {
                "term": {
                    "categories_parents_catalog_codes": "root"
                }
            },
            "query": {
                "match": {
                    "full_title.autocomplete": {
                        "operator": "and",
                        "query": "B"
                    }
                }
            }
        }
    },
    "size": 20,
    "suggest": {
        "suggestion_phrase": {
            "phrase": {
                "collate": {
                    "preference": "_primary",
                    "query": {
                        "filtered": {
                            "filter": {
                                "term": {
                                    "categories_parents_catalog_codes": "root"
                                }
                            },
                            "query": {
                                "match": {
                                    "full_title.autocomplete": {
                                        "operator": "and",
                                        "query": "{{suggestion}}"
                                    }
                                }
                            }
                        }
                    }
                },
                "field": "full_title.suggestion"
            }
        },
        "text": "B"
    }
}

Here is the mapping of the suggest field :

            "full_title": {
                "type": "string",
                "fields": {
                    "french": {
                        "type": "string",
                        "analyzer": "custom_french"
                    },
                    "suggestion": {
                        "type": "string",
                        "analyzer": "custom_suggestion"
                    },
                    "autocomplete": {
                        "type": "string",
                        "index_analyzer": "nGram_analyzer",
                        "search_analyzer": "whitespace_analyzer"
                    }
                }
            }

I remain available for any further information.

Best Regards,

clintongormley · 2015-01-26T20:57:45Z

@areek could you take a look at this please

s1monw · 2015-02-06T11:09:23Z

@areek can you try to write a test to reproduce this?

closes elastic#9377

s1monw · 2015-05-04T07:59:11Z

one idea that I have here is that it seem it only happens if we specify the suggest together with a search and that means the suggestion as well as the collation is executed on the search threadpool @areek can you try to write a test that does the same and try to forcefully reduce the number of threads in the search threadpool to a small number ie 1 or 2? I guess that could trigger the issue...

areek · 2015-05-13T20:00:11Z

I have been able to reproduce this issue.
Collate option internally fires off a search request, when used from the _search API, the internal request uses the same search threadpool used by the search request, hence if all threads are busy, it causes a deadlock.

One solution to avoid this issue would be to execute the collate query on only the local shard from which the suggestions are generated instead of collating on non-local shards. As suggestions are generated from the terms of the local shard, in most cases a generated suggestion which does not yield a hit for the collate query on the local shard would not yield a hit for collate query on non-local shards. Thoughts? I will create a PR for this.

s1monw · 2015-05-13T20:07:24Z

One solution to avoid this issue would be to execute the collate query on only the local shard from which the suggestions are generated instead of collating on non-local shards. As suggestions are generated from the terms of the local shard, in most cases a generated suggestion which does not yield a hit for the collate query on the local shard would not yield a hit for collate query on non-local shards. Thoughts? I will create a PR for this.

I like this a lot - i think it's the right tradeoff

…al shard. Previously, collate feature would be executed on all shards of an index using the client, this leads to a deadlock when concurrent collate requests are run from the _search API, due to the fact that both the external request and internal collate requests use the same search threadpool. As phrase suggestions are generated from the terms of the local shard, in most cases the generated suggestion, which does not yield a hit for the collate query on the local shard would not yield a hit for collate query on non-local shards. This commit removes the ability to specify a `preference` for a collate query, as the collate query is only run on the local shard. closes elastic#9377

…al shard. Previously, collate feature would be executed on all shards of an index using the client, this leads to a deadlock when concurrent collate requests are run from the _search API, due to the fact that both the external request and internal collate requests use the same search threadpool. As phrase suggestions are generated from the terms of the local shard, in most cases the generated suggestion, which does not yield a hit for the collate query on the local shard would not yield a hit for collate query on non-local shards. Instead of using the client for collating suggestions, collate query is executed against the ContextIndexSearcher. This PR removes the ability to specify a preference for a collate query, as the collate query is only run on the local shard. closes elastic#9377

…al shard. Previously, collate feature would be executed on all shards of an index using the client, this leads to a deadlock when concurrent collate requests are run from the _search API, due to the fact that both the external request and internal collate requests use the same search threadpool. As phrase suggestions are generated from the terms of the local shard, in most cases the generated suggestion, which does not yield a hit for the collate query on the local shard would not yield a hit for collate query on non-local shards. Instead of using the client for collating suggestions, collate query is executed against the ContextIndexSearcher. This PR removes the ability to specify a preference for a collate query, as the collate query is only run on the local shard. closes #9377

clintongormley added the feedback_needed label Jan 21, 2015

clintongormley assigned areek Jan 26, 2015

jpountz removed the discuss label Feb 6, 2015

areek added a commit to areek/elasticsearch that referenced this issue Apr 21, 2015

Phrase Suggester Collate enhancements

c086ee8

closes elastic#9377

areek mentioned this issue Apr 21, 2015

Phrase Suggester Collate Enhancements #10710

Closed

kevinkluge added the in progress label Apr 21, 2015

areek mentioned this issue May 13, 2015

Ensure collate option in PhraseSuggester only collates on local shard #11156

Closed

areek closed this as completed in 7efc43d May 14, 2015

kevinkluge removed the in progress label May 14, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Elasticsearch hangs using phrase suggester with collate option #9377

Elasticsearch hangs using phrase suggester with collate option #9377

silvestrelosada commented Jan 21, 2015

clintongormley commented Jan 21, 2015

silvestrelosada commented Jan 21, 2015

silvestrelosada commented Jan 22, 2015

seyart commented Jan 26, 2015

clintongormley commented Jan 26, 2015

s1monw commented Feb 6, 2015

s1monw commented May 4, 2015

areek commented May 13, 2015

s1monw commented May 13, 2015

Elasticsearch hangs using phrase suggester with collate option #9377

Elasticsearch hangs using phrase suggester with collate option #9377

Comments

silvestrelosada commented Jan 21, 2015

clintongormley commented Jan 21, 2015

silvestrelosada commented Jan 21, 2015

silvestrelosada commented Jan 22, 2015

seyart commented Jan 26, 2015

clintongormley commented Jan 26, 2015

s1monw commented Feb 6, 2015

s1monw commented May 4, 2015

areek commented May 13, 2015

s1monw commented May 13, 2015