Completion prefix suggestion #3376

spinscale · 2013-07-24T09:29:30Z

Note: This is an experimental feature!

Traditionally FST suggesters needed to create an in-memory structure upfront, which needed to be in sync with the data inserted/deleted. This step to create a FST can be really expensive and long lasting on production systems.

So, why not trying to create an efficient FST alike structure on index time, load that quickly into memory and use this for suggestions?

Before deep diving into implementation details, let's start with a small sample

Sample

Create a simple mapping

curl -X DELETE localhost:9200/music
curl -X PUT localhost:9200/music

curl -X PUT localhost:9200/music/song/_mapping -d '{
  "song" : {
        "properties" : {
            "name" : { "type" : "string" },
            "suggest" : { "type" : "completion",
                          "index_analyzer" : "stopword",
                          "search_analyzer" : "simple",
                          "payloads" : true
            }
        }
    }
}'

curl -X PUT 'localhost:9200/music/song/1?refresh=true' -d '{
    "name" : "Nevermind",
    "suggest" : { 
        "input": [ "Nevermind", "Nirvana" ],
        "output": "Nirvana - Nevermind",
        "payload" : { "artistId" : 2321 }
    }
}'

A request looks like this

curl -X POST 'localhost:9200/music/_suggest' -d '{
    "song-suggest" : {
        "text" : "nev",
        "completion" : {
            "field" : "suggest"
        }
    }
}'

This is the response

{
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "song-suggest" : [ {
    "text" : "nev",
    "offset" : 0,
    "length" : 10,
    "options" : [ {
      "text" : "Nirvana - Nevermind",
      "score" : 1.0, "payload" : {"artistId":2321}
    } ]
  } ]
}

As you can see, the text returned is the provided output during indexing. Also the payload is included, which might carry a reference ID to the artist and thus makes it easy to retrieve further information.

Mapping options

In order to support prefix suggestion the field has to be marked as type completion.

{
    ...
     "properties" : {
        "suggestField" {
            "type" : "completion"
            "index_analyzer" : "stopword",
            "search_analyzer" : "simple",
        }
    }
}

While the type field is mandatory, the index_analyzer and search_analyzer fields can be omitted. The simple analyzer is used by default.

Payloads

If you want to return payloads, you have to explicitely enable them by using payloads: true - payloads can contain arbitrary JSON, but must be a JSON object, with opening { and closing } - no pure strings or arrays allowed.

Preserve separators

In addition, you can set preserve_separators: false in case you in case you want to return "Foo Fighters" when searching for "foof" (using the correct analyzer of course).

Preserve position increments

You can set preserve_position_increments: false in order to not count increase position increments, which is needed if the first word is a stopword and you are using an analyzer to filter out stopwords. This would allow you to suggest for b and get back The Beatles

Indexing

Simple case

The most simple case to index is like this

"suggestField" : [ "The Prodigy Firestarter", "Firestarter"]

Depending on the analyzer used

Outputs

Defining an output will always return the output for a found suggestion.

"suggestField" : { 
  "input" : [ "The Prodigy Firestarter", "Firestarter"],
  "output" : "The Prodigy, Firestarter",
}

Weights

You should define custom weights instead of relying on the default one (see the drawbacks section). The weight must be an positive integer (no float) and defines the order of your suggestions.

"suggestField" : { 
  "input" : [ "The Prodigy Firestarter", "Firestarter"],
  "output" : "The Prodigy, Firestarter",
  "weight" : 42
}

Also custom weights can make your suggestions valuable. Using weights you could boost the most played song or the best rated hotel first in your suggestions.

Search

Searches are working exactly like the phrase and term suggesters

curl -X POST 'localhost:9200/music/_suggest' -d '{
    "song-suggest" : {
        "text" : "nev",
        "completion" : {
            "field" : "suggest"
        }
    }
}'

Drawbacks

Using term frequency as default weight

If you do not specify a weight, the term frequency is used. This only makes sense if you optimize to a single segment or have large segments. If you do not, having custom weights might yield the results you are awaiting. So using term frequences as a weight indicator is not the best solution and you should set weight yourself.

The text was updated successfully, but these errors were encountered:

This commit introduces near realtime suggestions. For more information about its usage refer to github issue elastic#3376 From the implementation point of view, a custom AnalyzingSuggester is used in combination with a custom postingsformat (which is not exposed to the user anywhere for him to use).

This commit introduces near realtime suggestions. For more information about its usage refer to github issue elastic#3376 From the implementation point of view, a custom AnalyzingSuggester is used in combination with a custom postingsformat (which is not exposed to the user anywhere for him to use). Closes elastic#3376

This commit introduces near realtime suggestions. For more information about its usage refer to github issue #3376 From the implementation point of view, a custom AnalyzingSuggester is used in combination with a custom postingsformat (which is not exposed to the user anywhere for him to use). Closes #3376

This commit introduces near realtime suggestions. For more information about its usage refer to github issue elastic#3376 From the implementation point of view, a custom AnalyzingSuggester is used in combination with a custom postingsformat (which is not exposed to the user anywhere for him to use). Closes elastic#3376

ghost assigned spinscale Jul 24, 2013

spinscale mentioned this issue Jul 31, 2013

Added prefix suggestions based on AnalyzingSuggester #3416

Merged

spinscale closed this as completed in 4f4f3a2 Aug 1, 2013

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Completion prefix suggestion #3376

Completion prefix suggestion #3376

spinscale commented Jul 24, 2013

Completion prefix suggestion #3376

Completion prefix suggestion #3376

Comments

spinscale commented Jul 24, 2013

Sample

Mapping options

Payloads

Preserve separators

Preserve position increments

Indexing

Simple case

Outputs

Weights

Search

Drawbacks

Using term frequency as default weight