Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Completion prefix suggestion #3376

Closed
spinscale opened this issue Jul 24, 2013 · 0 comments · Fixed by #3416
Closed

Completion prefix suggestion #3376

spinscale opened this issue Jul 24, 2013 · 0 comments · Fixed by #3416

Comments

@spinscale
Copy link
Contributor

Note: This is an experimental feature!

Traditionally FST suggesters needed to create an in-memory structure upfront, which needed to be in sync with the data inserted/deleted. This step to create a FST can be really expensive and long lasting on production systems.

So, why not trying to create an efficient FST alike structure on index time, load that quickly into memory and use this for suggestions?

Before deep diving into implementation details, let's start with a small sample

Sample

Create a simple mapping

curl -X DELETE localhost:9200/music
curl -X PUT localhost:9200/music

curl -X PUT localhost:9200/music/song/_mapping -d '{
  "song" : {
        "properties" : {
            "name" : { "type" : "string" },
            "suggest" : { "type" : "completion",
                          "index_analyzer" : "stopword",
                          "search_analyzer" : "simple",
                          "payloads" : true
            }
        }
    }
}'

curl -X PUT 'localhost:9200/music/song/1?refresh=true' -d '{
    "name" : "Nevermind",
    "suggest" : { 
        "input": [ "Nevermind", "Nirvana" ],
        "output": "Nirvana - Nevermind",
        "payload" : { "artistId" : 2321 }
    }
}'

A request looks like this

curl -X POST 'localhost:9200/music/_suggest' -d '{
    "song-suggest" : {
        "text" : "nev",
        "completion" : {
            "field" : "suggest"
        }
    }
}'

This is the response

{
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "song-suggest" : [ {
    "text" : "nev",
    "offset" : 0,
    "length" : 10,
    "options" : [ {
      "text" : "Nirvana - Nevermind",
      "score" : 1.0, "payload" : {"artistId":2321}
    } ]
  } ]
}

As you can see, the text returned is the provided output during indexing. Also the payload is included, which might carry a reference ID to the artist and thus makes it easy to retrieve further information.

Mapping options

In order to support prefix suggestion the field has to be marked as type completion.

{
    ...
     "properties" : {
        "suggestField" {
            "type" : "completion"
            "index_analyzer" : "stopword",
            "search_analyzer" : "simple",
        }
    }
}

While the type field is mandatory, the index_analyzer and search_analyzer fields can be omitted. The simple analyzer is used by default.

Payloads

If you want to return payloads, you have to explicitely enable them by using payloads: true - payloads can contain arbitrary JSON, but must be a JSON object, with opening { and closing } - no pure strings or arrays allowed.

Preserve separators

In addition, you can set preserve_separators: false in case you in case you want to return "Foo Fighters" when searching for "foof" (using the correct analyzer of course).

Preserve position increments

You can set preserve_position_increments: false in order to not count increase position increments, which is needed if the first word is a stopword and you are using an analyzer to filter out stopwords. This would allow you to suggest for b and get back The Beatles

Indexing

Simple case

The most simple case to index is like this

"suggestField" : [ "The Prodigy Firestarter", "Firestarter"]

Depending on the analyzer used

Outputs

Defining an output will always return the output for a found suggestion.

"suggestField" : { 
  "input" : [ "The Prodigy Firestarter", "Firestarter"],
  "output" : "The Prodigy, Firestarter",
}

Weights

You should define custom weights instead of relying on the default one (see the drawbacks section). The weight must be an positive integer (no float) and defines the order of your suggestions.

"suggestField" : { 
  "input" : [ "The Prodigy Firestarter", "Firestarter"],
  "output" : "The Prodigy, Firestarter",
  "weight" : 42
}

Also custom weights can make your suggestions valuable. Using weights you could boost the most played song or the best rated hotel first in your suggestions.

Search

Searches are working exactly like the phrase and term suggesters

curl -X POST 'localhost:9200/music/_suggest' -d '{
    "song-suggest" : {
        "text" : "nev",
        "completion" : {
            "field" : "suggest"
        }
    }
}'

Drawbacks

Using term frequency as default weight

If you do not specify a weight, the term frequency is used. This only makes sense if you optimize to a single segment or have large segments. If you do not, having custom weights might yield the results you are awaiting. So using term frequences as a weight indicator is not the best solution and you should set weight yourself.

@ghost ghost assigned spinscale Jul 24, 2013
spinscale added a commit to spinscale/elasticsearch that referenced this issue Jul 30, 2013
This commit introduces near realtime suggestions. For more information about
its usage refer to github issue elastic#3376

From the implementation point of view, a custom AnalyzingSuggester is used
in combination with a custom postingsformat (which is not exposed to the user
anywhere for him to use).
spinscale added a commit to spinscale/elasticsearch that referenced this issue Jul 31, 2013
This commit introduces near realtime suggestions. For more information about
its usage refer to github issue elastic#3376

From the implementation point of view, a custom AnalyzingSuggester is used
in combination with a custom postingsformat (which is not exposed to the user
anywhere for him to use).

Closes elastic#3376
spinscale added a commit that referenced this issue Aug 2, 2013
This commit introduces near realtime suggestions. For more information about
its usage refer to github issue #3376

From the implementation point of view, a custom AnalyzingSuggester is used
in combination with a custom postingsformat (which is not exposed to the user
anywhere for him to use).

Closes #3376
mute pushed a commit to mute/elasticsearch that referenced this issue Jul 29, 2015
This commit introduces near realtime suggestions. For more information about
its usage refer to github issue elastic#3376

From the implementation point of view, a custom AnalyzingSuggester is used
in combination with a custom postingsformat (which is not exposed to the user
anywhere for him to use).

Closes elastic#3376
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant