Skip to content

codere/elasticsearch-suggest-plugin

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

47 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Suggester Plugin for Elasticsearch

This little plugin uses the FSTSuggeser from lucene to create suggestions from a certain field for a specified term instead of returning index data.

THIS IS NOT PRODUCTION READY! DO NOT USE IT.

This is my first attempt with elasticsearch. I am not too deep into elasticsearch internals, nor I have deep knowledge about lucene. So please forgive this code.
Feel free to comment, improve and help – I am thankful for any insights, no matter whether you want to help with elasticsearch, lucene or my other flaws I will have done for sure.

Oh and in case you have not read it above:

THIS IS NOT PRODUCTION READY! DO NOT USE IT.

In case you want to contact me, drop me a mail at alexander@reelsen.net

Installation

If you do not want to work on the repository, just use the standard elasticsearch plugin command (inside your elasticsearch/bin directory)

bin/plugin -url https://github.com/downloads/spinscale/elasticsearch-suggest-plugin/elasticsearch-suggest-0.0.4-0.19.0.zip -install suggest

If you want to work on the repository

  • Clone this repo with git clone git://github.com/spinscale/elasticsearch-suggest-plugin.git
  • Run: gradle clean assemble zip – this does not run any unit tests, as they take some time. If you want to run them, better run gradle clean build zip
  • Install the plugin: /path/to/elasticsearch/bin/plugin -install elasticsearch-suggest -url file:///$PWD/build/distributions/elasticsearch-suggest-$version.zip

Usage

Fire up curl like this, in case you have a products index and the right fields – if not, read below how to setup a clean elasticsearch in order to support suggestions.


# curl -X POST 'localhost:9200/products1/product/_suggest?pretty=1' -d '{ "field": "ProductName.suggest", "term": "tischwäsche", "size": "10"  }'
{
  "suggest" : [ "tischwäsche", "tischwäsche 100", 
    "tischwäsche aberdeen", "tischwäsche acryl", "tischwäsche ambiente", 
    "tischwäsche aquarius", "tischwäsche atlanta", "tischwäsche atlas", 
    "tischwäsche augsburg", "tischwäsche aus", "tischwäsche austria" ]
}

As you can see, this queries the products index for the field ProductName.suggest with the specified term and size

You might want to check out the included unit test as well. I use a shingle filter in my examples, take a look at the files in src/test/resources directory.

Furthermore the suggest data is not updated, whenever you index a new product but every few minutes. The default is to update the index every 10 minutes, but you can change that in your elasticsearch.yml configuration:


suggest:
  refresh_interval: 600s

In this case the suggest indexes are refreshed every 10 minutes. This is also the default. You can use values like “10s”, “10ms” or “10m” as with most other time based configuration settings in elasticsearch.

If you want to refresh your FST suggesters manually instead of waiting for 10 minutes just issue a POST request to the “/_suggestRefresh” URL.


# curl -X POST 'localhost:9200/_suggestRefresh' 

Usage from Java

Inject the NodeClientWithSuggest via Guice and use it


private NodeClientWithSuggest client;

@Inject public ConstructorOfYourClass(NodeClientWithSuggest client) {
    this.client = client;
}

public List<String> getMySuggestions(String term, String field, String index, Integer size, Float similarity) {
    SuggestRequest request = new SuggestRequest(index);
    request.term(term);
    request.field(field);
    request.size(size);
    request.similarity(similarity);

    SuggestResponse response = client.suggest(request).actionGet()

    return response.sugggestions();
}

Refresh works like this – there is no information in the response:


    NodesSuggestRefreshRequest refreshRequest = new NodesSuggestRefreshRequest();
    NodesSuggestRefreshResponse response = client.suggestRefresh(request).actionGet()

You can also use the included builders


List<String> suggestions = new SuggestRequestBuilder(client)
            .field(field)
            .term(term)
            .size(size)
            .similarity(similarity)
            .execute().actionGet().suggestions();

    SuggestRefreshRequestBuilder builder = new SuggestRefreshRequestBuilder(client);
    builder.execute().actionGet();

Thanks

  • Shay for giving feedback

TODO

Changelog

  • 2012-03-07: Updated to work with elasticsearch 0.19.0
  • 2012-02-10: Created SuggestRequestBuilder and SuggestRefreshRequestBuilder classes – results in easy to use request classes (check the examples and tests)
  • 2011-12-29: The refresh interval can now be chosen as time based value like any other elasticsearch configuration
  • 2011-12-29: Instead of having all nodes sleeping the same time and updating the suggester asynchronously, the master node now triggers the update for all slaves
  • 2011-12-20: Added transport action (and REST action) to trigger reloading of all FST suggesters
  • 2011-12-11: Fixed the biggest issues: Searchers are released now and do not leak
  • 2011-12-11: Indexing is now done periodically
  • 2011-12-11: Found a way to get the injector from the node, so I can build my tests without using HTTP requests

HOWTO – the long version

This HOWTO will help you to setup a clean elasticsearch installation with the correct index settings and mappings, so you can use the plugin as easy as possible.
We will setup elasticsearch, index some products and query those for suggestions.

  • Get elasticsearch, install it
  • Get this plugin, install it
  • Add a suggest and a lowercase analyzer to your elasticsearch/config/elasticsearch.yml config file
    index:
      analysis:
        analyzer:
          lowercase_analyzer:
            type: custom
            tokenizer: standard
            filter: [standard, lowercase] 
          suggest_analyzer:
            type: custom
            tokenizer: standard
            filter: [standard, lowercase, shingle]
    
  • Start elasticsearch
  • Now a mapping has to be created. You can either create it via configuration in a file or during index creation. We will create an index with a mapping now
    curl -X PUT localhost:9200/products -d '{
        "mappings" : {
            "product" : {
                "properties" : {
    	        "ProductId":	{ "type": "string", "index": "not_analyzed" },
    	        "ProductName" : {
    	            "type" : "multi_field",
    	            "fields" : {
    	                "ProductName":  { "type": "string", "index": "not_analyzed" },
    	                "lowercase":    { "type": "string", "analyzer": "lowercase_analyzer" },
    	                "suggest" :     { "type": "string", "analyzer": "suggest_analyzer" }
    	            }
    	        }
                }
            }
        }
    }'
  • Now lets add some products
    for i in 1 2 3 4 5 6 7 8 9 10 100 101 1000; do
        json=$(printf '{"ProductId": "%s", "ProductName": "%s" }', $i, "My Product $i")
        curl -X PUT localhost:9200/products/product/$i -d "$json"
    done

Queries

Time to query and understand the different analyzers, returns 10 matches
  • Queries the not analyzed field, returns 10 matches (default), always the full product name:
    curl -X POST localhost:9200/products/product/_suggest -d '{ "field": "ProductName", "term": "My" }'
  • Queries the not analyzed field, returns nothing (because lowercase):
    curl -X POST localhost:9200/products/product/_suggest -d '{ "field": "ProductName", "term": "my" }'
  • Queries the lowercase field, returns only the occuring word (which is pretty bad for suggests):
    curl -X POST localhost:9200/products/product/_suggest -d '{ "field": "ProductName.lowercase", "term": "m" }'
  • Queries the suggest field, returns two words (this is the default length of the shingle filter), in this case “my” and “my product”
    curl -X POST localhost:9200/products/product/_suggest -d '{ "field": "ProductName.suggest", "term": "my" }'
  • Queries the suggest field, returns ten product names as we started with the second word + another one due to the shingle
    curl -X POST localhost:9200/products/product/_suggest -d '{ "field": "ProductName.suggest", "term": "product" }'
  • Queries the suggest field, returns all products with “product 1” in the shingle
    curl -X POST localhost:9200/products/product/_suggest -d '{ "field": "ProductName.suggest", "term": "product 1" }'
  • The same query as above, but limits the result set to two
    curl -X POST localhost:9200/products/product/_suggest -d '{ "field": "ProductName.suggest", "term": "product 1", "size": 2 }'
  • And last but not least, typo finding, the query without similarity parameter set returns nothing:
    curl -X POST localhost:9200/products/product/_suggest -d '{ "field": "ProductName.suggest", "term": "proudct", similarity: 0.7 }'

The similarity is a float between 0.0 and 1.0 – if it is not specified 1.0 is used, which means it must equal. I’ve found 0.7 ok for cases, when two letters were exchanged, but mileage may very as I tested merely on german product names.

With the tests I did, a shingle filter held the best results. Please check http://www.elasticsearch.org/guide/reference/index-modules/analysis/shingle-tokenfilter.html for more information about setup, like the default tokenization of two terms.

Now test with your data, come up and improve this configuration. I am happy to hear about your specific configuration for successful suggestion queries.

About

Plugin for elasticsearch which uses the lucene FSTSuggester

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published