Skip to content

codelibs/elasticsearch-analysis-synonym

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Elasticsearch Analysis Synonym

Overview

Elasticsearch Analysis Synonym Plugin provides NGramSynonymTokenizer. For more details, see LUCENE-5252.

Version

Versions in Maven Repository

Issues/Questions

Please file an issue. (Japanese forum is here.)

Installation

For 5.x

$ $ES_HOME/bin/elasticsearch-plugin install org.codelibs:elasticsearch-analysis-synonym:5.3.0

For 2.x

$ $ES_HOME/bin/plugin install org.codelibs/elasticsearch-analysis-synonym/2.4.0

Getting Started

Create synonym.txt File

First of all, you need to create a synonym dictionary file, synonym.txt in $ES_CONF(ex. /etc/elasticsearch). (The following content is just a sample...)

$ cat /etc/elasticsearch/synonym.txt
あ,かき,さしす,たちつて,なにぬねの

Create Index

NGramSynonymTokenizer is defined as "ngram_synonym" type. Creating an index with "ngram_synonym" is below:

$ curl -XPUT localhost:9200/sample?pretty -d '
{
  "settings":{
    "index":{
      "analysis":{
        "tokenizer":{
          "2gram_synonym":{
            "type":"ngram_synonym",
            "n":"2",
            "synonyms_path":"synonym.txt"
          }
        },
        "analyzer":{
          "2gram_synonym_analyzer":{
            "type":"custom",
            "tokenizer":"2gram_synonym"
          }
        }
      }
    }
  },
  "mappings":{
    "item":{
      "properties":{
        "id":{
          "type":"string",
          "index":"not_analyzed"
        },
        "msg":{
          "type":"string",
          "analyzer":"2gram_synonym_analyzer"
        }
      }
    }
  }
}'

and then insert data:

$ curl -XPOST localhost:9200/sample/item/1 -d '
{
  "id":"1",
  "msg":"あいうえお"
}'

Check Search Results

Try searching...

$ curl -XPOST "http://localhost:9200/sample/_search" -d '
{
   "query": {
      "match_phrase": {
         "msg": "あ"
      }
   }
}'

$ curl -XPOST "http://localhost:9200/sample/_search" -d '
{
   "query": {
      "match_phrase": {
         "msg": "あい"
      }
   }
}'

$ curl -XPOST "http://localhost:9200/sample/_search" -d '
{
   "query": {
      "match_phrase": {
         "msg": "かき"
      }
   }
}'

$ curl -XPOST "http://localhost:9200/sample/_search" -d '
{
   "query": {
      "match_phrase": {
         "msg": "かきい"
      }
   }
}'

Reload synonyms_path File Dynamically

To add "dynamic_reload" property as true, NGramSynonymTokenizer reloads synonyms_path file on the fly(actually, it's reload on reset() method call). If you want to change an interval time to check a file timestamp, add "reload_interval".

$ curl -XPUT localhost:9200/sample?pretty -d '
{
  "settings":{
    "index":{
      "analysis":{
        "tokenizer":{
          "2gram_synonym":{
            "type":"ngram_synonym",
            "n":"2",
            "synonyms_path":"synonym.txt",
            "dynamic_reload":true,
            "reload_interval":"10s"
          }
        },
...