Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Synonyms not expanded with WARN message "MorphemeFieldFilter does nothing, it is not the current consumer" #109

Closed
po3rin opened this issue Sep 18, 2023 · 2 comments

Comments

@po3rin
Copy link

po3rin commented Sep 18, 2023

Thanks for this useful plugin. Let me report a strange behavior.
We want to use sudachi and synonym filter. However, when using synonym filter with sudachi, synonyms are not expanded.

versions

Elasticsearch: v8.8.1
Sudachi Plugin: elasticsearch-8.8.1-analysis-sudachi-3.1.0

With the combination of Elasticsearch7.16.1 and analytics-sudachi-7.16.1-2.1.1, synonyms were expanded without any problems.

How to reproduce

I have published the configuration to reproduce this issue on the research branch of the github repository.

https://github.com/po3rin/sudachi-elasticsearch-sample/tree/morpheme-error-reproduce

# in sudachi-elasticsearch-sample directory
# Place system_core.dic in elasticsearch/sudachi directory ...

docker-compose up -d -build
curl -X PUT "localhost:9200/test"  --header "Content-Type: application/json" -d @"index.json"

check settings

cat elasticsearch/sudachi/sudachi_synonyms.txt
サルコイドーシス,サルコイド

cat index.json
{
  "settings": {
    "analysis": {
      "analyzer": {
        "sudachi_search_analyzer_c": {
          "type": "custom",
          "tokenizer": "sudachi_tokenizer_c",
          "discard_punctuation": true,
          "filter": [
            "synonym_filter",
            "sudachi_pos_filter",
            "sudachi_baseform",
            "sudachi_normalizedform"
            ]
        }
      },
      "tokenizer": {
        "sudachi_tokenizer_c": {
          "type": "sudachi_tokenizer",
          "split_mode": "C",
          "resources_path": "/app/config/sudachi/",
          "discard_punctuation": true
        }
      },
      "filter": {
        "synonym_filter" : {
            "type" : "synonym_graph",
            "synonyms_path": "/app/config/sudachi/sudachi_synonyms.txt"
        },
        "sudachi_pos_filter": {
            "type": "sudachi_part_of_speech",
            "stoptags": [
              "代名詞",
              "形状詞-タリ",
              "形状詞-助動詞語幹",
              "連体詞",
              "接続詞",
              "感動詞",
              "助動詞",
              "助詞",
              "補助記号",
              "空白"
            ]
          }
      }
    }
  },
  "mappings": {
    "dynamic": "strict",
    "properties": {
      "id": {
        "type": "long",
        "index": true
      },
      "body": {
        "type": "text"
      }
    }
  }
}

create index

curl -X PUT "localhost:9200/test"  --header "Content-Type: application/json" -d @"index.json"

call analyze API

GET localhost:9200/test/_analyze
{
  "analyzer" : "sudachi_search_analyzer_c",
  "text" : "サルコイド"
}

logs

{"@timestamp":"2023-09-18T15:47:13.301Z", "log.level": "WARN", "message":"MorphemeFieldFilter does nothing, it is not the current consumer", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"elasticsearch[es01][analyze][T#1]","log.logger":"com.worksap.nlp.lucene.sudachi.ja.MorphemeFieldFilter","elasticsearch.cluster.uuid":"WheyDXkCR0OOWQ8E6DYeSw","elasticsearch.node.id":"Edyp8roBQo6bXutCqhA4VQ","elasticsearch.node.name":"es01","elasticsearch.cluster.name":"docker-cluster"}

I appreciate if you check it.

@eiennohito
Copy link
Collaborator

Sorry for taking time to answer the issue.

The warning itself is correct, but its text is a bit misleading, need to fix that.
Warning text means that you have a filter in the pipeline which basically does nothing, in this case the sudachi_baseform one.
Only the last filter which modifies morphemes stemming from Sudachi will have an effect.

Also I have not tested, but the synonym filter should be the last one in the filter chain, otherwise it will not have any effect.

@po3rin
Copy link
Author

po3rin commented Sep 21, 2023

Thank you for your reply. I considered the advice and made the following changes and the problem is gone.

{
  "settings": {
    "analysis": {
      "analyzer": {
        "sudachi_search_analyzer_c": {
          "type": "custom",
          "tokenizer": "sudachi_tokenizer_c",
          "discard_punctuation": true,
          "filter": [
-           "synonym_filter",
            "sudachi_pos_filter",
-           "sudachi_baseform",
            "sudachi_normalizedform"
+           "synonym_filter",
            ]
        }
      },
      "filter": {
        "synonym_filter" : {
            "type": "synonym_graph",
+           "lenient": true,
            "synonyms_path": "/app/config/sudachi/sudachi_synonyms.txt"
        },

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants