Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

查询时空格问题 #7

Closed
zhaohailong opened this issue Jul 11, 2018 · 1 comment
Closed

查询时空格问题 #7

zhaohailong opened this issue Jul 11, 2018 · 1 comment

Comments

@zhaohailong
Copy link

分词时,空值建了索引

@KennFalcon
Copy link
Owner

HanLP默认分词器是都不会开过滤停止词的,所以在新的插件代码中,我加入了过滤停止词的方法,可以拉一下最新的看看,新的对应release是6.3.2,如果需要低版本的话,可自行编译打包。设置开启停止词等方法如下:
首先设置索引分词器

PUT test/
{
    "index" : {
        "analysis" : {
            "analyzer" : {
                "my_hanlp_analyzer" : {
                    "tokenizer" : "my_hanlp"
                    }
            },
            "tokenizer" : {
                "my_hanlp" : {
                    "type" : "hanlp",
                    "enable_stop_dictionary" : true
                }
            }
        }
    }
}

验证效果

GET test/_analyze
{
  "text": "美国,|=阿拉斯加州发生8.0级地震",
  "analyzer": "my_hanlp_analyzer"
}
{
  "tokens": [
    {
      "token": "美国",
      "start_offset": 0,
      "end_offset": 2,
      "type": "nsf",
      "position": 0
    },
    {
      "token": "阿拉斯加州",
      "start_offset": 0,
      "end_offset": 5,
      "type": "nsf",
      "position": 1
    },
    {
      "token": "发生",
      "start_offset": 0,
      "end_offset": 2,
      "type": "v",
      "position": 2
    },
    {
      "token": "地震",
      "start_offset": 0,
      "end_offset": 2,
      "type": "n",
      "position": 3
    }
  ]
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants