Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

search_analyzer无法设置为hanlp_nlp,设置为hanlp可以成功 #38

Closed
zhengyangyong opened this issue Jul 23, 2019 · 7 comments
Closed
Labels

Comments

@zhengyangyong
Copy link

{
    "properties": {
        "content": {
            "type": "text",
            "analyzer": "hanlp_nlp",
            "search_analyzer": "hanlp_nlp"
        },
        "remark": {
            "type": "text",
            "analyzer": "hanlp_nlp",
            "search_analyzer": "hanlp_nlp"
        }
    }
}
@KennFalcon
Copy link
Owner

这块可能是采用了原来的版本设置nlp分词方式的问题,你可以用最近的代码重新编译一个包,再进行测试 #35

@zhengyangyong
Copy link
Author

赞,我试试

@zhengyangyong
Copy link
Author

zhengyangyong commented Jul 25, 2019

不行,我更新<elasticsearch.version>6.8.1</elasticsearch.version>和<hanlp.version>portable-1.7.4</hanlp.version>,重新编译,不行

@KennFalcon
Copy link
Owner

这块我会尽快修复,且crf和nlp分词需要hanlp的模型,默认的插件内部的数据包是简化数据包,所以需要你自己去HanLP那边下载指定的数据包

@zhengyangyong
Copy link
Author

数据包我下载了,indexing文档没问题,只是搜索用hanlp效果有点差,需要都能够设置为hanlp_nlp

@KennFalcon
Copy link
Owner

KennFalcon commented Jul 27, 2019

NLP和CRF分词设置修复了,请查看一下:

POST _analyze
{
  "text": "我的希望是希望张晚霞的背影被晚霞映红",
  "analyzer": "hanlp_nlp"
}

结果:

{
  "tokens" : [
    {
      "token" : "我",
      "start_offset" : 0,
      "end_offset" : 1,
      "type" : "r",
      "position" : 0
    },
    {
      "token" : "的",
      "start_offset" : 1,
      "end_offset" : 2,
      "type" : "u",
      "position" : 1
    },
    {
      "token" : "希望",
      "start_offset" : 2,
      "end_offset" : 4,
      "type" : "vn",
      "position" : 2
    },
    {
      "token" : "是",
      "start_offset" : 4,
      "end_offset" : 5,
      "type" : "v",
      "position" : 3
    },
    {
      "token" : "希望",
      "start_offset" : 5,
      "end_offset" : 7,
      "type" : "v",
      "position" : 4
    },
    {
      "token" : "张",
      "start_offset" : 7,
      "end_offset" : 8,
      "type" : "q",
      "position" : 5
    },
    {
      "token" : "晚霞",
      "start_offset" : 8,
      "end_offset" : 10,
      "type" : "n",
      "position" : 6
    },
    {
      "token" : "的",
      "start_offset" : 10,
      "end_offset" : 11,
      "type" : "u",
      "position" : 7
    },
    {
      "token" : "背影",
      "start_offset" : 11,
      "end_offset" : 13,
      "type" : "n",
      "position" : 8
    },
    {
      "token" : "被",
      "start_offset" : 13,
      "end_offset" : 14,
      "type" : "p",
      "position" : 9
    },
    {
      "token" : "晚霞",
      "start_offset" : 14,
      "end_offset" : 16,
      "type" : "n",
      "position" : 10
    },
    {
      "token" : "映红",
      "start_offset" : 16,
      "end_offset" : 18,
      "type" : "v",
      "position" : 11
    }
  ]
}

@KennFalcon KennFalcon added the bug label Jul 30, 2019
@KennFalcon
Copy link
Owner

如还有问题,请重开issue,我先关闭了

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants