Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

中文简繁体同时存在时的检索 #1043

Open
yuan67-top opened this issue Feb 14, 2024 · 3 comments
Open

中文简繁体同时存在时的检索 #1043

yuan67-top opened this issue Feb 14, 2024 · 3 comments

Comments

@yuan67-top
Copy link

场景

现在在es7.6.2中有一个文档,文档内容有简体字和繁体字同时存在,例如id为1的文档,title=简体字繁體字同時存在,当用户检索输入的是“繁体字”时能否实现让id为1的文档能够命中,或者是用户检索输入的是“时”,该文档也能够命中?要求不能够改变文档原始值,要保证文档内容的准确性。

个人解决思路

1.同义词,该方案存在单字检索不命中的问题。
2.将用户输入全部转为繁体字做检索,该方案存在当该文档检索的该字词为简体时不命中检索的问题。
3.用norialzer,貌似改方案在使用过程中不支持分词。

@medcl
Copy link
Member

medcl commented Feb 19, 2024

3吧,都stconvert norialzer 统一转成简体再ik进行分词。

@yuan67-top
Copy link
Author

norialzer只能是keywrod,不支持分词

@medcl
Copy link
Member

medcl commented Feb 19, 2024

stconvert 的 charfilter 就行了。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants