You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When I tried: GET localhost:9200/question/_analyze { "analyzer" : "vn_html_analyzer", "text" : "<p>đỗ đại học</p>" }
It throws error: {"error":{"root_cause":[{"type":"remote_transport_exception","reason":"[7d1c0721c1d6][172.17.0.2:9300][indices:admin/analyze[s]]"}],"type":"string_index_out_of_bounds_exception","reason":"String index out of range: -1"},"status":500}
When I replace the tokenizer "vi_tokenizer" by "standard", the error did not occur
I'm using elasticsearch 7.3.1, elasticsearch-analysis-vietnamese 7.3.1 and install it using dockerfile:
RUN cd /usr/share/elasticsearch &&
bin/elasticsearch-plugin install --batch file:///usr/share/elasticsearch/elasticsearch-analysis-vietnamese-7.3.1.zip &&
bin/elasticsearch-plugin install analysis-icu
The text was updated successfully, but these errors were encountered:
seta-hainguyen
changed the title
Tokenizer "vi_tokenizer" doesn't work with character filer "html_strip"
Tokenizer "vi_tokenizer" doesn't work with character filter "html_strip"
Apr 24, 2021
@seta-hainguyen The version 7.3.1 with old VnTokenizer has a lot of issues, I switched to use another tokenizer from CocCoc team for the plugin so I don't maintain the plugin with VnTokenizer any more.
Currently plugin is compatible to ES v7.4.0 and later, you can refer the document to build the plugin with version you expect.
@seta-hainguyen The CocCoc tokenizer is written in C++, so you have to build it as shared library on Elasticsearch node which you intend to install the plugin on.
The ES plugin is compatible with Java 8 and later.
Here is my settings for analyzer:
"vn_html_analyzer": { "filter": [ "icu_folding" ], "char_filter": [ "html_strip" ], "type": "custom", "tokenizer": "vi_tokenizer" }
When I tried:
GET localhost:9200/question/_analyze { "analyzer" : "vn_html_analyzer", "text" : "<p>đỗ đại học</p>" }
It throws error:
{"error":{"root_cause":[{"type":"remote_transport_exception","reason":"[7d1c0721c1d6][172.17.0.2:9300][indices:admin/analyze[s]]"}],"type":"string_index_out_of_bounds_exception","reason":"String index out of range: -1"},"status":500}
When I replace the tokenizer "vi_tokenizer" by "standard", the error did not occur
I'm using elasticsearch 7.3.1, elasticsearch-analysis-vietnamese 7.3.1 and install it using dockerfile:
FROM elasticsearch:7.3.1
COPY elasticsearch-analysis-vietnamese-7.3.1.zip /usr/share/elasticsearch/
RUN cd /usr/share/elasticsearch &&
bin/elasticsearch-plugin install --batch file:///usr/share/elasticsearch/elasticsearch-analysis-vietnamese-7.3.1.zip &&
bin/elasticsearch-plugin install analysis-icu
The text was updated successfully, but these errors were encountered: