You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've found three cases where the unexpected presence or absence of whitespace causes offset error or string index out of range error. I'm reporting all three together, since I'm guessing they are related.
Two spaces between elements that should tokenize together causes an error. In this case "không gian" is normally indexed as one token. But if it has two spaces between "không" and "gian" it causes an error:
curl -s localhost:9200/wiki_content/_analyze?pretty -d '{"analyzer": "vi", "text" : "không gian"}'
{
"error" : {
"root_cause" : [
{
"type" : "remote_transport_exception",
"reason" : "[K5DTwrD][127.0.0.1:9300][indices:admin/analyze[s]]"
}
],
"type" : "illegal_argument_exception",
"reason" : "startOffset must be non-negative, and endOffset must be >= startOffset, startOffset=-1,endOffset=9"
},
"status" : 400
}
No space between elements that should tokenize together causes an error. In this case, "năm 6" usually gets tokenized together, but if there's no space in there, I think it still gets split into two tokens, but the lack of space between causes an error:
curl -s localhost:9200/wiki_content/_analyze?pretty -d '{"analyzer": "vi", "text" : "năm6"}'
{
"error" : {
"root_cause" : [
{
"type" : "remote_transport_exception",
"reason" : "[K5DTwrD][127.0.0.1:9300][indices:admin/analyze[s]]"
}
],
"type" : "illegal_argument_exception",
"reason" : "startOffset must be non-negative, and endOffset must be >= startOffset, startOffset=-1,endOffset=4"
},
"status" : 400
}
The text was updated successfully, but these errors were encountered:
I've found three cases where the unexpected presence or absence of whitespace causes offset error or string index out of range error. I'm reporting all three together, since I'm guessing they are related.
The text was updated successfully, but these errors were encountered: