New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
StackOverflow in completion suggester when using long input strings #3596
Comments
dude this is a long string... yet, I agree we need to protect the users from adding too long strings. I think we should just cut off at some point. Not sure what is a good default but maybe 30 chars? who types 30 chars in a suggest env. :) |
You might want to let us configure the limit and advise on stack issues. We don't use the completion suggester yet but I can see users copying and pasting longish strings into my suggestion environment and expecting it to work. We limit the field to 255 characters which seems a bit long to me but I suspect would need more than 30. 100, maybe? Also, I bet the data structure become a lot less awesome with long strings. |
yeah length should be configurable.
I don't think this is needed, even if you paste a longish string in the search bar you will get a suggestion but only for the top N characters. I am not sure if we should provide any if you exceed the limit. your query might be specific enough? |
Yea, I agree this is long. This came up while indexing some user generated data and not doing a length check. We are using 30 character max now and that works great. |
Admittedly this is kind of lame, but what if someone pasted |
Just to be clear this issue is during indexing and creation of the FST, not at search time to get suggestions. |
@nik9000 I think there is a misunderstanding here. We would always search for the entire string. But the prefix suggester will will not build it's prefix dictionary for more than N leading characters. I think this should be perfectly fine and no other suggester impl is affected. |
Oh sorry! So you'd still suggest the right thing even though the prefix dictionary doesn't spell it out exactly. Cool. Sorry for the confusion. |
Restrict the size of the input length to a reasonable size otherwise very long strings can cause StackOverflowExceptions deep down in lucene land. Yet, this is simply a saftly limit set to `50` UTF-16 codepoints by default. This limit is only present at index time and not at query time. If prefix completions > 50 UTF-16 codepoints are expected / desired this limit should be raised. Critical string sizes are beyone the 1k UTF-16 Codepoints limit. Closes elastic#3596
Restrict the size of the input length to a reasonable size otherwise very long strings can cause StackOverflowExceptions deep down in lucene land. Yet, this is simply a saftly limit set to `50` UTF-16 codepoints by default. This limit is only present at index time and not at query time. If prefix completions > 50 UTF-16 codepoints are expected / desired this limit should be raised. Critical string sizes are beyone the 1k UTF-16 Codepoints limit. Closes #3596
See also #5927 and https://issues.apache.org/jira/browse/LUCENE-5628 where we are fixing Lucene's getFiniteStrings to not consume Java stack in proportion to the character length of the index-time or query-time suggestion. |
Restrict the size of the input length to a reasonable size otherwise very long strings can cause StackOverflowExceptions deep down in lucene land. Yet, this is simply a saftly limit set to `50` UTF-16 codepoints by default. This limit is only present at index time and not at query time. If prefix completions > 50 UTF-16 codepoints are expected / desired this limit should be raised. Critical string sizes are beyone the 1k UTF-16 Codepoints limit. Closes elastic#3596
I ran into a StackOverflowError when using the new completion suggestions on a long string. I reproduced this using the latest build of master and a random string:
The text was updated successfully, but these errors were encountered: