StackOverflow in completion suggester when using long input strings #3596

mattweber · 2013-08-29T17:17:23Z

I ran into a StackOverflowError when using the new completion suggestions on a long string. I reproduced this using the latest build of master and a random string:

[2013-08-29 10:10:36,691][DEBUG][action.index             ] [Summers, Christopher] [music][4], node[MKSIixv6QQaAu-ws3rs9_g], [P], s[STARTED]: Failed to execute [index {[music][song][10], source[{
    "name" : "long string",
    "suggest" : { 
        "input": [ "GBDGUPXWENKY8HVZDZ0ZB6K0DU7QHAUC68JYRKCIU1GOLNDLRQF693S9UNB0OZL0TXL25LVUGNW03KOBXZK050HSXD37YJAA2JTEFGOG2552C0C0C6Z2G8WCH5Y78KDBY9D0950ZRPSMV3ZC3CPAZGP2D4QJ4FJE1ZQIQUYN6CBMUE2DDB4UV71AH4MR5E2F86NQN67I0STDK7I413DR6N5LSG1OK6YICL09VRIIN2FMKWEP242HYDG3GITQ4DPAMUL5N4S8HB7W7H7C6B7ZFTGO2RRLYB56GE44XD6B42XZVBP8IJROUHTPU7SFI59D1T11XH9FD6KDX7AV84K4KWHK48BQI1VZNZQRN1VC6YM8KLS4PGNGLFHRN302J92U90XM8H2AB94AFPFDTG559FHRX7ANRDUBYLPCR5U9O268HU783MOLP1A8TSU7NJWM6RG7ANPZL9NPQD1TRB3RPJNCT5IJS7QI6MI1HJG86XU8BXRLRU81ITKEXR1X7H8EXAKM1ER9DFMAS7O2WJB4W26Z6ETZ3UZKB55LDZEBW9XEDE7PW6ITA0CHUWXTUHSVID0ID5TTGG6U9J2Z72OV0ROI5KNRGDHYMT8O1S2GWA5EZFJJEGWS0R3SB8WVBL7XPCTD5WX314AEM8ZYW2APIM1GZXV17VLJJ9CF4TMD87X3SH4U1UQVOWCR27N4ZXH0UZ0VC545GPWK9EKJ8HLBRKIBGUHAESMJ930RF6SPMYVH4R02E54NBTBLOQRH92H49TJFT9SG3VU4WGV392DLIGTJUOH80RTIIL86N675GVTNUX7Z89FH8CU2S5XNW0FD6TG3DFU3WCDCJN03D6TYESIAZW6J7CCMMUZTNF6ON07BOGLMLYZRTQZB4O8MYOWQGI1185ALZSTO3HQFZBOSMHXQ7OKTZWHDI6O5SB1NWDF7PVBFNSM11QEGQKI6E52DVG44H3M4OIOT9MDZYOUBFVHV12E8T911YXSZJ74ZOM2JOUMBAM8ZOUFF5X9JAP6IY2E983LWK58BLPPEANQ6KRJZG256XQY2LJJMA7WESVGXMF2RP1XZ5FWCOCFCPHWI2FAYI68JN4N5EEKD3P3U6XALSYJXJWXWOSHT4FYDXDMR7W0VWVLX5C8YSU3SFYA2K0Q1YKVEP1RRWV30ZJMT9B5DDBWN8MHUHX7L3IROSZ6493S9IE8VFOMAYZ0HUMRXO6PS6JO0C54QB0BCUTJ4B1XBHIW6F7DNJMVI539YNERSUBAGGVQW3HR9SI53EODVSHKGI46WJICOCZG8CHD4LQBCWHFTREJ3JHVJGZ8884GGRX456VCGUAIXQL8PPHGW7S34U5KT9OY7O5ZXAFJF4IS9M1PIVU5VTLU44DJLSLQIHMNR5HMWDGIRI0WLHQE0KNRV7OL1JD9PGKUIG8JCI7D0VOT0BN2EQC23GSGL02SZE5314S1GHD1KENX1TD2SXWN272XR6BBL2BDH72XKF9R9482GDOO88ZN0DWTV0QI20SUOMAZQO6JEA7T3YR5FIU8TRX34YER1WPIZ20T7KR8JU71FAEXJYF74PFZF2BCRLN1E93HGN6GFYPTKESI2THT3R4YX7KA0M44G6TZ209SKSFAT2G5PUG0ERGKIHT1XIZTRJOGMG7T8NVCWWQSPIH3JAQ51E0ZXDFXSMJ31I7ABEWWDAF0KTNYRNWDE1R5V89HT9OLHDUZ7IXXRSYXUX2K3956MD0AF1NLGSS5U3D2QFPATN4AAW4IDYULEX67ZC5O1ZPQ9HGEGVFO0FQUFDG92WZIFEMIY0Z0Q66SUZ2N553D3VBLZIPGL2AHP11XEDJ5BU4R9ZJX3UWFCV6YCIXU5SRQEUYQZRQG7EDBQB8M7KIW4GIZQ3KMN8ISQUT5SRODI45O9GC5LG1P522U6092NTVMU7PLGEEGHLXPL1UCL3TCXQ8VS4SDW65MEVMWY9UNRWEHFMOVO7ABIYUUIU90FRIPATGZSGD7XOW5NWW4OYW8WQ91HSLVJUWUNIIDAU08GSZ654PXN5VMLNZ6N" ],
        "output": "long string",
        "payload" : { "artistId" : 2310 }, "weight": 1
    }
}]}]
java.lang.StackOverflowError
    at java.util.HashMap.addEntry(HashMap.java:856)
    at java.util.HashMap.put(HashMap.java:484)
    at java.util.HashSet.add(HashSet.java:217)
    at org.apache.lucene.util.automaton.SpecialOperations.getFiniteStrings(SpecialOperations.java:244)
    at org.apache.lucene.util.automaton.SpecialOperations.getFiniteStrings(SpecialOperations.java:259)
    at org.apache.lucene.util.automaton.SpecialOperations.getFiniteStrings(SpecialOperations.java:259)
    at org.apache.lucene.util.automaton.SpecialOperations.getFiniteStrings(SpecialOperations.java:259)
    at org.apache.lucene.util.automaton.SpecialOperations.getFiniteStrings(SpecialOperations.java:259)...

The text was updated successfully, but these errors were encountered:

s1monw · 2013-08-29T20:40:57Z

dude this is a long string... yet, I agree we need to protect the users from adding too long strings. I think we should just cut off at some point. Not sure what is a good default but maybe 30 chars? who types 30 chars in a suggest env. :)

nik9000 · 2013-08-29T20:55:52Z

You might want to let us configure the limit and advise on stack issues. We don't use the completion suggester yet but I can see users copying and pasting longish strings into my suggestion environment and expecting it to work. We limit the field to 255 characters which seems a bit long to me but I suspect would need more than 30. 100, maybe?

Also, I bet the data structure become a lot less awesome with long strings.

s1monw · 2013-08-29T21:02:23Z

You might want to let us configure the limit and advise on stack issues

yeah length should be configurable.

We limit the field to 255 characters which seems a bit long to me but I suspect would need more than 30. 100, maybe?

I don't think this is needed, even if you paste a longish string in the search bar you will get a suggestion but only for the top N characters. I am not sure if we should provide any if you exceed the limit. your query might be specific enough?

mattweber · 2013-08-29T21:13:08Z

Yea, I agree this is long. This came up while indexing some user generated data and not doing a length check. We are using 30 character max now and that works great.

nik9000 · 2013-08-29T21:14:03Z

Admittedly this is kind of lame, but what if someone pasted Wikipedia:List of Wikipedians by featured article nominations into the search bar and we limited it to 30 characters? We'd end up searching for Wikipedia:List of Wikipedians which would drive people wild. I'm not sure what the right answer is. Maybe switch to a phrase prefix search if the string is too long.

mattweber · 2013-08-29T21:16:08Z

Just to be clear this issue is during indexing and creation of the FST, not at search time to get suggestions.

s1monw · 2013-08-29T21:17:05Z

@nik9000 I think there is a misunderstanding here. We would always search for the entire string. But the prefix suggester will will not build it's prefix dictionary for more than N leading characters. I think this should be perfectly fine and no other suggester impl is affected.

nik9000 · 2013-08-29T21:20:54Z

Oh sorry! So you'd still suggest the right thing even though the prefix dictionary doesn't spell it out exactly. Cool. Sorry for the confusion.

Restrict the size of the input length to a reasonable size otherwise very long strings can cause StackOverflowExceptions deep down in lucene land. Yet, this is simply a saftly limit set to `50` UTF-16 codepoints by default. This limit is only present at index time and not at query time. If prefix completions > 50 UTF-16 codepoints are expected / desired this limit should be raised. Critical string sizes are beyone the 1k UTF-16 Codepoints limit. Closes elastic#3596

Restrict the size of the input length to a reasonable size otherwise very long strings can cause StackOverflowExceptions deep down in lucene land. Yet, this is simply a saftly limit set to `50` UTF-16 codepoints by default. This limit is only present at index time and not at query time. If prefix completions > 50 UTF-16 codepoints are expected / desired this limit should be raised. Critical string sizes are beyone the 1k UTF-16 Codepoints limit. Closes #3596

mikemccand · 2014-05-04T16:54:32Z

See also #5927 and https://issues.apache.org/jira/browse/LUCENE-5628 where we are fixing Lucene's getFiniteStrings to not consume Java stack in proportion to the character length of the index-time or query-time suggestion.

Restrict the size of the input length to a reasonable size otherwise very long strings can cause StackOverflowExceptions deep down in lucene land. Yet, this is simply a saftly limit set to `50` UTF-16 codepoints by default. This limit is only present at index time and not at query time. If prefix completions > 50 UTF-16 codepoints are expected / desired this limit should be raised. Critical string sizes are beyone the 1k UTF-16 Codepoints limit. Closes elastic#3596

s1monw mentioned this issue Sep 2, 2013

Add 'min_input_len' to completion suggester #3603

Closed

s1monw closed this as completed in eb2fed8 Sep 3, 2013

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

StackOverflow in completion suggester when using long input strings #3596

StackOverflow in completion suggester when using long input strings #3596

mattweber commented Aug 29, 2013

s1monw commented Aug 29, 2013

nik9000 commented Aug 29, 2013

s1monw commented Aug 29, 2013

mattweber commented Aug 29, 2013

nik9000 commented Aug 29, 2013

mattweber commented Aug 29, 2013

s1monw commented Aug 29, 2013

nik9000 commented Aug 29, 2013

mikemccand commented May 4, 2014

StackOverflow in completion suggester when using long input strings #3596

StackOverflow in completion suggester when using long input strings #3596

Comments

mattweber commented Aug 29, 2013

s1monw commented Aug 29, 2013

nik9000 commented Aug 29, 2013

s1monw commented Aug 29, 2013

mattweber commented Aug 29, 2013

nik9000 commented Aug 29, 2013

mattweber commented Aug 29, 2013

s1monw commented Aug 29, 2013

nik9000 commented Aug 29, 2013

mikemccand commented May 4, 2014