You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
RawTextFileReader corrupts lines which span more than 32000 bytes (twice the size of the allocated buffer).
When no CR/LF is found within the first 16000 bytes, _readNextSlow is called. This method has a while loop but the loop will ignore any chunk of 16000 bytes which does not contain a CR/LF thus leading to corrupted input.
The text was updated successfully, but these errors were encountered:
Oh. That's obviously not intentional.
Thank you for reporting this!
I'll be happy to fix that. Do you happen to have a simple reproduction? (I assume you encountered this with some code so in case you have anything that'd help -- I will need a regression test at any rate, to ensure it won't break again).
Reproduction is very simple, create a file with at least one line of length 40000 (or anything > 2 x 16000). example.txt
The attaches example file has a single line composed of 16000 '0', 16000 '1' and 15990 '2'.
When you sort this file with TextFileSorter which calls RawTextLineReader (and _readNextSlow), the result is a file with a single line containing only the '0's and '2's, the '1's have been discarded as they occupy the second 16000 block.
Thank you for reporting the issue; I added a simple test, fixed the issue, and will release 1.0.1 next.
Should be available via Maven Central within couple of hours.
RawTextFileReader corrupts lines which span more than 32000 bytes (twice the size of the allocated buffer).
When no CR/LF is found within the first 16000 bytes, _readNextSlow is called. This method has a while loop but the loop will ignore any chunk of 16000 bytes which does not contain a CR/LF thus leading to corrupted input.
The text was updated successfully, but these errors were encountered: