Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Long lines are corrupted when read by _readNextSlow #14

Closed
hbs opened this issue Feb 11, 2017 · 4 comments
Closed

Long lines are corrupted when read by _readNextSlow #14

hbs opened this issue Feb 11, 2017 · 4 comments
Milestone

Comments

@hbs
Copy link

hbs commented Feb 11, 2017

RawTextFileReader corrupts lines which span more than 32000 bytes (twice the size of the allocated buffer).

When no CR/LF is found within the first 16000 bytes, _readNextSlow is called. This method has a while loop but the loop will ignore any chunk of 16000 bytes which does not contain a CR/LF thus leading to corrupted input.

@cowtowncoder
Copy link
Owner

Oh. That's obviously not intentional.
Thank you for reporting this!

I'll be happy to fix that. Do you happen to have a simple reproduction? (I assume you encountered this with some code so in case you have anything that'd help -- I will need a regression test at any rate, to ensure it won't break again).

@hbs
Copy link
Author

hbs commented Feb 12, 2017

Reproduction is very simple, create a file with at least one line of length 40000 (or anything > 2 x 16000).
example.txt

The attaches example file has a single line composed of 16000 '0', 16000 '1' and 15990 '2'.

When you sort this file with TextFileSorter which calls RawTextLineReader (and _readNextSlow), the result is a file with a single line containing only the '0's and '2's, the '1's have been discarded as they occupy the second 16000 block.

@cowtowncoder cowtowncoder added this to the 1.0 milestone Feb 14, 2017
@cowtowncoder cowtowncoder modified the milestones: 1.0, 1.0.1 Feb 14, 2017
@cowtowncoder
Copy link
Owner

How embarrassing. :)

Thank you for reporting the issue; I added a simple test, fixed the issue, and will release 1.0.1 next.
Should be available via Maven Central within couple of hours.

@hbs
Copy link
Author

hbs commented Feb 14, 2017

Thanks for the quick fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants