Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

loss data after sort #13

Closed
sing1ee opened this issue Sep 8, 2014 · 1 comment
Closed

loss data after sort #13

sing1ee opened this issue Sep 8, 2014 · 1 comment

Comments

@sing1ee
Copy link

sing1ee commented Sep 8, 2014

$ wc -l ngram.data; wc -l ngram_sort.data
 10069731 ngram.data
 10067458 ngram_sort.data

after sorting, I lost about 2273 lines? why?

@cowtowncoder
Copy link
Owner

Hard to say without data, or code being used (to know input processor). Should not be due to duplicate lines (which are fine). Theoretically could have something to do with differences in linefeed detection -- java-merge-sort accepts all three (\n, \r and \r\n; windows and macos use different ones), not sure if wc does.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants