Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multithreaded parsing error should be warning #1095

Closed
adienes opened this issue Jun 9, 2023 · 7 comments · Fixed by #1099
Closed

Multithreaded parsing error should be warning #1095

adienes opened this issue Jun 9, 2023 · 7 comments · Fixed by #1099

Comments

@adienes
Copy link

adienes commented Jun 9, 2023

┌ Error: Multi-threaded parsing failed (are there newlines inside quoted fields?), falling back to single-threaded parsing

This still returns a parsed CSV file just fine and doesn't crash the program. I think this should display as a warning and not an error

@quinnj
Copy link
Member

quinnj commented Jun 13, 2023

cc: @Liozou

@Liozou
Copy link
Contributor

Liozou commented Jun 13, 2023

Thank you for reporting this! Would it be possible for you to share the offending CSV?

I put this as an error precisely to get this kind CSVs reported here. There should indeed be no apparent issue/crash since a fallback was implemented, but it signals a possible mistake in the code and in any case it causes a severe performance drop for CSV parsing (since it falls back to a single thread).
I guess it should be displayed as a warning instead of an error when we can internally check that the cause of this is well-known (i.e. newlines inside quoted fields) and there is nothing more to be done than let the user know of the expected performance loss.

@adienes
Copy link
Author

adienes commented Jun 17, 2023

DKWinners_20230603.csv

sure, here is an example of such a CSV. these are results from sports betting contests. let me know if it helps!

@Liozou
Copy link
Contributor

Liozou commented Jun 17, 2023

The root issue is that starting from line 30 (Gummypecs (2/3)), each line only contains 10 columns instead of 12, i.e. there should be two more commas at the end of each line. But the @error message is quite misleading here I agree, I'll try to fix that.

@adienes
Copy link
Author

adienes commented Jun 17, 2023

cool! thank you for the quick responses

@adienes
Copy link
Author

adienes commented Sep 27, 2023

btw this is not fixed for me :)

maybe it is because I set silencewarnings = true, so it is never true that numwarnings[] > 0 ?

@Liozou
Copy link
Contributor

Liozou commented Oct 1, 2023

I think this is just because the latest released version of CSV.jl does not include the bugfix. You can try to ]dev CSV just to check that the bug is indeed fixed. @quinnj would you mind tagging a new patch version?
Otherwise, in the code, both versions of the warnings are under the condition that !ctx.silencewarnings so nothing should appear, unless I'm missing something...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants