Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

data.table::fread CSV logic fails with complex field #2051

Closed
scarrascoso opened this issue Mar 3, 2017 · 3 comments
Closed

data.table::fread CSV logic fails with complex field #2051

scarrascoso opened this issue Mar 3, 2017 · 3 comments
Labels
Milestone

Comments

@scarrascoso
Copy link

@scarrascoso scarrascoso commented Mar 3, 2017

Hi:

I'm trying to load a 1.6GB csv file with data.table::fread. The process fails at some point complaining about a specific line:

Read 76.2% of 5288107 rowsError in fread("2015-03.csv") :
Expecting 50 cols, but line 4128650 contains text after processing all cols. 
Try again with fill=TRUE. Another reason could be that fread's logic in distinguishing one or more fields having embedded sep=',' 
and/or (unescaped) '\n' characters within unbalanced unescaped quotes has failed. 
If quote='' doesn't help, please file an issue to figure out if the logic could be improved.

I have checked that the offending line is ok (with csvfix and also https://csvlint.io/). I have included this line in the following example file (which contains the header, a non-failing line and then the failing line):

test.txt

As you can see, it has some non-trivial quoting and escaping

Do you think the fread csv logic could be extended to be able to deal with things like this?

Thanks a lot, best regards!

@jangorecki

This comment has been minimized.

Copy link
Member

@jangorecki jangorecki commented Mar 3, 2017

Hi, there is ongoing development related to fread and quoting, it is currently planned for next release. You can read more in https://github.com/Rdatatable/data.table/wiki/Convenience-features-of-fread#10-automatic-quote-escape-method-detection-including-no-escape Though I'm not sure if the change is going to cover your use case.

@scarrascoso

This comment has been minimized.

Copy link
Author

@scarrascoso scarrascoso commented Mar 3, 2017

Thank you very much! I'll keep an eye on that.

@jangorecki jangorecki added the fread label Mar 5, 2017
@mattdowle mattdowle added this to the v1.10.6 milestone Mar 13, 2017
@mattdowle mattdowle closed this in 93cb823 Mar 14, 2017
@mattdowle

This comment has been minimized.

Copy link
Member

@mattdowle mattdowle commented Mar 14, 2017

Thanks for reporting this one. It should work now with the new better quote rules. Test added.
Please try again and reopen in case of any problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.