Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fread Unable to handle mis-quoted field if it is out-of-sample #2265

Closed
st-pasha opened this issue Jul 9, 2017 · 3 comments
Closed

fread Unable to handle mis-quoted field if it is out-of-sample #2265

st-pasha opened this issue Jul 9, 2017 · 3 comments
Milestone

Comments

@st-pasha
Copy link
Contributor

@st-pasha st-pasha commented Jul 9, 2017

This example:

require(data.table)
DT = data.table(A=rep("abc", 10000), B="def")
DT[110, A:='"a"b']
fwrite(DT, f<-tempfile(), quote=F)
fread(f)

produces an error message which is misleading:

Expecting 2 cols but row 0 contains only 1 cols (sep=','). Consider fill=true. <<"a"b,def>>

At least it doesn't crash (which I thought it would given that type[0] gets bumped up from CT_STRING into a non-existent type)...

@st-pasha
Copy link
Contributor Author

@st-pasha st-pasha commented Nov 2, 2017

Possible approach:

  • Introduce new quoting rule QR0 (all other rules become QR1..QR4). This would be the default QR. Under this rule, fields may or may not be quoted, but no internal quotes are allowed. Thus, the following fields are admissible under QR0: 1,foo,"","bar",,"baz,baz", while these are not: "foo""bar","foo\"bar",foo"bar,f"oo,bar".
  • When reading a file, if some field is of STRING type and cannot be read under current QR, then:
    • If we're currently at QR0 -- bump the QR until the field can be read, then continue scanning the file;
    • Otherwise, bump the QR but then go back and rescan all string fields (since the meaning of quotation marks has changed in the data that was already read).
  • QR bumps have the following hierarchy: QR0 -> {QR1|QR2|QR3} -> QR4.

@ben519
Copy link

@ben519 ben519 commented Nov 3, 2017

I'm running into this too, but getting an error.

screen shot 2017-11-03 at 1 59 52 pm

fread("foo.csv", select=c("Date", "Description", "Amount"), header = T)  # error
fread("foo.csv", header = T, verbose = F)  # works

screen shot 2017-11-03 at 1 59 03 pm

@st-pasha
Copy link
Contributor Author

@st-pasha st-pasha commented Nov 3, 2017

@ben519 Your dataset contains just 1 row, so it's definitely not because of out-of-sample irregularities. I've created a new issue for your error (see link above)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants