Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

colClasses=integer no longer working in fread #2251

Closed
st-pasha opened this issue Jul 6, 2017 · 7 comments · Fixed by #2345
Closed

colClasses=integer no longer working in fread #2251

st-pasha opened this issue Jul 6, 2017 · 7 comments · Fixed by #2345
Milestone

Comments

@st-pasha
Copy link
Contributor

st-pasha commented Jul 6, 2017

> fread('A,B\n"1","2"', colClasses = "integer")
Error in fread("A,B\n\"1\",\"2\"", colClasses = "integer") : 
  Attempt to override column 1 <<A>> of inherent type 'int32' down to 'int32' which will lose accuracy. If this was intended, please coerce to the lower type afterwards. Only overrides to a higher type are permitted.
@markdanese
Copy link

I just ran into this today with the Aug 15, 2017 build. FWIW, I am not sure why there are 2 int32 types listed below. The output is from a test I was running to try and replicate the error on a non-proprietary dataset. But since this issue is already here, I will just pass this along.

Read 7 rows x 2 columns from 73 bytes file in 00:00.001 wall clock time
Thread buffers were grown 0 times (if all 1 threads each grew once, this figure would be 1)
Final type counts
         0 : drop     
         0 : bool8    
         1 : int32    
         0 : int32    
         0 : int64    
         0 : float64  
         1 : string   
Read 7 rows. Exactly what was estimated and allocated up front```

@aadler
Copy link

aadler commented Aug 29, 2017

I got the same issue today.

data.table 1.10.5 IN DEVELOPMENT built 2017-08-22 22:20:41 UTC; travis
  The fastest way to learn (by data.table authors): https://www.datacamp.com/courses/data-analysis-the-data-table-way

  Documentation: ?data.table, example(data.table) and browseVignettes("data.table")

  Release notes, videos and slides: http://r-datatable.com

Attempt to override column 11 <<ZBC>> of inherent type 'int32' down to 'int32' which will lose accuracy. If this was intended, please coerce to the lower type afterwards. Only overrides to a higher type are permitted.

@aadler
Copy link

aadler commented Aug 29, 2017

@st-pasha, it seems it is not just integers. I just got this error:

Attempt to override column 17 of inherent type 'string' down to 'float64' which will lose accuracy. If this was intended, please coerce to the lower type afterwards. Only overrides to a higher type are permitted.
Is this the same issue or should it be a new one?

@st-pasha
Copy link
Contributor Author

@aadler No, what you're describing is a different situation. If a column was detected as string, then it means it contains some values that could not be parsed as floats. So if you really want it to be float, you probably mean to convert all those invalid values into NAs. Currently fread doesn't support that (and afaik there is no plan to add this possibility). So you have 2 options here: (1) if all non-float values belong to a small set of strings (e.g. "NA", "#N/A", or similar), then give those strings explicitly to the na.strings argument; (2) otherwise you can read that column as string, and then do as.numeric() on it afterwards.

@aadler
Copy link

aadler commented Aug 30, 2017

Yes, @st-pasha, you're right. Buried deep in the hundreds of millions of rows, sometimes the value is captured as a letter (don't ask why).

Also, when I changed my colClass from "integer" to "int32", then the dev version of data.table read the file just fine (and in 7 minutes as opposed to 20). I don't think that "int32" is a valid R variable type, though,

@st-pasha
Copy link
Contributor Author

st-pasha commented Sep 8, 2017

Fixed in e79d63b

@st-pasha st-pasha closed this as completed Sep 8, 2017
@st-pasha st-pasha reopened this Sep 8, 2017
@mattdowle mattdowle added this to the v1.10.6 milestone Sep 8, 2017
@mattdowle
Copy link
Member

mattdowle commented Sep 8, 2017

@aadler Thanks for your testing and input on this one. Should be fixed now.
Just saw this bit :

Buried deep in the hundreds of millions of rows, sometimes the value is captured as a letter (don't ask why).

Just to check you saw that fread in dev now automatically rereads such out-of-sample type exceptions. I've just updated ?fread and the wiki page for fread. You shouldn't need to set colClasses. But you could choose to set it to avoid the auto reread for speed reasons (if verbose=TRUE shows the reread is taking too much time.) The reread skips columns that were read fine in the first pass because the guess using the large sample was good, so the reread should be pretty quick.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants