> fread('A,B\n"1","2"', colClasses = "integer")
Error in fread("A,B\n\"1\",\"2\"", colClasses = "integer") :
Attempt to override column 1 <<A>> of inherent type 'int32' down to 'int32' which will lose accuracy. If this was intended, please coerce to the lower type afterwards. Only overrides to a higher type are permitted.
The text was updated successfully, but these errors were encountered:
I just ran into this today with the Aug 15, 2017 build. FWIW, I am not sure why there are 2 int32 types listed below. The output is from a test I was running to try and replicate the error on a non-proprietary dataset. But since this issue is already here, I will just pass this along.
Read 7 rows x 2 columns from 73 bytes file in 00:00.001 wall clock time
Thread buffers were grown 0 times (if all 1 threads each grew once, this figure would be 1)
Final type counts
0 : drop
0 : bool8
1 : int32
0 : int32
0 : int64
0 : float64
1 : string
Read 7 rows. Exactly what was estimated and allocated up front```
data.table 1.10.5 IN DEVELOPMENT built 2017-08-22 22:20:41 UTC; travis
The fastest way to learn (by data.table authors): https://www.datacamp.com/courses/data-analysis-the-data-table-way
Documentation: ?data.table, example(data.table) and browseVignettes("data.table")
Release notes, videos and slides: http://r-datatable.com
Attempt to override column 11 <<ZBC>> of inherent type 'int32' down to 'int32' which will lose accuracy. If this was intended, please coerce to the lower type afterwards. Only overrides to a higher type are permitted.
@st-pasha, it seems it is not just integers. I just got this error:
Attempt to override column 17 of inherent type 'string' down to 'float64' which will lose accuracy. If this was intended, please coerce to the lower type afterwards. Only overrides to a higher type are permitted.
Is this the same issue or should it be a new one?
@aadler No, what you're describing is a different situation. If a column was detected as string, then it means it contains some values that could not be parsed as floats. So if you really want it to be float, you probably mean to convert all those invalid values into NAs. Currently fread doesn't support that (and afaik there is no plan to add this possibility). So you have 2 options here: (1) if all non-float values belong to a small set of strings (e.g. "NA", "#N/A", or similar), then give those strings explicitly to the na.strings argument; (2) otherwise you can read that column as string, and then do as.numeric() on it afterwards.
Yes, @st-pasha, you're right. Buried deep in the hundreds of millions of rows, sometimes the value is captured as a letter (don't ask why).
Also, when I changed my colClass from "integer" to "int32", then the dev version of data.table read the file just fine (and in 7 minutes as opposed to 20). I don't think that "int32" is a valid R variable type, though,
@aadler Thanks for your testing and input on this one. Should be fixed now.
Just saw this bit :
Buried deep in the hundreds of millions of rows, sometimes the value is captured as a letter (don't ask why).
Just to check you saw that fread in dev now automatically rereads such out-of-sample type exceptions. I've just updated ?fread and the wiki page for fread. You shouldn't need to set colClasses. But you could choose to set it to avoid the auto reread for speed reasons (if verbose=TRUE shows the reread is taking too much time.) The reread skips columns that were read fine in the first pass because the guess using the large sample was good, so the reread should be pretty quick.