colClasses=integer no longer working in fread #2251

st-pasha · 2017-07-06T22:31:40Z

> fread('A,B\n"1","2"', colClasses = "integer")
Error in fread("A,B\n\"1\",\"2\"", colClasses = "integer") : 
  Attempt to override column 1 <<A>> of inherent type 'int32' down to 'int32' which will lose accuracy. If this was intended, please coerce to the lower type afterwards. Only overrides to a higher type are permitted.

The text was updated successfully, but these errors were encountered:

markdanese · 2017-08-19T22:17:47Z

I just ran into this today with the Aug 15, 2017 build. FWIW, I am not sure why there are 2 int32 types listed below. The output is from a test I was running to try and replicate the error on a non-proprietary dataset. But since this issue is already here, I will just pass this along.

Read 7 rows x 2 columns from 73 bytes file in 00:00.001 wall clock time
Thread buffers were grown 0 times (if all 1 threads each grew once, this figure would be 1)
Final type counts
         0 : drop     
         0 : bool8    
         1 : int32    
         0 : int32    
         0 : int64    
         0 : float64  
         1 : string   
Read 7 rows. Exactly what was estimated and allocated up front```

aadler · 2017-08-29T18:09:53Z

I got the same issue today.

data.table 1.10.5 IN DEVELOPMENT built 2017-08-22 22:20:41 UTC; travis
  The fastest way to learn (by data.table authors): https://www.datacamp.com/courses/data-analysis-the-data-table-way

  Documentation: ?data.table, example(data.table) and browseVignettes("data.table")

  Release notes, videos and slides: http://r-datatable.com

Attempt to override column 11 <<ZBC>> of inherent type 'int32' down to 'int32' which will lose accuracy. If this was intended, please coerce to the lower type afterwards. Only overrides to a higher type are permitted.

aadler · 2017-08-29T18:17:05Z

@st-pasha, it seems it is not just integers. I just got this error:

Attempt to override column 17 of inherent type 'string' down to 'float64' which will lose accuracy. If this was intended, please coerce to the lower type afterwards. Only overrides to a higher type are permitted.
Is this the same issue or should it be a new one?

st-pasha · 2017-08-29T19:00:56Z

@aadler No, what you're describing is a different situation. If a column was detected as string, then it means it contains some values that could not be parsed as floats. So if you really want it to be float, you probably mean to convert all those invalid values into NAs. Currently fread doesn't support that (and afaik there is no plan to add this possibility). So you have 2 options here: (1) if all non-float values belong to a small set of strings (e.g. "NA", "#N/A", or similar), then give those strings explicitly to the na.strings argument; (2) otherwise you can read that column as string, and then do as.numeric() on it afterwards.

aadler · 2017-08-30T01:58:41Z

Yes, @st-pasha, you're right. Buried deep in the hundreds of millions of rows, sometimes the value is captured as a letter (don't ask why).

Also, when I changed my colClass from "integer" to "int32", then the dev version of data.table read the file just fine (and in 7 minutes as opposed to 20). I don't think that "int32" is a valid R variable type, though,

st-pasha · 2017-09-08T17:20:23Z

Fixed in e79d63b

mattdowle · 2017-09-08T19:14:23Z

@aadler Thanks for your testing and input on this one. Should be fixed now.
Just saw this bit :

Buried deep in the hundreds of millions of rows, sometimes the value is captured as a letter (don't ask why).

Just to check you saw that fread in dev now automatically rereads such out-of-sample type exceptions. I've just updated ?fread and the wiki page for fread. You shouldn't need to set colClasses. But you could choose to set it to avoid the auto reread for speed reasons (if verbose=TRUE shows the reread is taking too much time.) The reread skips columns that were read fine in the first pass because the guess using the large sample was good, so the reread should be pretty quick.

st-pasha added bug fread labels Jul 6, 2017

st-pasha mentioned this issue Jul 6, 2017

Master task for fread bugs / proposals #2247

Closed

st-pasha closed this as completed Sep 8, 2017

st-pasha reopened this Sep 8, 2017

st-pasha mentioned this issue Sep 8, 2017

Add test cases for issue 2251 #2345

Merged

mattdowle added this to the v1.10.6 milestone Sep 8, 2017

mattdowle closed this as completed in #2345 Sep 8, 2017

st-pasha mentioned this issue Apr 18, 2018

colClasses=logical is no longer working #2766

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

colClasses=integer no longer working in fread #2251

colClasses=integer no longer working in fread #2251

st-pasha commented Jul 6, 2017

markdanese commented Aug 19, 2017

aadler commented Aug 29, 2017

aadler commented Aug 29, 2017

st-pasha commented Aug 29, 2017

aadler commented Aug 30, 2017

st-pasha commented Sep 8, 2017 •

edited

mattdowle commented Sep 8, 2017 •

edited

colClasses=integer no longer working in fread #2251

colClasses=integer no longer working in fread #2251

Comments

st-pasha commented Jul 6, 2017

markdanese commented Aug 19, 2017

aadler commented Aug 29, 2017

aadler commented Aug 29, 2017

st-pasha commented Aug 29, 2017

aadler commented Aug 30, 2017

st-pasha commented Sep 8, 2017 • edited

mattdowle commented Sep 8, 2017 • edited

st-pasha commented Sep 8, 2017 •

edited

mattdowle commented Sep 8, 2017 •

edited