New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[R-Forge #5384] fread() fail to deal with missing values in integer64 columns #488

Closed
arunsrinivasan opened this Issue Jun 8, 2014 · 5 comments

Comments

Projects
None yet
3 participants
@arunsrinivasan
Member

arunsrinivasan commented Jun 8, 2014

Submitted by: Peter Stoyanov; Assigned to: Nobody; R-Forge link

Using fread() to read in the data below yields strange results for NA values in columns which fread() detects as integer64. All other columns are OK:

2012,276,,0,"S1","001",1,,724135215,1590915056,
2012,276,2,8,"S1","001",1, ,,154598,0
2012,276,2,12,"S1","001",1,NA,5118863,21819477,
2012,276,2,0,"S1","011",8,3127133583,3127133583,9003982501,0

The resulting data.table has "9218868437227407266" instead of "NA" in columns 8 and 9. Only str() prints these as NA, everything else I tried sees them as numeric values (min, max, sum, etc). Then again str() prints out the fourth element of column 8 as "1.55e-314" instead of "3127133583".

I posted this first here on StackOverflow but it did not generate any interest for 2 weeks, so I've linked it here as well.

@richierocks

This comment has been minimized.

Show comment
Hide comment
@richierocks

richierocks Mar 15, 2015

I've just fallen over this bug too. To reproduce:

fread("x,y\n0,\n", colClasses = list(integer64 = "y"))
##    x                   y
## 1: 0 9218868437227407266

richierocks commented Mar 15, 2015

I've just fallen over this bug too. To reproduce:

fread("x,y\n0,\n", colClasses = list(integer64 = "y"))
##    x                   y
## 1: 0 9218868437227407266

@arunsrinivasan arunsrinivasan added this to the v1.9.6 milestone Mar 15, 2015

@arunsrinivasan arunsrinivasan self-assigned this Mar 15, 2015

@pstoyanov

This comment has been minimized.

Show comment
Hide comment
@pstoyanov

pstoyanov Mar 15, 2015

Thank you (for this and all the other excellent work).

pstoyanov commented Mar 15, 2015

Thank you (for this and all the other excellent work).

@richierocks

This comment has been minimized.

Show comment
Hide comment
@richierocks

richierocks Mar 17, 2015

Thanks for this. The fix isn't quite complete though. It works when fread correctly auto-detects the column classes, but not when it has to bump a column to integer64.

To reproduce:

fread(
  "x,y
0,12345678901234
0,
0,
0,
0,
,
,
,
,
,
,
,
,
,
,
,
12345678901234,
0,
0,
0,
0,
0,
")

In this example missing values still show as 9218868437227407266 in x but they are correctly missing in y.

richierocks commented Mar 17, 2015

Thanks for this. The fix isn't quite complete though. It works when fread correctly auto-detects the column classes, but not when it has to bump a column to integer64.

To reproduce:

fread(
  "x,y
0,12345678901234
0,
0,
0,
0,
,
,
,
,
,
,
,
,
,
,
,
12345678901234,
0,
0,
0,
0,
0,
")

In this example missing values still show as 9218868437227407266 in x but they are correctly missing in y.

@arunsrinivasan

This comment has been minimized.

Show comment
Hide comment
@arunsrinivasan

arunsrinivasan Mar 17, 2015

Member

Thanks, will take a look asap.

Member

arunsrinivasan commented Mar 17, 2015

Thanks, will take a look asap.

@arunsrinivasan

This comment has been minimized.

Show comment
Hide comment
@arunsrinivasan

arunsrinivasan Mar 17, 2015

Member

Please write back if this is still not resolved.

Member

arunsrinivasan commented Mar 17, 2015

Please write back if this is still not resolved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment