Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[R-Forge #2660] Improve fread na.strings handling #504

Closed
arunsrinivasan opened this issue Jun 8, 2014 · 4 comments
Closed

[R-Forge #2660] Improve fread na.strings handling #504

arunsrinivasan opened this issue Jun 8, 2014 · 4 comments

Comments

@arunsrinivasan
Copy link
Member

Submitted by: Matt Dowle; Assigned to: Nobody; R-Forge link

As raised here and here on SO.

@arunsrinivasan
Copy link
Member Author

:bump:

@ghost
Copy link

ghost commented Jun 25, 2015

I was going to post this on SO but I found this issue so I'll instead provide a screenshot for the fread vs. read.table results - it's the main reason I've put off learning to use data.table. It's a pretty common data format with a few missing values in numeric columns. I wonder if people have been making sure to check their column classes after using fread:
fread_bug

@dselivanov
Copy link

@arunsrinivasan, @mattdowle today I had a look into this.
Seems Strto*() functions family from fread.c should care about na.strings parameter. As of now the only one case is handled - "NA" strings. And the way this case handled is not very agile.

if (lch==start && lch<eof-1 && *lch++=='N' && *lch++=='A' && (lch==eof || *lch==sep || *lch==eol)) {
        ch = lch;
        u.d = NA_REAL;
        return(TRUE);
    }

If you agree that this is work for these functions (not just post-processing at the end readfile() function ) , I have ideas how to fix this.

@arunsrinivasan
Copy link
Member Author

Thanks again for the excellent fix @dselivanov. Opened another issue, #1314, for na.strings="-999" like cases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants