Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fread suggests fill for dropped column #6050

Open
tdhock opened this issue Apr 4, 2024 · 1 comment
Open

fread suggests fill for dropped column #6050

tdhock opened this issue Apr 4, 2024 · 1 comment
Labels

Comments

@tdhock
Copy link
Member

tdhock commented Apr 4, 2024

Hi all!
Today I was using fread, on some data with one more column name than there is data, and I observed the following warning, which I think is a false positive. I would have expected no warning, because I told fread to drop the last column. Is this a bug?

> fread("x,y,z\n1,2",drop=3)
       x     y
   <int> <int>
1:     1     2
Warning message:
In fread("x,y,z\n1,2", drop = 3) :
  Detected 3 column names but the data has 2 columns. Filling rows automatically. Set fill=TRUE explicitly to avoid this warning.
> fread("x,y,z\n1,2",colClasses=list(NULL=3))
       x     y
   <int> <int>
1:     1     2
Warning message:
In fread("x,y,z\n1,2", colClasses = list(`NULL` = 3)) :
  Detected 3 column names but the data has 2 columns. Filling rows automatically. Set fill=TRUE explicitly to avoid this warning.
@tdhock tdhock added the fread label Apr 4, 2024
@joshhwuu
Copy link
Member

As far as I can tell, this is because the check for the warning happens before drop is applied internally:

fread("x,y,z\n1,2",drop=3,verbose=TRUE)

...
[07] Detect column types, good nrow estimate and whether first row is column names
  Number of sampling jump points = 1 because (3 bytes from row 1 to eof) / (2 * 3 jump0size) == 0
Types in 1st data row match types in 2nd data row but previous row has 3 fields. Taking previous row as column names.
Warning message in fread("x,y,z\n1,2", drop = 3, verbose = TRUE):Detected 3 column names but the data has 2 columns. Filling rows automatically. Set fill=TRUE explicitly to avoid this warning.All rows were sampled since file is small so we know nrow=0 exactly
[08] Assign column names
[09] Apply user overrides on column types # where drop happens
...

I agree that this behavior is unintuitive, but it may be hard to fix without refactoring the internals of fread?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants