Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fread] broken functionality in 1.9.6 #1267

Closed
ziyadsaeed opened this issue Aug 13, 2015 · 7 comments · Fixed by #2623
Closed

[fread] broken functionality in 1.9.6 #1267

ziyadsaeed opened this issue Aug 13, 2015 · 7 comments · Fixed by #2623
Milestone

Comments

@ziyadsaeed
Copy link

@ziyadsaeed ziyadsaeed commented Aug 13, 2015

fread from v1.9.4 gives correct results but fread from dev 1.9.5 doesn't.
For eg

create a test.csv file

cat("V1, V2, V3
1,2,3
V4, V5, V6, V7
4,5,6,7
8,9,10,11",
    file = "test.csv"
)

Notice there are different number of columns
fread from 1.9.4

fread("test.csv", nrows = 1, header = TRUE, skip = 0)
V1 V2 V3
1:  1  2  3

This is what I want

But in fread from v1.9.5

     fread("test.csv", nrows = 1, header = TRUE, skip = 0)
    V4 V5 V6 V7
    1:  4  5  6  7
    Warning messages:
    1: In fread("test.csv", nrows = 1, header = TRUE, skip = 0) :
     Starting data input on line 3 and discarded previous non-empty line: 1,2,3
    2: In fread("test.csv", nrows = 1, header = TRUE, skip = 0) :
    Stopped reading at empty line 5 but text exists afterwards (discarded): 8,9,10,11

Not the expected result.

@ziyadsaeed ziyadsaeed changed the title broken fread functionality in dev version broken fread functionality in dev version 1.9.5 Aug 14, 2015
@ziyadsaeed ziyadsaeed changed the title broken fread functionality in dev version 1.9.5 [fread] broken functionality in dev version 1.9.5 Aug 14, 2015
@ziyadsaeed
Copy link
Author

@ziyadsaeed ziyadsaeed commented Aug 14, 2015

Minimum reproducible code is now added.

@ziyadsaeed
Copy link
Author

@ziyadsaeed ziyadsaeed commented Sep 20, 2015

This bug has now been released in v1.9.6.

@ziyadsaeed ziyadsaeed changed the title [fread] broken functionality in dev version 1.9.5 [fread] broken functionality in 1.9.6 Sep 20, 2015
@jangorecki
Copy link
Member

@jangorecki jangorecki commented Sep 20, 2015

@ziyadsaeed you can self-close resolved issue, OK

@ziyadsaeed
Copy link
Author

@ziyadsaeed ziyadsaeed commented Sep 20, 2015

it isn't resolved. I'm mentioning that this bug was not fixed in the dev version and now the release version 1.9.6 has this bug.

@arunsrinivasan
Copy link
Member

@arunsrinivasan arunsrinivasan commented Sep 22, 2015

The default behaviour is right as it is. I'd expect skip=0 explicitly to work as you intend however.

@arunsrinivasan arunsrinivasan added this to the v1.9.8 milestone Sep 22, 2015
@ziyadsaeed
Copy link
Author

@ziyadsaeed ziyadsaeed commented Jan 9, 2016

what is the workaround to ask fread to just read the first line.
When 1.9.6 was released it should be mentioned somewhere that it breaks existing functionality from 1.9.4 and a workaround should have been mentioned.

@arunsrinivasan arunsrinivasan added this to the v2.0.0 milestone Mar 13, 2016
@arunsrinivasan arunsrinivasan removed this from the v1.9.8 milestone Mar 13, 2016
@everdark
Copy link

@everdark everdark commented Jun 5, 2016

in 1.9.6 (and 1.9.7 to this point of time) nrows no longer forces fread to just read those rows.
this makes me frustrated on small files with only the last line having less columns.

fread("1,2,3\n1,2", nrows=1) wont work and there seems to be no work-around...
i'd like to achieve reading the entire file except for the unbalanced last line.

and this only happens in small files:

for ( i in 100:1) { 
  lines <- paste0(paste(rep("1,2,3", i), collapse='\n'), "\n1,2")
  fread(lines, nrows=i)
}

the above code stopped at i =4, which means the somewhat "undocumented" type checking behavior forces a parsing of at least 5 lines, regardless of the actual number of lines in the given file.

the impact of such feature is that whenever we are freading a bunch of files with a special treatment of last line (as we already know that the last line and only the last line may be unbalanced in column numbers but we still want to read it separately after freading the entire file except the last line), and there are files smaller than 6 lines with the last line unbalanced, we need a workaround. (i.e., to not use fread in these circumstances)

it will be appreciated that if we can optionally disable such type-checking behavior, or the argument nrows is actually not doing what it looks like to do: read only the first nrows lines. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants