Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fread's nrow argument could accept -ve values to skip last 'n' rows #1643

Open
arunsrinivasan opened this issue Apr 11, 2016 · 5 comments
Open

Comments

@arunsrinivasan
Copy link
Member

could be useful for this post: http://stackoverflow.com/q/36558437/559784

fread(file, nrow=-3) # could skip last 3 lines, for example.
@MichaelChirico
Copy link
Member

Note that this would break the current default of -1L.

Other than that, it seems like it shouldn't be too hard to implement, maybe one or two extra lines in this branch of fread.c:

https://github.com/Rdatatable/data.table/blob/master/src/fread.c#L956-L1018

@franknarf1
Copy link
Contributor

franknarf1 commented Jun 30, 2016

Yeah, I could use this. I'm currently reading in csvs that often have an incomplete last line (not enough fields as inferred from commas), which reliably causes fread to crash R.

It would be nice to set skip.last=1L to avoid this. Because nrow already allows a negative value as Michael mentioned, I think it would be cleaner as a separate arg or allowing the skip arg to have a length of two (with the second component of the vector taking on this role when present).

@jangorecki
Copy link
Member

jangorecki commented Jun 30, 2016

I would prefer the way mentioned by Arun, as it would be consistent to linux head and tail way of handling negative values. If negative skip is currently being used, and cannot be easily changed, then it make sense to allow skip of length two, so skip=c(0, 1) would skip just the last line.
Just for completeness current workaround: fread("head -n -1 filename.csv")

@lanceculnane
Copy link

On a related note, it would be nice if we could pass a list of indicies (also as part of the 'skip' parameter) to explicitly read in the rows you want, like we can do in python's Pandas. If the list of indicies is random, it is a nice way to create a random sample of a data frame which is too large to be read onto a local machine, for instance.

@MichaelChirico
Copy link
Member

@lanceculnane see also #583

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants