Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When subsetting dt with row indices, dt puts NA when a number of row indices is greater than .N #4375

Closed
PetoLau opened this issue Apr 12, 2020 · 4 comments

Comments

@PetoLau
Copy link

PetoLau commented Apr 12, 2020

Hi,

I have a question, when I'm doing subsetting by row indices and a number of used indices is greater than .N, why is it don't throw error or warning, but instead of it creates NAs?

dt <- data.table(Num = rnorm(10))
dt[1:11]

Output is:
Num
1: -0.5613457
2: 1.7967747
3: 0.3145488
4: 1.1318019
5: 1.0750366
6: -1.5978017
7: -1.0894657
8: -1.2805362
9: -1.7455068
10: 0.1769249
11: NA

# Output of sessionInfo()
R version 3.5.3 (2019-03-11)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 18362)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C LC_TIME=English_United States.1252

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] data.table_1.12.8

loaded via a namespace (and not attached):
[1] compiler_3.5.3 tools_3.5.3 packrat_0.5.0

@jangorecki
Copy link
Member

jangorecki commented Apr 12, 2020

Hi, thank you for submitting the issue. Behaviour you describe doesn't error or warning because it is the standard behaviour in R.

data.frame(x=1:2)[1:3,,drop=FALSE]
#    x
#1   1
#2   2
#NA NA

You may be interested in a FR that gives better control over that #3109, there is PR #4353 already for that, so it is likely to be available in next CRAN release.
Then you just add nomatch=NULL and extra rows are excluded. I think it address your issue well, so I am going to close it, in any case we can always re-open it later on if needed.

@Henrik-P
Copy link

Henrik-P commented Apr 12, 2020

@PetoLau I think the relevant part of the help text is in ?data.table about the i argument:

integer and logical vectors work the same way they do in [.data.frame

The next obvious question is then: how does out of bounds indexing work for data.frame? I once tried to answer Why does 'out of bounds' indexing differ between a matrix and a data.frame?, and it seems like OOB indexing for data.frame is not very well documented.

@PetoLau
Copy link
Author

PetoLau commented Apr 12, 2020

Thanks for fast reply @jangorecki . Yes, I hope options(datatable.nomatch=NULL) will do the thing (can't wait for it) :)

@jangorecki
Copy link
Member

Actually this option is going to be deprecated in the long future. We advise to not use it. It is only safe if you can be sure that any of your code doesn't depend on a package that depends on data.table, including their recursive dependencies. This is the global option that will affect every single package that uses data.table, thus is likely to break some packages which expect default behaviour.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants