When subsetting dt with row indices, dt puts NA when a number of row indices is greater than .N #4375

PetoLau · 2020-04-12T12:42:31Z

Hi,

I have a question, when I'm doing subsetting by row indices and a number of used indices is greater than .N, why is it don't throw error or warning, but instead of it creates NAs?

dt <- data.table(Num = rnorm(10))
dt[1:11]

Output is:
Num
1: -0.5613457
2: 1.7967747
3: 0.3145488
4: 1.1318019
5: 1.0750366
6: -1.5978017
7: -1.0894657
8: -1.2805362
9: -1.7455068
10: 0.1769249
11: NA

# Output of sessionInfo()
R version 3.5.3 (2019-03-11)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 18362)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C LC_TIME=English_United States.1252

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] data.table_1.12.8

loaded via a namespace (and not attached):
[1] compiler_3.5.3 tools_3.5.3 packrat_0.5.0

The text was updated successfully, but these errors were encountered:

jangorecki · 2020-04-12T12:52:50Z

Hi, thank you for submitting the issue. Behaviour you describe doesn't error or warning because it is the standard behaviour in R.

data.frame(x=1:2)[1:3,,drop=FALSE]
#    x
#1   1
#2   2
#NA NA

You may be interested in a FR that gives better control over that #3109, there is PR #4353 already for that, so it is likely to be available in next CRAN release.
Then you just add nomatch=NULL and extra rows are excluded. I think it address your issue well, so I am going to close it, in any case we can always re-open it later on if needed.

Henrik-P · 2020-04-12T13:09:46Z

@PetoLau I think the relevant part of the help text is in ?data.table about the i argument:

integer and logical vectors work the same way they do in [.data.frame

The next obvious question is then: how does out of bounds indexing work for data.frame? I once tried to answer Why does 'out of bounds' indexing differ between a matrix and a data.frame?, and it seems like OOB indexing for data.frame is not very well documented.

PetoLau · 2020-04-12T15:59:37Z

Thanks for fast reply @jangorecki . Yes, I hope options(datatable.nomatch=NULL) will do the thing (can't wait for it) :)

jangorecki · 2020-04-12T19:05:28Z

Actually this option is going to be deprecated in the long future. We advise to not use it. It is only safe if you can be sure that any of your code doesn't depend on a package that depends on data.table, including their recursive dependencies. This is the global option that will affect every single package that uses data.table, thus is likely to break some packages which expect default behaviour.

jangorecki closed this as completed Apr 12, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

When subsetting dt with row indices, dt puts NA when a number of row indices is greater than .N #4375

When subsetting dt with row indices, dt puts NA when a number of row indices is greater than .N #4375

PetoLau commented Apr 12, 2020

jangorecki commented Apr 12, 2020 •

edited

Henrik-P commented Apr 12, 2020 •

edited

PetoLau commented Apr 12, 2020

jangorecki commented Apr 12, 2020

When subsetting dt with row indices, dt puts NA when a number of row indices is greater than .N #4375

When subsetting dt with row indices, dt puts NA when a number of row indices is greater than .N #4375

Comments

PetoLau commented Apr 12, 2020

jangorecki commented Apr 12, 2020 • edited

Henrik-P commented Apr 12, 2020 • edited

PetoLau commented Apr 12, 2020

jangorecki commented Apr 12, 2020

jangorecki commented Apr 12, 2020 •

edited

Henrik-P commented Apr 12, 2020 •

edited