I have noticed some inconsistent behavior using between(). Specifically, NAbounds = FALSE must be provided depending on the order of other filters.
Reprex
I have a dataset similar to the following. Basically some rows are "Active" and will have associated ranges of dates. I need to check if a row's date is in between the specified range for that row, which differs from row to row. However, if a row is not active, no corresponding ranges exist.
# verbose
options(datatable.verbose = TRUE)
# create data.table
x <- data.table::data.table(
Date = seq(Sys.Date(), Sys.Date() + lubridate::years(30), by = "1 day")
)
x[, Active := sample(c(TRUE, FALSE), size = .N, replace = TRUE, prob = c(0.8, 0.2))]
# if active is TRUE, Date will either be inside or outside of generated ranges below
x[Active == TRUE, InRange := sample(c(TRUE, FALSE), size = .N, replace = TRUE)]
x[InRange == TRUE, c("RangeBegin", "RangeEnd") := list(Date - lubridate::days(20), Date + lubridate::days(20))]
x[InRange == FALSE, c("RangeBegin", "RangeEnd") := list(Date + lubridate::days(1), Date + lubridate::days(20))]
I would like to use between() to calculate the InRange column generated above. However, before doing that, even filtering the data is not consistent:
# succeeds
x[Active == TRUE][data.table::between(Date, RangeBegin, RangeEnd)]
Optimized subsetting with index 'Active'
forder.c received 1 rows and 1 columns
forderReuseSorting: opt=-1, took 0.000s
forder took 0.000 sec
x is already ordered by these columns, no need to call reorder
i.Active has same type (logical) as x.Active. No coercion needed.
on= matches existing index, using index
Starting bmerge ...
forderReuseSorting: using key: __Active
forderReuseSorting: opt=1, took 0.000s
bmerge: looping bmerge_r took 0.000s
bmerge: took 0.000s
bmerge done in 0.000s elapsed (0.000s cpu)
Constructing irows for '!byjoin || nqbyjoin' ... 0.000s elapsed (0.000s cpu)
optimised between not available for this data type, fallback to slow R routine
# fails
x[Active == TRUE & data.table::between(Date, RangeBegin, RangeEnd)]
optimised between not available for this data type, fallback to slow R routine
Error in .checkTypos(e, names_x) :
Not yet implemented NAbounds=TRUE for this non-numeric and non-character type
However, by using NAbounds = FALSE it arrives at the same result.
# works again
x[Active == TRUE & data.table::between(Date, RangeBegin, RangeEnd, NAbounds = FALSE)]
I am trying to make this calculation in an fcase() operation in my actual data, though for this simple example it is more complex than necessary.
# toy calculation
# fails with same error
x[, InRangefcase := data.table::fcase(
Active == FALSE, NA,
Active == TRUE, data.table::between(Date, RangeBegin, RangeEnd)
)]
# succeeds
x[, InRangefcase := data.table::fcase(
Active == FALSE, NA,
Active == TRUE, data.table::between(Date, RangeBegin, RangeEnd, NAbounds = FALSE)
)]
# and is identical to the original column
all(x[, InRange] == x[, InRangefcase], na.rm = TRUE)
At any rate, I don't know if this behavior is intended but I found it pretty confusing in my actual data, made me think I had NA's in my RangeBegin and RangeEnd columns.
Thanks for the amazing package - I use it daily!
SessionInfo
R version 4.5.1 (2025-06-13)
Platform: aarch64-apple-darwin20
Running under: macOS Sequoia 15.6.1
Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.12.1
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils
[5] datasets methods base
other attached packages:
[1] fs_1.6.6
loaded via a namespace (and not attached):
[1] compiler_4.5.1 generics_0.1.4
[3] cli_3.6.5 tools_4.5.1
[5] lubridate_1.9.4 data.table_1.17.8
[7] jsonlite_2.0.0 timechange_0.3.0
[9] rlang_1.1.6
I have noticed some inconsistent behavior using
between(). Specifically,NAbounds = FALSEmust be provided depending on the order of other filters.Reprex
I have a dataset similar to the following. Basically some rows are "Active" and will have associated ranges of dates. I need to check if a row's date is in between the specified range for that row, which differs from row to row. However, if a row is not active, no corresponding ranges exist.
I would like to use
between()to calculate theInRangecolumn generated above. However, before doing that, even filtering the data is not consistent:However, by using
NAbounds = FALSEit arrives at the same result.I am trying to make this calculation in an
fcase()operation in my actual data, though for this simple example it is more complex than necessary.At any rate, I don't know if this behavior is intended but I found it pretty confusing in my actual data, made me think I had
NA's in my RangeBegin and RangeEnd columns.Thanks for the amazing package - I use it daily!
SessionInfo