Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Different %between% behavior on character vectors undocumented #3667

Closed
ToeKneeFan opened this issue Jun 27, 2019 · 0 comments · Fixed by #3731
Closed

Different %between% behavior on character vectors undocumented #3667

ToeKneeFan opened this issue Jun 27, 2019 · 0 comments · Fixed by #3731
Assignees
Milestone

Comments

@ToeKneeFan
Copy link

@ToeKneeFan ToeKneeFan commented Jun 27, 2019

At the moment, the help documentation for %between% states that

between is equivalent to x >= lower & x <= upper when incbounds=TRUE, or x > lower & y < upper when FALSE. With a caveat that NA in lower or upper are taken as a missing bound and return TRUE not NA.

While this appears to be true for numeric vectors, for character vectors, NA is not treated as a missing bound and yields NA, as seen below, and this difference is not noted in the documentation. Since (theoretically) the lexicographic order is a total order, one might expect the same or similar behavior unless indicated otherwise, and the documentation appears to imply this, given it does not mention character vectors are treated differently. The different behavior on character vectors should probably be either (a) documented if this is intended or (b) altered to treat NA as a missing bound (as with numeric vectors).

# Minimal reproducible example]

library(data.table)

numbers <- 1:26
numbers %between% c(13, NA)
#  [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE
# [14]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
letters %between% c("m", NA_character_)
#  [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE    NA
# [14]    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA

# Output of sessionInfo()

R version 3.6.0 (2019-04-26)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Mojave 10.14.5

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] data.table_1.12.3

loaded via a namespace (and not attached):
[1] compiler_3.6.0 tools_3.6.0   

data.table package is the most recent stable development version pulled today using install.packages.

@ToeKneeFan ToeKneeFan changed the title Different %between% behavior on character columns undocumented Different %between% behavior on character vectors undocumented Jun 27, 2019
@jangorecki jangorecki added this to the 1.12.4 milestone Jun 27, 2019
@jangorecki jangorecki self-assigned this Jul 30, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants