Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

potential issue with the between operator #3014

Closed
peterlittlejohn opened this issue Aug 28, 2018 · 6 comments
Closed

potential issue with the between operator #3014

peterlittlejohn opened this issue Aug 28, 2018 · 6 comments
Milestone

Comments

@peterlittlejohn
Copy link

@peterlittlejohn peterlittlejohn commented Aug 28, 2018

The between operator seems to not operate as intended:

d <- data.table(A=1:10)
d[,B:=3]
d[A %between% c(B,B+1)]  ##  incorrect result since default setting includes bounds 
#    A B
# 1: 3 3
d[A %between% c(3,3+1)]  ## correct result when using same notation with hard-coded bounds
#    A B
# 1: 3 3
# 2: 4 3
d[between(A,B,B+1)]      ## correct result when using different notation
#    A B
# 1: 3 3
# 2: 4 3

My R session and data.table version:

R version 3.4.3 (2017-11-30) -- "Kite-Eating Tree"
Platform: x86_64-redhat-linux-gnu (64-bit)
> library(data.table)
data.table 1.11.2

Also tried:

 R version 3.5.1 (2018-07-02) -- "Feather Spray"
 Platform: x86_64-w64-mingw32/x64 (64-bit)
> library(data.table)
data.table 1.11.5 IN DEVELOPMENT built 2018-08-16 23:02:55 UTC; root  Latest news: r-datatable.com
@franknarf1
Copy link
Contributor

@franknarf1 franknarf1 commented Aug 28, 2018

The syntax in your query is not correct.

> `%between%`
function (x, y) 
between(x, y[[1L]], y[[2L]], incbounds = TRUE)
<bytecode: 0x0000000016286cd8>
<environment: namespace:data.table>

Your query is

`%between%`(1:10, c(3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4))

which evaluates to between(1:10, 3, 3) according to the function definition. Maybe you meant to use list(B, B + 1) instead of c(B, B + 1).

Maybe %between% should give a warning when y has length greater than 2...?

@MichaelChirico
Copy link
Member

@MichaelChirico MichaelChirico commented Aug 29, 2018

You want d[A %between% list(B, B+1)]. This is well-documented in ?between:

y | A length-2 vector or list, with y[[1]] interpreted as lower and y[[2]] as upper.

And in the Examples:

X[c %between% list(a,b)]

Using c incorrectly as was done here is more of a base R trip-up than something from data.table...

Agree we could add a warning as a reminder...

@peterlittlejohn
Copy link
Author

@peterlittlejohn peterlittlejohn commented Aug 29, 2018

From the May 27 data.table reference manual, page 23, https://cloud.r-project.org/web/packages/data.table/data.table.pdf, emphasis mine:

Usage
    between(x, lower, upper, incbounds=TRUE)
    x %between% y

y    A length-2 **vector** or list, with y[[1]] interpreted as lower and y[[2]] as upper.

You are right, list() does indeed work.

d[A %between% list(B,B+1)]   ## returns correct result
   A B
1: 3 3
2: 4 3

But if vectors, as in c(B,B+1), cannot be used in vectorized form (so that the initial example is not an error, but simply wrong usage), then this needs to be noted in the documentation.

@peterlittlejohn
Copy link
Author

@peterlittlejohn peterlittlejohn commented Aug 29, 2018

Thank you @MichaelChirico. I was responding to @franknarf1.

Thank you both.

@MichaelChirico
Copy link
Member

@MichaelChirico MichaelChirico commented Aug 29, 2018

You used the vector use case yourself in your issue:

d[A %between% c(3,3+1)]

I think maybe you're missing that by the time %between% operates on your expression, c has already done its work. So think carefully about what c(B, B+1) will do (and then try d[ , c(B, B+1)] to see for yourself) -- the result of this is* what will be sent to %between%.

*Lazy evaluation means that we could technically spot this & fix it, but it's not really worth the effort.

@mattdowle
Copy link
Member

@mattdowle mattdowle commented Aug 29, 2018

For those that quickly read to the end of this issue in future and just see Michael's last sentence "could ... but not worth the effort": Michael means auto-fixing isn't worth the effort; i.e. working out what the user really intended and doing it. I'd go even further and say requiring different syntax (list vs c) is clearer and more robust. Michael had already spent the time to make the PR (now merged) that auto-detects the user-error and gives a helpful message which explains the correct syntax, which seems the optimal way to solve the problem that data.table was silent about it.

@mattdowle mattdowle added this to the 1.11.6 milestone Aug 29, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
4 participants