Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect subsetting with operator %in% #99

Closed
lsilvest opened this issue Feb 14, 2022 · 15 comments
Closed

Incorrect subsetting with operator %in% #99

lsilvest opened this issue Feb 14, 2022 · 15 comments

Comments

@lsilvest
Copy link
Collaborator

lsilvest commented Feb 14, 2022

Looks like some interference with bit64. We've seen that previously and have always been able to get around it by defining and exporting S3 methods. For some reason it does not work with %in%. It passes the unit tests because it works when the package is test loaded, but it doesn't work straight out of the box.

So this does not work:

library(nanotime)
one_second <- 1e9
index <- seq(nanotime("2022-12-12 12:12:10+00:00"), length.out=10, by=one_second)
ival <- c(as.nanoival("-2022-12-12 12:12:10+00:00 -> 2022-12-12 12:12:14+00:00-"),
          as.nanoival("+2022-12-12 12:12:18+00:00 -> 2022-12-12 12:12:20+00:00+"))
index %in% ival
##  [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE

But this works:

library(bit64)
library(nanotime)
one_second <- 1e9
index <- seq(nanotime("2022-12-12 12:12:10+00:00"), length.out=10, by=one_second)
ival <- c(as.nanoival("-2022-12-12 12:12:10+00:00 -> 2022-12-12 12:12:14+00:00-"),
          as.nanoival("+2022-12-12 12:12:18+00:00 -> 2022-12-12 12:12:20+00:00+"))
index %in% ival
##  [1] FALSE  TRUE  TRUE  TRUE FALSE FALSE FALSE FALSE  TRUE  TRUE

I have tried to fix this, but for the moment I haven't been successful with whatever type of export I've used in NAMESPACE. Will have to dig into this a bit more.

@lsilvest lsilvest changed the title Incorrest subsetting with operator %in% Incorrect subsetting with operator %in% Feb 14, 2022
@eddelbuettel
Copy link
Owner

Hmpf. That is a tough one. I do rely on base R's %in% every now and then as a 'simpler' alternative to match().

I wasn't aware we had dug ourselves a "potential problem" here. Oh well.

@lsilvest
Copy link
Collaborator Author

Yes, note to self: "next time don't base an S4 type on an S3 type". I'm going to try to see what R is actually doing differently when a package imports another package in the DESCRIPTION file compared to when the user does it in the interpreter.

@lsilvest
Copy link
Collaborator Author

It looks like any function that base does not explicitly declare as S3 cannot straightforwardly be declared as an S3 function. Indeed, without a declaration of %in% as an S3 method (with UseMethod) R cannot find that function:

> methods("%in%")
[1] %in%.nanotime
see '?methods' for accessing help and source code
Warning message:
In .S3methods(generic.function, class, envir) :
  function '%in%' appears not to be S3 generic; found functions that look like S3 methods

bit64 has the same issue and defines %in% like this:

if (!exists("%in%.default")){
	"%in%" <- function(x, table) UseMethod("%in%")
	"%in%.default" <- function(x, table) base::"%in%"(x, table)
}

If we put this in the nanotime package, it unfortunately produces a warning on loading the package:

> library(nanotime)

Attaching package: ‘nanotime’

The following object is masked from ‘package:base’:

    %in%

This warning also happens for bit64. When bit64 is explicitly loaded either before or after the current nanotime, things work fine because %in% is now recognized as an S3 method.

@eddelbuettel
Copy link
Owner

Oh, that is gnarly.

I had on occassion been bitten by but64 formatting not being available when I used nanotime data. I am very confused on this: say we decided to depend 'harder' on bit64, could we do it? Is there a way besides an old-school (and I presume, marginally deprecated( Depends: bit64? Is that a road we want to go along?

@eddelbuettel
Copy link
Owner

Just thinking out loud here: would de-emphasizing %in% (over this ambiguity) and using a match() variant help? Or not really as want the %in% opetator (which I also use for character set membership tests) ?

@lsilvest
Copy link
Collaborator Author

We probably want to keep %in%. The idea is very much like R's native use:

1:10 %in% c(3, 5, 7)
 [1] FALSE FALSE  TRUE FALSE  TRUE FALSE  TRUE FALSE FALSE FALSE

Then it's semantically intuitive and correct as it is related to the notion of set: %in% is actually a call to our C-level function nanoival_intersect_idx_time_interval_logical_impl...

@lsilvest
Copy link
Collaborator Author

lsilvest commented Feb 16, 2022

The Depends: bit64 works. We do however get the following display that is not very pretty:

> library(nanotime)
Loading required package: bit64
Loading required package: bit

Attaching package:bitThe following object is masked frompackage:base:

    xor

Attaching package bit64
package:bit64 (c) 2011-2017 Jens Oehlschlaegel
creators: integer64 runif64 seq :
coercion: as.integer64 as.vector as.logical as.integer as.double as.character as.bitstring
logical operator: ! & | xor != == < <= >= >
arithmetic operator: + - * / %/% %% ^
math: sign abs sqrt log log2 log10
math: floor ceiling trunc round
querying: is.integer64 is.vector [is.atomic} [length] format print str
values: is.na is.nan is.finite is.infinite
aggregation: any all min max range sum prod
cumulation: diff cummin cummax cumsum cumprod
access: length<- [ [<- [[ [[<-
combine: c rep cbind rbind as.data.frame
WARNING don't use as subscripts
WARNING semantics differ from integer
for more help type ?bit64

Attaching package: ‘bit64’

The following objects are masked from ‘package:base’:

    :, %in%, is.double, match, order, rank

> 

If we do the definition of %in% as S3 in nanotime we just have one warning that we are masking %in% from base, which at the end of the day does seem preferable to the above.

@eddelbuettel
Copy link
Owner

eddelbuettel commented Feb 16, 2022

Yes, still "bad" (I'd prefer no warning) but not as bad as the above.

Have you seen what the tidyverse folks are doing with package generics? That actually seems like a decent idea: a no-depends-just-base-R package collecting whatever generics are needed and then collecting the generics they need in other places there -- and have those packages depend on generics. Would that help us?

PS: https://cran.r-project.org/package=generics and https://github.com/r-lib/generics

@lsilvest
Copy link
Collaborator Author

Just looked at generics; they don't provide %in%, but where they do provide generics they do the same as bit64 and simply declare a method as S3 and its default is to call base::name_of_function, which results in a masking warning. So I think we could go with a simple masking warning for %in% in nanotime. The package is then still safe despite the warning and we fix our bug. Here is the output of generics:

> library(generics)

Attaching package:genericsThe following objects are masked frompackage:nanotime:

    intersect, setdiff, union

The following objects are masked frompackage:base:

    as.difftime, as.factor, as.ordered, intersect, is.element, setdiff,
    setequal, union

> 

Another alternative would be to sever the tie with bit64 and implement %in% as an S4 method. Because what prevents us from doing that right now is that since we base nanotime on integer64 and since an S3 method %in%.integer64 is defined, the dispatch ignores the S4 method and matches instead %in%.integer64. This alternative is of course significantly more work. I'm not sure how much yet, but I could test the waters to figure out...

@eddelbuettel
Copy link
Owner

Rock, meet hard place.

Just thinking out aloud: operators of the %foo% variant can be defined by end-user packages, if we get a warning that we clobber %in% from base how about if we ... defined a different one? %nt_in% ? I find all these %foo% constructs ugly as hell but got used to %in% as it useful. We could just pick any other letter combination. Maybe %nm% for nanotime-match?

@lsilvest
Copy link
Collaborator Author

Don't know, in my mind it feels a little worse to define a new operator than to get a warning on package load, because the warning is not dangerous, but a new operator makes nanotime less intuitive to use. I'll try to see how much work we would need to base nanotime (and nanoduration) on a double. I think it's mostly the arithmetic operators we need to rewrite as we depend on +, -, etc. from integer64.

@eddelbuettel
Copy link
Owner

eddelbuettel commented Feb 17, 2022

But int64 <-> double conversion are lossy (as casts).

Or are you thinking "long term" and fully divorcing bit64? That seems to shoulder ... a lot of load!

@eddelbuettel
Copy link
Owner

I am still troubled by

The following object is masked from ‘package:base’:

    %in%

I truly dislike what happened with filter, select, ... and I think "those folks" were too cavalier. Masking means our function will get called instead of the base R. This may well break existing code using %in% to match, say, characters.

@lsilvest
Copy link
Collaborator Author

Yes, when defining an S3 method for a function implemented in base that is not defined as S3 there, the new S3 function is called first, but if done correctly it will not break any code. To summarize, the first step is to declare the function as S3:

"%in%" <- function(x, table) UseMethod("%in%")

This tells R to use the S3 dispatch system instead of a straight call to the original function (hence the masking warning).

The second step is to declare a default that will call the base function

"%in%.default" <- function(x, table) base::"%in%"(x, table)

So the first declaration tells R to dispatch on the type, so if it finds nanotime it will call %in%.nanotime, but if it cannot match any type it will end up defaulting to base::"%in%". So the behavior is safe and correct, but adds the S3 overhead and a function indirection to the original direct call.

I don't think it's a major problem; in the S3 world this is simply unavoidable (and that's why we see so many packages display these masking warnings). For the nanotime package, basing nanotime on integer64 forces us out of S4 and into S3.

eddelbuettel added a commit that referenced this issue Mar 6, 2022
fix %in% and negative 'nanoperiod' parse. closes #96 #99
@eddelbuettel
Copy link
Owner

Fixed in #100

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants