Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fifelse did not work in recursive function #4549

Closed
hongyuanjia opened this issue Jun 15, 2020 · 6 comments · Fixed by #6151
Closed

fifelse did not work in recursive function #4549

hongyuanjia opened this issue Jun 15, 2020 · 6 comments · Fixed by #6151

Comments

@hongyuanjia
Copy link
Contributor

hongyuanjia commented Jun 15, 2020

I am trying to use data.table::fifelse to implement a recursive function to calculate greatest common divisors mentioned in this StackOverflow. However, I found that in this case, data.table::fifelse cannot be a replacement for base::ifelse.

Minimal reproducible

# use base::ifelse
gcd_base <- function(x,y) {
  r <- x%%y;
  return(ifelse(r, gcd_base(y, r), y))
}

gcd_dt <- function(x,y) {
  r <- x%%y;
  return(data.table::fifelse(r, gcd_dt(y, r), y))
}

gcd_base(10, 1)
# [1] 1
gcd_dt(10, 1)
# Error: C stack usage  7971780 is too close to the limit

Any insights? Thanks!

@MichaelChirico
Copy link
Member

Thanks for the MRE! Not sure why it's recursing so much in the first place...

# injecting a print statement to check how deep in recursion we are
n_base = 0 
gcd_base <- function(x,y) {
  cat('Recursion level:', n_base <<- n_base + 1, '\n')
  r <- x%%y;
  return(ifelse(r, gcd_base(y, r), y))
}

n_dt = 0 
gcd_dt <- function(x,y) {
  cat('Recursion level:', n_dt <<- n_dt + 1, '\n')
  r <- x%%y;
  return(data.table::fifelse(r, gcd_dt(y, r), y))
}
gcd_base(10, 1)
# Recursion level: 1 
# [1] 1
gcd_dt(10, 1)
# Recursion level: 1 
# Recursion level: 2 
# Recursion level: 3 
#  ...
# Recursion level: 865 
# Recursion level: 866 
# Recursion level: 867 
# Error: C stack usage  7971728 is too close to the limit

@ColeMiller1
Copy link
Contributor

ColeMiller1 commented Jun 17, 2020

Base ifelse() evaluates yes and no only if either condition is applicable. This is just a snippet of ifelse which is applicable if the length of test is greater than 1:

    ans <- test
    len <- length(ans)
    ypos <- which(test)
    npos <- which(!test)
    if (length(ypos) > 0L) 
        ans[ypos] <- rep(yes, length.out = len)[ypos]
    if (length(npos) > 0L) 
        ans[npos] <- rep(no, length.out = len)[npos]

For fifelse(), the yes or no appear to be evaluated during the call to the C API - I recompiled the fifelse with Rprintf() at the top of the function and it never printed anything. So in this case, we have a perpetual loop because gcd_dt() will keep calling itself.

We could address it by incorporating some of ifelse() into the fifelse() wrapper when the length of test is 1:

fifelse2 = function(test, yes, no, na) {
  if (is.atomic(test)) {
    if (typeof(test) != "logical")
      storage.mode(test) = "logical"
    if (length(test) == 1L) {
      if (is.na(test)) {
        if (missing(na)) 
          return (NA)
        else 
          return(na)
      }
      if (test) 
        return(yes)
      else 
        return(no)
    } else {
      .Call(data.table:::CfifelseR, test, yes, no, na)
    } 
  } else {
    stop("Argument 'test' must be logical")
  }
}

n_dt2 = 0 
gcd_dt2 <- function(x,y) {
  cat('Recursion level:', n_dt <<- n_dt + 1, '\n')
  r <- x%%y
  return(fifelse2(r, gcd_dt(y, r), y))
}

gcd_dt2(10, 1)

##Recursion level: 1
##[1] 1

Note, if this were implemented, this would not check the type of yes, no, and na, so to allow for a recursive function we would likely miss out on that restriction. Also, for recursive functions that take a vector, we would also get similar errors.

@jangorecki
Copy link
Member

Alternatively we would have to substitute input, pass together with parent.frame to C, unevaluated, which complicates a lot. I don't like to use lazy evaluation when it is not really necessary.

@ColeMiller1
Copy link
Contributor

dplyr recently had two similar issues:

tidyverse/dplyr#5341
tidyverse/dplyr#5321

Both issues were closed due to concerns for type stability (e.g., both yes and no need evaluated to make sure they are both integer results). I checked, dplyr::if_else also errors out using this. I did not report up to dplyr due to those recent issues being closed.

Regardless, I believe we have some data.table issues which mention optimizing to automatically use fifelse. This use case would not work if ifelse were optimized within data.table.

@jangorecki
Copy link
Member

I think what should be done in this issue is to document (if not already documented) that fifelse, contrary to base ifelse, evaluates both arguments. Then mention fcase which provides similar functionality (+ ensure examples included). Then add unit test (and ideally examples) for fcase to confirm such usage works fine.

NB. I was just looking for fast vectorized GCD and there doesn't seem anything nice out there. @hongyuanjia you may want to fill FR for that particularly, you will get my upvote, but not guaranteed it to get to in-scope of DT.

@hongyuanjia
Copy link
Contributor Author

@jangorecki, I confirmed that this behavior has not yet been documented. Also, for a fast vectorized GCD, {collapse} has one, i.e. collapse::vgcd() which is implemented in C. However, it did not support interger64 (see SebKrantz/collapse#318).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants