Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

.length() misses methods in packages #164

Closed
goldingn opened this issue Sep 24, 2017 · 8 comments
Closed

.length() misses methods in packages #164

goldingn opened this issue Sep 24, 2017 · 8 comments
Labels
Milestone

Comments

@goldingn
Copy link

goldingn commented Sep 24, 2017

future_lapply() errors when called on a user-defined function, and when an object is in the calling environment that has a length() method defined in another package.

This gives the expected result:

library (future)
f <- function (i) max(i)
future_lapply(1:3, f)

But this errors (x is a greta_array object, and greta exports a length.greta_array() method):

# install.packages("greta")
library (greta)
x <- as_data(1:3)
future_lapply(1:3, f)
Error in .subset2(x, kk) : subscript out of bounds

The reason is that the sizes of all objects in the environment are checked, and when .length() checks for the length of x it checks whether another S3 method is available with the call:

exists("length.greta_array", mode = "function", inherits = TRUE)

but that doesn't search the namespaces of attached packages, so the incorrect length is returned, which leads to the .subset2 error.

I've pasted a fix below. I'd be happy to submit it as a PR instead, just let me know.


P.S. future is really really nice! I'm working on integrating it into greta to handle parallelism across MCMC chains, and for running models on remote (GPU-enabled) machines.


A fix:

# write a new .length function
has_length <- function(class) {
  fun <- getS3method("length", class, optional = TRUE)
  !is.null(fun)
}
new_length <- function (x) {
  nx <- length(x)
  classes <- class(x)
  if (length(classes) == 1L && classes == "list")
    return(nx)
  keep <- lapply(classes, has_length)
  keep <- unlist(keep, use.names = FALSE)
  if (any(keep))
    nx <- length(unclass(x))
  nx
}
# replace it on the fly to test
future_env <- asNamespace("future")
unlockBinding(".length", future_env)
future_env$.length <- new_length
lockBinding(".length", future_env)

This now works:

future_lapply(1:3, f)
@HenrikBengtsson HenrikBengtsson added this to the Next release milestone Sep 25, 2017
@HenrikBengtsson
Copy link
Owner

Thanks for the kind words - I'm happy you find it useful and your use case seem really cool - and thanks for this details report and troubleshooting/patching. I can reproduce this and without diving into the details I'd say it's a bug in the future package. I'll look into this and your proposal in more details when I have time - need to create some solid package unit tests etc.

Just a quick comment, I think your x object is picked up and "measured" for its size because it is part of the environment that global object f carries with it. In other words, if f would be defined in a package, then x would not be part of it's environment and not scanned/measured. For example, the following (unsafe) "hack" illustrates this:

library("future")
library("greta")
x <- as_data(1:3)
f <- function (i) max(i)
environment(f) <- new.env()  ## disconnect from the global environment
future_lapply(1:3, f)

so it is not that all objects are scanned / measured - only those somehow connected to the globals identified, which are FUN (= argument f) and max (and max is ignored/dropped because it's part of a package) - you can see this using options(future.debug = TRUE).

As a quick workaround, you can disable the check for "too large globals" by using options(future.globals.maxSize = +Inf), e.g.

library("future")
library("greta")
x <- as_data(1:3)
f <- function (i) max(i)
future_lapply(1:3, f)
## Error in .subset2(x, kk) : subscript out of bounds

options(future.globals.maxSize = +Inf)
future_lapply(1:3, f)
## [[1]]
## [1] 1
## 
## [[2]]
## [1] 2
## 
## [[3]]
## [1] 3

@goldingn
Copy link
Author

Ah nice, thanks! Yeah, I found this whilst troubleshooting another bug*, and you're right - it's not an issue for my package code.

*my real bug is something cryptic in the intersection of R6, non-standard evaluation, and reticulate that I can only identify by a forked process hanging :/ It doesn't appear to be a future issue though

@HenrikBengtsson
Copy link
Owner

HenrikBengtsson commented Sep 25, 2017

Yeah, there are probably quite few things that are really hard to parallelize in R because they're designed / implement to rely on process-specific unique objects such as external pointers, mutable states, connections to other background processes, etc. For instance, connections cannot be serialized and exported to another process (#79). If you find other examples (e.g. reticulate), I'd like to know so at least it can be documented. The basic test is whether it's possible to save an object to file (saveRDS()), close the R session, start a new one, read it back in another session (readRDS()), and then continue from there. If that works, the object can probable be used in parallel processing.

@goldingn
Copy link
Author

Somewhat surprisingly, forked process seem to call python just fine, and can use a tensorflow graph defined in the parent process (gist).

That's the core of the problem, so it should be feasible with greta!

@HenrikBengtsson
Copy link
Owner

Forked processes are probably the most "forgiving" parallel setup. If you can get it to work also with external (PSOCK) processes / background R sessions (e.g. plan(multisession)) - that would be ideal - but that requires serialization of objects. Such a solution might require quite a re-design with some manual master-server setup.

@goldingn
Copy link
Author

Cool thanks, I'll give it a try!

HenrikBengtsson added a commit that referenced this issue Sep 26, 2017
…t, for certain

types of classes then inferred lengths of these objects were incorrect causing
internal subset out-of-range issues [#164]
HenrikBengtsson added a commit that referenced this issue Sep 26, 2017
@HenrikBengtsson
Copy link
Owner

This has now been fixed in the develop branch (with appropriate redundancy tests) and will be part of the next release. Thxs again for troubleshooting this so carefully.

@HenrikBengtsson
Copy link
Owner

FYI, future 1.6.2, where this bug is fixed, is now on CRAN.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants