Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

getInputs(..., recursive = TRUE, envir = parent.frame()) #19

Open
wlandau-lilly opened this issue Jul 31, 2017 · 8 comments
Open

getInputs(..., recursive = TRUE, envir = parent.frame()) #19

wlandau-lilly opened this issue Jul 31, 2017 · 8 comments

Comments

@wlandau-lilly
Copy link
Contributor

wlandau-lilly commented Jul 31, 2017

It would be really handy to dig deeper for dependencies. I think it would be enough to limit recursion to the user's environment. Desired behavior:

f <- function(x) g(x)
g <- function(x) h(x)
h <- function(x) {
  k1(x) + k2(x) + my_var
}
my_var <- 1

# default envir is parent.frame()
# getInputs() would begin with force(envir)
getInputs(body(f), recursive = TRUE)
## An object of class "ScriptNodeInfo"
## Slot "files":
## character(0)
## 
## Slot "strings":
## character(0)
## 
## Slot "libraries":
## character(0)
## 
## Slot "inputs":
## [1] "x"    "my_var"
## 
## Slot "outputs":
## character(0)
## 
## Slot "updates":
## character(0)
## 
## Slot "functions":
##  g    h       k1     k2
## NA    NA      NA     NA
## 
## Slot "removes":
## character(0)
## 
## Slot "nsevalVars":
## character(0)
## 
## Slot "sideEffects":
## character(0)
## 
## Slot "code":
## g(x)
@wlandau-lilly
Copy link
Contributor Author

Oops, I forgot to write the actual desired result the first time. Please see the "inputs" and "functions" slots in the edited version.

@clarkfitzg
Copy link
Contributor

+1 This would be useful for me.

@duncantl
Copy link
Owner

duncantl commented Aug 8, 2017

Well, global variables such as my_var are far from ideal. So not best practice for writing code.
Recursive is easy - use codetools::findGlobals().
But what if there is a conditional dependency, e.g., if my_var is used only if the value of x > 10 in g()?

@wlandau-lilly
Copy link
Contributor Author

I agree that global non-function global variables are bad practice. Unfortunately, though, I have seen it in code.

Personally, I do not mind that some dependencies are conditional, and I would be okay if they mixed with unconditional dependencies. It would seem difficult to detect the distinction in static code analysis.

@clarkfitzg
Copy link
Contributor

In my mind there is a big distinction between static code analysis and looking at 'live' code. codetools::findGlobals() operates on closures, which are live objects that don't exist if we've only parsed the code. As I understand, CodeDepends is designed for static analysis.

@wlandau-lilly wrote:

I think it would be enough to limit recursion to the user's environment.

This could essentially be done statically by only recursing into functions defined in the ScriptNodeInfo objects representing the code under analysis.

Regarding conditional dependencies, the conservative thing is to explore all the branches (perhaps with the exclusion of a literal FALSE). CodeDepends already does this:

g <- function(x) {
    if(x > 10)
        h1(x)
    else
        h2(x)
}

> getInputs(body(g))@functions
 { if  > h1 h2
NA NA NA NA NA

Here is a recursive version based on codetools::findGlobals().

@clarkfitzg
Copy link
Contributor

It would also be useful to dig deeper for dependencies inside a package environment.

@gmbecker
Copy link
Collaborator

Hi, so apologies not weighing in on this at the time. I'm basically in complete agreement with @clarkfitzg here. Static analysis means not requiring the resolution of symbols (which can only be done completely correctly by loading all the correct packages and making all the relevant actual calls).

Now there are corner cases when we already break the static code analysis shouldn't even require an evaluation engine to be present rule. The one that comes to mind is the stuff I put in for calls to data(), in order to determine the outputs of such calls and thus get the dependency chain right we do have to actually perform the data calls. I don't think want to be in this business in the general case though.

That said, I also agree that as a new (quite major) feature, we could add reasoning about nesting of user/script defined functions. We could prototype something like this as a second pass with what we have now:

code = "f2 = function(a, b) {
    rnorm(a) * b
}
f = function(x, y, z) {
    f2(y, x) + x
}
"
scr = readScript(txt = code)
inp = getInputs(scr)

## now do a second pass for f
findFuncDef = function(fname, scrinfo ) {
    ind = which(sapply(scrinfo, function(x) fname %in% c(x@outputs, x@updates)))
   if(length(ind) == 0)
        stop("no function of that name found")
   else if (length(ind) > 1)
        ind = max(ind)
    scrinfo[[ind]]
}

f_el = findFuncDef("f", inp)

## remember the functions slot has a unintuitive/weird on the face of it format
## names(x@functions) is the function names, while x@functions is a logical indicating
## whether the function was defined "locally" ie in the script (TRUE) or not (e.g. package functions) (FALSE)
##
## this is why!
funcsval = f_el@functions
calledlocal = names(funcsval[!is.na(funcsval) & funcsval])

allinputs = f_el@inputs
allfuncs = f_el@functions

for(fi in calledlocal) {
    fi_el = findFuncDef(fi, inp)
   allinputs = c(allinputs, fi_el@inputs)
    allfuncs = c(allfuncs, fi_el@functions)
}

## clean up duplicates, in this case `{`
allfuncs = allfuncs[!duplicated(paste(allfuncs, names(allfuncs)))]

print(allinputs)
character(0)

print(allfuncs)
    +    f2     {     * rnorm 
FALSE  TRUE FALSE FALSE FALSE 

Now there are places where I'm pretty sure this going to get things wrong, ie situations which look like global variables but are actually defined in a higher (or is it lower, I can never remember) call frame so won't need to go all the way to the user environment to resolve.

## a is global in f4, ie an input, but not (fully) global inside a call to f3!
f4 = function(x) x + a 
f3 = function(x) {
   a = 5
   f4(x)
}

Also, we have a bit of a problem defining things as inputs during function definition. It's really the only thing we can do right, but they aren't actually inputs until the function is called, so them not existing when it is defined isn't the problem CodeDepends is likely to think it is when looking at expression dependency. We could get better at handling this but the way I can see for doing that off the top of my head would require a new piece of information, something like a funcBodyGlobals slot. We could then add these as inputs to the function call expression which would be more correct for dependency detection.

That said I have to admit this doesn't seem like a huge priority given what @duncantl pointed out in this thread initially: generally globals-using functions shouldn't occur and so perhaps surfacing such problems, rather than enabling them, is sufficient for now?

@wlandau
Copy link

wlandau commented Feb 22, 2018

Static analysis means not requiring the resolution of symbols (which can only be done completely correctly by loading all the correct packages and making all the relevant actual calls).

Sounds like a reasonable consequence of being static.

That said, I also agree that as a new (quite major) feature, we could add reasoning about nesting of user/script defined functions.

That would cover the overwhelming majority of my own use cases.

That said I have to admit this doesn't seem like a huge priority given what @duncantl pointed out in this thread initially: generally globals-using functions shouldn't occur and so perhaps surfacing such problems, rather than enabling them, is sufficient for now?

Absolutely. There is no rush on any of this, and just understanding the constraints of CodeDepends is extremely helpful. Thank you for such a thorough explanation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants