Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Find functions passed as arguments #13

Open
rjake opened this issue Jul 28, 2022 · 5 comments
Open

Find functions passed as arguments #13

rjake opened this issue Jul 28, 2022 · 5 comments

Comments

@rjake
Copy link

rjake commented Jul 28, 2022

This is the use case I mentioned at the RStudio conference. I'd like to find functions called as arguments in apply functions or in something like switch()

file_lines <- "
  sapply(mtcars, min)

  agg <- function(x, fun) {
    fn <- 
      switch(
        fun, 
        avg = mean,
        total = sum
      )
    
    fn(x, na.rm = TRUE)
  }

agg(1:3, 'total')
"

file_output <- tempfile(fileext = ".R")
writeLines(file_lines, file_output)
# A tibble: 4 x 2
  funs   pkgs     
  <chr>  <chr>    
1 sapply base     
2 switch base     
3 fn     (unknown)
4 agg    (unknown)

I am expecting to see min, mean and sum in the mix

@brshallo
Copy link
Owner

brshallo commented Aug 1, 2022

Ahh, yes, I understand now better. Correct, currently it will not identify any functions passed in through apply(), map(), or anything that's passed-in as an argument.

I'm not sure the way to have it capture this, but could look down a few potential paths...

  • Review the output after parsing and see if there is something there that would be easier to filter to these and have an option to keep them...
  • Alternatively have an argument where you can pass in functions or sets of functions and keep the arguments inputted, e.g. c("map*()", "*apply()") and then have some custom parsers to extract the .forFUN` values inputted to these

Feel free to open a PR or give a suggestion if you have an idea on how to handle.
*

@rjake
Copy link
Author

rjake commented Aug 2, 2022

I did some exploring and the functions get flagged as "symbol" when called this way. I tried a few approaches but nothing was robust and I wanted to jot down what I learned.

Altogether this was my best attempt
library(tidyverse)

file_lines <- "
  sapply(mtcars, min)
  min(1:10) -> a
  b = base::min(1:10)
  d <- plot <- print <- NULL
  agg <- function(x, fun) {
    fn <- 
      switch(
        fun, 
        avg = mean,
        total = sum
      )
    
    fn(x, na.rm = TRUE)
  }

  agg(1:3, 'total')
"

exprs <- parse(text = file_lines)

parsed_df <- # standardize call, must be a better way to do this
	exprs |> 
	as.character() |> 
	parse(text = _) |> 
	utils::getParseData(includeText = TRUE) |> 
	as_tibble() |> 
	print()
# github added all the 4-space tabs, I'm a 2-space guy 🚀 

parsed_df |> 
	mutate(expr = cumsum(parent == 0)) |> 
	# not sure this fully drops formals
	mutate(
		is_formal = token %in% c("SYMBOL_FORMALS"),
		is_new = lead(token, 2) %in% c("EQ_ASSIGN", "LEFT_ASSIGN")
	) |> 
	filter(token %in% c("SYMBOL_FUNCTION_CALL", "SYMBOL")) |> 
	# not sure this is robust to drop new items used later
	group_by(text) |> 
	filter(cumsum(is_new) == 0) |> 
	ungroup() |> 
	# shorten
	distinct(text) |> 
	# figure out what things are
	mutate(
		from = map_chr(
			text, 
			~find(.x) |> pluck(1) |> toString()
		),
		is = map_chr(
			text, 
			~get0(.x) |> class() |> toString()
		)
	) |> 
	filter(is == "function" & from != ".GlobalEnv")

#   text   from         is      
#   <chr>  <chr>        <chr>   
# 1 sapply package:base function
# 2 min    package:base function
# 3 switch package:base function
# 4 mean   package:base function
# 5 sum    package:base function
All the caveats & info I found

We could find all symbol objects in addition to symbol_function_call objects then inspect them. This casts a very wide net and I think requires the libraries to be attached to know where everything is coming from.

funspotr/R/spot-funs.R

Lines 11 to 17 in e529b00

list_functions_in_file <- function(file_path, show_each_use = FALSE){
tmp <- utils::getParseData(parse(file_path, keep.source = TRUE))
nms <- tmp$text[which(tmp$token == "SYMBOL_FUNCTION_CALL")]
nms_unique <- unique(nms)
# only do `find()` once for each function, even if `show_each_use = TRUE`
funs <- sapply(nms_unique, utils::find)

purrr::map_chr(
  c("min", "mtcars"), 
  ~get0(.x) |> class()
)
# [1] "function"   "data.frame"

Another problem with using the method above is formals being in the body of a function coming through as symbols. They start as symbol_formals in getParseData() but then become symbol when called later. globals::findGlobals() is good for this use case but it doesn't catch that agg was made in step 2

exprs <- parse(text = file_lines)
map(exprs, globals::findGlobals)
# [[1]]
# [1] "sapply" "mtcars" "min"   
# 
# [[2]]
# [1] "<-"     "{"      "switch" "mean"   "sum"   
# 
# [[3]]
# [1] "agg" ":" 

Another gotcha is an assignment like 1:3 -> min is hard to follow in getParseData() but when treated as an expression, it puts it in the right order

parse(text = "1:3 -> min") |> as.character()
# [1] "min <- 1:3"

For this you could do something weird like below to standardize the call then use getParseData(includeText = TRUE) This argument will mark the expression call as parent = 0 when they occur in the global environment. You can then use cumsum() to find each new time that occurs.

parse(text = "1 -> min; a = 23") |>
  as.character() |> 
  parse(text = _) |> 
  getParseData(includeText = TRUE) |> 
  mutate(expr = cumsum(parent == 0))

I liked the FUN or .f idea but there could be cases where more than one argument is passed (like passing fct_reorder that then uses sum) and there are other ways it gets called .funs .fun, etc.

There is a lot of info here re: Abstract Syntax Trees (AST).

Some of these use cases are pretty niche maybe there is some low-hanging fruit in the mix? I'll try to take a closer look later.

@brshallo
Copy link
Owner

This is great, thanks for putting together. A few questions / comments:


Do you need the whole exprs |> as.character |> parse ... or would it just work using utils::getParseData() directly. I.e.

file_lines <- "
  sapply(mtcars, min)
  min(1:10) -> a
  b = base::min(1:10)
  d <- plot <- print <- NULL
  agg <- function(x, fun) {
    fn <- 
      switch(
        fun, 
        avg = mean,
        total = sum
      )
    
    fn(x, na.rm = TRUE)
  }

  agg(1:3, 'total')
"

file_output <- tempfile(fileext = ".R")
writeLines(file_lines, file_output)
parsed_df <- utils::getParseData(parse(file_output, keep.source = TRUE)

Currently in funspotr I just do the latter and don't see why you'd need the extra set-up in this case even for your later filtering steps...?


I like your use of get0(.x) |> class() though don't see why you'd need toString(), does class() not always return a string?


One thing this makes me think more about is how I want to handle functions that aren't installed locally. Currently if you have madeupfun(x) in your script "madeupfun" will be returned in the funspotr output as a function and then the package it comes from will just show-up as "(unknown)" -- in the case of apply(x, madeupfun) though "madeupfun" would almost certainly have to be dropped (as no easy way to differentiate it from x). This asynchrony is probably OK (a separate issue that should open, but am wondering if should set a warning in the case of "(unknown)" functions or an option to drop these from output).


(Also should consider how much computational overhead get0() adds... though probably not a ton compared to issues with my current method for loading/attaching packages for each file #14 .)

@rjake
Copy link
Author

rjake commented Aug 11, 2022

Do you need the whole exprs |> as.character |> parse ... or would it just work using

Even though it is right-assigned min(1:10) -> a, R interprets it as a <- min(1:10) and using as.character() |> parse() will put it in the right order when I then pass it to getParseData(). I'm not really sure how to interact with the id & parent columns. Maybe there is a simpler way to do this.


I like your use of get0(.x) |> class() though don't see why you'd need toString(), does class() not always return a string?

Ah, you're right, I don't need it for class(). I was trying to deal with null values in find() though and it looks like ~find(.x)[1] works just as well.

@brshallo
Copy link
Owner

@rjake did not get to this before pushing to CRAN.

Given the ambiguity here, one approach I was considering was making it so you could have an argument that allowed you to specify an approach. So say you want to do something like your approach above, you could pass this in as a function into a parser argument (or something like that). funspotr's value-add in this situation would be setting-up the search space but then it gives the user flexibility to identify the symbols to check against the search space. Perhaps funspotr would have some in-built approaches available but it also keeps me from actually having to set this up (or worrying about the ambiguities or partial solutions that may be "good enough" for someone's use-case).

Actually setting this up though would require me to untangle a few steps in the current approach (that I may or may not get to).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants