Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow definition of collector_fun #13

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from
Draft

Allow definition of collector_fun #13

wants to merge 3 commits into from

Conversation

krlmlr
Copy link
Contributor

@krlmlr krlmlr commented Apr 22, 2024

This started as an attempt to filter the results of the collection before saving them to disk. Unclear if this is needed, but it does no harm.

The two subsequent commits add value. With that, a collection that previously took 40 GB now takes under 200 MB. Unfortunately, these commits can't be applied directly to the main branch.

@krlmlr krlmlr changed the title f collector fun Allow definition of collector_fun Apr 22, 2024
@krlmlr krlmlr changed the title Allow definition of collector_fun Allow definition of collector_fun Apr 22, 2024
@moodymudskipper
Copy link
Collaborator

A higher granularity variant would be to be able to provide for every patched functions the SE args that we should force. This specification can be generated automatically by a helper with a rule that every 1st arg is SE, then we can tweak it if desired.
Note that we can technically walk up to check what args were ultimately evaled but it doesn't mean we can force them, because we sometimes need the promise first, and eval later. Unfortunately I don't think there's a way to know if a promise was captured before evaluation.

Copy link
Collaborator

@moodymudskipper moodymudskipper left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

globals$collecting is great
collector_fun is good but needs to be documented
forcing the first arg seems brittle to me, I think I'd like to at least opt in, or have a more elegant system.

funs = NULL,
pkg = NULL,
path = "collector",
collector_fun = NULL) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

documentation is missing for new arg

@@ -52,8 +57,16 @@ set_collector <- function(funs = NULL, pkg = NULL, path = "collector") {
# Replace bodies a call to collect_and_run(), keep original call as attr
for (nm in names(funs)) {
val <- ns[[nm]]
if (length(formals(val)) > 0 && names(formals(val))[[1]] != "...") {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this force every first argument ? I can think of a few functions with a NSE 1st arg, like quote, substitute, ~, expr, quo, unquo, inject.
This is also not "true" collection because we lose the logic on how the data was built.
I get that it is useful but do we need an argument to switch this on ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the results aren't great here either. We need to find a better way to deal with nested calls. The next step would be to create a reprex to better understand the problem.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By nested call do you mean that with something like :

df |>
  mutate(...) |>
  select(...)

We'd like to collect both calls ? if so globals$collecting is indeed preventing it, the older behavior wasn't.
Since we don't collect when a function is called from its namespace (or a child of its namespace) the globals$collecting is really doing just that as far as I understand.

I can propose a force argument, a named list where names are functions and values a vector of argument names.
This list could be pre-computed from static analysis, and then fixed manually.

- Clean srcref attribute from call

- Record sys.calls()

- Fix
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants