-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove source reference before digesting #200
Comments
Possibly via an option or different entry point, or a wrapper around As you know, > a <- function(){}
> digest::digest(a)
[1] "b6ce3bd7555610d97bf1b73331c4e5d8"
> digest::digest(a)
[1] "b6ce3bd7555610d97bf1b73331c4e5d8"
> digest::digest(a)
[1] "b6ce3bd7555610d97bf1b73331c4e5d8"
> digest::digest(a)
[1] "b6ce3bd7555610d97bf1b73331c4e5d8"
> |
How about this? > x <- quote({print(a)})
> digest::digest(x)
[1] "1d39e309d1c74c5f775ba93fb31945fc"
>
> y <- quote({print( a )})
> digest::digest(y)
[1] "cfff128b7af1f7d8e4a96a017c666951"
>
> print(x)
{
print(a)
}
> print(y)
{
print(a)
}
>
> identical(x, y)
[1] FALSE
> identical(removeSource(x), removeSource(y))
[1] TRUE I think R keeps source in language & functions. For a simple language or function, I can use |
Sure but I still do not see this as a problem with the scope and use of The R language is clear about functions and enclosing environments. If you want a 'stripped down' Or maybe I am still missing what you are trying to explain to me... |
I believe when users use > options(keep.source = TRUE)
> a <- function(){}
> digest::digest(a)
[1] "b6ce3bd7555610d97bf1b73331c4e5d8"
> a <- function(){}
> digest::digest(a)
[1] "b6ce3bd7555610d97bf1b73331c4e5d8"
> a <- function(){}
> digest::digest(a)
[1] "7e8ebbd63410116d67e85ebdbd01bec0"
> b <- function(){}
> digest::digest(b)
[1] "00ddb1241c5b90837b64e8656c26a540"
> options(keep.source = FALSE)
> a <- function(){}
> digest::digest(a)
[1] "c82d22ecfd038cae8f95bb2099fe84c7"
> a <- function(){}
> digest::digest(a)
[1] "c82d22ecfd038cae8f95bb2099fe84c7"
> b <- function(){}
> digest::digest(b)
[1] "c82d22ecfd038cae8f95bb2099fe84c7" All If I turn off |
I still see nothing wrong R or What you show seems like a perfect illustration of an available change that can be opted into locally via an existing variable. So I think we are good here. |
PS I could add to the help page that setting the options removes white space. That may be useful for some. |
Adding a help page would be helpful. PS: I was not asking to change the default behavior of a 20+ year package, instead I was just asking if we could add this option to remove |
Gotcha. That is better. And yes, given that In fact, sometimes attributes get added (I add a query status when returning |
That makes sense. I see in if (serialize && !file) {
if (!is_streaming_algo) {
object <- if (.hasNoSharing())
serialize(object, connection = NULL, ascii = ascii,
nosharing = TRUE, version = serializeVersion)
else serialize(object, connection = NULL, ascii = ascii,
version = serializeVersion)
}
if (is.character(skip) && skip == "auto")
skip <- set_skip(object, ascii)
} It's calling Ideally it could be |
Isn't what you want about (pseudo-code, not tests) if (stripAttrArgumentTrue)
obj <- stripAttribute(obj)
# carry on as before ie by altering the object to what you wish it were (eg no source ref) you get your behaviour? No need to involve R and create a different, versioned, dependency methinks. |
I'm implementing the PR and encountered something that beyond me knowledge. May I ask for your insight? @eddelbuettel The first case is as below: > library(digest)
>
> options(keep.source = FALSE)
> a <- function(){}
> digest(a)
[1] "c82d22ecfd038cae8f95bb2099fe84c7"
>
> a()
NULL
> digest(a)
[1] "86058e413c7f8b2d6cef3d5a019486f7"
>
> a()
NULL
> digest(a)
[1] "48717494d87c56e418ce1287ad31ec06"
>
> a()
NULL
> digest(a)
[1] "48717494d87c56e418ce1287ad31ec06" The second case is > b <- removeSource(a)
> identical(a, b)
[1] TRUE
> digest(b)
[1] "c82d22ecfd038cae8f95bb2099fe84c7"
> I think this has something to do with |
As was touched upon previously, both here and on the mailing list where you chose to also post, what I have also learned over the years that it is much preferable to first discuss what a PR is supposed to do before handing it over. It may save you disappointment when the PR may end up being declined. Last we spoke here, I believe my recommendation to you was to alter the object before handing it to |
See below. Note that the > options(keep.source = FALSE)
> a <- function(){}
> serialize(a, NULL, ascii=TRUE)
[1] 41 0a 33 0a 32 36 32 39 31 34 0a 31 39 37 38 38 38 0a 35 0a 55 54 46 2d 38 0a 31 30 32 37 0a 32 35 33 0a 32 35 34 0a 36 0a 31 0a 32 36 32 31 35 33 0a 31 0a 7b 0a 32 35 34 0a
> a()
NULL
> serialize(a, NULL, ascii=TRUE)
[1] 41 0a 33 0a 32 36 32 39 31 34 0a 31 39 37 38 38 38 0a 35 0a 55 54 46 2d 38 0a 32 36 33 31 37 31 0a 32 35 33 0a 32 35 34 0a 36 0a 31 0a 32 36 32 31 35 33 0a 31 0a 7b 0a 32 35 34 0a
> a()
NULL
> serialize(a, NULL, ascii=TRUE)
[1] 41 0a 33 0a 32 36 32 39 31 34 0a 31 39 37 38 38 38 0a 35 0a 55 54 46 2d 38 0a 31 30 32 37 0a 32 35 33 0a 32 35 34 0a 32 31 0a 31 0a 31 33 0a 33 0a 31 32 0a 31 37 0a 31 0a 33 0a 36 0a 32 35 34 0a
[66] 30 0a 31 0a 32 36 32 31 35 33 0a 31 0a 7b 0a 30 0a 32 35 34 0a 30 0a 32 35 34 0a 31 33 0a 37 38 31 0a 33 0a 4e 41 0a 31 0a 31 0a 31 30 32 36 0a 31 0a 32 36 32 31 35 33 0a 35 0a 63 6c 61 73 73 0a
[131] 31 36 0a 31 0a 32 36 32 31 35 33 0a 31 36 0a 65 78 70 72 65 73 73 69 6f 6e 73 49 6e 64 65 78 0a 32 35 34 0a
> a()
NULL
> serialize(a, NULL, ascii=TRUE)
[1] 41 0a 33 0a 32 36 32 39 31 34 0a 31 39 37 38 38 38 0a 35 0a 55 54 46 2d 38 0a 31 30 32 37 0a 32 35 33 0a 32 35 34 0a 32 31 0a 31 0a 31 33 0a 33 0a 31 32 0a 31 37 0a 31 0a 33 0a 36 0a 32 35 34 0a
[66] 30 0a 31 0a 32 36 32 31 35 33 0a 31 0a 7b 0a 30 0a 32 35 34 0a 30 0a 32 35 34 0a 31 33 0a 37 38 31 0a 33 0a 4e 41 0a 31 0a 31 0a 31 30 32 36 0a 31 0a 32 36 32 31 35 33 0a 35 0a 63 6c 61 73 73 0a
[131] 31 36 0a 31 0a 32 36 32 31 35 33 0a 31 36 0a 65 78 70 72 65 73 73 69 6f 6e 73 49 6e 64 65 78 0a 32 35 34 0a
> There is apparently a difference to R between calling |
Haha it's good to know the scope before going too far. And I really appreciate that you could answer me : ) After digging into the R internal source code, I found that case CLOSXP:
...
flags = PackFlags(TYPEOF(s), LEVELS(s), OBJECT(s),
hasattr, hastag);
OutInteger(stream, flags); I sent an email to r-devel mailing list and they told me that |
It can be annoying when it gets to this detail :-/ Can you briefly describe your use case again and why you seem to need source references? And yes, using |
Currently pipeline workflow packages such as The framework works under the assumption:
This assumption works great on atomic results. However, when the results are (or contain) functions/expressions, the digest results become unstable. It gives us lots of false-positives: identical objects may still result in different digest. Initially I thought it was because of the source reference. Since R languages and functions preserve the original code generating the objects, depending on how you run the code, the source references will be different. Indeed after running However, it seems source reference is not the only cause that fails the assumption. Even the same function/expression (with the same memory address), its digest changes once evaluated. Also the reason I posted this issue was because I thought if users use Please feel free to close this issue. I think the cause is |
Sure. As this is however what |
Thanks! I'm so grateful that you could help me on this. Also attach the reply from Tomas (r-devel mailing list) just for future reference.
Unfortunately
> memF <- memoise::memoise(function(f){ f() })
> a <- function(){
+ message("a is evaluated")
+ }
> memF(a)
a is evaluated
> memF(a)
a is evaluated
> memF(a)
a is evaluated
> memF(a)
> memF(a)
> a <- function(){}
> rlang::hash(a)
[1] "badd918f7a8d088de7ce5c4e817d8dd2"
> a <- function(){}
> rlang::hash(a)
[1] "e2a98ccbe019303395180640cf6959b7"
> a()
NULL
> rlang::hash(a)
[1] "f6563a8b0972d9d0ddd2e3fc1d091ef6" |
( I am a reader of r-devel and have been for decades so no need to copy from there. |
(And of course I'd wish this were simpler! |
Function and language objects usually have attribute
srcref
that changes when users choose to run the code differently. For exampleChanges every time when I use command/ctrl+shift+return/enter in RStudio. If you select the script and command+return, the results could also change even if you add blank lines in the selection.
I wonder if it's possible to remove
srcref
when digesting objects?The text was updated successfully, but these errors were encountered: