Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

retain attributes while chaining #995

Closed
jangorecki opened this issue Jan 1, 2015 · 10 comments · Fixed by #3250
Closed

retain attributes while chaining #995

jangorecki opened this issue Jan 1, 2015 · 10 comments · Fixed by #3250

Comments

@jangorecki
Copy link
Member

Question / Feature request.
Any tricky way to retain attributes when chaining? of course not all should be retained ("index","sorted","names", etc.).

library(data.table)
DT <- data.table(a = rep(1:5,2), b = rnorm(10))
setattr(DT,"my_custom_attr","attr_value")
str(DT)

# I would like to reuse attribute while chaining and also after chaining
DT[,attr(.SD,"my_custom_attr")] # NULL
attr(DT[,lapply(.SD,sum),by="a"],"my_custom_attr") # NULL

So the Feature request would be about retaining all user defined (non-DT related: "sorted","index", etc.) attributes.
Or maybe retain attrs defined by name, something like

options("datatable.retain.attr"=c("my_custom_attr","my_custom_attr2"))

This would allow to store custom metadata together with data.table, possibility reuse them while processing, or manipulate when using DT[,f(.SD)].

@arunsrinivasan
Copy link
Member

Have marked as FR for now.

@jangorecki
Copy link
Member Author

@assaron
Copy link

assaron commented Aug 13, 2016

Guy, is there are any problems with a solution like the following?

First, adding a function that will copy required attributes:

retainAttributes <- function(x, ans) {
    if ("retain.attributes" %in% names(attributes(x))) {
        for (a in c(attr(x, "retain.attributes"), "retain.attributes")) {
            attr(ans, a) <- attr(x, a)
        }
    }
    ans
}

And then using it is as a wrapper when neccessary in [.data.table:

diff --git a/R/data.table.R b/R/data.table.R
index d18dd3f..dda4985 100644
--- a/R/data.table.R
+++ b/R/data.table.R
@@ -1343,7 +1343,7 @@ chmatch2 <- function(x, table, nomatch=NA_integer_) {
             setattr(ans, "class", class(x)) # fix for #5296
             setattr(ans, "row.names", .set_row_names(nrow(ans)))

-            if (!with || missing(j)) return(alloc.col(ans))
+            if (!with || missing(j)) return(alloc.col(retainAttributes(x, ans)))

             SDenv$.SDall = ans
             SDenv$.SD = if (!length(othervars)) SDenv$.SDall else shallow(SDenv$.SDall, setdiff(ansvars, othervars))

With these, the following scenario works:

> t <- data.table(a=1:2, b=3:4)
> attr(t, "asd") <- "qwe"
> attr(t, "retain.attributes") <- "asd"
> attributes(t)
$names
[1] "a" "b"

$row.names
[1] 1 2

$class
[1] "data.table" "data.frame"

$.internal.selfref
<pointer: 0x2cb5018>

$asd
[1] "qwe"

$retain.attributes
[1] "asd"

> attributes(t[1, ])
$names
[1] "a" "b"

$class
[1] "data.table" "data.frame"

$row.names
[1] 1

$asd
[1] "qwe"

$retain.attributes
[1] "asd"

$.internal.selfref
<pointer: 0x2cb5018>

I can't provide a full patch, as [.data.table is long and full of returns... I'm not 100% sure which ones should be modified.

@jangorecki
Copy link
Member Author

jangorecki commented Aug 13, 2016

Interesting idea, but attr<- isn't best idea. There is in-memory copy there, we use setattr in such cases.

d1=data.table::data.table(a=1)
data.table::address(d1)
attr(d1, "asd") <- "qwe"
data.table::address(d1)

@assaron
Copy link

assaron commented Aug 13, 2016

@jangorecki Could you explain a little bit more? I have the same address printed there:

> d1=data.table::data.table(a=1)
> data.table::address(d1)
[1] "0x6b4f910"
> attr(d1, "asd") <- "qwe"
> data.table::address(d1)
[1] "0x6b4f910"

What's get copied there?

@assaron
Copy link

assaron commented Aug 13, 2016

But it's just for my education, I have no problems with changing attr<- to setattr, or whatever else works best.

@jangorecki
Copy link
Member Author

That is strange, when I was calling that before I get different address and now I'm getting the same. I must have made some wrong then.

@changcw83
Copy link

want this feature very much, it's not available yet?

@dselivanov
Copy link

+1 for this features. Any concerns about solution above?

@manuelbickel
Copy link

+1 for this feature also from my side.

I am not an expert with data.table or even with programming, therefore, I am currently using a clumsy function that stores custom attributes in a separate data.table called DT.attributes in the .GlobalEnv using eval(parse()) expressions, that makes accessing/updating these attributes similar to using attr. The code does not interfere with source code of data.table, hence, the object created is stable during chaining and can always be referenced/accessed by its name. The clear drawback is clearly that the functions messes around in .GlobalEnv, which is not a very good idea. Knowing that my solution is not clean and clumsy, I thought this might still be helpful to somebody and wanted to share this here. Please do not hang me for this attempt...

library(data.table)

attr_DT <- function(DT, j, value = NULL) {
  if (is.null(value)) {
    call_get_attr <- paste0(substitute(DT), ".attributes$", j, "[[1]]")
    return(eval(parse(text = call_get_attr)))
  } else {
    DT.attributes <- paste0(substitute(DT), ".attributes")
    call_make_attr <- paste0(DT.attributes, "[,", j,":= list(list(value))]")
    if (!exists(DT.attributes)) {
      #init
      assign(DT.attributes, data.table(init = integer(1L)),envir = .GlobalEnv)
      eval(parse(text =  call_make_attr))
      #delete init
      eval(parse(text = paste0(DT.attributes, "[,init:= NULL]")))
    } else {
      eval(parse(text =  call_make_attr))
    }
  }
}

myDT <- data.table(x = c(1:3), y = (letters[1:3]))
my_attr1 <- data.table(A1 = c(9:10), A2 = c(letters[9:10]))
my_attr2 <- "attribute2"

attr_DT(myDT, j = "name_myattr1", value = my_attr1)
attr_DT(myDT, j = "name_myattr2", value = my_attr2)
myDT.attributes
#    name_myattr1 name_myattr2
# 1: <data.table>   attribute2
attr_DT(myDT, j = "name_myattr1")
#    A1 A2
# 1:  9  i
# 2: 10  j 

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants