Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert from data.table to data.frame/matrix helper functions #5382

Open
dereckmezquita opened this issue May 12, 2022 · 1 comment
Open

Comments

@dereckmezquita
Copy link
Member

Could I offer some of these functions as helpers which would cut down on some of the verbosity of writing code which uses data.table?

I'll be using this data as an example dataset:

dt = iris

data.table::setDT(dt)
dt[, sample := paste(dt$Species, 1:nrow(dt), sep = " ")]

Setting a matrix by reference

I find myself often working with data.frames and matrix type objects, we currently have a setDF function but no "setMatrix"/"setMT" equivalent.

setMT = function(x, rownames = NULL) {
    return(as.matrix(setDF(x = x, rownames = rownames)))
}

setMT(dt, rownames = dt$sample)

to.X family of functions but move a column to rownames

Here I propose a family of functions which would allow one to convert to a certain class, data.frame or matrix, but move one of the columns to its rownames.

This is useful because again I find myself working with data.frames a lot when interacting with base R/packages but since data.table doesn't allow rownames I have to keep this information as a column and then move it as such:

data.table::setDF(dt, rownames = dt$sample)

dt$sample = NULL

I propose to simplify this to a single function call which could move the column to the rownames of the resulting object.

Convert to a data.frame

to.data.frame = function(x, id.col = NULL, drop.id.col = TRUE, ...) {
    ans <- data.table::copy(x)

    if(!is.null(id.col)) {
        if(!id.col %in% colnames(ans)) {
            rlang::abort(stringr::str_interp('Column "${id.col}" not found.'))
        }

        data.table::setDF(ans, rownames = ans[, get(id.col)])

        if(drop.id.col) {
            ans[, id.col] = NULL
        }
    } else {
        data.table::setDF(ans)
    }

    return(ans)
}

Thus converting to a data.frame with rownames is simplified to:

to.data.frame(dt, id.col = "sample")

Convert to a matrix

to.matrix = function(x, id.col = NULL, drop.id.col = TRUE, ...) {
    return(as.matrix(to.data.frame(x, id.col = id.col, drop.id.col = drop.id.col)))
}
to.matrix(dt, id.col = "sample")

sessionInfo()

sessionInfo()
R version 4.1.3 (2022-03-10)
Platform: aarch64-apple-darwin21.3.0 (64-bit)
Running under: macOS Monterey 12.2.1

Matrix products: default
LAPACK: /opt/homebrew/Cellar/r/4.1.3/lib/R/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] datk_0.0.1

loaded via a namespace (and not attached):
 [1] ComplexHeatmap_2.10.0 compiler_4.1.3        pillar_1.7.0          RColorBrewer_1.1-3    iterators_1.0.14     
 [6] tools_4.1.3           digest_0.6.29         lifecycle_1.0.1       tibble_3.1.7          gtable_0.3.0         
[11] clue_0.3-60           pkgconfig_2.0.3       png_0.1-7             rlang_1.0.2           foreach_1.5.2        
[16] DBI_1.1.2             cli_3.3.0             microbenchmark_1.4.9  parallel_4.1.3        stringr_1.4.0        
[21] dplyr_1.0.9           cluster_2.1.3         generics_0.1.2        vctrs_0.4.1           GlobalOptions_0.1.2  
[26] S4Vectors_0.32.4      IRanges_2.28.0        tidyselect_1.1.2      stats4_4.1.3          grid_4.1.3           
[31] glue_1.6.2            data.table_1.14.2     R6_2.5.1              GetoptLong_1.0.5      fansi_1.0.3          
[36] purrr_0.3.4           ggplot2_3.3.6         magrittr_2.0.3        scales_1.2.0          codetools_0.2-18     
[41] matrixStats_0.62.0    ellipsis_0.3.2        BiocGenerics_0.40.0   assertthat_0.2.1      shape_1.4.6          
[46] circlize_0.4.15       colorspace_2.0-3      utf8_1.2.2            stringi_1.7.6         doParallel_1.0.17    
[51] munsell_0.5.0         crayon_1.5.1          rjson_0.2.21         
@jangorecki
Copy link
Member

jangorecki commented May 13, 2022

Hi, thank you for code and proposal.
Convert to/from matrix by reference is not possible, therefore set* should be avoided. DF is a collection of C arrays, each column is a separate array. Matrix is a single C array, where attributes defines it's shape.
I think it is better to improve existing methods rather than adding new functions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants