Skip to content

An R package to decouple gene sets and statistics of functional genomics tools

Notifications You must be signed in to change notification settings

PauBadiaM/decoupleR

 
 

Repository files navigation

decoupleR

Lifecycle: maturing BioC status BioC dev status R build status Codecov test coverage GitHub issues

Overview

Transcriptome profiling followed by differential gene expression analysis often leads to lists of genes that are hard to analyze and interpret. Downstream analysis tools can be used to summarize deregulation events into a smaller set of biologically interpretable features. In particular, methods that estimate the activity of transcription factors (TFs) from gene expression are commonly used. It has been shown that the transcriptional targets of a TF yield a much more robust estimation of the TF activity than observing the expression of the TF itself. Consequently, for the estimation of transcription factor activities, a network of transcriptional regulation is required in combination with a statistical algorithm that summarizes the expression of the target genes into a single activity score. Over the years, many different regulatory networks and statistical algorithms have been developed, mostly in a fixed combination of one network and one algorithm. To systematically evaluate both networks and algorithms, we developed decoupleR , an R package that allows users to apply efficiently any combination provided.

For more information about how this package has been used with real data, please check the following links:

Installation instructions

Get the latest stable R release from CRAN.

Then install decoupleR using from Bioconductor the following code:

if (!requireNamespace("BiocManager", quietly = TRUE)) {
    install.packages("BiocManager")
}

BiocManager::install("decoupleR")

# Check that you have a valid Bioconductor installation
BiocManager::valid()

Then install development version from GitHub with:

BiocManager::install("saezlab/decoupleR")

Usage

Load packaga and data

library(decoupleR)

inputs_dir <- system.file("testdata", "inputs", package = "decoupleR")

mat <- file.path(inputs_dir, "input-expr_matrix.rds") %>%
    readRDS() %>%
    dplyr::glimpse()
#>  num [1:18490, 1:4] 3.251 0.283 -2.253 0.782 -4.575 ...
#>  - attr(*, "dimnames")=List of 2
#>   ..$ : chr [1:18490] "A1BG" "A1CF" "A2M" "A2ML1" ...
#>   ..$ : chr [1:4] "GSM2753335" "GSM2753336" "GSM2753337" "GSM2753338"

network <- file.path(inputs_dir, "input-dorothea_genesets.rds") %>%
    readRDS() %>%
    dplyr::glimpse()
#> Rows: 151
#> Columns: 5
#> $ tf         <chr> "FOXO4", "FOXO4", "FOXO4", "FOXO4", "FOXO4", "FOXO4", "FOXO…
#> $ confidence <chr> "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A",…
#> $ target     <chr> "BCL2L11", "BCL6", "CDKN1A", "CDKN1B", "G6PC", "GADD45A", "…
#> $ mor        <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
#> $ likelihood <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…

Decouple wrapper

decouple() allows access to all decoupleR available statistics in one place. Statistic functions inside decoupleR always return a tidy tibble that can be easily processed with the tools provide by the tidyverse ecosystem.

decouple(
    mat = mat,
    network = network,
    .source = "tf",
    .target = "target",
    statistics = c("gsva", "mean", "pscira", "scira", "viper", "ora"),
    args = list(
        gsva = list(verbose = FALSE),
        mean = list(.mor = "mor", .likelihood = "likelihood"),
        pscira = list(.mor = "mor"),
        scira = list(.mor = "mor"),
        viper = list(.mor = "mor", .likelihood = "likelihood", verbose = FALSE),
        ora = list()
    )
)
#> # A tibble: 140 x 11
#>    run_id statistic tf    condition    score p_value estimate conf.low conf.high
#>    <chr>  <chr>     <chr> <chr>        <dbl>   <dbl>    <dbl>    <dbl>     <dbl>
#>  1 1      gsva      FOXO4 GSM2753335 -0.380       NA       NA       NA        NA
#>  2 1      gsva      FOXO4 GSM2753336 -0.300       NA       NA       NA        NA
#>  3 1      gsva      FOXO4 GSM2753337  0.239       NA       NA       NA        NA
#>  4 1      gsva      FOXO4 GSM2753338  0.0907      NA       NA       NA        NA
#>  5 1      gsva      NFIC  GSM2753335 -0.0845      NA       NA       NA        NA
#>  6 1      gsva      NFIC  GSM2753336  0.0778      NA       NA       NA        NA
#>  7 1      gsva      NFIC  GSM2753337 -0.260       NA       NA       NA        NA
#>  8 1      gsva      NFIC  GSM2753338  0.281       NA       NA       NA        NA
#>  9 1      gsva      RFXAP GSM2753335 -0.810       NA       NA       NA        NA
#> 10 1      gsva      RFXAP GSM2753336 -0.472       NA       NA       NA        NA
#> # … with 130 more rows, and 2 more variables: method <chr>, alternative <chr>

Individual parts

In turn, we recognize that the use of individual statistics may be of interest. Therefore, these are also exported and ready for use. All statistics follow the same design pattern and arguments, so moving between statistics could be very comfortable.

# viper call is equivalent to the one made by decouple() above.
run_viper(
    mat = mat,
    network = network,
    .source = "tf",
    .target = "target",
    .likelihood = "likelihood",
    verbose = FALSE
)
#> # A tibble: 20 x 4
#>    statistic tf     condition    score
#>    <chr>     <chr>  <chr>        <dbl>
#>  1 viper     FOXO4  GSM2753335  1.34  
#>  2 viper     FOXO4  GSM2753336  1.18  
#>  3 viper     FOXO4  GSM2753337  1.44  
#>  4 viper     FOXO4  GSM2753338  1.21  
#>  5 viper     NFIC   GSM2753335  0.0696
#>  6 viper     NFIC   GSM2753336 -0.0265
#>  7 viper     NFIC   GSM2753337 -0.516 
#>  8 viper     NFIC   GSM2753338 -0.543 
#>  9 viper     RFXAP  GSM2753335  0.488 
#> 10 viper     RFXAP  GSM2753336  1.32  
#> 11 viper     RFXAP  GSM2753337  1.93  
#> 12 viper     RFXAP  GSM2753338  1.93  
#> 13 viper     SMAD3  GSM2753335  0.176 
#> 14 viper     SMAD3  GSM2753336  0.0426
#> 15 viper     SMAD3  GSM2753337  0.219 
#> 16 viper     SMAD3  GSM2753338  0.142 
#> 17 viper     TFAP2A GSM2753335  0.722 
#> 18 viper     TFAP2A GSM2753336  0.582 
#> 19 viper     TFAP2A GSM2753337  0.462 
#> 20 viper     TFAP2A GSM2753338  0.330

Contributing to decoupleR

Are you interested in adding a new statistical method or collaborating in the development of internal tools that allow the extension of the package? Please check out our contribution guide.


Please note that this project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms.

About

An R package to decouple gene sets and statistics of functional genomics tools

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • R 100.0%