Network-based Gene Set Analysis
This package carries out Network-based Gene Set Analysis by incorporating external information about interactions among genes, as well as novel interactions learned from data.
You can install it directly from GitHub through devtools
:
library(devtools)
devtools::install_github("mikehellstern/netgsa", build_vignettes=T)
The most recent implementation has optimized the NetGSA computation in the following aspects:
- Variance component estimation: Residuals are needed to estimate the variance components. This is done directly without evaluating the fixed effect coefficients.
- Contrast vector/matrix: This is done more efficiently by leveraging the
lapply
function; products of the contrast vectors are first computed and reused to calculate the degrees of freedom and test statistics. - In the main function
NetGSA
, the default inputA
is a list of adjacency matrices across the tested groups. For each group, we assume that its adjacency matrix is again coded as a list of smaller matrices (or in the extreme case, one matrix of sizep
). We do not assume the adjancency matrices across groups to have the same block diagonal structure. For this particular structure, I removed the check on variable compatibility between the adjacency matrices and the input data, but this should be added later. - Note the use of
adj2inf
should not change if we have block diagonal adj matrix, because the list of eigenvalues remain the same. - The fixed effect coefficients
beta
is currently an output from the main functionNetGSA
, but do we need it? Is there a better way of estimating beta given the block diagonal structure of D? - Although it was mentioned in the notes that we may not need to assemble the entire adjacency matrix to get the test statistic for a given pathway, in practice we may be working with more pathways. Would it be better if we assemble the entire adjacency matrix anyway to avoid repeatedly subsetting variables? In either case, before running
NetGSA
, we should make sure to filter variables in the input data matrix to keep only those that belong to at least one tested pathway.
Ma, Jing, Shojaie, Ali and Michailidis, George. (2016) Network-based pathway enrichment analysis with incomplete network information. Bioinformatics https://doi.org/10.1093/bioinformatics/btw410