Skip to content

chuangao/BicMix

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

92 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

BicMix (now integrated both SFAmix and BicMix)

BicMix is a sparse matrix decomposition tool. Given a matrix Y with dimension of P by N, BicMix decompose it into the product of two sparse matrices LAM and X

This is the C++ implementation of BicMixC wrapped in R.

Use devtools to install in R

library(devtools)
install_github("chuangao/BicMix")

I found that install_github does not always work. Please install from source (see below) if this is the case.

Install from source

git clone https://github.com/chuangao/BicMix
R CMD INSTALL BicMix

Usage

BicMixR(y=your_input_matrix, nf=100, a=0.5, b=0.5, c=0.5, d=0.5, e=0.5, f=0.5, itr=5000, out_itr=200, out_dir=director_to_save_results, rsd=NULL, lam_method="matrix", x_method="dense", tol=1e-10, qnorm = TRUE, nf_min = 5)

The input file y should have no headers, no missing values, just pure numbers
Also no corrections of confounding beforehand, BicMix will handle that in the dense components
For a gene expression matrix, each gene is a row and each sample is a column

Arguments

y matrix to be decomposed, no missing values are allowed, no headers, space or tab delimited
nf the number of factors for the algorithm to start with, will be shrank to a smaller number reflecting the number of factors needed to explain the variance, default to 50
a parameter one for the three parameter beta distribution at local level, default to 0.5 to recapitulate horseshoe
b parameter two for the three parameter beta distribution at local level, default to 0.5 to recapitulate horseshoe
c parameter one for the three parameter beta distribution at component level, default to 0.5 to recapitulate horseshoe
d parameter one for the three parameter beta distribution at component level, default to 0.5 to recapitulate horseshoe
e parameter one for the three parameter beta distribution at global level, default to 0.5 to recapitulate horseshoe
f parameter one for the three parameter beta distribution at global level, default to 0.5 to recapitulate horseshoe <b itr the maximum number of iterations the algorithm is allowed to run, default to 5000
out_itr number of iterations at which temporary output will be written into the specified directory (see below)
out_dir directory where the algorithm will write temporary results (see above)
rsd random seed for initializing the parameter values, default to be randomly drawn
lam_method the method used to update the loading matrix, take values either "matrix" or "element". if "matrix", then all component are updated simultaneously (slower but more stable, don't need as many iterations to converge); if "element", each component is updated sequentially (faster but less stable, and need more iterations to converge), default to "matrix"
x_method whether induce sparsity on the X matrix, take values either "sparse" or "dense". default to "sparse"
tol tolerance threshold for convergence, default to 1e-10
qnorm whether to qq-normalize the gene expression matrix, default to TRUE
nf_min the minimum number of factors that needed to be kept (when the signals in the data are small, the default shrinkage parameters in BicMix can be too aggressive that zero factors are left. nf_min make sure at lease some factors are kept, default to 5)

Output

lam the sparse loading matrix
ex the factor matrix
z a vector indicating whether the corresponding loading is sparse (value of 1)
o a vector indicating whether the corresponding factor is sparse (value of 1)
nf the number of factors learned by the model
exx the expected value of the covariance matrix, E(XX^T)
itr the number of iterations for the algorithm to converge

Example

library(BicMix)

simulate data where the loading is a mixture of sparse and dense components, and factor is dense

data = gen_BicMix_data(std=2)

Visulize the loading matrix

image(t(data$lam),x=1:ncol(data$lam),y=1:nrow(data$lam),xlab="Loadings",ylab="Samples")

Visulize the factor matrix

image(t(data$ex),x=1:ncol(data$ex),y=1:nrow(data$ex),xlab="Samples",ylab="Factors")

run algorithm on the simulated data

result = BicMixR(data$y,nf=100,a=0.5,b=0.5,itr=5000,out_dir="results",tol=1e-10,x_method="sparse",rsd=123)

calculate a correlation matrix of the estimated loading matrix and the true loading matrix. Ideally, there should be one and only one big correlation value for a given row and column of the correlation matrix

cor.est.real = cor(result$lam[,result$z>0.9],data$lams)

visulize the correlation matrix

image(cor.est.real,x=1:nrow(cor.est.real),y=1:ncol(cor.est.real),xlab="Recovered loadings",ylab="True loadings")

calculate similarity score of the recovered sparse loading components and the true sparse loading components

cal_score_sparse(result$lam[,result$z>0.9],data$lams)

calculate similarity score of the recovered dense loading components and the true dense loading components

cal_score_dense(result$lam[,result$z<=0.9],data$lamd)

simulate data where the loading is a mixture of sparse and dense components, and factor is dense

data = gen_BicMix_data(std=2, type.factor="dense", rsd = 123)

perform analysis

result = BicMixR(data$y,nf=100,out_dir="results",tol=1e-10,x_method="dense",rsd=123)

calculate similarity score of the recovered sparse loading components and the true sparse loading components

cal_score_sparse(result$lam[,result$z>0.9],data$lams)

calculate similarity score of the recovered dense loading components and the true dense loading components

cal_score_dense(result$lam[,result$z<=0.9],data$lamd)

Documentation

Refer to BicMix.pdf for more usage details

Reference

Context Specific and Differential Gene Co-expression Networks via Bayesian Biclustering
http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1004791

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published