Skip to content

DiseaseNeuroGenomics/crumblr

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

53 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation


Count ratio uncertainty modeling based linear regression

The crumblr package enables analysis of count ratio data using precision-weighted linear (mixed) models, PCA and clustering. crumblr's fast, normal approximation of transformed count data from a Dirichlet-multinomial model allows use of standard workflows to analyize count ratio data while modeling heteroskedasticity.

Details

Analysis of count ratio data (i.e. fractions) requires special consideration since data is non-normal, heteroskedastic, and spans a low rank space. While counts can be considered directly using Poisson, negative binomial, or Dirichlet-multinomial models for simple regression applications, these can be problematic since they 1) can be very computationally expensive, 2) can produce poorly calibrated hypothesis tests, and 3) are challenging to extend to other applications. The widely used centered log-ratio (CLR) transform from compositional data analysis makes count ratio data more normal and enables use the linear models, and other standard methods.

Yet CLR-transformed data is still highly heteroskedastic: the precision of measurements varies widely. This important factor is not considered by existing methods.

crumblr uses a fast asymptotic normal approximation of CLR-transformed counts from a Dirichlet-multinomial distribution to model the sampling variance of the transformed counts. crumblr enables incorporating the sampling variance as precision weights to linear (mixed) models in order to increase power and control the false positive rate. crumblr also uses a variance stabilizing transform (vst) based on the precision weights to improve performance of PCA and clustering.

Install

# 1) Make sure Bioconductor is installed
# 2) Install crumblr and dependencies:
devtools::install_github("DiseaseNeurogenomics/crumblr")

Introduction to compositional data analysis

About

Count ratio uncertainty modeling base linear regression

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published