-
Notifications
You must be signed in to change notification settings - Fork 75
ccrepe
The CCREPE (Compositionality Corrected by REnormalizaion and PErmutation) package is designed to assess the significance of general similarity measures in compositional datasets.
CCREPE can be downloaded from this link. CCREPE is also available as a Github repository.
If you use this package, please cite as below:
Emma Schwager and Colleagues. Detecting statistically significant associations between sparse and high dimensional compositional data. In Progress
We provide support for CCREPE users. Please join our bioBakery support forum designated specifically for CCREPE users.
Table of contents
CCREPE can be installed using either of the following two options.
- R needs to be installed on your computer.
For Linux users, you can just use apt-get install r-base
for
installing R. For users with different platforms, please refer to the
website above to look up installation instructions.
- R package
infotheo
needs to be installed in R.
Once R is installed, run the following command in R to install the package.:
> install.packages('infotheo')
-
Download: You may download the
ccrepe-package
from the list. -
Once downloaded, please install the package in R by doing one of the following:
-
* For Linux users, you can just run the following command to install the ccrepe download to R
$ R CMD INSTALL --build <insert-download-name.tar.gz>
-
- For users with other platforms, please refer to R documentation to see how to source external packages.
OR
- Clone the repository: You may clone the repository by running the following command from a Terminal.
$ git clone https://github.com/biobakery/ccrepe.git
Once the package has been downloaded and incorporated in R, you may run the following command to import the ccrepe-package.:
> library(ccrepe)
Once the ccrepe-package
is installed, you may now proceed with use it.
Please ensure that R is installed on your computer. For instructions on
installing R please refer to their website.
The package contains two packages (i) ccrepe and (ii) nc-score. For instructions on each, please see below.
ccrepe calculates compositionality-corrected p-values and q-values for compositional data using an arbitrary distance metric.
For the purpose of this tutorial we will run ccrepe on two datasets with a nc.score as the similarity score.
-
Open R
-
Run the following command to import the library
ccrepe
:> library(ccrepe)
-
The input datasets are shown below for your reference:
text.input
Feature 1 Feature 2 Feature 3 Feature 4
Sample 1 0.09913084 0.12746072 0.53385029 0.239558154 Sample 2 0.39666736 0.19993817 0.02417398 0.379220490 Sample 3 0.24119443 0.08419378 0.32709373 0.347518058 Sample 4 0.39670572 0.20889021 0.17157276 0.222831316 Sample 5 0.46209528 0.22016053 0.30927015 0.008474046 Sample 6 0.25553284 0.14904298 0.56854622 0.026877963 Sample 7 0.47681832 0.20330031 0.04027400 0.279607363 Sample 8 0.16694612 0.17131849 0.42224798 0.239487416 Sample 9 0.48773148 0.37592572 0.12448270 0.011860096 Sample 10 0.51668975 0.28593023 0.12065695 0.076723068
text.input.2
Feature 1 Feature 2 Feature 3 Feature 4 Feature 5 Feature 6 Feature 7
Sample 1 0.458561155 0.008092532 0.07722429 0.061862506 0.141716599 0.160429523 0.092113392 Sample 2 0.115176017 0.215269857 0.33960857 0.127598647 0.111312569 0.006027953 0.085006387 Sample 3 0.549371433 0.019962964 0.01227265 0.051829919 0.074611054 0.048762656 0.243189326 Sample 4 0.284740019 0.190046266 0.02880524 0.142821805 0.028813184 0.272138724 0.052634764 Sample 11 0.005447614 0.080074742 0.01086816 0.009454749 0.002404633 0.883554158 0.008195943 Sample 5 0.576470738 0.042814009 0.04274546 0.067392553 0.029867829 0.209886768 0.030822642 Sample 6 0.4530424 0.044092102 0.04207554 0.347114356 0.031553487 0.034537133 0.04758498 Sample 7 -0.088121495 0.114319848 0.38703157 0.107000574 0.24974684 0.204100466 0.025922193 Sample 8 0.146175965 0.517055805 0.13548013 0.119349245 0.020930469 0.030382319 0.030626065 Sample 12 0.004911492 0.414258262 0.07665803 0.008781068 0.026323325 0.396293546 0.072774276 Sample 9 0.220817751 0.054693589 0.11161043 0.229245931 0.153135574 0.108948339 0.121548389 Sample 10 0.179461466 0.148850896 0.07424187 0.28602251 0.048613054 0.091058307 0.171751894 Sample 13 0.24914881 0.171957509 0.13331199 0.043893814 0.027837292 0.243969848 0.129880733 Sample 14 0.619468187 0.175914305 0.01021288 0.050524383 0.018911969 0.109652865 0.015315407 Sample 15 0.024403392 0.118639502 0.1164575 0.196565283 0.299012684 0.02810215 0.216819491
-
Run the following command to run CCREPE on two datasets
test.input
andtest.input.2
, with the NC-score as the similarity scoring method (provide by thesim.score
argument).:> out <- ccrepe(x = test.input, y = test.input.2, sim.score = nc.score, iterations = 20, min.subj = 10)
-
The
out
variable will contain the following -
* p.values
-
* z.stat
-
* q.values
-
- sim.score
The output is shown below for your reference:
$p.values
Feature 1 Feature 2 Feature 3 Feature 4 Feature 5 Feature 6 Feature 7
Feature 1 0.5263075 0.8077156 0.8100249 0.1969555 0.6349808 0.2337343 0.3693508
Feature 2 0.8735528 0.9570482 0.9706203 0.4088109 0.7971789 0.6908775 0.6616025
Feature 3 0.1999377 0.4515583 0.3964722 0.4689658 0.2959280 0.5885919 0.5062330
Feature 4 0.6267964 0.5877425 0.4874545 0.1158420 0.1532040 0.8464301 0.8808116
$z.stat
Feature 1 Feature 2 Feature 3 Feature 4 Feature 5 Feature 6
Feature 1 -0.6336527 -0.24337417 -0.24039385 1.2902741 0.4747281 1.1907944
Feature 2 -0.1591473 -0.05385807 -0.03683033 0.8259880 0.2569998 0.3976645
Feature 3 1.2817292 -0.75281958 -0.84793845 -0.7241628 -1.0452055 -0.5408777
Feature 4 -0.4862408 0.54211040 0.69436309 -1.5724682 1.4283052 -0.1936754
Feature 7
Feature 1 0.8976900
Feature 2 0.4377017
Feature 3 -0.6647146
Feature 4 0.1499404
$q.values
Feature 1 Feature 2 Feature 3 Feature 4 Feature 5 Feature 6 Feature 7
Feature 1 4.115114 4.018890 3.855147 7.186497 3.861521 5.117088 5.775791
Feature 2 3.824895 3.880078 3.794563 4.972220 4.155343 3.781303 3.811658
Feature 3 5.471483 4.942928 5.424918 4.666796 5.398899 4.026842 4.262629
Feature 4 4.035971 4.289100 4.446551 12.680502 8.385145 3.860559 3.708345
$sim.score
Feature 1 Feature 2 Feature 3 Feature 4 Feature 5 Feature 6
Feature 1 -0.28571429 0.1038961 0.1515152 0.1601732 0.1515152 0.3809524
Feature 2 -0.31168831 0.3593074 -0.0952381 0.3593074 -0.0952381 0.3809524
Feature 3 0.58874459 -0.5887446 -0.5887446 -0.3333333 -0.5887446 -0.3116883
Feature 4 -0.07792208 -0.0952381 0.1515152 -0.5324675 0.3809524 -0.0952381
Feature 7
Feature 1 0.20779221
Feature 2 0.16017316
Feature 3 -0.07792208
Feature 4 0.16017316
nc.score provides a novel similarity measure (the N-dimensional checkerboard score: NC-score), particularly appropriate to compositions dervied from microbial community sequencing data.
For the purpose of this tutorial we will run nc.score on two datasets.
-
Open R
-
Run the following command to import the library
ccrepe
:> library(ccrepe)
-
For your reference the input datasets are below:
test.input
Feature 1 Feature 2 Feature 3 Feature 4
Sample 1 0.53098625 0.24945178 0.16516569 0.05439628 Sample 2 0.11334774 0.32356694 0.38591054 0.17717477 Sample 3 0.22339983 0.12784189 0.24400540 0.40475287 Sample 4 -0.56292940 0.72177457 0.45731308 0.38384175 Sample 5 0.06740686 0.01687197 0.79829941 0.11742176 Sample 6 -0.39967644 0.11066224 0.38134556 0.90766864 Sample 7 0.52663095 0.29204997 0.03995832 0.14136075 Sample 8 0.63055974 0.31210092 0.03521166 0.02212769 Sample 9 0.08308327 0.05428329 0.84239772 0.02023572 Sample 10 0.55629625 0.30391172 0.11810698 0.02168505
test.input.2
:
Feature 1 Feature 2 Feature 3 Feature 4
Sample 1 0.4856505 0.25517410 0.004001302 0.25517410 Sample 2 0.4009346 0.21260883 0.173847734 0.21260883 Sample 3 0.2622234 0.19282519 0.352126182 0.19282519 Sample 4 0.2156559 0.12046793 0.543408230 0.12046793 Sample 5 0.2821932 0.07223021 0.573346348 0.07223021 Sample 6 0.4216509 0.24159265 0.095163833 0.24159265 Sample 7 0.3919592 0.21705939 0.173921984 0.21705939 Sample 8 0.5283681 0.22512167 0.021388590 0.22512167 Sample 9 0.5373268 0.17106386 0.120545436 0.17106386 Sample 10 0.2697604 0.28053863 0.169162368 0.28053863
-
Run the following command to calculate the NC-score for the two datasets
test.input
andtest.input.2
, with the NC-score as the similarity scoring method (provide by thesim.score
argument).:> out2 <- nc.score(x = test.input, y = test.input.2)
* The out2
variable will contain the NC-scores for the two datasets.
The output is shown below for your reference:
Feature 1 Feature 2 Feature 3 Feature 4
Feature 1 NA 0.38095238 -0.7489177 -0.58874459 Feature 2 0.3809524 NA -0.3809524 -0.07792208 Feature 3 -0.7489177 -0.38095238 NA 0.38095238 Feature 4 -0.5887446 -0.07792208 0.3809524 NA
For more information on CCREPE, please refer to the following wiki page:
- HUMAnN 2.0
- HUMAnN 3.0
- MetaPhlAn 2.0
- MetaPhlAn 3.0
- MetaPhlAn 4.0
- MetaPhlAn 4.1
- PhyloPhlAn 3
- PICRUSt 2.0
- ShortBRED
- PPANINI
- StrainPhlAn 3.0
- StrainPhlAn 4.0
- MelonnPan
- WAAFLE
- MetaWIBELE
- MACARRoN
- FUGAsseM
- HAllA
- HAllA Legacy
- ARepA
- CCREPE
- LEfSe
- MaAsLin 2.0
- MaAsLin 3.0
- MMUPHin
- microPITA
- SparseDOSSA
- SparseDOSSA2
- BAnOCC
- anpan
- MTXmodel
- MTX model 3
- PARATHAA