Skip to content
Vic_ks edited this page May 20, 2020 · 3 revisions
Clone this wiki locally

CCREPE Tutorial

The CCREPE (Compositionality Corrected by REnormalizaion and PErmutation) package is designed to assess the significance of general similarity measures in compositional datasets.

CCREPE can be downloaded from this link. CCREPE is also available as a Github repository.

If you use this package, please cite as below:

Emma Schwager and Colleagues. Detecting statistically significant associations between sparse and high dimensional compositional data. In Progress

We provide support for CCREPE users. Please join our bioBakery support forum designated specifically for CCREPE users.


Table of contents


Overview

CCREPE.png


1. Installation


CCREPE can be installed using either of the following two options.

1.1 Pre-requisites

  • R needs to be installed on your computer.

For Linux users, you can just use apt-get install r-base for installing R. For users with different platforms, please refer to the website above to look up installation instructions.

  • R package infotheo needs to be installed in R.

Once R is installed, run the following command in R to install the package.:

> install.packages('infotheo')

1.2 Installing CCREPE

  • Download: You may download the ccrepe-packagefrom the list.

  • Once downloaded, please install the package in R by doing one of the following:

  • * For Linux users, you can just run the following command to install the ccrepe download to R

    $ R CMD INSTALL --build <insert-download-name.tar.gz>

    • For users with other platforms, please refer to R documentation to see how to source external packages.

OR

  • Clone the repository: You may clone the repository by running the following command from a Terminal.

$ git clone https://github.com/biobakery/ccrepe.git

Once the package has been downloaded and incorporated in R, you may run the following command to import the ccrepe-package.:

> library(ccrepe)

2. Running CCREPE


Once the ccrepe-package is installed, you may now proceed with use it. Please ensure that R is installed on your computer. For instructions on installing R please refer to their website.

The package contains two packages (i) ccrepe and (ii) nc-score. For instructions on each, please see below.

2.2 ccrepe function

ccrepe calculates compositionality-corrected p-values and q-values for compositional data using an arbitrary distance metric.

For the purpose of this tutorial we will run ccrepe on two datasets with a nc.score as the similarity score.

  • Open R

  • Run the following command to import the library ccrepe:

    > library(ccrepe)
    
  • The input datasets are shown below for your reference:

text.input

Feature 1  Feature 2  Feature 3   Feature 4

Sample 1 0.09913084 0.12746072 0.53385029 0.239558154 Sample 2 0.39666736 0.19993817 0.02417398 0.379220490 Sample 3 0.24119443 0.08419378 0.32709373 0.347518058 Sample 4 0.39670572 0.20889021 0.17157276 0.222831316 Sample 5 0.46209528 0.22016053 0.30927015 0.008474046 Sample 6 0.25553284 0.14904298 0.56854622 0.026877963 Sample 7 0.47681832 0.20330031 0.04027400 0.279607363 Sample 8 0.16694612 0.17131849 0.42224798 0.239487416 Sample 9 0.48773148 0.37592572 0.12448270 0.011860096 Sample 10 0.51668975 0.28593023 0.12065695 0.076723068

text.input.2

Feature 1      Feature 2       Feature 3       Feature 4       Feature 5       Feature 6       Feature 7

Sample 1 0.458561155 0.008092532 0.07722429 0.061862506 0.141716599 0.160429523 0.092113392 Sample 2 0.115176017 0.215269857 0.33960857 0.127598647 0.111312569 0.006027953 0.085006387 Sample 3 0.549371433 0.019962964 0.01227265 0.051829919 0.074611054 0.048762656 0.243189326 Sample 4 0.284740019 0.190046266 0.02880524 0.142821805 0.028813184 0.272138724 0.052634764 Sample 11 0.005447614 0.080074742 0.01086816 0.009454749 0.002404633 0.883554158 0.008195943 Sample 5 0.576470738 0.042814009 0.04274546 0.067392553 0.029867829 0.209886768 0.030822642 Sample 6 0.4530424 0.044092102 0.04207554 0.347114356 0.031553487 0.034537133 0.04758498 Sample 7 -0.088121495 0.114319848 0.38703157 0.107000574 0.24974684 0.204100466 0.025922193 Sample 8 0.146175965 0.517055805 0.13548013 0.119349245 0.020930469 0.030382319 0.030626065 Sample 12 0.004911492 0.414258262 0.07665803 0.008781068 0.026323325 0.396293546 0.072774276 Sample 9 0.220817751 0.054693589 0.11161043 0.229245931 0.153135574 0.108948339 0.121548389 Sample 10 0.179461466 0.148850896 0.07424187 0.28602251 0.048613054 0.091058307 0.171751894 Sample 13 0.24914881 0.171957509 0.13331199 0.043893814 0.027837292 0.243969848 0.129880733 Sample 14 0.619468187 0.175914305 0.01021288 0.050524383 0.018911969 0.109652865 0.015315407 Sample 15 0.024403392 0.118639502 0.1164575 0.196565283 0.299012684 0.02810215 0.216819491

  • Run the following command to run CCREPE on two datasets test.input and test.input.2, with the NC-score as the similarity scoring method (provide by the sim.score argument).:

    > out <- ccrepe(x = test.input, y = test.input.2, sim.score = nc.score, iterations = 20, min.subj = 10)
    
  • The out variable will contain the following

  • * p.values

  • * z.stat

  • * q.values

    • sim.score

The output is shown below for your reference:

$p.values
          Feature 1 Feature 2 Feature 3 Feature 4 Feature 5 Feature 6 Feature 7
Feature 1 0.5263075 0.8077156 0.8100249 0.1969555 0.6349808 0.2337343 0.3693508
Feature 2 0.8735528 0.9570482 0.9706203 0.4088109 0.7971789 0.6908775 0.6616025
Feature 3 0.1999377 0.4515583 0.3964722 0.4689658 0.2959280 0.5885919 0.5062330
Feature 4 0.6267964 0.5877425 0.4874545 0.1158420 0.1532040 0.8464301 0.8808116

$z.stat
           Feature 1   Feature 2   Feature 3  Feature 4  Feature 5  Feature 6
Feature 1 -0.6336527 -0.24337417 -0.24039385  1.2902741  0.4747281  1.1907944
Feature 2 -0.1591473 -0.05385807 -0.03683033  0.8259880  0.2569998  0.3976645
Feature 3  1.2817292 -0.75281958 -0.84793845 -0.7241628 -1.0452055 -0.5408777
Feature 4 -0.4862408  0.54211040  0.69436309 -1.5724682  1.4283052 -0.1936754
           Feature 7
Feature 1  0.8976900
Feature 2  0.4377017
Feature 3 -0.6647146
Feature 4  0.1499404

$q.values
          Feature 1 Feature 2 Feature 3 Feature 4 Feature 5 Feature 6 Feature 7
Feature 1  4.115114  4.018890  3.855147  7.186497  3.861521  5.117088  5.775791
Feature 2  3.824895  3.880078  3.794563  4.972220  4.155343  3.781303  3.811658
Feature 3  5.471483  4.942928  5.424918  4.666796  5.398899  4.026842  4.262629
Feature 4  4.035971  4.289100  4.446551 12.680502  8.385145  3.860559  3.708345

$sim.score
            Feature 1  Feature 2  Feature 3  Feature 4  Feature 5  Feature 6
Feature 1 -0.28571429  0.1038961  0.1515152  0.1601732  0.1515152  0.3809524
Feature 2 -0.31168831  0.3593074 -0.0952381  0.3593074 -0.0952381  0.3809524
Feature 3  0.58874459 -0.5887446 -0.5887446 -0.3333333 -0.5887446 -0.3116883
Feature 4 -0.07792208 -0.0952381  0.1515152 -0.5324675  0.3809524 -0.0952381
            Feature 7
Feature 1  0.20779221
Feature 2  0.16017316
Feature 3 -0.07792208
Feature 4  0.16017316

2.3 nc.score function

nc.score provides a novel similarity measure (the N-dimensional checkerboard score: NC-score), particularly appropriate to compositions dervied from microbial community sequencing data.

For the purpose of this tutorial we will run nc.score on two datasets.

  • Open R

  • Run the following command to import the library ccrepe:

    > library(ccrepe)
    
  • For your reference the input datasets are below:

test.input

Feature 1  Feature 2  Feature 3  Feature 4

Sample 1 0.53098625 0.24945178 0.16516569 0.05439628 Sample 2 0.11334774 0.32356694 0.38591054 0.17717477 Sample 3 0.22339983 0.12784189 0.24400540 0.40475287 Sample 4 -0.56292940 0.72177457 0.45731308 0.38384175 Sample 5 0.06740686 0.01687197 0.79829941 0.11742176 Sample 6 -0.39967644 0.11066224 0.38134556 0.90766864 Sample 7 0.52663095 0.29204997 0.03995832 0.14136075 Sample 8 0.63055974 0.31210092 0.03521166 0.02212769 Sample 9 0.08308327 0.05428329 0.84239772 0.02023572 Sample 10 0.55629625 0.30391172 0.11810698 0.02168505

test.input.2

:

Feature 1  Feature 2   Feature 3  Feature 4

Sample 1 0.4856505 0.25517410 0.004001302 0.25517410 Sample 2 0.4009346 0.21260883 0.173847734 0.21260883 Sample 3 0.2622234 0.19282519 0.352126182 0.19282519 Sample 4 0.2156559 0.12046793 0.543408230 0.12046793 Sample 5 0.2821932 0.07223021 0.573346348 0.07223021 Sample 6 0.4216509 0.24159265 0.095163833 0.24159265 Sample 7 0.3919592 0.21705939 0.173921984 0.21705939 Sample 8 0.5283681 0.22512167 0.021388590 0.22512167 Sample 9 0.5373268 0.17106386 0.120545436 0.17106386 Sample 10 0.2697604 0.28053863 0.169162368 0.28053863

  • Run the following command to calculate the NC-score for the two datasets test.input and test.input.2, with the NC-score as the similarity scoring method (provide by the sim.score argument).:

    > out2 <- nc.score(x = test.input, y = test.input.2)
    

* The out2 variable will contain the NC-scores for the two datasets. The output is shown below for your reference:

Feature 1   Feature 2  Feature 3   Feature 4

Feature 1 NA 0.38095238 -0.7489177 -0.58874459 Feature 2 0.3809524 NA -0.3809524 -0.07792208 Feature 3 -0.7489177 -0.38095238 NA 0.38095238 Feature 4 -0.5887446 -0.07792208 0.3809524 NA


Notes

For more information on CCREPE, please refer to the following wiki page: