SIAMCAT - Statistical Inference of Associations between Microbial Communities And host phenoType

Overview

SIAMCAT is a pipeline for Statistical Inference of Associations between Microbial Communities And host phenoTypes.
A primary goal of analyzing microbiome data is to determine changes in community composition that are associated with environmental factors. In particular, linking human microbiome composition to host phenotypes such as diseases has become an area of intense research. For this, robust statistical modeling and biomarker extraction toolkits are crucially needed.
SIAMCAT provides a full pipeline supporting data preprocessing, statistical association testing, statistical modeling (LASSO logistic regression) including tools for evaluation and interpretation of these models (such as cross validation, parameter selection, ROC analysis and diagnostic model plots).
SIAMCAT is available in three different flavors:

Galaxy web server
command line tool
R package

Please see the Support Section if you run into problems when using SIAMCAT.

Input data format

The input data should be organized in the same way for every version of SIAMCAT. All files are in tab-separated column format

Label data: First row is expected to be #BINARY:1=[label for cases];-1=[label for controls]
Second row should contain the sample identifiers as tab-separated list (consistent with feature and metadata). Third row is expected to contain the actual class labels (tab-separated), e.g. 1 for each case and -1 for each control.
Note: Labels can take other numeric values (but not characters or strings); importantly, the label for cases has to be greater than the one for controls.
Feature matrix: features (in rows) x samples (in columns)
First row should contain sample labels (consistent with label data), while the first column should contain feature labels (e.g. taxonomic identifiers). The remaining entries are expected to be real values >= 0 that quantify the abundance of each feature in each sample.
Metadata (optional): samples (in rows) x metadata (in columns)
Metadata needs to be converted to numerical values by the user (This is necessary for heatmap displays)!

Galaxy interface:

The Galaxy interface can be found here: http://siamcat.embl.de/

Galaxy in brief

Left panel: TOOLS lists available analysis modules.
Click to choose which ones you'd like to run.
Right panel: HISTORY keeps track of every analysis step you have perfomed.
Click on the "eye" icon to view data, or click on the "floppy disk" icon to download results
If something goes wrong the "i" icon can provide useful details
the "circular arrows" icon allows to rerun a job
you can delete analysis steps from your history using the "x" icon
Central panel: ANALYZE DATA allows to specify input data sets and parameters for each analysis module

Additional info: https://usegalaxy.org/ (in particular the Help menu) and https://wiki.galaxyproject.org/Learn

Getting started with Galaxy

Start by uploading your data (see above for input data formats) using the DATA IMPORT / Import Data module / Upload File

Then procede by executing all SIAMCAT modules in order (from A to I). See example history / Workflow as well as each module's description for specific information on input and output data

Commandline version

The commandline version are a collection of modules implemented in R which are called via a bash script.

Stable version: https://github.com/gezel/siamcat/
Developmental version (only available inside the EMBL intranet): beta:/g/bork4/zeller/dev/siamcat

# type
git clone beta:/g/bork4/zeller/dev/siamcat
# in the folder in which you'd like to clone the siamcat repository

R packages required to run SIAMCAT:

install.packages('optparse')
install.packages('LiblineaR')
install.packages('pROC')
install.packages('colorRamps')
install.packages('RColorBrewer')
install.packages('beanplot')

Using the Commandline version

...COMING SOON...

R package

The SIAMCAT R package ...COMING SOON...

Using the R package

...COMING SOON...

Support

Google user group for support:

https://groups.google.com/d/forum/siamcat-users

Known issues

Examples are weighted differently between classes (a remnant of our colorectal cancer microbiome study). Fixed in Galaxy, will be pushed to GitHub soon.

Class labels are somehow swapped in the LASSO module, so that prediction scores are 1 - p instead of p (posterior probability), consequently precision-recall curves are incorrect, but ROC-curves are unaffected. Appears to only occur in a recent version of R and/or the LiblineaR package; will be fixed with high priority.

Contact

Please let me know if you run into any issues (mailto: zeller@embl.de)

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
R		R
data		data
docs		docs
inst		inst
man		man
vignettes		vignettes
.gitignore		.gitignore
DESCRIPTION		DESCRIPTION
NAMESPACE		NAMESPACE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SIAMCAT - Statistical Inference of Associations between Microbial Communities And host phenoType

Overview

Input data format

Galaxy interface:

Galaxy in brief

Getting started with Galaxy

Commandline version

Using the Commandline version

R package

Using the R package

Support

Known issues

Contact

About

Releases

Packages

Contributors 2

Languages

KonradZych/SIAMCAT

Folders and files

Latest commit

History

Repository files navigation

SIAMCAT - Statistical Inference of Associations between Microbial Communities And host phenoType

Overview

Input data format

Galaxy interface:

Galaxy in brief

Getting started with Galaxy

Commandline version

Using the Commandline version

R package

Using the R package

Support

Known issues

Contact

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages