Population-based survey linking gut microbiome to economic development and metabolic syndrome
Analysis pipeline for: Population-based survey linking gut microbiome to economic development and metabolic syndrome
Author: Hui-Min Zheng, Pan Li, Xian Wang and Yan He
Last update: 2017-04-18
Email: 328093402@qq.com
1 Environment
1.1 System
1.1.1 System Platform
1.1.2 Hardware
1.2 R
1.2.1 R version
1.2.2 R libraries
1.3 MaAsLin
1.4 Qiime
1.4.1 Qiime version
1.4.2 System information
1.4.3 QIIME default reference information
1.4.4 QIIME config values
1.5 BBMap
2 Data
2.1 original sequences
2.2 metadata
3 Scripts
3.1 Perl Scripts
3.2 R Scripts
4 Supplementary files
4.1 metadata_category.txt
4.2 taxa.list
5 Direction for use
5.1 Configuring the system environment files and variables
5.2 Location of the files
5.3 modify Relate_metadata_with_microbiota.pl
5.4 Run pipeline
6 How to get the data?
6.1 How to get the original sequences?
6.2 How to get the metadata?
Platform: Linux2
Version: Linux version 2.6.32-573.8.1.el6.x86_64 (mockbuild@c6b8.bsys.dev.centos.org) (gcc version 4.4.7 20120313 (Red Hat 4.4.7-16)(GCC))
OS: CentOS release 6.7 (Final)
Cpu(s): >10
thread: >10
RAM: >10G
Hard disk: >2T
Version: R version 3.2.5(Linux)(2016-04-14)(www.r-project.org)
Copyright: (C) 2016 The R Foundation for Statistical Computing
ggplot2
psych
reshape
agricolae
gam
gamlss
gbm
glmnet
inlinedocs
logging
MASS
nlme (version 3.1-127)
optparse
outliers
penalized
pscl
robustbase
(https://bitbucket.org/biobakery/maaslin/)
Package: Maaslin
Type: Package
Title: Maaslin
Version: 0.0.3
Imports: agricolae, gam, gamlss, gbm, glmnet, inlinedocs, logging, MASS, nlme, optparse, outliers, penalized, pscl, robustbase
Date: 2014-12-04
Author: Timothy Tickle<ttickle@hsph.harvard.edu>, Curtis Huttenhower <chuttenh@hsph.harvard.edu>
Maintainer: Timothy Tickle <ttickle@hsph.harvard.edu>,Ayshwarya
Subramanian<subraman@broadinstitute.org>,Lauren
McIver<lauren.j.mciver@gmail.com>,George
Weingart<george.weingart@gmail.com>
Description: MaAsLin is a multivariate statistical framework that finds associations between clinical metadata and microbial community abundance or function. The clinical metadata can be of any type continuous (for example age and weight), boolean (sex, stool/biopsy), or discrete/factor (cohort groupings and phenotypes). MaAsLin is best used in the case when you are associating many metadata with microbial measurements. When this is the case each metadatum can be a diffrent type. For example, you could include age, weight, sex, cohort and phenotype in the same input file to be analyzed in the same MaAsLin run. The microbial measurements are expected to be normalized before using MaAsLin and so are proportional data ranging from 0 to 1.0
License: MIT + file LICENSE
VignetteBuilder: knitr
Suggests: knitr, BiocStyle, BiocGenerics
biocViews: Statistics, Metagenomics, Bioinformatics, Software
Packaged: 2015-04-02 00:28:13 UTC; gweingart
Version: qiime 1.9.1
Platform: Linux2
Python version: 2.7.10 (default, Dec 4 2015, 15:36:19) [GCC 4.4.7 20120313 (Red Hat 4.4.7-16)]
Python executable: /usr/local/bin/python
For details on what files are used as QIIME's default references, see here:
https://github.com/biocore/qiime-default-reference/releases/tag/0.1.3
QIIME library version: 1.9.1
QIIME script version: 1.9.1
qiime-default-reference version: 0.1.3
NumPy version: 1.11.0
SciPy version: 0.17.1
pandas version: 0.17.1
matplotlib version: 1.4.3
biom-format version: 2.1.5
qcli version: 0.1.1
pyqi version: 0.3.2
scikit-bio version: 0.2.3
PyNAST version: 1.2.2
Emperor version: 0.9.51
burrito version: 0.9.1
burrito-fillings version: 0.1.1
sortmerna version: SortMeRNA version 2.0, 29/11/2014
sumaclust version: SUMACLUST Version 1.0.00
swarm version: Swarm 1.2.19 [Dec 5 2015 16:48:11]
gdata: Installed.
For definitions of these settings and to learn how to configure QIIME, see here:
http://qiime.org/install/qiime_config.html
http://qiime.org/tutorials/parallel_qiime.html
QIIME config values
For definitions of these settings and to learn how to configure QIIME, see here:
http://qiime.org/install/qiime_config.html
http://qiime.org/tutorials/parallel_qiime.html
blastmat_dir: None
pick_otus_reference_seqs_fp: /usr/local/lib/python2.7/sitepackages/qiime_default_reference/gg_13_8_otus/rep_set/97_otus.fasta
python_exe_fp: python
sc_queue: all.q
topiaryexplorer_project_dir: None
pynast_template_alignment_fp: /usr/local/data/core_set_aligned.fasta.imputed
cluster_jobs_fp: None
pynast_template_alignment_blastdb: None
assign_taxonomy_reference_seqs_fp: /usr/local/lib/python2.7/sitepackages/qiime_default_reference/gg_13_8_otus/rep_set/97_otus.fasta
torque_queue: friendlyq
qiime_test_data_dir: None
template_alignment_lanemask_fp: /usr/local/data/lanemask_in_1s_and_0s.txt
jobs_to_start: 1
slurm_time: None
cloud_environment: False
qiime_scripts_dir: /usr/local/bin
denoiser_min_per_core: 50
working_dir: None
assign_taxonomy_id_to_taxonomy_fp: /usr/local/lib/python2.7/sitepackages/qiime_default_reference/gg_13_8_otus/taxonomy/97_otu_taxonomy.txt
temp_dir: /tmp/
slurm_memory: None
slurm_queue: None
blastall_fp: blastall
seconds_to_sleep: 2
BBTools bioinformatics tools, including BBMap.
Author: Brian Bushnell, Jon Rood
Language: Java
Version 36.32
format of sequences : fastq
Number of fq files: 36
fq_filenames: 1.Clean_FCHVJVMBCXX_L1_wHAXPI034525-109_1.fq
1.Clean_FCHVJVMBCXX_L1_wHAXPI034525-109_2.fq
1.Clean_FCHVJVMBCXX_L2_wHAXPI034525-109_1.fq
1.Clean_FCHVJVMBCXX_L2_wHAXPI034525-109_2.fq
1.Clean_FCHVTWCBCXX_L1_wHAXPI034526-108_1.fq
1.Clean_FCHVTWCBCXX_L1_wHAXPI034526-108_2.fq
1.Clean_FCHVTWCBCXX_L2_wHAXPI034526-108_1.fq
1.Clean_FCHVTWCBCXX_L2_wHAXPI034526-108_2.fq
2.Clean_FCHVJVMBCXX_L1_wHAXPI034525-109_1.fq
2.Clean_FCHVJVMBCXX_L1_wHAXPI034525-109_2.fq
2.Clean_FCHVJVMBCXX_L2_wHAXPI034525-109_1.fq
2.Clean_FCHVJVMBCXX_L2_wHAXPI034525-109_2.fq
2.Clean_FCHVTWCBCXX_L1_wHAXPI034526-108_1.fq
2.Clean_FCHVTWCBCXX_L1_wHAXPI034526-108_2.fq
2.Clean_FCHVTWCBCXX_L2_wHAXPI034526-108_1.fq
2.Clean_FCHVTWCBCXX_L2_wHAXPI034526-108_2.fq
3.Clean_FCHVJVMBCXX_L1_wHAXPI034525-109_1.fq
3.Clean_FCHVJVMBCXX_L1_wHAXPI034525-109_2.fq
3.Clean_FCHVJVMBCXX_L2_wHAXPI034525-109_1.fq
3.Clean_FCHVJVMBCXX_L2_wHAXPI034525-109_2.fq
3.Clean_FCHVTWCBCXX_L1_wHAXPI034526-108_1.fq
3.Clean_FCHVTWCBCXX_L1_wHAXPI034526-108_2.fq
3.Clean_FCHVTWCBCXX_L2_wHAXPI034526-108_1.fq
3.Clean_FCHVTWCBCXX_L2_wHAXPI034526-108_2.fq
4.Clean_FCHVJVMBCXX_L1_wHAXPI034525-109_1.fq
4.Clean_FCHVJVMBCXX_L1_wHAXPI034525-109_2.fq
4.Clean_FCHVJVMBCXX_L2_wHAXPI034525-109_1.fq
4.Clean_FCHVJVMBCXX_L2_wHAXPI034525-109_2.fq
4.Clean_FCHVTWCBCXX_L1_wHAXPI034526-108_1.fq
4.Clean_FCHVTWCBCXX_L1_wHAXPI034526-108_2.fq
4.Clean_FCHVTWCBCXX_L2_wHAXPI034526-108_1.fq
4.Clean_FCHVTWCBCXX_L2_wHAXPI034526-108_2.fq
5.Clean_FCHVJVMBCXX_L1_wHAXPI034525-109_1.fq
5.Clean_FCHVJVMBCXX_L1_wHAXPI034525-109_2.fq
5.Clean_FCHVJVMBCXX_L2_wHAXPI034525-109_1.fq
5.Clean_FCHVJVMBCXX_L2_wHAXPI034525-109_2.fq
Filename: Additional file 2: Table S1
Header: #SampleID county_level_code age gender Bristol_stool_type fam_income_year_avg fam_spend_year_avg
MetS anthrop_waist anthrop_SBP anthrop_DBP biochem_FBG biochem_TG biochem_HDL
Row.names: 6896 SampleIDs
Function: Pipline of preprocessing, this script performs all processing steps through building the OTU table with several pair of fastq file.
Last updata: 2016-09-18
Author: Huimin Zheng
Function: Pipline of geting main results of the paper-Population-based survey linking gut microbiome to economic development and metabolic syndrome.
Last updata: 2017-04-18
Author: Huimin Zheng
Function: This script performs all processing steps through building the OTU table with one pair of fastq file.
Location: Called by the pipeline--Preprocessing.pl
Last updata: 2016-08-08
Author: Yan He
Function: Trim the fastq file to 200bp, this can reduce the computational burden while using enough information to do overlapping
Location: Called by the script--Illumina_pairend_preprocessing.pl
Last updata: 2016-08-08
Author: Hua-Fang Sheng
Function: Do library splitting, as barcodes on both ends is not quite supported by QIIME at the moment (QIIME 1.9.1)
Location: Called by the script--Illumina_pairend_preprocessing.pl
Last updata: 2016-08-08
Author: Yan He
Function: merge the alpha diversity file and metadata file.
Location: Called by the pipeline--Relate_metadata_with_microbiota.pl
Last updata: 2016-08-08
Author: Yan He
Function: Examined the significance using adonis analysis on various subsampling sizes with specific replications at each size.
Location: Called by the pipeline--Relate_metadata_with_microbiota.pl
Last updata: 2017-04-16
Author: Huimin Zheng
Function: Do adonis analysis on all variables supplied.
Location: Called by the script--adonis_dilution_curve.pl
Last updata: 2016-08-08
Author: Yan He
Function: Collect the adonis analysis results of script adonis_dilution_curve.pl and plot.
Location: Called by the script--Illumina_pairend_preprocessing.pl
Last updata: 2017-04-16
Author: Huimin Zheng
Function: Do MaAsLin analysis
Location: Called by the pipeline--Relate_metadata_with_microbiota.pl
Last updata: 2016-08-08
Author: Yan He
Function: Add relative abundance of taxa in taxa.list to metadata file.
Location: Called by the pipeline--Relate_metadata_with_microbiota.pl
Last updata: 2017-04-16
Author: Huimin Zheng
Function: Collapse samples in metadata file. Values in the metadata file are collapsed by means of continuous variables and proportion of categorical variables for each group.
Location: Called by the pipeline--Relate_metadata_with_microbiota.pl
Last updata: 2017-04-16
Author: Huimin Zheng
Function: Calculate MetS incidences between quartiles and plot
Location: Called by the pipeline--Relate_metadata_with_microbiota.pl
Last updata: 2017-04-16
Author: Pan Li
Supplementary file of maaslin_and_cytoscape.otu.pl
Supplementary file of add_taxa_to_map.pl
Configuring the system environment files and variables based on the (1) Environment
put the scripts, bbmap folder and supplementary files in the same path
put all fastq files in the same path
line 39: get the path of 97_otus.fasta, such as /usr/local/lib/python2.7/site-packages/qiime_default_reference/gg_13_8_otus/rep_set/97_otus.fasta
line 51: get the path of 97_otus.fasta, such as /usr/local/lib/python2.7/site-packages/qiime_default_reference/gg_13_8_otus/rep_set/97_otus.fasta
perl Preprocessing.pl <fq_dir> <metadata.list> <threads> <output_dir>
<fq_dir>: Path to the folder containing all fastq files.
<metadata.list>: Path to file listing path to metadata file.
<threads>: Specify number of threads.
<output_dir>: The output directory.
# nohup perl Preprocessing.pl <fq_dir> <metadata.list> <threads> <output_dir> > Preprocessing.log 2>&1 &
perl Relate_metadata_with_microbiota.pl <otu_table.biom> <metadata> <output_dir>
<otu_table.biom>: The input otu table filepath in biom format.
<metadata>: Path to the metadata file.
<output_dir>: The output directory.
#nohup perl Relate_metadata_with_microbiota.pl <otu_table.biom> <metadata> <output_dir> > Preprocessing.log 2>&1 &
The 16S gene sequencing reads of GGMP have been deposited in EBI under accession PRJEB18535.
Please contact Prof. Hong-Wei Zhou(biodegradation@gmail.com) to get a application form for access to the GGMP metadata.