GMWI2: Gut Microbiome Wellness Index 2

Description

GMWI2 (Gut Microbiome Wellness Index 2) is a robust and biologically interpretable predictor of health status based on gut microbiome taxonomic profiles.

On a stool metagenome sample, this command-line tool performs four major steps:

Quality control
1. Removal of overrepresented sequences (probable adapter sequences) using fastqc
2. Removal of human DNA contaminants (reads that map to GRCh38/hg38) using Bowtie2
3. Removal of adapter sequences and low quality reads using Trimmomatic
Taxonomic profiling using MetaPhlAn3 (v3.0.13) with the mpa_v30_CHOCOPhlAn_201901 marker database
Transformation of taxonomic relative abundances into a binary presence/absence profile
Computation of the GMWI2 score using a Lasso-penalized logistic regression model trained on a meta-dataset of 8,069 health status labeled stool shotgun metagenomes

If you use GMWI2, please cite:

Gut Microbiome Wellness Index 2 for Enhanced Health Status Prediction from Gut Microbiome Taxonomic Profiles Chang and Gupta et al. bioRxiv (2023).

System requirements

GMWI2 is supported for macOS and Linux, and has been tested on the following systems:

macOS Big Sur 11.7.10
CentOS Linux 7 (Core)

Installation

To avoid dependency conflicts, please create an isolated conda environment and install the GMWI2 package. Installation via conda/mamba automatically installs GMWI2 and its dependencies. Make sure to perform step 4 to ensure that databases are downloaded and installed! Installation should take ~30 minutes.

Create new conda environment and install mamba

conda create --name gmwi2_env -c conda-forge mamba python=3.8

Activate environment

conda activate gmwi2_env

Install GMWI2 package with mamba

mamba install -c bioconda -c conda-forge gmwi2=1.5

Download/install databases (and verify that the package was installed correctly) by running GMWI2 on a tiny simulated stool metagenome. This tool automatically installs databases during the first run (should take ~20 minutes). To avoid issues in downloading databases, please run this step before submitting multiple concurrent batch jobs.

# download the tiny stool metagenome
wget https://raw.githubusercontent.com/danielchang2002/GMWI2/main/example/tiny/tiny_1.fastq
wget https://raw.githubusercontent.com/danielchang2002/GMWI2/main/example/tiny/tiny_2.fastq

gmwi2 -f tiny_1.fastq -r tiny_2.fastq -n 16 -o tiny

Usage

Try downloading and running GMWI2 on a real example stool metagenome from the pooled dataset used to develop GMWI2 (should take ~20 minutes).

Input: Two (forward/reverse) raw fastq (or fastq.gz) files generated from paired-end stool metagenome reads
Output: The GMWI2 (Gut Microbiome Wellness Index 2) score

usage: gmwi2 [-h] -n NUM_THREADS -f FORWARD -r REVERSE -o OUTPUT_PREFIX [-v]

* Example usage:

$ ls
.
├── forward.fastq
└── reverse.fastq

$ gmwi2 -f forward.fastq -r reverse.fastq -n 8 -o output_prefix

$ ls
.
├── forward.fastq
├── reverse.fastq
├── output_prefix_GMWI2.txt
├── output_prefix_GMWI2_taxa.txt
└── output_prefix_metaphlan.txt

The three output files are: 
(i) output_prefix_GMWI2.txt: GMWI2 score
(ii) output_prefix_GMWI2_taxa.txt: A list of the taxa present in the sample used to compute GMWI2
(iii) output_prefix_metaphlan.txt: Raw MetaPhlAn3 taxonomic profiling output

optional arguments:
  -h, --help            show this help message and exit
  -v, --version         show program's version number and exit

required named arguments:
  -n NUM_THREADS, --num_threads NUM_THREADS
                        number of threads
  -f FORWARD, --forward FORWARD
                        forward-read of metagenome (.fastq/.fastq.gz)
  -r REVERSE, --reverse REVERSE
                        reverse-read of metagenome (.fastq/.fastq.gz)
  -o OUTPUT_PREFIX, --output_prefix OUTPUT_PREFIX
                        prefix to designate output file names

To merge GMWI2 score output files from multiple samples into a single csv file, please run:

echo "Sample,GMWI2" > merged.csv && for file in *GMWI2.txt; do echo "$(basename "$file" | awk -F "_GMWI2.txt" '{print $1}'),$(cat "$file")" >> merged.csv; done

Reproducing manuscript results

Please use the colab notebook linked above to reproduce all downstream analyses on the pooled dataset. See the manuscript directory for more details.

Poop on a chip??

The top image was generated via OpenAI DALL·E 2 using the prompt: "3D render of GPU chip in the form of a poop emoji, digital art". The image was then widened using the Runway Infinite Image tool.

Name		Name	Last commit message	Last commit date
Latest commit History 145 Commits
conda_recipe		conda_recipe
example		example
images		images
manuscript		manuscript
manuscript_raw		manuscript_raw
src/GMWI2		src/GMWI2
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

conda_recipe

conda_recipe

example

example

images

images

manuscript

manuscript

manuscript_raw

manuscript_raw

src/GMWI2

src/GMWI2

.gitattributes

.gitattributes

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

setup.py

setup.py

Repository files navigation

GMWI2: Gut Microbiome Wellness Index 2

Description

System requirements

Installation

Usage

Reproducing manuscript results

Poop on a chip??

About

Releases 3

Packages

Languages

License

danielchang2002/GMWI2

Folders and files

Latest commit

History

Repository files navigation

GMWI2: Gut Microbiome Wellness Index 2

Description

System requirements

Installation

Usage

Reproducing manuscript results

Poop on a chip??

About

Resources

License

Stars

Watchers

Forks

Languages