Skip to content

Sung-Bong-Kang/hGMNet

Repository files navigation

Automatic pipeline of microbiome PheWAS for the discovery of host-microbe interaction networks

hGMNet : host Genetics and Microbe interaction Networks

The assay which proceeds to measure changes in microbe abundance according to a single genotype, does not take into account the interactions of bacteria and genetics.

 Therefore, we have created a tool to discover new host genetics and microbe interaction networks using microbiome PheWAS.

image

1.Required Libraries and Tools

Python 2.7 version from https://www.python.org/download/releases/2.7

1.1.python libraries :

matplotlib : pip install matplotlib

pandas 0.23.4 : pip install pandas

numpy 1.15.4 : pip install numpy

seaborn 0.9.0 : pip install seaborn

networkx 2.2 : pip install networkx

metagenomeSeq : for CSS normalization R Script source("http://bioconductor.org/biocLite.R") biocLite("metagenomeSeq")

sklearn , math

1.2.analysis tools :

Plink 1.9 version for genotype analysis from https://www.cog-genomics.org/plink2

If you downloaded plink, you would enter the following code:

sudo ln -s /absolute/Path/of/plink /usr/local/bin/plink

R version 3.4 or higher

db19_20k.gz for Gene mode from https://drive.google.com/open?id=1hEUdViceUQIO-_-zSShxUqW6W4qashXu

2.How to use ?

2.1.options dicription

2.1.1.Require Options

--DIR : path of your Plink file format data

--Input_prefix : Plink files(.bed,.bim,.fam) ID

--OTU_ID : File ID of OTU file format

--Bacterial_class : Choose bacterial taxonomic level such as Species(S),Genus(G),Family(F),Order(O),Class(C),Pylumn(P)

--Analysis : Choose Analysis Mode such as Linear, NMF(non-negative matrix factorization), (Logistic is not yet available)
see http://zzz.bwh.harvard.edu/plink/anal.shtml

--P_cut : Cut off of Single SNP P-value base on linear Quantitative Trait Loci Wald Test.
see http://zzz.bwh.harvard.edu/plink/anal.shtml .qassoc

--P_count : Set the number of bacteria that exceed significance P. This is to find SNPs that control several bacteria.

2.1.2.Selective Options

--PHEWAS_image_mode: PheWAS results make image like fig.1 Choose Y or YES make image default None

--NMF_K : If the --Analysis option is NMF, set the NMF component K

--Gene_mode : Y or YES : SNP in the gene region 20kb are generated by gene name. default None
require db19_20k.gz from https://drive.google.com/open?id=1hEUdViceUQIO-_-zSShxUqW6W4qashXu

--Cov : Covariate File name . For the covariate file format, only plink covariate format is available.(require --Cov_names)

--Cov_names : "," sparate Covariate names such as age,sex,bmi, .etc (require --Cov)

--Norm : Operation taxonomic units(OTU) Table normalization. you can choose TSS and CSS .(default TSS)

--Corr : Analysis method of SNP-SNP Beta correlation (defalt pearson, bray-cutis, .etc ) *not yet available option

--Corr_cut : Set correlation coefficient cutoff *not yet available option

2.2.Analysis Examples

2.2.1.Analysis example 1 : make PheWAS figures, no covariate , analysis bacterial-Class:Family

/downloaded/hGMNet/Path/hGMNet.sh
--OTU_ID your_OTU.txt
--Bacterial_class F
--OTU_DIR /your/OTU/path
--Input_prefix plink_file_id
--DIR /your/plink/.bed.bim.fam/path
--Analysis Linear
--P_cut 5e-6
--P_count 10
--PHEWAS_image_mode Y

2.2.2.Analysis example 2 : non make PheWAS figures, Covariate, analysis bacterial-Class:Species

/downloaded/hGMNet/Path/hGMNet.sh
--OTU_ID your_OTU.txt
--Bacterial_class S
--OTU_DIR /your/OTU/path
--Input_prefix plink_file_id
--DIR /your/plink/.bed.bim.fam/path
--Analysis Linear
--P_cut 5e-6
--P_count 4
--Cov /covariate/path/covariate.txt
--Cov_names age,sex,bmi,.etc

2.2.3.Analysis example 3 : Non-negative matrix factorization(NMF) analysis , non PheWAS image

/downloaded/hGMNet/Path/hGMNet.sh
--OTU_ID your_OTU.txt
--Bacterial_class F
--OTU_DIR /your/OTU/path
--Input_prefix plink_file_id
--DIR /your/plink/.bed.bim.fam/path
--Analysis NMF
--NMF_K 8
--P_cut 5e-6
--P_count 1

2.2.4. Analysis example 4 : CSS normalization , --NMF, --Cov

/downloaded/hGMNet/Path/hGMNet.sh
--OTU_ID your_OTU.txt
--Bacterial_class G
--OTU_DIR /your/OTU/path
--Input_prefix plink_file_id
--DIR /your/plink/.bed.bim.fam/path
--Analysis NMF
--Norm CSS
--NMF_K 8
--P_cut 5e-6
--P_count 1
--Cov /covariate/path/covariate.txt
--Cov_names age,sex,bmi,.etc

3.Download and Example run

./ is dowonloaded directory

Program Download :

git clone https://github.com/Sung-Bong-Kang/hGMNet.git

Initial Setup :

bash ./Setup.sh

Example run :

./SetUP_example/Example_run.sh

Results figure

image [fig.1 microbiome PheWAS image mode result]

ALL_chr resultsfor_network csv_snp_bacteria_network [fig.2 Bacteria and host Genotype interaction network

4.Reference

[1]

[2] Cronin, Robert M.; Field, Julie R.; Bradford, Yuki; Shaffer, Christian M.; Carroll, Robert J.; Mosley, Jonathan D.; Bastarache, Lisa; Edwards, Todd L.; Hebbring, Scott J. (2014). "Phenome-wide association studies demonstrating pleiotropy of genetic variants within FTO with and without adjustment for body mass index". Frontiers in Genetics. 5: 250. doi:10.3389/fgene.2014.00250. ISSN 1664-8021. PMC 4134007. PMID 25177340.

[3] Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MAR, Bender D, Maller J, Sklar P, de Bakker PIW, Daly MJ & Sham PC (2007) PLINK: a toolset for whole-genome association and population-based linkage analysis. American Journal of Human Genetics, 81.

[4] Fevotte, C., & Idier, J. (2011). Algorithms for nonnegative matrix factorization with the beta-divergence. Neural Computation, 23(9).