-
Notifications
You must be signed in to change notification settings - Fork 72
ppanini
PPANINI (Prioritization and Prediction of functional Annotation for Novel and Important genes via automated data Network Integration) is a computational pipeline that ranks genes by employing a combination of community parameters such as prevalence and abundance across samples.The resulting prioritized list of gene candidates can then be further analyzed using our visualization tools. PPANINI is available as a Github repository.
We provide support for PPANINI users via our bioBakery support forum. Please feel free to send any questions to the group by posting directly in the forum.
Table of contents
The easiest way to install PPANINI is with pip.
To install with pip:
$ pip install ppanini
After installation from pip, you may optionally test your local PPANINI environment:
$ ppanini_test
Which yields :
test_annotate_genes (basic_tests_annotate_genes.TestAnnotateGenesBasicFunctions) ... ok
test_read_gene_table (basic_tests_ppanini.TestPPANINIBasicFunctions)
Tests the function read_gene_table ... Gene Table contains 2 metadata lines .
Gene Table contains 998 gene or centroid lines.
ok
test_preppanini (basic_tests_preppanini.TestPrePPANINIBasicFunctions) ... ok
test_quantify_genes (basic_tests_quantify_genes.TestQuanitfyGenesBasicFunctions) ... ok
test_create_folders (basic_tests_utilities.TestUtilitiesBasicFunctions) ... ok
test_is_present (basic_tests_utilities.TestUtilitiesBasicFunctions) ... ok
test_is_protein (basic_tests_utilities.TestUtilitiesBasicFunctions) ... /Library/Python/2.7/site-packages/biopython-1.66-py2.7-macosx-10.9-intel.egg/Bio/Seq.py:2041: BiopythonWarning: Partial codon, len(sequence) not a multiple of three. Explicitly trim the sequence or add trailing N before translation. This may become an error in future.
BiopythonWarning)
ok test\_pullgenes\_fromcontigs (basic\_tests\_utilities.TestUtilitiesBasicFunctions) ... ok test\_read\_fasta (basic\_tests\_utilities.TestUtilitiesBasicFunctions) ... ok test\_read\_gff3 (basic\_tests\_utilities.TestUtilitiesBasicFunctions) ... ok test\_read\_ppanini\_imp\_genes\_table (basic\_tests\_utilities.TestUtilitiesBasicFunctions) ... ok test\_write\_dict (basic\_tests\_utilities.TestUtilitiesBasicFunctions) ... ok test\_write\_fasta (basic\_tests\_utilities.TestUtilitiesBasicFunctions) ... ok ----------------------------------------------------------------------Ran 13 tests in 3.123s OK
PPANINI prioritizes important genes including characterized genes in protein or function level and uncharacterized genes according to their properties in microbial communities. The input file is a gene families abundance table.
-i or --input-table
Such tables can be obtained using:
The input file is a table of annotated gene abundances across samples. You can obtain a copy of demo input by right-clicking this link and selecting "save link as":
* demo_ppanini_gene_families.txt This file can have metadata rows as well and metadata row names should start with #.
To execute PPANINI , you can use the demo input file described above and run the following:
$ ppanini -i demo_ppanini_gene_families.txt -o ppanini_demo_output
Which yields: :
--- Reading the gene table...
--- Gene Table contains 1 metadata lines.
--- Gene Table contains 2006 gene families.
--- Summarize gene families table ...
--- Number of centroids: 2006
--- Normalize gene families table ...
--- Getting prevalence abundance ...
--- Mapping UniRef90 to GO terms!
--- Loading mapping file from: /Library/Python/2.7/site-packages/ppanini-0.7.0-py2.7.egg/ppanini/data/map_uniref90_infogo1000.txt.gz
This is a large file, one moment please...
--- Prioritize gene families ...
--- The PPANINI output is written in ...
--- PPANINI process is successfully completed ...
A list of important genes families based on prevalence, abundance, and ppanini score is the output of PPANINI. At the end of the analysis, a number of files are generated as an output.
The output:
$ ls ppanini_demo_output/*
Which yields:
ppanini_table.txt
temp:
ppanini_abundance_table.txt ppanini_gene_centroids_norm.txt
$ column -t -s $'\t' ppanini_demo_output/ppanini_table.txt | less -S ```
which yields:
alpha_prevalence prevalence_percentile mean_abundance abund_percentile beta_prevalence ppanini_score GO
Cluster 3236 0.6 99.4765702891 0.0796948281516 99.850448654 24.9157897073
Cluster 1954 0.6 99.4765702891 0.0130959466023 97.7567298106 24.6522879233
UniRef90_A4K468 0.8 99.7258225324 0.0100382547968 97.0588235294 24.5935625675
UniRef90_K4HN31 1.0 99.9501495513 0.00852018618773 96.3609172483 24.530680432
UniRef90_K4HN67 1.0 99.9501495513 0.00848592460776 96.3110667996 24.524217543
UniRef90_K4HMX4 1.0 99.9501495513 0.00785893181152 95.8624127617 24.4659034607
UniRef90_A4K475 0.8 99.7258225324 0.00775611952695 95.7128614158 24.4195356794
UniRef90_T1R4E7 0.8 99.7258225324 0.00728619760249 95.4636091725 24.3870450964
UniRef90_K4HNF9 0.8 99.7258225324 0.00627188732095 94.2671984048 24.2299281008
UniRef90_T1R5B4 0.8 99.7258225324 0.00541914213714 93.0707876371 24.0708611077
UniRef90_K4HMW9 0.6 99.4765702891 0.00516704969614 92.5722831505 23.9750799516
UniRef90_F4MIK3 0.8 99.7258225324 0.00422832458672 90.4287138584 23.7124973226
UniRef90_U7MMQ4 0.6 99.4765702891 0.00343814449336 89.0329012961 23.4913598488
Cluster 3001 0.4 64.5812562313 0.0705061973561 99.8005982054 19.6044996171
UniRef90_A4K498 0.4 64.5812562313 0.018181925684 98.8035892323 19.5270861709
UniRef90_K4HN57 0.4 64.5812562313 0.017728406802 98.703888335 19.5192928313
UniRef90_K4HNN3 0.4 64.5812562313 0.0142993385429 98.1056829511 19.4723321991
UniRef90_T1R4K8 0.4 64.5812562313 0.0136896074371 98.0059820538 19.4644718306
UniRef90_K4HMX7 0.4 64.5812562313 0.013630794173 97.9561316052 19.4605380301
UniRef90_A4K488 0.4 64.5812562313 0.0133693314862 97.9062811565 19.456601816
UniRef90_U7LYK0 0.4 64.5812562313 0.0126166167897 97.7068793619 19.4408327774
UniRef90_K4HN47 0.4 64.5812562313 0.0111812692374 97.3080757727 19.4091781622
UniRef90_T1R527 0.4 64.5812562313 0.011116165726 97.258225324 19.4052103661
UniRef90_E4GVC2 0.4 64.5812562313 0.0107279490133 97.2083748754 19.4012401249
UniRef90_T1R4G3 0.4 64.5812562313 0.00964232289559 96.8594217348 19.3733797835
UniRef90_K4HMW4 0.4 64.5812562313 0.0091430930091 96.6600199402 19.3574054464
UniRef90_K4HND8 0.4 64.5812562313 0.00899340491187 96.4606181456 19.3413915505
UniRef90_A4K470 0.4 64.5812562313 0.0083952273161 96.2113659023 19.3213183268
UniRef90_A4K482 0.4 64.5812562313 0.00833622449037 96.1615154536 19.3172962118
UniRef90_F4MIF6 0.4 64.5812562313 0.00702192029107 95.1645064806 19.2363267508
UniRef90_F4MIL7 0.4 64.5812562313 0.00697228373128 95.0648055833 19.2281741816
Cluster 116 0.4 64.5812562313 0.0065989482817 94.666001994 19.1954618224
UniRef90_K4HP14 0.4 64.5812562313 0.00624827679414 94.2173479561 19.1584640209
UniRef90_F4MIH4 0.4 64.5812562313 0.00571372560111 93.4695912263 19.0963342428
UniRef90_T1R508 0.4 64.5812562313 0.00519400191157 92.7218344965 19.0336137839
Cluster 1510 0.4 64.5812562313 0.00447300735404 91.0767696909 18.8935076276
UniRef90_Q45122 0.4 64.5812562313 0.00398876503541 89.6809571286 18.772286308
Cluster 2572 0.4 64.5812562313 0.00193082354688 87.8863409771 18.6131689939
UniRef90_F1WXK5 0.4 64.5812562313 0.00152630714186 87.7367896311 18.5997399717
UniRef90_Q4FPR4 0.4 64.5812562313 0.00120935236567 87.5872382851 18.5862845534 GO:0003735
UniRef90_D5V846 0.4 64.5812562313 0.00119936831464 87.5373878365 18.5817935347 GO:0003735
UniRef90_D5V8C4 0.4 64.5812562313 0.00110299850764 87.4376869392 18.572802661 GO:0003735
UniRef90_D5VA18 0.4 64.5812562313 0.00107770647666 87.3878364905 18.5683028003
...
To plot a summary of the gene characterization in the sample community.:
$ cd ppanini_demo_output/
$ ppanini_barplot -i1 temp/ppanini_abundance_table.txt -i2 ppanini_table.txt
Which yields:
- HUMAnN 2.0
- HUMAnN 3.0
- MetaPhlAn 2.0
- MetaPhlAn 3.0
- MetaPhlAn 4.0
- MetaPhlAn 4.1
- PhyloPhlAn 3
- PICRUSt 2.0
- ShortBRED
- PPANINI
- StrainPhlAn 3.0
- StrainPhlAn 4.0
- MelonnPan
- WAAFLE
- MetaWIBELE
- MACARRoN
- FUGAsseM
- HAllA
- HAllA Legacy
- ARepA
- CCREPE
- LEfSe
- MaAsLin 2.0
- MaAsLin 3.0
- MMUPHin
- microPITA
- SparseDOSSA
- SparseDOSSA2
- BAnOCC
- anpan
- MTXmodel
- MTX model 3
- PARATHAA