Skip to content

Home_3_0

Léonard Dubois edited this page Jan 19, 2022 · 9 revisions

Pangenome-based Phylogenomic Analysis

PanPhlAn is a strain-level metagenomic profiling tool for identifying the gene composition of individual strains in metagenomic samples. PanPhlAn’s ability for strain-tracking and functional analysis of unknown pathogens makes it an efficient tool for culture-free infectious outbreak epidemiology and microbial population studies.

For a detailed tutorial with a small example dataset, see Tutorial. For help, use the bioBakery help forum.

The PanPhlAn software team: Léonard Dubois, Matthias Scholz (algorithm design), Moreno Zolfo, Thomas Tolio (programmer), and Nicola Segata (principal investigator).

If you use PanPhlAn 3, please cite:

Integrating taxonomic, functional, and strain-level profiling of diverse microbial communities with bioBakery 3 Francesco Beghini, Lauren J McIver, Aitor Blanco-Miguez, Leonard Dubois, Francesco Asnicar, Sagun Maharjan, Ana Mailyan, Andrew Maltez Thomas, Paolo Manghi, Mireia Valles-Colomer, George Weingart, Yancong Zhang, Moreno Zolfo, Curtis Huttenhower, Eric A Franzosa, Nicola Segata. bioRxiv preprint (2020)


Download the PanPhlAn software

PanPhlAn can be manually installed by downloading the GitHub repository

git clone https://github.com/SegataLab/panphlan.git

or retrieved using pip

pip install panphlan

or BioConda

conda install -c bioconda panphlan

PanPhlAn runs under Ubuntu/Linux and requires the following software tools to be installed on your system :

  • Python 3 (version 3.8+ is recommended) including packages :
    • numpy
    • scipy
    • pandas
  • bowtie2 (version 2.4 is recommended)
  • samtools (version 1.11 is recommended)

The PanPhlAn steps

  1. Download pangenome
    Download the files needed for mapping

    panphlan_download_pangenome.py -i Eubacterium_rectale`
    
  2. PanPhlAn mapping

    Map each metagenomic sample against the species database. Example: screen for E. rectale pangenome genes in sample01 and sample02

    panphlan_map.py -p Eubacterium_rectale/Eubacterium_rectale_pangenome.tsv \
                    --indexes Eubacterium_rectale/Eubacterium_rectale \    
                    -i sample01.fastq \
                    -o map_results/sample01_erectale.csv
    
    panphlan_map.py -p Eubacterium_rectale/Eubacterium_rectale_pangenome.tsv \
                    --indexes Eubacterium_rectale/Eubacterium_rectale \    
                    -i sample02.fastq \
                    -o map_results/sample02_erectale.csv                   
    
  3. PanPhlAn profiling
    Merge and process the mapping results to generate the final gene-family presence/absence profile matrix

    panphlan_profiling.py -i map_results/ \
                          --o_matrix result_profile_erectale.tsv \ 
                          -p Eubacterium_rectale/Eubacterium_rectale_pangenome.tsv \
                          --add_ref