Skip to content

PhyloPhlAn 3: Example 03: Metagenomic application

Katarina Mladenovic edited this page Apr 11, 2024 · 1 revision

Metagenomic analysis of the Ethiopian cohort

Go back to the main PhyloPhlAn 3 Tutorial - Main Page


Before starting, make sure to have PhyloPhlAn 3 installed.

  • Make sure PhyloPhlAn 3 scripts are executable and available in your command line
  • The commands in this tutorial assume that you are inside the tutorial folder examples/03_metagenomic
  • All the steps below are reported in the run_03.sh script

By following these steps, the user will be able to build two heatmaps of the 21 species-level genome bins (SGBs) most prevalent in 50 Ethiopian metagenomes.

Step 1. Download the Ethiopian genome bins

The Ethiopian metagenomes from which the genomes bins for this example were assembled are deposited in NCBI with Bioproject: PRJNA504891.

You can download the Ethiopian genome bins from here or retrieve them from the console as follows:

wget https://www.dropbox.com/s/fuafzwj67tguj31/ethiopian_mags.tar.bz2?dl=1 -O ethiopian_mags.tar.bz2
mkdir -p input_metagenomic
tar -xjf ethiopian_mags.tar.bz2 -C input_metagenomic/

Step 2. Assign a taxonomic label to each bin

With the following command, we will use the SGB release of January 2019 (MetaRefSGB, Pasolli E, et al. Cell 176.3 (2019)) to assign to each genome bin its closest SGB.

phylophlan_assign_sgbs \
    -i input_metagenomic \
    -o output_metagenomic \
    --nproc 4 \
    -n 1 \
    -d ethiopia_tutorial \
    --database_folder ethiopia_tutorial_db \
    --verbose 2>&1 | tee logs/phylophlan_metagenomic.log

In this case, for each genome bin, we are interested in only the closest SGB (-n 1), which is reported in the output. If the genome bin has a Mash distance <2% from the reported SGB, we can consider that bin as part of it and assign the SGB's taxonomic label.

Step 3. Heatmaps of the top 21 SGBs found in the Ethiopian metagenomes

This step allows you to visualize the top 21 SGBs found in the Ethiopian metagenomes.

To be able to do this, you need to provide a mapping file that maps each genome bin to the metagenome it was assembled from. The mapping file should be a tab-separated text file where the genome bins are listed in the first column and the corresponding metagenome in the second column.

For this example, we are providing the mapping file bin2meta.tsv present inside the example folder.

phylophlan_draw_metagenomic \
    -i output_metagenomic.tsv \
    -o output_heatmap \
    --map bin2meta.tsv \
    --top 20 \
    --verbose 2>&1 | tee logs/phylophlan_draw_metagenomic.log

This will produce two heatmaps:

  1. The first heatmap shows, for each metagenome, the presence/absence profile of the top 21 SGBs found in the Ethiopian cohort
  2. The second heatmap shows how many uSGBs, kSGBs, and unassigned bins are present in each metagenome

PhyloPhlAn 3: Example 03: Metagenomic application: presence / absence heatmap

PhyloPhlAn 3: Example 03: Metagenomic application: counter uSGBs, kSGBs, unassinged heatmap

What to do after the SGBs profiling?

The SGBs profiles of the Ethiopian cohort can be further analyzed focusing on some specific known and/or unknown SGBs.

For instance, if we focus on the common gut commensal Escherichia coli, we can put into phylogenetic context the 8 Ethiopian bins falling into kSGB 10068, as shown in 4. High-resolution phylogeny of genomes and MAGs of a known species (E. coli).

Moreover, if we focus on the most prevalent unknown SGB in the Ethiopian cohort (uSGB 19436), we can further phylogenetically characterize the 13 Ethiopian bins in the context of the reference genomes of the Proteobacteria phylum and the MAGs from Pasolli, E et al. Cell (2019) belonging to the same uSGB 19436, as shown in 5. Phylogenetically characterization of an unknown SGB from the Proteobacteria phylum.

Clone this wiki locally