-
Notifications
You must be signed in to change notification settings - Fork 72
Tutorial version: Metagenomic analysis of the Ethiopian cohort
Note: This page is running a reduced-size version of the MAGs and the corresponding database to run within a normal tutorial window. To see the full version of the commands please see the main PhyloPhlAn tutorial specifically - 3. Metagenomic analysis of the Ethiopian cohort; 4. High-resolution phylogeny of genomes and MAGs of a known species (E. coli); 5. Phylogenetic characterization of an unknown SGB from the Proteobacteria phylum
This tutorial will show you how to phylogenetically characterize newly assemble genomes from metagenomes in the context of Species-level Genome Bins (SGBs).
To do this we use 50 metagenomes of the Ethiopian cohort: From the 50 Ethiopian metagenomes, 369 MAGs were reconstructed (with at least >50% completeness and <5% contamination, based on checkM)
Disclaimer: Here the Ethiopian MAGs were reduced to 181 because of the limited time for the tutorial and reduced overhead needed for many tutorial VMs.
Note: Before starting, make sure to have PhyloPhlAn 3 installed.
Ingredients you will need to run PhyloPhlAn metagenomic include:
- A directory with contigs (genome bins / MAGs) from your metagenomic study (for a tutorial on the basics of assembly see here)
- A database of SGBs to pull annotations from (lastest release from the Segata lab = SGB.Jan19)
Ingredients you will need to run PhyloPhlAn include:
- Reference genomes
- Genome bins / MAGs assigned to each phylogeny of interest
- Database with annotated marker genes (see Database setup)
- Configuration file (How to make a configuration file)
Pull the script to do this from Dropbox:
wget https://github.com/biobakery/biobakery/releases/download/1.8/setup.sh
View the setup.sh file
less -S setup.sh
# database and data download
wget https://www.dropbox.com/s/z2v7nmosua9ty19/tutorial_ethiopia__mag2meta.tsv
wget https://www.dropbox.com/s/ktwviuvwmrf0u2l/tutorial_ethiopia__mags.tar.bz2
tar -xjf tutorial_ethiopia__mags.tar.bz2
mkdir -p phylophlan_databases/
cd phylophlan_databases/
wget https://www.dropbox.com/s/tik9yubeerq4t37/tutorial_ethiopia.md5
wget https://www.dropbox.com/s/9oey75prd2v7lfs/tutorial_ethiopia.txt.bz2
mkdir -p s__Escherichia_coli phylophlan_chlamydiae
cd s__Escherichia_coli/
wget https://www.dropbox.com/s/8quyu04fucl3dwj/s__Escherichia_coli.faa
cd ../phylophlan_chlamydiae/
wget https://www.dropbox.com/s/b1ykd7gh98n8fry/phylophlan_chlamydiae.faa
cd ..
cd ..
mkdir phylophlan_configs/
cd phylophlan_configs
wget https://github.com/biobakery/biobakery/releases/download/1.8/d_aa__i_nt.cfg
cd ..
mkdir ecoli chlamydiae
cd ecoli/
wget https://github.com/biobakery/biobakery/releases/download/1.8/ecoli_refgen.tar
tar -xf ecoli_refgen.tar
cd ../chlamydiae/
wget https://github.com/biobakery/biobakery/releases/download/1.8/chlamydiae_refgen.tar
tar -xf chlamydiae_refgen.tar
What does this code do?
Let's run the setup.sh
to set up our environment to run PhyloPhlAn on metagenomic samples.
sh setup.sh
With the following command, we will use the SGB release of January 2020 to assign to each genome bin its closest SGB.
Reminder this is a reduced sized database - if you are trying to run against the full database please use the lastest full-size edition located here.
phylophlan_assign_sgbs \
-i tutorial_ethiopia/ethiopian_mags \
-o tutorial_ethiopia/ethiopian_mags \
--nproc 4 \
-n 1 \
-d ethiopia_tutorial \
--database_folder ethiopia_tutorial_db \
--verbose 2>&1 | tee logs/phylophlan_metagenomic.log
In this case, for each genome bin, we are interested in only the closest SGB (-n 1
), which is reported in the output. If the genome bin has a Mash distance <2% from the reported SGB, we can consider that bin as part of it and transfer the SGB's taxonomic label.
less -S tutorial_ethiopia/ethiopian_mags/ethiopian_mags.tsv
This step allows you to visualize the top 21 SGBs found in the Ethiopian metagenomes.
To be able to do this, you need to provide a mapping file that maps each genome bin to the metagenome it was assembled from. The mapping file should be a tab-separated text file where the genome bins / MAGs are listed in the first column and the corresponding metagenome in the second column.
For this example, we are providing the mapping file tutorial_ethiopia__mag2meta.tsv
present inside the example folder. To further visualize this file run column -t -s "," tutorial_ethiopia/tutorial_ethiopia__mag2meta.tsv | less -S
then q
to escape.
phylophlan_draw_metagenomic \
-i tutorial_ethiopia/ethiopian_mags/ethiopian_mags.tsv \
--map tutorial_ethiopia/tutorial_ethiopia__mag2meta.tsv \
-f png \
--verbose 2>&1 | tee phylophlan_draw_metagenomic.log
This will produce two heatmaps:
- The first heatmap shows, for each metagenome, the presence/absence profile of the top 21 SGBs found in the Ethiopian cohort
- The second heatmap shows how many uSGBs, kSGBs, and unassigned bins / MAGs are present in each metagenome
The SGBs profiles of the Ethiopian cohort can be further analyzed focusing on some specific known and/or unknown SGBs.
For instance, if we focus on the common gut commensal Escherichia coli, we can put into phylogenetic context the 8 Ethiopian MAGs falling into kSGB 10068, as shown in 4. High-resolution phylogeny of genomes and MAGs of a known species (E. coli) or a reduced version below.
Moreover, if we focus on the most prevalent unknown SGB in the Ethiopian cohort (uSGB 19436), we can further phylogenetically characterize the 13 Ethiopian MAGs in the context of the reference genomes of the Proteobacteria phylum and the MAGs from Pasolli, E et al. Cell (2019) belonging to the same uSGB 19436, as shown in 5. Phylogenetically characterization of an unknown SGB from the Proteobacteria phylum or a reduced version below
The configuration file for the following analyses can be easily generated with:
phylophlan_write_config_file \
-d a \
-o tutorial_ethiopia/phylophlan_configs/reference_config.cfg \
--db_aa diamond \
--map_dna diamond \
--map_aa diamond \
--msa mafft \
--trim trimal \
--tree1 fasttree \
--tree2 raxml \
--verbose 2>&1 | tee phylophlan_write_config_file.log
Study the 8 Ethiopian MAGs assigned to the common gut commensal Escherichia coli (kSGB 10068) with 42 E. coli reference genomes
Retrieve the genomes that phylophlan_metagenomic has assigned to the E. coli SGB (ID 10068)
and move them into the input folder: tutorial_ethiopia/ecoli/
. To move just those genomes over to the E.coli directory run the following command:
for b in $(grep kSGB_10068 tutorial_ethiopia/ethiopian_mags/ethiopian_mags.tsv | cut -f1); do cp tutorial_ethiopia/ethiopian_mags/${b}.fna tutorial_ethiopia/ecoli/; done
What does this directory contain?
ls -lthr tutorial_ethiopia/ecoli/
Run PhyloPhlAn to build a phylogenetic tree and check where the new known Ethiopian E.coli SGBs fall within the phylogeny.
phylophlan \
-i tutorial_ethiopia/ecoli/ \
-d tutorial_ethiopia/phylophlan_databases/s__Escherichia_coli \
--diversity low \
--fast \
--force_nucleotides \
-f tutorial_ethiopia/phylophlan_configs/reference_config.cfg \
-t a \
--subsample tenpercent \
--trim greedy \
--nproc 2 \
--verbose 2>&1 | tee phylophlan2_ecoli.log
You can try visualizing the tree using ggtree script (the same way that we did in StrainPhlAn).
cd ecoli_s__Escherichia_coli
./phylophlan_ggtree.R RAxML_bestTree.ecoli.tre ecoli_concatenated.aln ecoli_tree1.png ecoli_tree2.png
Study the 10 Ethiopian MAGs assigned to the most prevalent uSGB 19436 to phylogenetically characterize them in the context of all species in the Proteobacteria phylum
for b in $(grep uSGB_19436 tutorial_ethiopia/ethiopian_mags/ethiopian_mags.tsv | cut -f1); do cp tutorial_ethiopia/ethiopian_mags/${b}.fna tutorial_ethiopia/chlamydiae/; done
phylophlan \
-i tutorial_ethiopia/chlamydiae/ \
-d tutorial_ethiopia/phylophlan_databases/phylophlan_chlamydiae \
--diversity high \
--fast \
--force_nucleotides \
-f tutorial_ethiopia/phylophlan_configs/reference_config.cfg \
-t a \
--subsample tenpercent \
--trim greedy \
--nproc 2 \
--verbose 2>&1 | tee phylophlan_chlamydiae.log
You can try visualizing the tree using ggtree script (the same way that we did in StrainPhlAn).
cd chlamydiae_phylophlan_chlamydiae
./phylophlan_ggtree.R RAxML_bestTree.chlamydiae.tre chlamydiae_concatenated.aln chlamydiae_tree1.png chlamydiae_tree2.png
- HUMAnN 2.0
- HUMAnN 3.0
- MetaPhlAn 2.0
- MetaPhlAn 3.0
- MetaPhlAn 4.0
- MetaPhlAn 4.1
- PhyloPhlAn 3
- PICRUSt 2.0
- ShortBRED
- PPANINI
- StrainPhlAn 3.0
- StrainPhlAn 4.0
- MelonnPan
- WAAFLE
- MetaWIBELE
- MACARRoN
- FUGAsseM
- HAllA
- HAllA Legacy
- ARepA
- CCREPE
- LEfSe
- MaAsLin 2.0
- MMUPHin
- microPITA
- SparseDOSSA
- SparseDOSSA2
- BAnOCC
- anpan
- MTXmodel
- PARATHAA