-
Notifications
You must be signed in to change notification settings - Fork 72
PhyloPhlAn 3: Example 05: Proteobacteria
Go back to the main PhyloPhlAn 3 Tutorial - Main Page
Before starting, make sure to have PhyloPhlAn 3 installed and to have already completed the first two steps of the tutorial 3. Metagenomic analysis of the Ethiopian cohort.
- Make sure PhyloPhlAn 3 scripts are executable and available in your command line
- The commands in this tutorial assume that you are inside the tutorial folder
examples/05_chlamydiae
- All the steps below are reported in the
run_05.sh
script
By following these steps, the user will be able to phylogenetically characterize the MAGs of uSGB 19436, including the MAGs from the example 3. Metagenomic analysis of the Ethiopian cohort. The SGB we consider in this example (uSGB 19436) has been assigned to the Proteobacteria phylum according to the Jan19 SGB release.
Step 1. Retrieve the uSGB 19436 genome bins used in the PhyloPhlAn 3 Metagenomic application example
Create a folder named input_bins
and retrieve MAGs assigned to uSGB 19436 from the example 3. Metagenomic analysis of the Ethiopian cohort.
mkdir -p input_bins
for i in $(grep uSGB_19436 ../03_metagenomic/output_metagenomic.tsv | cut -f1); do
cp -a ../03_metagenomic/input_metagenomic/$i.fna input_bins/
done
To retrieve the MAGs that belongs to uSGB 19436 from the SGB resource, you can to follow these steps:
- Go to the SGB database curated in Pasolli E, et al. Cell 176.3 (2019)
- In the search bar, type
"SGB id: 19436"
, quotes included - Click on
Get Data / Run Analysis
and select the formatsequence
from the drop-down box
By following these steps you will download a zip file that contains four files with all the necessary information to download the genomes, among them:
- One
.txt
file with a list of all the urls for the files to download - One
.sh
file which is a shell script that allow to download all files
Note: The downloaded genomes will have .fa
as extension, change it to .fna
in order to be consistent with the other files.
You'll find these two files in the example folder. To automatically download the genomes, you can execute the following command:
./download_files.sh sequence_url_opendata_19436.txt input_bins
Now we are going to download up to one reference genome for each species in the Epsilonproteobacteria class with:
phylophlan_get_reference \
-g c__Epsilonproteobacteria\
-n -1 \
-o input_bins \
--verbose 2>&1 | tee logs/phylophlan_get_reference.log
To expand the phylogenetic context of our phylogeny reconstruction, we are going to download up to one reference genome for the following phyla: Spirochaetes, Chlamydiae, Planctomycetes, Candidatus Omintrophica, Lentisphaerae, and Verrucomicrobia, which are phylogenetically close to the Epsilonproteobacteria class.
for i in Spirochaetes Chlamydiae Planctomycetes Candidatus_Omintrophica Lentisphaerae Verrucomicrobia; do
phylophlan_get_reference \
-g p__${i} \
-n 1 \
-o input_bins \
--verbose 2>&1 | tee logs/phylophlan_get_reference_${i}.log
done;
The configuration file for this analysis can be easily generated with:
phylophlan_write_config_file \
-d a \
-o examples/05_proteobacteria/proteobacteria_config.cfg \
--db_aa diamond \
--map_dna diamond \
--map_aa diamond \
--msa mafft \
--trim trimal \
--tree1 fasttree \
--tree2 raxml \
--verbose 2>&1 | tee examples/05_proteobacteria/logs/phylophlan_write_config_file.log
Build the phylogenetic tree with:
phylophlan \
-i examples/05_proteobacteria/input_bins \
-d phylophlan \
--diversity medium \
--accurate \
-f examples/05_proteobacteria/proteobacteria_config.cfg \
-o output_proteobacteria \
--output_folder examples/05_proteobacteria/ \
--nproc 4 \
-t a \
--verbose 2>&1 | tee examples/05_proteobacteria/logs/phylophlan_proteobacteria.log
The output tree is RaxML_bestTree.input_bins_refined.tre
.
Note: given that the taxonomic label assigned to uSGB 19436 is at the phylum level, in this phylogenetic analysis we used the set of universal markers from PhyloPhlAn Segata, N et al. NatComm 4:2304 (2013), using the -d phylophlan
param.
- HUMAnN 2.0
- HUMAnN 3.0
- MetaPhlAn 2.0
- MetaPhlAn 3.0
- MetaPhlAn 4.0
- MetaPhlAn 4.1
- PhyloPhlAn 3
- PICRUSt 2.0
- ShortBRED
- PPANINI
- StrainPhlAn 3.0
- StrainPhlAn 4.0
- MelonnPan
- WAAFLE
- MetaWIBELE
- MACARRoN
- FUGAsseM
- HAllA
- HAllA Legacy
- ARepA
- CCREPE
- LEfSe
- MaAsLin 2.0
- MMUPHin
- microPITA
- SparseDOSSA
- SparseDOSSA2
- BAnOCC
- anpan
- MTXmodel
- PARATHAA