Skip to content

PhyloPhlAn 3: Example 05: Proteobacteria

Katarina Mladenovic edited this page Apr 11, 2024 · 1 revision

Phylogenetic characterization of an unknown SGB assigned to the Proteobacteria phylum

Go back to the main PhyloPhlAn 3 Tutorial - Main Page


Before starting, make sure to have PhyloPhlAn 3 installed and to have already completed the first two steps of the tutorial 3. Metagenomic analysis of the Ethiopian cohort.

  • Make sure PhyloPhlAn 3 scripts are executable and available in your command line
  • The commands in this tutorial assume that you are inside the tutorial folder examples/05_chlamydiae
  • All the steps below are reported in the run_05.sh script

By following these steps, the user will be able to phylogenetically characterize the MAGs of uSGB 19436, including the MAGs from the example 3. Metagenomic analysis of the Ethiopian cohort. The SGB we consider in this example (uSGB 19436) has been assigned to the Proteobacteria phylum according to the Jan19 SGB release.

Step 1. Retrieve the uSGB 19436 genome bins used in the PhyloPhlAn 3 Metagenomic application example

Create a folder named input_bins and retrieve MAGs assigned to uSGB 19436 from the example 3. Metagenomic analysis of the Ethiopian cohort.

mkdir -p input_bins
for i in $(grep uSGB_19436 ../03_metagenomic/output_metagenomic.tsv | cut -f1); do
    cp -a ../03_metagenomic/input_metagenomic/$i.fna input_bins/
done

Step 2. Download other genomes of uSGB 19436

To retrieve the MAGs that belongs to uSGB 19436 from the SGB resource, you can to follow these steps:

  1. Go to the SGB database curated in Pasolli E, et al. Cell 176.3 (2019)
  2. In the search bar, type "SGB id: 19436", quotes included
  3. Click on Get Data / Run Analysis and select the format sequence from the drop-down box

By following these steps you will download a zip file that contains four files with all the necessary information to download the genomes, among them:

  • One .txt file with a list of all the urls for the files to download
  • One .sh file which is a shell script that allow to download all files

Note: The downloaded genomes will have .fa as extension, change it to .fna in order to be consistent with the other files.

You'll find these two files in the example folder. To automatically download the genomes, you can execute the following command:

./download_files.sh sequence_url_opendata_19436.txt input_bins

Step 3. Download reference genomes for each species in the Epsilonproteobacteria class

Now we are going to download up to one reference genome for each species in the Epsilonproteobacteria class with:

phylophlan_get_reference \
    -g c__Epsilonproteobacteria\
    -n -1 \
    -o input_bins \
    --verbose 2>&1 | tee logs/phylophlan_get_reference.log

Step 4. Download reference genomes from close phyla

To expand the phylogenetic context of our phylogeny reconstruction, we are going to download up to one reference genome for the following phyla: Spirochaetes, Chlamydiae, Planctomycetes, Candidatus Omintrophica, Lentisphaerae, and Verrucomicrobia, which are phylogenetically close to the Epsilonproteobacteria class.

for i in Spirochaetes Chlamydiae Planctomycetes Candidatus_Omintrophica Lentisphaerae Verrucomicrobia; do
    phylophlan_get_reference \
        -g p__${i} \
        -n 1 \
        -o input_bins \
        --verbose 2>&1 | tee logs/phylophlan_get_reference_${i}.log
done;

Step 5. Generating the configuration file

The configuration file for this analysis can be easily generated with:

phylophlan_write_config_file  \
    -d a \
    -o examples/05_proteobacteria/proteobacteria_config.cfg \
    --db_aa diamond \
    --map_dna diamond \
    --map_aa diamond \
    --msa mafft \
    --trim trimal \
    --tree1 fasttree \
    --tree2 raxml \
    --verbose 2>&1 | tee examples/05_proteobacteria/logs/phylophlan_write_config_file.log

Step 6. Build the phylogeny

Build the phylogenetic tree with:

phylophlan \
    -i examples/05_proteobacteria/input_bins \
    -d phylophlan \
    --diversity medium \
    --accurate \
    -f examples/05_proteobacteria/proteobacteria_config.cfg \
    -o output_proteobacteria \
    --output_folder examples/05_proteobacteria/ \
    --nproc 4 \
    -t a \
    --verbose 2>&1 | tee examples/05_proteobacteria/logs/phylophlan_proteobacteria.log

The output tree is RaxML_bestTree.input_bins_refined.tre.

Note: given that the taxonomic label assigned to uSGB 19436 is at the phylum level, in this phylogenetic analysis we used the set of universal markers from PhyloPhlAn Segata, N et al. NatComm 4:2304 (2013), using the -d phylophlan param.

PhyloPhlAn 3: Example 05: Proteobacteria

Clone this wiki locally