---
kernelspec:
    name: python3
    display_name: python3
---

# Phylogenetic Analysis of ASFV Strains in the Philippines

:::{admonition} Objective
Construct a phylogenetic tree consisting of 20 ASFV strains sequenced in the Philippines.
:::

Sequencing efforts by @doi:10.1128/mra.00719-22 and Ko _et al_. (unpublished) resulted in the publication of many complete, high-quality assemblies of ASFV strains in the Philippines. To identify the viral genotypes present in the country, a genome-wide diversity analysis will be conducted using only completed genomes sequenced from strains found within the country. 

:::{mermaid}

:::

## Sequence Retrieval

All published ASFV complete genomes were filtered and retrieved from the NCBI nucleotide database on 03 December 2024 using `entrez-direct` (v22.4). The data retrieval step can be replicated by running [](#fetch-philippine-genomes). Returned results were verified by runnning the same query on the NCBI website and manually counting the number of complete genomes from the Philippines.

```{code-block} bash
:label: fetch-philippine-genomes
:caption: Fetching complete ASFV genomes from the Philippines.

esearch -db nucleotide -query "African swine fever virus AND Philippines" \
    | esummary \
    | xtract -pattern DocumentSummary -if Completeness -equals complete -element AccessionVersion \
    | efetch -db nucleotide -format fasta > assets/philippine_genomes.fasta
```

## Sequence Alignment

The ASFV genome sequences were aligned using MAFFT (v7.525)[@doi:10.1093/bioinformatics/bty121]. The program was run with the `--auto` flag ([](#mafft-alignment)). Genome sequences containing unknown bases and unconfirmed large-fragment deletion were identified.

```{code-block} bash
:label: mafft-alignment
:caption: Mulitple sequence alignment with MAFFT.

IN=assets/philippine_genomes.fasta
OUT=output/mafft/philippines_mafft_auto.aln
mafft --auto ${IN} > ${OUT}
```

## Maximum Likelihood Phylogenetic Analysis

Maximum likelihood (ML) phylogenetic trees were estimated by RAxML (v8.2.12) using the GTR-GAMMA nucleotide substitution model. ML boostrapping was performed with 1000 replicates in order to assess the robustness of tree topologies. The final tree was midpoint rooted by FigTree.