# Submodule 4: Analyze Phylogenetic Tree

## Learning Objectives
In Submodule 4, we will analyze phylogenetic trees constructed in Submodule 3. This includes:
1. Comparative metagenomics from the tree.
2. Ancestral tree reconstruction using advanced tools.

By the end of this submodule, learners will:
- Interpret and visually represent phylogenetic trees.
- Conduct comparative metagenomics analysis.
- Gain insights into ancestral state reconstruction using Bayesian inference methods.

----------------------------------------------------------------------------------------------------------------
# Training Plan 


Submodule #1: Understanding the Basics of Phylogenetic

Submodule #2: Collect and Prepare Sequence Data and Analysis

Submodule #3: Construct Phylogenetic Tree

 
<font color="green"> **Submodule #4: Analyze Phylogenetic Tree** </font>

## 4.1 Interpret and Visually Represent Phylogenetic Trees

### Overview
Visualizing phylogenetic trees helps interpret evolutionary relationships and present findings effectively. In this submodule, we use **IQ-TREE** as the primary tool for tree visualization and **iTOL (Interactive Tree of Life)** as an alternative for customization.

### Why IQ-TREE?
**IQ-TREE** is a comprehensive tool for:
1. **Phylogenetic Analysis:** It supports model testing, bootstrapping, and tree construction.
2. **Tree Visualization:** Integrated visualization capabilities allow users to explore trees effectively.

### Steps to Use IQ-TREE
#### 1. Running IQ-TREE to Visualize Trees:
The following command constructs and visualizes a phylogenetic tree:

In [None]:
import subprocess

subprocess.run(["iqtree", "-s", "aligned_subset_augur.fasta", "-m", "TEST", "-bb", "1000", "-alrt", "1000", "-nt", "AUTO"])

#### Options Explained:
- -s: Specifies the input file.
- -m TEST: Automatically selects the best substitution model.
- -bb 1000: Runs bootstrap analysis for statistical support.
- -alrt 1000: Performs approximate likelihood ratio testing.
- -nt AUTO: Automatically determines the number of threads.

#### 2. Visualize the Tree in Python:
The output tree `(seq_output_subset.nwk)` can be visualized using `Bio.Phylo` and Matplotlib:

In [None]:
from Bio import Phylo
import matplotlib.pyplot as plt

tree = Phylo.read("data/cov/seq_output_subset.nwk", "newick")

Phylo.draw(tree)
plt.show()

#### 3. Using iTOL (Interactive Tree of Life):
1. Visit the iTOL website.
2. Upload your `seq_output_subset.nwk` file.
3. Customize the tree by adding labels, colors, and annotations.

<div style="padding: 10px; border: 1px solid #b3e5fc; border-radius: 5px; background-color: #e1f5fe;">
    <strong>Tip:</strong>💡 Use iTOL for publication-quality visualizations and advanced customization options.
</div>

### 4.2 Importance of Visual Representation
Visual representation is critical for:

1. **Interpreting Results:** Simplifies understanding of evolutionary relationships.
2. **Communication:** Makes findings accessible to a broader audience.
3. **Highlighting Features:** Emphasizes key evolutionary events and patterns.

<div style="padding: 10px; border: 1px solid #b3e5fc; border-radius: 5px; background-color: #e1f5fe;">
    <strong>Note:</strong> 📝  A well-designed visualization can make complex evolutionary relationships easier to comprehend.
</div>

## 4.3 Conduct Comparative Metagenomics Along Different Branches

### Overview
Comparative metagenomics involves analyzing genetic content along different branches of a phylogenetic tree. This helps identify variations and patterns in the genetic data across species or strains.

### Steps for Comparative Metagenomics
#### 1. BLAST:
BLAST is a critical tool for comparing sequences.

#### 2. Create a BLAST Database:
Navigate to your data directory and create a database:

In [None]:
!makeblastdb -in sequences_subset.fasta -dbtype nucl -out seq_database

#### 3. Run BLAST for Comparisons:
Create a `query_seq.fasta` file with sequences to compare.
Run BLAST to compare query sequences against the database:

In [None]:
!blastn -query query_seq.fasta -db seq_database -out seq_results.txt -outfmt 6

#### 4. View Results:
Results are saved in `seq_results.txt.`

<div style="padding: 10px; border: 1px solid #ffccbc; border-radius: 5px; background-color: #ffebee;">
    <strong>Alert:</strong>⚠️ Ensure your sequences are properly formatted and validated before running BLAST to avoid errors.
</div>

### 4.4 Automate Comparative Metagenomics Analysis using Biopython
Automation can streamline comparative metagenomics analysis, making it more efficient.

**Script for Automation:**

In [None]:
from Bio.Blast import NCBIWWW, NCBIXML

# Function to run BLAST and parse results
def run_blast(query_file, db_file, output_file):
    result_handle = NCBIWWW.qblast("blastn", db_file, query_file)
    with open(output_file, "w") as out_handle:
        out_handle.write(result_handle.read())
    result_handle.close()

# Run the BLAST
run_blast("query_seq.fasta", "seq_database", "blast_results.xml")

# Parse the BLAST results
with open("blast_results.xml") as result_handle:
    blast_records = NCBIXML.parse(result_handle)
    for blast_record in blast_records:
        for alignment in blast_record.alignments:
            for hsp in alignment.hsps:
                print(f"****Alignment****")
                print(f"sequence: {alignment.title}")
                print(f"length: {alignment.length}")
                print(f"e value: {hsp.expect}")
                print(f"{hsp.query[0:75]}...")
                print(f"{hsp.match[0:75]}...")
                print(f"{hsp.sbjct[0:75]}...")


<div style="padding: 10px; border: 1px solid #b3e5fc; border-radius: 5px; background-color: #e1f5fe;">
    <strong>Tip:</strong>💡Automating BLAST processes is especially useful when working with large datasets.
</div>

## 4.5 Discuss Insights from Ancestral State Reconstruction

### Overview
Ancestral state reconstruction provides insights into:
1. **Evolutionary Dynamics:** Tracks how traits or genes evolved over time.
2. **Diversity:** Highlights diversity within and between metagenomic samples.
3. **Evolutionary Pressures:** Identifies selective pressures shaping genetic traits.

<div style="padding: 10px; border: 1px solid #b3e5fc; border-radius: 5px; background-color: #e1f5fe;">
    <strong>Note:</strong> 📝  Use tools like BEAST to infer ancestral states and reconstruct evolutionary pathways.
</div>

## 4.6 Utilize Bayesian Inference Methods with BEAST for Ancestral State Reconstruction

### Overview
Bayesian inference methods allow for probabilistic reconstruction of ancestral states. BEAST is a powerful tool for this purpose.

#### 2. Run BEAUti for Configuration:

1. Find BEAUti’s path:

In [None]:
!find $CONDA_PREFIX -name "beauti"

2. Open BEAUti:
- /path/to/bin/beauti
3. Import your aligned_sequences.fasta file in BEAUti.
4. Configure parameters and save as seq_config.xml.

#### 3. Run BEAST: Run the analysis with the saved configuration file:

In [None]:
!beast seq_config.xml

<div style="padding: 10px; border: 1px solid #ffccbc; border-radius: 5px; background-color: #ffebee;">
    <strong>Alert:</strong>⚠️ BEAST is computationally intensive, so ensure sufficient resources are allocated.
</div>

## Interactive Quiz

Test your understanding of phylogenetic analysis and ancestral state reconstruction:

In [None]:
from jupyterquiz import display_quiz
display_quiz('Quiz/QS4.json')

## Installations

#### Install IQ-TREE

In [None]:
!conda install -c bioconda iqtree -y

In [None]:
!find $CONDA_PREFIX -name "iqtree"

#### Install BLAST

In [None]:
!conda install -c bioconda blast -y

#### Install BEAST

In [None]:
!conda install -c bioconda beast -y