# Submodule 4: Analyze Phylogenetic Tree

# Learning Objectives:
In submodule 4 we will be anlyzing phylogenetic trees constructed in previous submodule (submodule 3) that includes following steps:
- Comparative metagenomics from the tree
- Ancestral tree reconstruction

----------------------------------------------------------------------------------------------------------------
# Training Plan 


Submodule #1: Understanding the Basics of Phylogenetic

Submodule #2: Collect and Prepare Sequence Data and Analysis

Submodule #3: Construct Phylogenetic Tree

 
<font color="green"> **Submodule #4: Analyze Phylogenetic Tree** </font>

### 4.1 Interpret and Visually Represent Phylogenetic Trees:

#### Overview:
Visualizing phylogenetic trees helps interpret and present evolutionary relationships. This step uses IQ-TREE as the primary tool, with iTOL as an alternative option for tree visualization.

#### Visualization Tools:
* **IQ-TREE:** A comprehensive tool for both phylogenetic analysis and tree visualization.

    + Installation:

In [None]:
!conda install -c bioconda iqtree -y

In [None]:
!find $CONDA_PREFIX -name "iqtree"

+ **Run IQ-TREE:**

In [None]:
import subprocess

subprocess.run(["iqtree", "-s", "aligned_subset_augur.fasta", "-m", "TEST", "-bb", "1000", "-alrt", "1000", "-nt", "AUTO"])

+ **Visualize with IQ-TREE:**

In [None]:
from Bio import Phylo
import matplotlib.pyplot as plt

tree = Phylo.read("data/cov/seq_output_subset.nwk", "newick")

Phylo.draw(tree)
plt.show()

* **iTOL (Interactive Tree of Life):** An online tool for additional customization options.

    + **Usage:**
       1. Visit iTOL:
           - https://itol.embl.de/upload.cgi
       2. Upload your seq_output.nwk file.
       3. Customize and view the tree as desired.

### 4.2 Importance of Visual Representation
**Visual representation of phylogenetic trees aids in:**
- **Interpreting Results:** Makes it easier to understand evolutionary relationships.
- **Communication:** Helps in conveying findings to a broader audience, including those who may not be specialists in phylogenetics.
- **Highlighting Key Features:** Emphasizes important evolutionary events and patterns.

### 4.3 Conduct Comparative Metagenomics along Different Branches
Comparative metagenomics involves comparing the genetic content of different samples to uncover variations.

#### Steps for Comparative Metagenomics:

1. Installing BLAST:
    - Install BLAST using conda:

In [None]:
!conda install -c bioconda blast -y

In [None]:
!conda update -n base -c conda-forge conda

2. Creating a BLAST Database:
- Create a BLAST database from your sequence file:

In [None]:
%pwd

* Navigate to your data directory

In [None]:
%cd data/cov/

In [None]:
%pwd

* Create the database:

In [None]:
!makeblastdb -in sequences_subset.fasta -dbtype nucl -out seq_database

3. **Run BLAST:**
    - Create a new file query_seq.fasta with the sequences you want to compare.
    - Run BLAST to compare your query sequences against the database:
    - The results will be saved in seq_results.txt.

In [None]:
!blastn -query query_seq.fasta -db seq_database -out seq_results.txt -outfmt 6

### 4.4 Automate Comparative Metagenomics Analysis using Biopython
Automation can streamline comparative metagenomics analysis, making it more efficient.

**Script for Automation:**

In [None]:
!pip install biopython

In [None]:
from Bio.Blast import NCBIWWW, NCBIXML

# Function to run BLAST and parse results
def run_blast(query_file, db_file, output_file):
    result_handle = NCBIWWW.qblast("blastn", db_file, query_file)
    with open(output_file, "w") as out_handle:
        out_handle.write(result_handle.read())
    result_handle.close()

# Run the BLAST
run_blast("query_seq.fasta", "seq_database", "blast_results.xml")

# Parse the BLAST results
with open("blast_results.xml") as result_handle:
    blast_records = NCBIXML.parse(result_handle)
    for blast_record in blast_records:
        for alignment in blast_record.alignments:
            for hsp in alignment.hsps:
                print(f"****Alignment****")
                print(f"sequence: {alignment.title}")
                print(f"length: {alignment.length}")
                print(f"e value: {hsp.expect}")
                print(f"{hsp.query[0:75]}...")
                print(f"{hsp.match[0:75]}...")
                print(f"{hsp.sbjct[0:75]}...")


### 4.5 Discuss Insights from Ancestral State Reconstruction
Ancestral state reconstruction provides insights into:
- Evolutionary Dynamics: Understanding how certain traits or genetic sequences have evolved over time.
- Diversity: Gaining a deeper understanding of the diversity within and between metagenomic samples.
- Evolutionary Pressures: Identifying the evolutionary pressures that have shaped the genetic makeup of organisms.


### 4.6 Utilize Bayesian Inference Methods with BEAST for Ancestral State Reconstruction
Bayesian inference methods are powerful for reconstructing ancestral states and understanding evolutionary dynamics.

**Using BEAST for Ancestral State Reconstruction:**

1.	Installing BEAST:
- Install BEAST using conda:

In [None]:
!conda install -c bioconda beast -y

- You can check the installation and available options with:

In [None]:
!beast -beagle_info

2.	Launching BEAUti:
- Find the path to the BEAUti software:

In [None]:
!find $CONDA_PREFIX -name "beauti"

- Open BEAUti by running the found path in your command line.
    - Example:
        - /path/to/bin/beauti

3.	Using BEAUti:
- In BEAUti, go to File > Import and load your aligned_sequences.fasta.
- Set up the parameters for your analysis and generate the BEAST XML file.
- Save the configuration as seq_config.xml.
4.	Running BEAST:
- Run BEAST with the configuration file:


In [None]:
!beast seq_config.xml

In [1]:
from jupyterquiz import display_quiz
display_quiz('Quiz/QS4.json')

<IPython.core.display.Javascript object>