# Submodule #1: Understanding the Basics of Phylogenetic Trees

## Phylogenetic Analysis of Sequence Data

Phylogenetics explores the evolutionary connections among organisms by analyzing genetic or phenotypic data. This module provides an in-depth tutorial on phylogenetic tree construction and visualization to understand evolutionary relationships across species.

### **Primary Objective:**
Develop an educational and practical workflow to analyze and visualize phylogenetic trees, leveraging real-world biological data. This module will enable students to:
1. Understand the importance of phylogenetics in studying evolutionary biology.
2. Learn methods to construct rooted, unrooted, and other phylogenetic tree types.
3. Visualize phylogenetic relationships using Python.

### **Overview:**
- **What You'll Learn:**
    - Basics of phylogenetic trees and their significance.
    - Steps to create and interpret different tree types (Rooted, Unrooted, Cladograms, Phylograms, Dendrograms).
    - Hands-on examples using Python for visualizations.
- **Tools and Libraries:** 
    - `Biopython` for phylogenetic analysis.
    - `Matplotlib` for visualization.
    - Real-world biological data in Newick format.
- **Why It Matters:**
    - Phylogenetic trees provide insights into evolutionary history, biodiversity, and genetic variation.
    - Understanding these trees is critical for applications in genomics, disease research, and conservation biology.


----------------------------------------------------------------------------------------------------------------
# Training Plan 


<font color="green"> **Submodule #1: Understanding the Basics of Phylogenetic** </font>

 
Submodule #2: Collect and Prepare Sequence Data and Analysis


Submodule #3: Construct Phylogenetic Tree

 
Submodule #4: Analyze Phylogenetic Tree

----------------------------------------------------------------------------------------------------------------

# Learning Objectives:

Phylogenetics explores the evolutionary connections and ancestral relationships among organisms, providing insights into their genetic and evolutionary history. This submodule introduces the foundational concepts of phylogenetic trees, helping learners understand their structure, purpose, and applications in biological research. By the end of this module, learners will be able to define and interpret phylogenetic trees, recognize their importance in mapping genetic changes and understanding biodiversity, and apply practical skills to construct and analyze phylogenetic trees using Python for real-world biological data.

----------------------------------------------------------------------------------------------------------------

## What is a Phylogenetic Tree? 
Imagine tracing your family tree to uncover your ancestry and relationships with relatives. A phylogenetic tree works similarly, but instead of human relatives, it maps the evolutionary history of organisms. It illustrates how species are connected, showing shared ancestry and how closely related they are based on genetic or physical traits.

For example, consider a phylogenetic tree of primates. It reveals how humans, chimpanzees, gorillas, and orangutans share common ancestors and highlights their evolutionary divergence. A phylogenetic tree is essentially a hypothesis that depicts evolutionary pathways and relationships.

## Why Are Phylogenetic Trees Important?

### Phylogenetic trees are essential tools in evolutionary biology and related fields, serving several critical functions:

1. **Tracing Evolutionary Pathways:** These trees help scientists understand how different species have evolved over time from common ancestors. They provide insights into the branching patterns and evolutionary relationships between organisms.
2. **Mapping Genetic Changes:** By analyzing genetic sequences, phylogenetic trees enable researchers to track genetic changes that have occurred throughout the evolutionary process. This helps in studying the evolution of specific traits or genes across different species.
3. **Understanding Biodiversity:** The branching patterns in phylogenetic trees shed light on the diversification of species and the development of new characteristics or traits. This knowledge contributes to our understanding of the vast biodiversity on Earth.
4. **Disease Research:** In the context of pathogens, such as viruses or bacteria, phylogenetic trees are invaluable tools for tracking the spread and evolution of diseases. They help identify the origins, transmission pathways, and potential sources of disease outbreaks.

## How Are Phylogenetic Trees Created?
### To construct accurate phylogenetic trees, researchers rely on various data sources:

1. **Genetic Sequences:** The primary data used in phylogenetic analysis are DNA, RNA, or protein sequences obtained from different species or strains. These sequences are compared to identify similarities and differences.
2. **Public Databases:** Genetic sequence data can be accessed from public repositories like GenBank, EMBL, and DDBJ, which maintain comprehensive and annotated genetic information for numerous organisms.
3. **Genomic Projects:** Large-scale genomic projects, such as the Human Genome Project or the 1000 Genomes Project, provide extensive datasets that can be utilized for phylogenetic studies.
4. **Sequencing Technologies:** Advances in sequencing technologies, like next-generation sequencing (NGS), have made it easier and more cost-effective to obtain high-quality genetic data for a wide range of organisms.


## Types of Phylogenetic Trees 
### Phylogenetic trees can take various forms, each suited to specific research questions and data interpretations. Below are the main types of phylogenetic trees, with examples and visualization code to help you understand their differences.
<!-- 1. **Rooted Trees:** These trees have a single ancestral root, representing the common ancestor of all the entities in the tree. The direction of the branches indicates the passage of time and evolutionary divergence.
2. **Unrooted Trees:** Unrooted trees do not show a common ancestor but illustrate the relationships among species without indicating the direction of evolutionary time.
3. **Cladograms:** Cladograms represent the branching order of evolutionary relationships but do not provide information about the branch lengths or the amount of evolutionary change.
4. **Phylograms:** Phylograms provide both the branching order and the branch lengths, indicating the amount of evolutionary change along each branch.
5. **Dendrograms:** Similar to phylograms, dendrograms can also include hierarchical clustering, making them useful in various fields like genomics and linguistics.
   
By studying phylogenetic trees, scientists can gain valuable insights into the evolutionary history, relationships, and diversification of different organisms, ultimately expanding our understanding of the intricate tapestry of life on Earth. -->

#### 1. **Rooted Trees**:

A rooted tree includes a single common ancestor, represented by the root, from which all other branches diverge. The direction of the branches shows the passage of time and evolutionary divergence.

In [None]:
from Bio import Phylo
from io import StringIO
import matplotlib.pyplot as plt

# Newick format representing a rooted tree for mammals
rooted_tree = "((Human:0.6, Chimpanzee:0.6):0.4, (Dog:0.8, (Cat:0.7, Mouse:0.7):0.3):0.2);"
tree = Phylo.read(StringIO(rooted_tree), "newick")

# Visualization
fig, ax = plt.subplots(figsize=(12, 8))
Phylo.draw(tree, axes=ax)
ax.set_title("Rooted Tree: Evolutionary Relationships Among Mammals", fontsize=14, weight='bold')
plt.show()


#### 2. **Unrooted Trees**:

Unrooted trees do not indicate a common ancestor. They depict relationships between species without implying evolutionary direction or time.

In [None]:
from Bio import Phylo
from io import StringIO
import matplotlib.pyplot as plt

# Newick format for an unrooted tree comparing microbial communities
unrooted_tree = "(Bacteria, Archaea, Eukaryota);"
tree = Phylo.read(StringIO(unrooted_tree), "newick")

# Visualization
fig = plt.figure(figsize=(8, 6))
ax = fig.add_subplot(1, 1, 1)
Phylo.draw(tree, axes=ax)
ax.set_title("Unrooted Tree: Microbial Communities", fontsize=14, weight='bold')
plt.show()

#### 3. **Cladograms**:

Cladograms focus on branching order, showing relationships between species but not providing information about branch lengths or evolutionary distances.

Visualization Example:

In [None]:
from Bio import Phylo
from io import StringIO
import matplotlib.pyplot as plt

# Newick format for a cladogram (branching order only)
cladogram = "(((Frog, Lizard), (Bird, Mammal)), Fish);"
tree = Phylo.read(StringIO(cladogram), "newick")

# Visualization
fig, ax = plt.subplots(figsize=(10, 8))
Phylo.draw(tree, axes=ax)
ax.set_title("Cladogram: Vertebrate Evolution", fontsize=14, weight='bold')
plt.show()

#### 4. **Phylograms**:
    
Phylograms provide information on both branching order and branch lengths, which represent evolutionary changes.

Visualization Example:

In [None]:
from Bio import Phylo
from io import StringIO
import matplotlib.pyplot as plt

# Newick format for a phylogram with branch lengths representing genetic divergence
phylogram = "((Drosophila_melanogaster:0.4, Drosophila_simulans:0.5):0.3, Drosophila_yakuba:0.6);"
tree = Phylo.read(StringIO(phylogram), "newick")

# Visualization
fig, ax = plt.subplots(figsize=(12, 6))
Phylo.draw(tree, axes=ax)
ax.set_title("Phylogram: Genetic Divergence in Fruit Flies", fontsize=14, weight='bold')
plt.show()

#### 5. **Dendrograms**:
    
Dendrograms include hierarchical clustering information and are useful in fields like genomics and linguistics.

Visualization Example:

In [None]:
from scipy.cluster.hierarchy import dendrogram, linkage
import matplotlib.pyplot as plt
import numpy as np
from scipy.spatial.distance import squareform

# Sample data: distances between genes based on expression profiles
genes = ['GeneA', 'GeneB', 'GeneC', 'GeneD', 'GeneE']
similarity_data = np.array([
    [0, 2, 4, 6, 8],
    [2, 0, 3, 5, 7],
    [4, 3, 0, 2, 4],
    [6, 5, 2, 0, 3],
    [8, 7, 4, 3, 0]
])

# Convert similarity matrix to condensed distance format
condensed_similarity_data = squareform(similarity_data)

# Perform clustering
linkage_matrix = linkage(condensed_similarity_data, method='ward')

# Plot dendrogram
plt.figure(figsize=(10, 7))
dendrogram(
    linkage_matrix,
    labels=genes,
    leaf_rotation=45,
    leaf_font_size=12,
    color_threshold=6
)
plt.title("Dendrogram: Hierarchical Clustering of Genes", fontsize=14, weight='bold')
plt.xlabel("Genes", fontsize=12)
plt.ylabel("Distance", fontsize=12)
plt.tight_layout()
plt.show()

# Summary
Phylogenetic trees are like a detective story unraveling the history of life. They offer insights into how organisms evolved, diversified, and adapted to different environments. From understanding human ancestry to tracking diseases like COVID-19, these trees play a crucial role in modern science.
 
To solidify your understanding, engage with the interactive quiz below and test your knowledge of phylogenetic basics.

# Interactive Quiz
The following quiz will help reinforce the understanding of phylogenetics:

In [None]:
from jupyterquiz import display_quiz
display_quiz('Quiz/QS1.json')