# Introduction of PhyloTree Module for Evolutionary Analyses 

Phylogenetic trees are the result of most evolutionary analyses. They represent the evolutionary relationships among a set of species or, in molecular biology, a set of homologous sequences.

## Comparing `Tree()` and `PhyloTree()` in ETE

- `Tree()`

It does not need to be a phylogenetic tree, any hierarchical tree-like structure could in
theory be used as long as it’s in newick format.
So, in ETE, trees are just a bunch of interconnected nodes that form a hierarchical
structure.

- `PhyloTree()`

The `PhyloTree` is a special class in ETE that was developed to work specifically with
phylogenetic trees, as an extension of the base `Tree` class. 
By default, a PhyloTree incorporates information about thespecies related to the leaf. Thus, while leaves are considered to represent species (or sequences from a given species genome), internal nodes are considered ancestral nodes. A direct consequence of this is, for instance, that every split in the tree will represent a speciation or duplication event.


| Tree type          | ete4.Tree() | ete4.PhyloTree() |
|--------------------|-------------|------------------|
| Leaf names         | Yes         | Yes              |
| Branch lengths     | Yes         | Yes              |
| Support            | Yes         | Yes              |
| **Species information**| No          | Yes              |



## Overview of `PhyloTree`

## [PhyloTree](https://etetoolkit.github.io/ete/reference/reference_phylo.html#ete4.PhyloTree)

*class* **PhyloTree** *(
            newick=None, children=None, 
            alignment=None, alg_format='fasta', 
            sp_naming_function=`<function _parse_species>`, 
            parser=None)*

Bases: [`Tree`](file:///home/deng/Projects/ete4/hackathon/ete4/doc/_build/html/reference/reference_tree.html#ete4.Tree)

Class to store a phylogenetic tree.

Extends the standard [`Tree`](file:///home/deng/Projects/ete4/hackathon/ete4/doc/_build/html/reference/reference_tree.html#ete4.Tree) instance by adding specific properties and methods to work with phylogenetic trees.

**Attributes**
- **newick** – If not None, initializes the tree from a newick, which can be a string or file object containing it.
- **children** – If not None, the children to add to this node.
- **alignment** – File containing a multiple sequence alignment.
- **alg_format** – “fasta”, “phylip” or “iphylip” (interleaved).
- **parser** – Parser to read the newick.
- **sp_naming_function** – Function that gets a node name and returns the species name (see [`PhyloTree.set_species_naming_function()`](https://etetoolkit.github.io/ete/reference/reference_phylo.html#ete4.PhyloTree.set_species_naming_function)). By default, the 3 first letters of node names will be used as species identifier.
- **species** - species identifier of node.


**Methods**

- **annotate_gtdb_taxa** *(taxid_attr='species', tax2name=None, tax2track=None, tax2rank=None, dbfile=None)*

Add NCBI taxonomy annotation to all descendant nodes. Leaf nodes are expected to contain a feature (name, by default) encoding a valid taxid number.
  
- **annotate_ncbi_taxa** *(taxid_attr='species', tax2name=None, tax2track=None, tax2rank=None, dbfile=None)*

Add GTDB taxonomy annotation to all descendant nodes. Leaf nodes are expected to contain a feature (name, by default) encoding a valid taxid number.

- **get_species()** 

Returns the set of species covered by its partition.

- (More detailed information in [PhyloTree documentation](https://etetoolkit.github.io/ete/reference/reference_phylo.html#ete4.PhyloTree.set_species_naming_function))


## Assign `species` to nodes

In [None]:
Adding taxonomic information
PhyloTree instances allow to deal with leaf names and species names separately. This is useful when working with molecular phylogenies, in which node names usually represent sequence identifiers.

Species names will be stored in the PhyloTree.species attribute of each leaf node. The method PhyloTree.get_species() can be used obtain the set of species names found under a given internal node (speciation or duplication event). Often, sequence names do contain species information as a part of the name, and ETE can parse this information automatically.

There are three ways to establish the species of the different tree nodes:

By using the three first letters of the node’s name (default)

By dynamically calling a function based on the node’s name

By setting it manually for each node
# load PhyloTree
from ete4 import PhyloTree

tree = PhyloTree('((9606|protA, 9598|protA), 10090|protB);')

In [20]:
from ete4 import PhyloTree
tree = PhyloTree('((9606|protA, 9598|protA), 10090|protB);')