
# Evaluating phylogenomics approaches in genome evolution

### Design project - Master Bioinformatics 2018-2019

<img width="40%" style="float: right;" src="https://camo.githubusercontent.com/271a31013c5338407ab939f65b0fee906f0a4f4e/687474703a2f2f696e636f6e76657267656e742e6e65742f696d672f747265655f632e6a7067">

## Genome evolution & whole genome duplication

<img src="img/vdp17.png" width="50%"> 

## Phylogenomics 

Applying methods from **phylogenetics** to **genome-scale** data


- to study patterns of **gene gain** and **loss** across multiple lineages
- to study **gene function** in an evolutionary context
- to infer ancient **whole genome duplication** (WGDs) events

Phylogenomics is at the interface of evolutionary biology and genomics. It is most often understood as the application of phylogenetics methods to genome scale data sets. It is often used to gain insights in gene function, gene family evolution and large scale evolutionary events (such as whole genome duplications or WGDs) or to perform species tree inference, species delimitation or phylobiogeography.

### The general phylogenomic pipeline

![](img/pipeline.png)

The general phylogenomic pipeline consists in (1) delimiting gene families and establishing their evolutionary relationships through (2) multiple sequence alignments and (3) phylogenetic tree inference. The first two steps are effectively geared towards establishing hyoptheses of homology, whereas the tree inference itself tries to pin down n exct evolutionary hypothesis on the gene family in question. 

Gene family inference is generally performed using sophisticated graph clustering methods. Multiple sequence alignment is a horrifyingly challenging task (an aspect often underappreciated due to the existence of user-friendly software), relying on heuristic algorithms. Tree inference has become a challing statistical and computational task, employing some of the most sophisticated maximum likelhood or Bayesian methodologies developed in the whole of statistics.

Depending on the actual problem of interest, one will do different things with the alignments or trees, here we will focus on gene tree - species tree reconciliation analysis as a means of constructing an explicit hypothesis of gene family evolution in terms of duplication, loss and transfer events. Gene tree - species tree reconciliation is still quite in its infance, and most people still use simplistic algorithms. Recently, promising sophisticated probabilistic approaches have been proposed (a topic where I am working on new methods for WGD inference).

## Goals

- To **critically evaluate** previous results on **WGD inference** based on phylogenomic reconciliation 
    - Hexapods - Li *et al.* (2018)
    - Seed plants - Jiao *et al.* (2011), Li *et al.* (2015)

<img src="img/barker_hex.svg" width="100%">

## Goals

- To **critically evaluate** previous results on **WGD inference** based on phylogenomic reconciliation 
    - Hexapods - Li *et al.* (2018)
    - Seed plants - Jiao *et al.* (2011), Li *et al.* (2015)

- To assess the impact of using **different methods** in phylogenomic reconciliation pipelines
- To evaluate error propagation and biases through **simulation experiments**

The idea is to 

1. Perform the full phylogenomics pipeline using various methods for every step
 - gene family inference
 - multiple sequence alignment
 - tree inference 
2. To assess the impact of the different algorithms and error propagation using simulation studies
3. To implement various *gene tree - species tree reconciliation* algorithms
4. To evaluate and compare the performance and reliability of different commonly used approaches for the inference of WGDs 
5. To assess the importance of the above for the interpretation of previously published results

## What you will learn

- A great deal about evolutionary genomics and phylogenetics - the mother of all bioinformatics! 
- Assessing complex methodologies and pipelines through simulation studies
- Performing computationally intensive pipelines efficiently
- Implementing algorithms on tree structures
- Scrutinize and critically evaluate published results