# Welcome to the Phylogenetics Jupyter Notebook!

### You have been learning about phylogenetics over the past few weeks: looking at phylogenetic trees and thinking about how we can understand evolutionary relationships. Due to advances in sequencing technology, a lot of current phylogenetics work is characterized by *enormous* datasets, meaning that a lot of work we do in the field is *computational*.

### Along with dealing with large datasets, computational work has allowed us to develop ways to simulate evolution. This notebook is meant to introduce you to very simple evolutionary simulations, so that you can visualize some of the concepts we've been talking about in class. 

<div class="alert alert-success">
<b>Learning objectives</b>
<br />1. Understand what a jupyter notebook is and how to run code in jupyter notebook cells
<br />2. Review key terms in phylogenetics
<br />3. Become familiar with toytree and ipcoal as tools for learning phylogenetics
<br />4. Understand the species tree/gene tree problem and incomplete lineage sorting       
<div>

## First, what is a jupyter notebook?

Jupyter notebooks are interactive way of running and visualizing different coding languages (typically python) through a web-browser. They are a tool that can be use to develop and share code, and can be a really nice teaching tool as well! Here is a mini tutorial on how they work

Each block of code is contained within a **cell** which are run individually. Double click on this text, and you'll see it change. This is a Markdown cell, which allows me to make this nicely formatted text! Press cmd + enter on Mac or ctrl + enter on PC to run this cell. Below is a **code cell**, instead of text it contains code that can be run. Run the code cell below to see what happens:

In [5]:
# store value to the variable x
x = 3

# print the variable x
print(x)

3


## Now that you're acquainted with jupyter notebooks, lets do some phylogenetics!

First, I'm going to import two packages that we will use to visualize and simulate evolutionary trees. **Toytree** helps us visualize the trees and **ipcoal** uses a coalescent model to simulate data.

In [1]:
import ipcoal
import toytree

Great! Now that **toytree** and **ipcoal** are imported, we can start to do some phylogenetics! Let's start by generating a random tree with 8 tips. Don't worry so much about what the code is doing here, just run the code cell below to see the tree.

In [10]:
# generate a random tree with 8 tips and height of 1M generations
tree1 = toytree.rtree.unittree(8, treeheight=1e6, seed=123)
# draw tree showing idx labels
tree1.draw(tree_style='s');

Now let's review some terminology. Fill in the answers to these questions by double-clicking this markdown cell and replacing the **blank** with the correct term:
1. r0-r7 are called the **blank** on this tree
2. r2 and r3 are **blank** to each other
3. r0, r1, r2, and r3 are considered a **blank** group
4. r0, r1, r4, and r5 are considered a **blank** group
5. the specific arrangment of the tips of the tree is referred to as the **blank** of the tree
6. Node 14 has a special name. It is the **blank**.

## Visualizing the Gene Tree/Species Tree Problem

As you learned in lecture, gene trees don't always match the species tree. What does this look like? Let's use the tree above as our species tree. 

In [3]:
model1 = ipcoal.Model(tree=tree1, Ne=1e5)
model1.sim_trees(nloci=10, nsites=1)

AttributeError: 'ToyTree' object has no attribute 'set_node_data'

In [None]:
# a dictionary of arguments to style the drawings
kwargs = {
    "ts": "c",
    "tip_labels": True,
    "shared_axis": True,
    "width": 600,
    "height": 200,
    "node_sizes": 6,
}

# draw a grid of trees from model 1
toytree.mtree(model1.df.genealogy).draw_tree_grid(**kwargs);

Let's look at that species tree again:

In [16]:
tree1.draw(tree_style='s');

What do you notice about the species tree relative to gene trees you simulated? What can you say about their **topologies**? Double click into this markdown cell to respond below:








## Incomplete Lineage Sorting

You probably observed above that the topologies of the gene trees don't always match the species tree. We call this phenomenon **gene tree discordance** and it is the result of **incomplete lineage sorting**: when the evolutionary trajectory of a gene does not match the larger species tree pattern.

**Incomplete lineage sorting (ILS)** can make it difficult to resolve species trees (a major goal of a lot of research in Evolutionary Biology!). Simulations can help us understand 1) how ILS will impact how our statistical models resolve phylogenies and 2) which parameters affect ILS, and to what degree. 

In the following activity you will use simulations to address the second question, looking at how **effective population size (Ne), mutation rate, and admixture** affect ILS & gene tree discordance. Use the same species tree you made above (tree1) to test how differences in each of these parameters affects the degree of gene tree discordance! *Record your observations in the markdown cell at the bottom of the notebook.*

### Effective Population Size (Ne)
Recall that effective population size reflects the number of breeding individuals in a population. I've created two models with different Ne values. Keep the Ne value of model1 constant, but adjust the Ne value of model2. What happens when Ne=1e1? What about 1e10? Play around with different values of Ne and record how gene tree discordance changes. 

In [17]:
#create two models
model1 = ipcoal.Model(tree=tree1, Ne=1e4)
model2 = ipcoal.Model(tree=tree1, Ne=1e6)

# simulate n genealogies for each model
model1.sim_trees(10)
model2.sim_trees(10)

#draw the species tree
tree1.draw(tree_style='s')
# draw a grid of trees from model 1
toytree.mtree(model1.df.genealogy).draw_tree_grid(**kwargs);
# draw a grid of trees from model 2
toytree.mtree(model2.df.genealogy).draw_tree_grid(**kwargs);

AttributeError: 'ToyTree' object has no attribute 'set_node_data'

### Mutation Rate

Now do the same thing for mutation rate. What happens when mutation rate is a lot higher? A lot lower?

I've put Ne here so you can test how mutation rate and Ne work together to determine ILS. Does one seem to play a bigger role? Which? Record your observations below. 

In [None]:
#create two models
model1 = ipcoal.Model(tree=tree1, mut=1e-8, Ne=1e4)
model2 = ipcoal.Model(tree=tree1, mut=1e-10, Ne=1e4)

# simulate n genealogies for each model
model1.sim_trees(10)
model2.sim_trees(10)

#draw the species tree
tree1.draw(tree_style='s')
# draw a grid of trees from model 1
toytree.mtree(model1.df.genealogy).draw_tree_grid(**kwargs);
# draw a grid of trees from model 2
toytree.mtree(model2.df.genealogy).draw_tree_grid(**kwargs);

### Admixture

Admixture (sometimes called **introgression**) occurs when two species hybridize, passing genes between branches of a species tree. First, let's visualize what admixture looks like:

In [20]:
tree1.draw(ts='s', admixture_edges=[(3,8)])

(<toyplot.canvas.Canvas at 0x13b2ac940>,
 <toyplot.coordinates.Cartesian at 0x13b0b95e0>,
 <toytree.Render.ToytreeMark at 0x13b299670>)

That orange line is called an **admixture edge**, and it indicates a hybridization event between the branch leading to node 8 and the branch leading to tip 3. How does this affect gene tree discordance?

First, let's create simulations that include admixture. I'm creating a simple admixture edge where there is a single migration even that occurs backwards in time corresponding to the middle of each of branches 3 and 8. This can be adjusted. If you're interested in how to further customize admixture in an ipcoal simulation, click your cursor into the ipcoal.Model parentheses, hold down 'shift' and double click 'tab', this will allow you to scroll through the documentation of this function.

Once you've run the below simulation, you are welcome to play around with it. Change the admixture edges: make them closer or farther away. Add in Ne or mutation rate! What does admixture do to gene tree discordance? Does this make sense to you? Why or why not? Record your observations in the observation cell below. 

In [None]:
#create a model with an admixture edge
model1 = ipcoal.Model(tree=tree1, mut=1e-8, Ne=1e4, admixture_edges=[(3, 8)])

# simulate n genealogies for each model
model1.sim_trees(10)

#draw the species tree
tree1.draw(tree_style='s')
# draw a grid of trees from model 1
toytree.mtree(model1.df.genealogy).draw_tree_grid(**kwargs);

# My observations:

## Double click into this cell and record your observations from the above activities.


<div class="alert alert-success">
<b>Submitting the assignment</b>
<br /> When you are finished with a notebook and ready to submit it, download as an HTML version of the notebook to submit. To do this, go to the 'File Menu', then 'Download as', and select HTML. Upload the HTML notebook to the 'Phylogenetics Jupyter Notebook' assignment in the In-class assigbments part of the Courseworks to submit your assignment. 
<div>