## Getting a dated synthesis tree for an arbitrary set of taxa

## Standardizing query taxon names

One of the key challenges of comparing trees across studies is differences in taxon names because of spelling or taxonomic idiosincracies.

A solution to this, is mapping taxon names to unique identifiers using the Open Tree Taxonomic Name Resolution Service (TNRS). There are a few options to use this service including via the API, or the browser based bulk name mapping.

### Open Tree TNRS bulk name mapping tool.

Access this tool at https://tree.opentreeoflife.org/curator/tnrs/

This is a new beta-version of this functionality, so some parts are a bit finicky.

*Try this*
  * Click on "Add names..." (second button at the top of the menu on the left), and upload the names file `drosophila_example`. The "loading file" window will not close by itself, click the (X).
  * In the "Mapping options" section (bottom of the menu to the left):
    - select 'Insects' to narrow down the possibilities and speed up mapping
  * Click "Map selected names" (middle of the menu to the left).
  * Exact matches will show up in green, and can be accepted by clicking "accept exact matches".
  * Once you have accepted names for each of the taxa, click "Save nameset...", download it to your laptop, and extract (unzip) the files. You can take a look at the human readable version of the output at `output/main.csv`. `main.json` contains the the same data in a more computer readable format.
  * Finally, transfer the `main.csv` file to your working folder, so you can use it to get the tree for your taxa.

*Make sure your mappings were saved! If you do not **accept** matches (by clicking buttons), they do not download.*


In [2]:
import requests
import json
import sys
import dendropy
import csv
from opentree import OT


In [3]:
mapped_names = "/home/ejmctavish/Desktop/Committees/grad/Bailey/main.csv"


## uses the csv to create a dictionary with OTTids as keys, and the label you input as values
with open(mapped_names) as fp:
    reader = csv.reader(fp, delimiter=",", quotechar='"')
    next(reader, None)  # skip the headers
    label_dict = {'ott'+row[2]:row[0] for row in reader}

    
## Node ids contains the list of tip ids you want in your final tree
node_ids = [key for key in label_dict]


## Dated trees
To estimate dates, we will use the Chronosynth API. 
The dates API is work-in-progress, and so it is not yet as user friendly as it will be. 

A summary of the methods is here: https://github.com/OpenTreeOfLife/chronosynth/wiki/Chronosynth-methods-overview  
There are some API docs here: https://github.com/OpenTreeOfLife/chronosynth/wiki/Draft-API-docs

Using the dates API we can get dates that align to for individual nodes in the synth tree. 
This is based on the same information you saw in the conflict viewer.  
You can you a CURL call to GET the current information for dates for a node.

To look at the node itself, you can navigate to 
"https://tree.opentreeoflife.org/curator/study/view/{STUDY ID}?tab=home&tree={TREE ID}&node={NODE_ID}"  
e.g. https://tree.opentreeoflife.org/curator/study/view/ot_2018?tab=home&tree=tree9&node=node1412  

This node https://tree.opentreeoflife.org/opentree/argus/opentree13.4@mrcaott1000311ott3643727 in the synthetic tree aligns with this node in this dated tree https://tree.opentreeoflife.org/curator/study/view/ot_1592?tab=home&tree=tree1&node=node22956

## Summarizing dates  for a set of taxa
You can use these dates individually, or use the API to get the dates across nodes in a tree, and then smooth the nodes in between

The approach in these examples uses rate smoothing, via bladj https://phylodiversity.net/phylocom/  
These approaches are rough, and there is NO infomation for many nodes, so be cautious!
### To run bladj we need a max age estimate for the root of the tree.

In [4]:
## Get the synth tree MRCA of the taxa of interest, so we know the root of our tree
output = OT.synth_mrca(node_ids=node_ids)
print(json.dumps(output.response_dict['mrca'], indent=4))



Unknown/unrecognized query ids (skipped):
 ott
ott1030729
ott3633223
ott3633631
ott3633632
ott3633633
ott3633637
ott3633645
ott3633691
ott3633724
ott3633758
ott3633785
ott3633786
ott3633796
ott3633798
ott3633810
ott3634633
ott3634648
ott3634790
ott3634840
ott3634842
ott3634844
ott3634910
ott3634952
ott3635449
ott3635492
ott3635506
ott3635521
ott3635523
ott3635547
ott3635563
ott3635589
ott3635599
ott3638536
ott5223553
ott5852341
ott5924886
ott7064009
ott7064032
ott7064035
ott774815 


{
    "node_id": "ott278114",
    "num_tips": 90733,
    "partial_path_of": {
        "pg_2822@tree6569": "node1142002"
    },
    "supported_by": {
        "ot_508@tree2": "node27",
        "ott3.3draft1": "ott278114",
        "pg_1217@tree2455": "node566190"
    },
    "taxon": {
        "name": "Gnathostomata",
        "ott_id": 278114,
        "rank": "superclass",
        "tax_sources": [
            "ncbi:7776",
            "worms:1828"
        ],
        "unique_name": "Gnathostomata (superclass in phylum Chordata)"
    },
    "was_constrained": true,
    "was_uncontested": true
}


In [5]:
!curl -X GET https://dates.opentreeoflife.org/v4/dates/synth_node_age/ott278114 | jq

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   668  100   668    0     0     17      0  0:00:39  0:00:37  0:00:02   174
[1;39m{
  [0m[34;1m"query"[0m[1;39m: [0m[0;32m"ott278114"[0m[1;39m,
  [0m[34;1m"synth_node_id"[0m[1;39m: [0m[0;32m"ott278114"[0m[1;39m,
  [0m[34;1m"ot:source_node_ages"[0m[1;39m: [0m[1;39m[
    [1;39m{
      [0m[34;1m"source_id"[0m[1;39m: [0m[0;32m"ot_508@tree8"[0m[1;39m,
      [0m[34;1m"age"[0m[1;39m: [0m[0;39m449.9971[0m[1;39m,
      [0m[34;1m"source_node"[0m[1;39m: [0m[0;32m"node669"[0m[1;39m
    [1;39m}[0m[1;39m,
    [1;39m{
      [0m[34;1m"source_id"[0m[1;39m: [0m[0;32m"ot_508@tree9"[0m[1;39m,
      [0m[34;1m"age"[0m[1;39m: [0m[0;39m449.9713[0m[1;39m,
      [0m[34;1m"source_node"[0m[1;39m: [0m[0;32m"node776"[0m[1;39m
    [1;39m}[0m[1;39m,
    [1;39m{
      [0m[34;1m"s

In [6]:
## You can also get node dates for a node in python, ucing the 'requests' module
datesurl     = 'https://dates.opentreeoflife.org/v4/dates/synth_node_age/{}'.format(output.response_dict['mrca']['node_id'])
resp = requests.get(datesurl)
resp.json()

{'query': 'ott278114',
 'synth_node_id': 'ott278114',
 'ot:source_node_ages': [{'age': 449.9971,
   'source_id': 'ot_508@tree8',
   'source_node': 'node669'},
  {'age': 449.9713, 'source_id': 'ot_508@tree9', 'source_node': 'node776'},
  {'age': 449.8911, 'source_id': 'ot_508@tree3', 'source_node': 'node134'},
  {'age': 450.7471, 'source_id': 'ot_508@tree5', 'source_node': 'node348'},
  {'age': 450.6612, 'source_id': 'ot_508@tree4', 'source_node': 'node241'},
  {'age': 453.0243, 'source_id': 'ot_508@tree6', 'source_node': 'node455'},
  {'age': 453.0575, 'source_id': 'ot_508@tree7', 'source_node': 'node562'},
  {'age': 449.9092, 'source_id': 'ot_508@tree2', 'source_node': 'node27'}]}

In [7]:
## Get Dated synth tree
url     = 'https://dates.opentreeoflife.org/v4/dates/dated_tree'

payload = { "node_ids" : node_ids}

resp = requests.post(url=url, data=json.dumps(payload))
resp_dict = resp.json()



This returns a response object, 

In [10]:
resp_dict.keys()

dict_keys(['dated_trees_newick_list', 'topology_sources', 'date_sources', 'tar_file_download'])

In [11]:
dated_tree = dendropy.Tree.get(data=resp_dict['dated_trees_newick_list'][0], schema="newick")
dated_tree.write(path="ottid_dated_tree.tre", schema='newick')


In [12]:
resp_dict['tar_file_download']

'dates.opentreeoflife.org/v4/dates/download_dates_tar/chrono_out_02_02_2023_18_50_59.tar.gz'

In [14]:
## To get the full set of ages, 
! wget http://dates.opentreeoflife.org/v4/dates/download_dates_tar/chrono_out_02_02_2023_18_50_59.tar.gz


--2023-02-02 10:52:24--  http://dates.opentreeoflife.org/v4/dates/download_dates_tar/chrono_out_02_02_2023_18_50_59.tar.gz
Resolving dates.opentreeoflife.org (dates.opentreeoflife.org)... 34.216.116.212
Connecting to dates.opentreeoflife.org (dates.opentreeoflife.org)|34.216.116.212|:80... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: https://dates.opentreeoflife.org/v4/dates/download_dates_tar/chrono_out_02_02_2023_18_50_59.tar.gz [following]
--2023-02-02 10:52:24--  https://dates.opentreeoflife.org/v4/dates/download_dates_tar/chrono_out_02_02_2023_18_50_59.tar.gz
Connecting to dates.opentreeoflife.org (dates.opentreeoflife.org)|34.216.116.212|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 78357 (77K) [application/gzip]
Saving to: ‘chrono_out_02_02_2023_18_50_59.tar.gz’


2023-02-02 10:52:25 (668 KB/s) - ‘chrono_out_02_02_2023_18_50_59.tar.gz’ saved [78357/78357]



In [17]:
for taxon in dated_tree.taxon_namespace:
    if taxon.label in label_dict:
        taxon.label = label_dict[taxon.label]

In [18]:
dated_tree.write(path = "labelled_dated_tree.tre",schema= "newick")

In [24]:
print(OT.get_citations(resp_dict['topology_sources']))

https://tree.opentreeoflife.org/curator/study/view/ot_526?tab=trees&tree=Tr91777
Lundberg J.G., Sullivan J., Rodiles-hernandez R., & Hendrickson D. 2007. Discovery of African roots for the Mesoamerican Chiapas catfish, Lacantunia enigmatica, requires an ancient intercontinental passage. Proceedings of the Academy of Natural Sciences of Philadelphia, 156: 39-53.
http://dx.doi.org/10.1635/0097-3157(2007)156[39:doarft]2.0.co;2

https://tree.opentreeoflife.org/curator/study/view/ot_1054?tab=trees&tree=tree1
Arcila, Dahiana, Guillermo Ortí, Richard Vari, Jonathan W. Armbruster, Melanie L. J. Stiassny, Kyung D. Ko, Mark H. Sabaj, John Lundberg, Liam J. Revell, Ricardo Betancur-R., 2017, 'Genome-wide interrogation advances resolution of recalcitrant groups in the tree of life', Nature Ecology & Evolution, vol. 1, p. 0020
http://dx.doi.org/10.1038/s41559-016-0020

https://tree.opentreeoflife.org/curator/study/view/ot_1384?tab=trees&tree=tree1
Jose Tavera, Arturo Acero P., Peter C. Wainwright, 

In [25]:
print(OT.get_citations(resp_dict['date_sources']))

https://tree.opentreeoflife.org/curator/study/view/pg_2576?tab=trees&tree=tree5975
Near, T. J., A. Dornburg, R. I. Eytan, B. P. Keck, W. L. Smith, K. L. Kuhn, J. A. Moore, S. A. Price, F. T. Burbrink, M. Friedman, P. C. Wainwright, 2013, 'Phylogeny and tempo of diversification in the superradiation of spiny-rayed fishes', Proceedings of the National Academy of Sciences, vol. 110, no. 31, pp. 12738-12743
http://dx.doi.org/10.1073/pnas.1304661110

https://tree.opentreeoflife.org/curator/study/view/ot_1091?tab=trees&tree=tree1
Millicent D. Sanciangco, Kent E. Carpenter, Ricardo Betancur-R., 2016, 'Phylogenetic placement of enigmatic percomorph families (Teleostei: Percomorphaceae)', Molecular Phylogenetics and Evolution, vol. 94, pp. 565-576
http://dx.doi.org/10.1016/j.ympev.2015.10.006

https://tree.opentreeoflife.org/curator/study/view/pg_2654?tab=trees&tree=tree6179
Chakrabarty, Prosanta, Matthew P. Davis, W. Leo Smith, Zachary H. Baldwin, John S. Sparks. 2011. Is sexual selection driv

In [27]:
len([leaf for leaf in dated_tree.leaf_node_iter()])

1631