## Getting a dated synthesis tree for an arbitrary set of taxa

## Standardizing query taxon names

One of the key challenges of comparing trees across studies is differences in taxon names because of spelling or taxonomic idiosincracies.

A solution to this, is mapping taxon names to unique identifiers using the Open Tree Taxonomic Name Resolution Service (TNRS). There are a few options to use this service including via the API, or the browser based bulk name mapping.

### Open Tree TNRS bulk name mapping tool.

Access this tool at https://tree.opentreeoflife.org/curator/tnrs/

This is a new beta-version of this functionality, so some parts are a bit finicky.

*Try this*
  * Click on "Add names..." (second button at the top of the menu on the left), and upload the names file `drosophila_example`. The "loading file" window will not close by itself, click the (X).
  * In the "Mapping options" section (bottom of the menu to the left):
    - select 'Insects' to narrow down the possibilities and speed up mapping
  * Click "Map selected names" (middle of the menu to the left).
  * Exact matches will show up in green, and can be accepted by clicking "accept exact matches".
  * Once you have accepted names for each of the taxa, click "Save nameset...", download it to your laptop, and extract (unzip) the files. You can take a look at the human readable version of the output at `output/main.csv`. `main.json` contains the the same data in a more computer readable format.
  * Finally, transfer the `main.csv` file to your working folder, so you can use it to get the tree for your taxa.

*Make sure your mappings were saved! If you do not **accept** matches (by clicking buttons), they do not download.*


In [1]:
import requests
import json
import sys
import dendropy
import csv
from opentree import OT


In [2]:
mapped_names = "../drosophila_example/drosophila_main.csv"


## uses the csv to create a dictionary with OTTids as keys, and the label you input as values
with open(mapped_names) as fp:
    reader = csv.reader(fp, delimiter=",", quotechar='"')
    next(reader, None)  # skip the headers
    label_dict = {'ott'+row[2]:row[0] for row in reader}

    
## Node ids contains the list of tip ids you want in your final tree
node_ids = [key for key in label_dict]


## Dated trees
To estimate dates, we will use the Chronosynth API. 
The dates API is work-in-progress, and so it is not yet as user friendly as it will be. 

A summary of the methods is here: https://github.com/OpenTreeOfLife/chronosynth/wiki/Chronosynth-methods-overview  
There are some API docs here: https://github.com/OpenTreeOfLife/chronosynth/wiki/Draft-API-docs

Using the dates API we can get dates that align to for individual nodes in the synth tree. 
This is based on the same information you saw in the conflict viewer.  
You can you a CURL call to GET the current information for dates for a node.

In [3]:
!curl -X GET https://dates.opentreeoflife.org/v4/dates/synth_node_age/ott109893 | jq

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   712  100   712    0     0    833      0 --:--:-- --:--:-- --:--:--   833
[1;39m{
  [0m[34;1m"query"[0m[1;39m: [0m[0;32m"ott109893"[0m[1;39m,
  [0m[34;1m"synth_node_id"[0m[1;39m: [0m[0;32m"ott109893"[0m[1;39m,
  [0m[34;1m"ot:source_node_ages"[0m[1;39m: [0m[1;39m[
    [1;39m{
      [0m[34;1m"age"[0m[1;39m: [0m[0;39m28.303778[0m[1;39m,
      [0m[34;1m"source_id"[0m[1;39m: [0m[0;32m"ot_409@tree1"[0m[1;39m,
      [0m[34;1m"source_node"[0m[1;39m: [0m[0;32m"node19094"[0m[1;39m
    [1;39m}[0m[1;39m,
    [1;39m{
      [0m[34;1m"age"[0m[1;39m: [0m[0;39m11.822624999999999[0m[1;39m,
      [0m[34;1m"source_id"[0m[1;39m: [0m[0;32m"ot_809@tree2"[0m[1;39m,
      [0m[34;1m"source_node"[0m[1;39m: [0m[0;32m"node13685"[0m[1;39m
    [1;39m}[0m[1;39m,
    [1;39m{
    

To look at the node itself, you can navigate to 
"https://tree.opentreeoflife.org/curator/study/view/{STUDY ID}?tab=home&tree={TREE ID}&node={NODE_ID}"  
e.g. https://tree.opentreeoflife.org/curator/study/view/ot_2018?tab=home&tree=tree9&node=node1412  

In [4]:
## You can also get dates for arbitrary nodes in the synth tree, which are not associated with taxa.
!curl -X GET https://dates.opentreeoflife.org/v4/dates/synth_node_age/mrcaott1000311ott3643727 | jq

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   184  100   184    0     0    213      0 --:--:-- --:--:-- --:--:--   213
[1;39m{
  [0m[34;1m"query"[0m[1;39m: [0m[0;32m"mrcaott1000311ott3643727"[0m[1;39m,
  [0m[34;1m"synth_node_id"[0m[1;39m: [0m[0;32m"mrcaott1000311ott3643727"[0m[1;39m,
  [0m[34;1m"ot:source_node_ages"[0m[1;39m: [0m[1;39m[
    [1;39m{
      [0m[34;1m"age"[0m[1;39m: [0m[0;39m9.325001[0m[1;39m,
      [0m[34;1m"source_id"[0m[1;39m: [0m[0;32m"ot_1592@tree1"[0m[1;39m,
      [0m[34;1m"source_node"[0m[1;39m: [0m[0;32m"node22956"[0m[1;39m
    [1;39m}[0m[1;39m
  [1;39m][0m[1;39m
[1;39m}[0m


This node https://tree.opentreeoflife.org/opentree/argus/opentree13.4@mrcaott1000311ott3643727 in the synthetic tree aligns with this node in this dated tree https://tree.opentreeoflife.org/curator/study/view/ot_1592?tab=home&tree=tree1&node=node22956

## Summarizing dates  for a set of taxa
You can use these dates individually, or use the API to get the dates across nodes in a tree, and then smooth the nodes in between

The approach in these examples uses rate smoothing, via bladj https://phylodiversity.net/phylocom/  
These approaches are rough, and there is NO infomation for many nodes, so be cautious!
### To run bladj we need a max age estimate for the root of the tree.

In [5]:
## Get the synth tree MRCA of the taxa of interest, so we know the root of our tree
output = OT.synth_mrca(node_ids=node_ids)
print(json.dumps(output.response_dict['mrca'], indent=4))


{
    "node_id": "ott34905",
    "num_tips": 4554,
    "supported_by": {
        "ott3.3draft1": "ott34905"
    },
    "taxon": {
        "name": "Drosophilidae",
        "ott_id": 34905,
        "rank": "family",
        "tax_sources": [
            "ncbi:7214",
            "worms:987176",
            "gbif:5547",
            "irmng:100842"
        ],
        "unique_name": "Drosophilidae"
    },
    "terminal": {
        "ot_1046@tree1": "node28",
        "ot_1047@tree1": "node63",
        "pg_1337@tree6167": "node1053390",
        "pg_2594@tree6014": "node1021794",
        "pg_2710@tree6291": "node1094072",
        "pg_2822@tree6569": "node1141999",
        "pg_437@tree6242": "node1081938"
    },
    "was_constrained": true,
    "was_uncontested": true
}



Unknown/unrecognized query ids (skipped):
 ott361374
ott4418979 


In [6]:
!curl -X GET https://dates.opentreeoflife.org/v4/dates/synth_node_age/ott34905 | jq

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100    79  100    79    0     0     98      0 --:--:-- --:--:-- --:--:--    98
[1;39m{
  [0m[34;1m"query"[0m[1;39m: [0m[0;32m"ott34905"[0m[1;39m,
  [0m[34;1m"synth_node_id"[0m[1;39m: [0m[0;32m"ott34905"[0m[1;39m,
  [0m[34;1m"ot:source_node_ages"[0m[1;39m: [0m[1;30mnull[0m[1;39m
[1;39m}[0m


In [7]:
## You can also get node dates for a node in python, ucing the 'requests' module
datesurl     = 'https://dates.opentreeoflife.org/v4/dates/synth_node_age/{}'.format(output.response_dict['mrca']['node_id'])
resp = requests.get(datesurl)
resp.json()

{'query': 'ott34905', 'synth_node_id': 'ott34905', 'ot:source_node_ages': None}

We do not have an age for this node in the database, so we need to use external information.  
We will use 50 Million years for the age of drosophilidae, based on https://www.sciencedirect.com/topics/agricultural-and-biological-sciences/drosophilidae

In [8]:
## Get Dated synth tree
url     = 'https://dates.opentreeoflife.org/v4/dates/dated_tree'

payload = { "node_ids" : node_ids,
            "max_age": 50}

resp = requests.post(url=url, data=json.dumps(payload))
resp_dict = resp.json()



This returns a response object, 

In [9]:
resp_dict.keys()

dict_keys(['dated_trees_newick_list', 'topology_sources', 'date_sources', 'tar_file_download'])

In [10]:
dated_tree = dendropy.Tree.get(data=resp_dict['dated_trees_newick_list'][0], schema="newick")
dated_tree.write(path="ottid_dated_tree.tre", schema='newick')


In [11]:
resp_dict['tar_file_download']

'dates.opentreeoflife.org/v4/dates/download_dates_tar/chrono_out_05_18_2023_20_24_21.tar.gz'

In [12]:
## To get the full set of ages, 
! wget http://dates.opentreeoflife.org/v4/dates/download_dates_tar/chrono_out_01_12_2023_07_28_37.tar.gz


--2023-05-18 13:24:21--  http://dates.opentreeoflife.org/v4/dates/download_dates_tar/chrono_out_01_12_2023_07_28_37.tar.gz
Resolving dates.opentreeoflife.org (dates.opentreeoflife.org)... 34.216.116.212
Connecting to dates.opentreeoflife.org (dates.opentreeoflife.org)|34.216.116.212|:80... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: https://dates.opentreeoflife.org/v4/dates/download_dates_tar/chrono_out_01_12_2023_07_28_37.tar.gz [following]
--2023-05-18 13:24:21--  https://dates.opentreeoflife.org/v4/dates/download_dates_tar/chrono_out_01_12_2023_07_28_37.tar.gz
Connecting to dates.opentreeoflife.org (dates.opentreeoflife.org)|34.216.116.212|:443... connected.
HTTP request sent, awaiting response... 410 Gone
2023-05-18 13:24:22 ERROR 410: Gone.



### Evaluate your confidence in your tree.
How many nodes are in your tree? How many do you have date estimates for?

## We can also compare nodes in our custom synth tree to dated trees, and use them to infer dates for those nodes.

In [13]:
## Read in your custom synth tree
## See https://opentreeoflife.github.io/CustomSynthesis/ for how to generate a custom tree
import dendropy
custom_synth_dir = "snacktavish_dros_34905_tmp5exp6dql"
treepath = "{}/labelled_supertree/labelled_supertree.tre".format(custom_synth_dir)
custom_synth = dendropy.Tree.get_from_path(treepath, schema = "newick")


In [14]:
#Sometimes there are subspecies as tips, when your query is actually species.
collapsed_tax = []
for node in custom_synth:
    if not node.is_leaf():
        if node.label in label_dict:
            print(node.label)
            print(label_dict[node.label])
            collapsed_tax.append(node.label)
            node.clear_child_nodes()
            node.taxon = custom_synth.taxon_namespace.new_taxon(label=node.label)


leaves_A = [tip.taxon.label for tip in custom_synth.leaf_node_iter()]
print("Total number of tips in synth tree after collapsing queries with lower level taxa is {}\n".format(len(leaves_A)))


ott1082279
Drosophila mojavensis
ott138603
Drosophila mercatorum
ott534107
Drosophila americana
ott245830
Drosophila nasuta
ott1082272
Drosophila pseudoobscura
ott616768
Drosophila paulistorum
ott616777
Drosophila tropicalis
ott505710
Scaptodrosophila lebanonensis
Total number of tips in synth tree after collapsing queries with lower level taxa is 5398



In [15]:
taxa_to_retain = list(label_dict.keys())
custom_synth.retain_taxa_with_labels(taxa_to_retain)

In [16]:
## Get Dated synth tree
url     = 'https://dates.opentreeoflife.org/v4/dates/dated_tree'

payload = { "newick" : custom_synth.as_string(schema="newick"),
            "max_age": 50}

resp = requests.post(url=url, data=json.dumps(payload))
resp_dict = resp.json()


dated_tree = dendropy.Tree.get(data=resp_dict['dated_trees_newick_list'][0], schema="newick")
#dated_tree.write(path=output_dir+"/ottid_dated_tree.tre", schema='newick')

In [17]:
for taxon in dated_tree.taxon_namespace:
    if taxon.label in label_dict:
        taxon.label = label_dict[taxon.label]

In [18]:
dated_tree.write(path = "labelled_dated_tree.tre",schema= "newick")

### Assessing uncertainty

Lets choose a part of the synth tree where we have lot of data!  
https://tree.opentreeoflife.org/opentree/opentree13.4@ott109893/Cracidae  
We can estimate a dated tree for Cracidae

In [19]:
## Get Dated synth tree
url     = 'https://dates.opentreeoflife.org/v4/dates/dated_tree'


##Here, we have several date estimates for each node,
##so we can run the summarization several times,
##choosing one at random each time
payload = { "node_id" : 'ott109893',
            "select":'random',
            "reps" : 5}

resp = requests.post(url=url, data=json.dumps(payload))
resp_dict = resp.json()


In [20]:
resp_dict['tar_file_download']

'dates.opentreeoflife.org/v4/dates/download_dates_tar/chrono_out_05_18_2023_20_24_44.tar.gz'

In [21]:
!wget 'dates.opentreeoflife.org/v4/dates/download_dates_tar/chrono_out_05_18_2023_20_17_34.tar.gz'

--2023-05-18 13:24:45--  http://dates.opentreeoflife.org/v4/dates/download_dates_tar/chrono_out_05_18_2023_20_17_34.tar.gz
Resolving dates.opentreeoflife.org (dates.opentreeoflife.org)... 34.216.116.212
Connecting to dates.opentreeoflife.org (dates.opentreeoflife.org)|34.216.116.212|:80... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: https://dates.opentreeoflife.org/v4/dates/download_dates_tar/chrono_out_05_18_2023_20_17_34.tar.gz [following]
--2023-05-18 13:24:45--  https://dates.opentreeoflife.org/v4/dates/download_dates_tar/chrono_out_05_18_2023_20_17_34.tar.gz
Connecting to dates.opentreeoflife.org (dates.opentreeoflife.org)|34.216.116.212|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 4471 (4.4K) [application/gzip]
Saving to: ‘chrono_out_05_18_2023_20_17_34.tar.gz.1’


2023-05-18 13:24:45 (24.9 MB/s) - ‘chrono_out_05_18_2023_20_17_34.tar.gz.1’ saved [4471/4471]



In [22]:
## The file "tmp/bladj.tre" will contain 5 different trees, based on random selections among the input date data