# Creating a custom summary tree for a set of taxa of interest

If you want to get a subtree from the existing sythesis tree, instead of creating a new custom tree, see https://github.com/McTavishLab/jupyter_OpenTree_tutorials/blob/master/notebooks/DEMO_OpenTree.ipynb  
Here we will show you how to generate a tree from a set of phylogeneies you are interested in.


In [1]:
!pip install -r ../requirements.txt

Defaulting to user installation because normal site-packages is not writeable
Obtaining opentree from git+ssh://****@github.com/OpenTreeOfLife/python-opentree.git@development#egg=opentree (from -r ../requirements.txt (line 2))
  Updating ./src/opentree clone (to revision development)
  Running command git fetch -q --tags
  Running command git reset --hard -q 0efffc04525f45e50eb08cd1addcc25dbec8a268
  Preparing metadata (setup.py) ... [?25ldone
Installing collected packages: opentree
  Attempting uninstall: opentree
    Found existing installation: opentree 1.0.1
    Uninstalling opentree-1.0.1:
      Successfully uninstalled opentree-1.0.1
  Running setup.py develop for opentree
Successfully installed opentree


## Upload phylogenies 

Upload the trees you want to summarize to Phylesystem: https://tree.opentreeoflife.org/curator  
Map the tip labels to the OpenTree taxonomy using the OTU Mapping tab. Don't forget to save!  
Add your trees to a collection: https://tree.opentreeoflife.org/curator/collection/  
Rank them based on which tree's relationships you want to prioritize in your summary tree.

For this example I will be summariazing some recent drosphila trees, which I have placed in my collection 'dros'. https://tree.opentreeoflife.org/curator/collection/view/snacktavish/dros  
I have ranked them based on how recently they were published.

I have a list of taxa that I need a tree for. It is stored in drosophila_example/DrosophilaSpecies.txt


## Running this example

 
### To run as a jupyter notebook

To run as an interactive notebook, install and set up your system as descibed in:
http://opentreeoflife.github.io/SSBworkshop/

```
    git clone https://github.com/McTavishLab/jupyter_OpenTree_tutorials.git
    cd  jupyter_OpenTree_tutorials/workbooks
    jupyter notebook
```

### Or run the code directly in python and bash
Python3 is required.  
```
    git clone https://github.com/McTavishLab/jupyter_OpenTree_tutorials.git
    cd jupyter_OpenTree_tutorials
    pip install -r requirements.txt
```
    
You can run the bash commands directly in your terminal, and the python commands in python3.  


The example data for this demo will be in `drosophila_example`.
You should create a working folder for your data and outputs.

## Standardizing query taxon names

One of the key challenges of comparing trees across studies is differences in taxon names because of spelling or taxonomic idiosincracies.

A solution to this, is mapping taxon names to unique identifiers using the Open Tree Taxonomic Name Resolution Service (TNRS). There are a few options to use this service including via the API, or the browser based bulk name mapping.

### Open Tree TNRS bulk name mapping tool.

Access this tool at https://tree.opentreeoflife.org/curator/tnrs/

This is a new beta-version of this functionality, so some parts are a bit finicky.

*Try this*
  * Click on "Add names..." (second button at the top of the menu on the left), and upload the names file `drosophila_example`. The "loading file" window will not close by itself, click the (X).
  * In the "Mapping options" section (bottom of the menu to the left):
    - select 'Insects' to narrow down the possibilities and speed up mapping
  * Click "Map selected names" (middle of the menu to the left).
  * Exact matches will show up in green, and can be accepted by clicking "accept exact matches".
  * Once you have accepted names for each of the taxa, click "Save nameset...", download it to your laptop, and extract (unzip) the files. You can take a look at the human readable version of the output at `output/main.csv`. `main.json` contains the the same data in a more computer readable format.
  * Finally, transfer the `main.csv` file to your working folder, so you can use it to get the tree for your taxa.

*Make sure your mappings were saved! If you do not **accept** matches (by clicking buttons), they do not download.*


## Get the Most Recent Common Ancestor of your taxa of interest


In [2]:
from opentree import OT
import csv
mapped_names = "../drosophila_example/drosophila_main.csv"


## uses the csv to create a dictionary with OTTids as keys, and the label you input as values
with open(mapped_names) as fp:
    reader = csv.reader(fp, delimiter=",", quotechar='"')
    next(reader, None)  # skip the headers
    label_dict = {'ott'+row[2]:row[0] for row in reader}

    
ott_id_list = [key.strip('ott') for key in label_dict]

## Get the taxonomic MRCA of the taxa of interest
output = OT.taxon_mrca(ott_ids=ott_id_list)
print(output.response_dict)


{'mrca': {'flags': [], 'is_suppressed': False, 'is_suppressed_from_synth': False, 'name': 'Drosophilidae', 'ott_id': 34905, 'rank': 'family', 'source': 'ott3.3draft1', 'synonyms': [], 'tax_sources': ['ncbi:7214', 'worms:987176', 'gbif:5547', 'irmng:100842'], 'unique_name': 'Drosophilidae'}}


For my drosophila example I will set the root of my custom synth tree to 'Drosophilidae', 'ott_id': 34905

In [3]:
print("https://tree.opentreeoflife.org/opentree/argus/opentree13.4@ott{}".format(output.response_dict['mrca']['ott_id']))

https://tree.opentreeoflife.org/opentree/argus/opentree13.4@ott34905


## Run custom synth on your trees

In [4]:
!curl -X POST  https://ot38.opentreeoflife.org/v3/tree_of_life/build_tree -d '{"input_collection":"snacktavish/dros", "root_id": "ott34905"}'

{"opentree_home": "/home/deploy/synthesis", "ott_dir": "/home/deploy/synthesis/ott/ott3.3", "root_ott_id": "34905", "synth_id": "snacktavish_dros_34905_tmpb8fkgd_8", "collections": "snacktavish/dros", "cleaning_flags": "major_rank_conflict,major_rank_conflict_inherited,environmental,viral,barren,not_otu,hidden,was_container,inconsistent,hybrid,merged", "additional_regrafting_flags": "extinct_inherited,extinct", "queue_order": 28, "status": "QUEUED"}

In [5]:
!curl -X GET https://ot38.opentreeoflife.org/v3/tree_of_life/list_custom_built_trees | jq 
## !curl -X GET https://ot38.opentreeoflife.org/v3/tree_of_life/list_custom_built_trees | grep -B 4 YOUR_GITHUB_ID | jq 


  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 16585  100 16585    0     0  59707      0 --:--:-- --:--:-- --:--:-- 59873
[1;39m{
  [0m[34;1m"snacktavish_minitestaves_81461_tmp1_091lel"[0m[1;39m: [0m[1;39m{
    [0m[34;1m"opentree_home"[0m[1;39m: [0m[0;32m"/home/deploy/synthesis"[0m[1;39m,
    [0m[34;1m"ott_dir"[0m[1;39m: [0m[0;32m"/home/deploy/synthesis/ott/ott3.3"[0m[1;39m,
    [0m[34;1m"root_ott_id"[0m[1;39m: [0m[0;32m"81461"[0m[1;39m,
    [0m[34;1m"synth_id"[0m[1;39m: [0m[0;32m"snacktavish_minitestaves_81461_tmp1_091lel"[0m[1;39m,
    [0m[34;1m"collections"[0m[1;39m: [0m[0;32m"snacktavish/minitestaves"[0m[1;39m,
    [0m[34;1m"cleaning_flags"[0m[1;39m: [0m[0;32m"major_rank_conflict,major_rank_conflict_inherited,environmental,viral,barren,not_otu,hidden,was_container,inconsistent,hybrid,merged"[0m[1;39m,
    [0m[34;

Find your tree.
The label will start with your GitHub user id.

You can dowload it from the listed URL using GET

```curl -X GET https://ot38.opentreeoflife.org/v3/tree_of_life/custom_built_tree/YOUR_SYNTH_ID.tar.gz --output custom_synth.tar.gz```

I like to rename my tar files to something I can remember using --output

In [6]:
!wget https://ot38.opentreeoflife.org/v3/tree_of_life/custom_built_tree/snacktavish_dros_34905_tmp5exp6dql.tar.gz

--2023-05-18 13:10:10--  https://ot38.opentreeoflife.org/v3/tree_of_life/custom_built_tree/snacktavish_dros_34905_tmp5exp6dql.tar.gz
Resolving ot38.opentreeoflife.org (ot38.opentreeoflife.org)... 129.237.33.153
Connecting to ot38.opentreeoflife.org (ot38.opentreeoflife.org)|129.237.33.153|:443... connected.
HTTP request sent, awaiting response... 404 Not Found
2023-05-18 13:10:10 ERROR 404: Not Found.



In [7]:
## Unpack the downloaded archive
!tar -xzvf snacktavish_dros_34905_tmp5exp6dql.tar.gz

snacktavish_dros_34905_tmp5exp6dql/
snacktavish_dros_34905_tmp5exp6dql/.STATUS.txt
snacktavish_dros_34905_tmp5exp6dql/assessments/
snacktavish_dros_34905_tmp5exp6dql/assessments/lost_taxa.txt
snacktavish_dros_34905_tmp5exp6dql/assessments/taxonomy_degree_distribution.txt
snacktavish_dros_34905_tmp5exp6dql/assessments/README.md
snacktavish_dros_34905_tmp5exp6dql/assessments/index.json
snacktavish_dros_34905_tmp5exp6dql/assessments/supertree_degree_distribution.txt
snacktavish_dros_34905_tmp5exp6dql/assessments/summary.json
snacktavish_dros_34905_tmp5exp6dql/assessments/index.html
snacktavish_dros_34905_tmp5exp6dql/grafted_solution/
snacktavish_dros_34905_tmp5exp6dql/grafted_solution/grafted_solution_ottnames.tre
snacktavish_dros_34905_tmp5exp6dql/grafted_solution/README.md
snacktavish_dros_34905_tmp5exp6dql/grafted_solution/grafted_solution.tre
snacktavish_dros_34905_tmp5exp6dql/grafted_solution/index.json
snacktavish_dros_34905_tmp5exp6dql/grafted_solution/index.html
sn

snacktavish_dros_34905_tmp5exp6dql/subproblems/index.html
snacktavish_dros_34905_tmp5exp6dql/subproblems/ott340392-tree-names.txt
snacktavish_dros_34905_tmp5exp6dql/subproblems/ott688699-tree-names.txt
snacktavish_dros_34905_tmp5exp6dql/subproblem_solutions/
snacktavish_dros_34905_tmp5exp6dql/subproblem_solutions/deg-dist-ott839210.txt
snacktavish_dros_34905_tmp5exp6dql/subproblem_solutions/ott115101.tre
snacktavish_dros_34905_tmp5exp6dql/subproblem_solutions/ott63105.tre
snacktavish_dros_34905_tmp5exp6dql/subproblem_solutions/ott930774.tre
snacktavish_dros_34905_tmp5exp6dql/subproblem_solutions/subproblems-scaffold.tre
snacktavish_dros_34905_tmp5exp6dql/subproblem_solutions/deg-dist-ott812297.txt
snacktavish_dros_34905_tmp5exp6dql/subproblem_solutions/ott73057.tre
snacktavish_dros_34905_tmp5exp6dql/subproblem_solutions/deg-dist-ott930774.txt
snacktavish_dros_34905_tmp5exp6dql/subproblem_solutions/deg-dist-ott237805.txt
snacktavish_dros_34905_tmp5exp6dql/subproblem_solutio

snacktavish_dros_34905_tmp5exp6dql/config.json


## Read in and label your custom synthesis tree

In [8]:
import dendropy
import copy
custom_synth_dir = "snacktavish_dros_34905_tmp5exp6dql"
treepath = "{}/labelled_supertree/labelled_supertree.tre".format(custom_synth_dir)
custom_synth = dendropy.Tree.get_from_path(treepath, schema = "newick")
original_tree = copy.deepcopy(custom_synth)

# Often you want to translate names, or link other information to the tips on your tree

You can use the csv file you downloaded with your name to id mappings, to add additional information.
We will also use that file to prune your custom synthesis tree to the set of taxa you are intreested in.

In [9]:
leaves_start = [tip.taxon.label for tip in custom_synth.leaf_node_iter()]
print("Total number of tips in synth tree is {}\n".format(len(leaves_start)))



Total number of tips in synth tree is 5411



In [19]:
#Sometimes there are subspecies, when your query is actually species.
collapsed_tax = []
for node in custom_synth:
    if not node.is_leaf():
        if node.label in label_dict:
            print(node.label)
            print(label_dict[node.label])
            collapsed_tax.append(node.label)
            node.clear_child_nodes()
            node.taxon = custom_synth.taxon_namespace.new_taxon(label=node.label)


leaves_A = [tip.taxon.label for tip in custom_synth.leaf_node_iter()]
print("Total number of tips in synth tree after collapsing queries with lower level taxa is {}\n".format(len(leaves_A)))


Total number of tips in synth tree after collapsing queries with lower level taxa is 10



In [20]:
print("Total number of query taxa is {}\n".format(len(label_dict)))

Total number of query taxa is 202



In [21]:
taxa_to_retain = list(label_dict.keys())

In [22]:
len(taxa_to_retain)

202

In [23]:
custom_synth.retain_taxa_with_labels(taxa_to_retain)
leaves_B = [tip.taxon.label for tip in custom_synth.leaf_node_iter()]

In [24]:
print("Total number of tips in synth tree after pruning to queries is {}\n".format(len(leaves_B)))


Total number of tips in synth tree after pruning to queries is 10



In [25]:
custom_synth.as_string(schema="newick")

'(((((ott975460,ott963944)mrcaott3952ott41807,ott534107)mrcaott3952ott12789,((ott607962,ott97932)mrcaott32496ott63100,ott4418979)mrcaott32496ott616209)mrcaott3952ott32496,ott930765)mrcaott3952ott26987,(ott34911,(ott660819,ott34910)ott863004)mrcaott4410ott34895)mrcaott3952ott4410;\n'

In [26]:
set(taxa_to_retain).difference(set(leaves_B))

{'ott1008043',
 'ott1022764',
 'ott1022765',
 'ott1024123',
 'ott1024511',
 'ott1031757',
 'ott1033313',
 'ott103919',
 'ott104619',
 'ott1051234',
 'ott1052390',
 'ott1060027',
 'ott1062062',
 'ott1069155',
 'ott1082270',
 'ott1082272',
 'ott1082274',
 'ott1082275',
 'ott1082277',
 'ott1082278',
 'ott1082279',
 'ott1082281',
 'ott1082282',
 'ott1082284',
 'ott1086847',
 'ott12789',
 'ott12791',
 'ott12792',
 'ott138599',
 'ott138600',
 'ott138602',
 'ott138603',
 'ott138612',
 'ott139455',
 'ott141605',
 'ott15302',
 'ott158663',
 'ott158664',
 'ott167982',
 'ott167983',
 'ott167990',
 'ott192066',
 'ott194377',
 'ott218751',
 'ott218754',
 'ott218755',
 'ott218757',
 'ott218759',
 'ott223304',
 'ott223734',
 'ott227752',
 'ott245830',
 'ott247544',
 'ott264594',
 'ott264596',
 'ott264598',
 'ott287229',
 'ott301475',
 'ott301478',
 'ott301480',
 'ott304074',
 'ott314820',
 'ott32496',
 'ott335367',
 'ott340391',
 'ott34895',
 'ott355241',
 'ott355244',
 'ott355249',
 'ott355253',
 'o

In [27]:
label_dict['ott12789'] ## What happened?? (subspp + spp)

'Drosophila texana'