# How to get an updated tree of all bird families from Open Tree of Life
In order to run these examples, you need to have installed the OpenTree package. Please see instructions at https://github.com/OpenTreeOfLife/python-opentree/blob/ms/INSTALL.md

Currently (Nov 2020) this tutorial requires a newer version of python-opentree than is available on PyPi, please follow the instuctions for a local installation.

In [1]:
from opentree import OT, taxonomy_helpers, util

To avoid typographical errors and confusing synonomies, OpenTree relies on unique identifiers to refer to taxa. These can be found by searching on the website, or 

In [2]:
aves = OT.get_ottid_from_name('Aves')

In [3]:
aves

81461

You can see more about this taxon at https://tree.opentreeoflife.org/taxonomy/browse?name=81461

## Fuzzy matching
Using get_ottid_from_name requires an exact string match - if nothing is returned for your taxon of interest, you can try approximate matches.

In [4]:
typo = OT.get_ottid_from_name('Avez')

Exact match to name Avez not found in taxonomy.
Try using `resp = OT.tnrs_match(["Avez"], do_approximate_matching=True)`

                                                   resp.response_dict 

                                        to find fuzzy matches


In [5]:
res = OT.tnrs_match(['avez'], do_approximate_matching=True)
res.response_dict

{'context': 'All life',
 'governing_code': 'undefined',
 'includes_approximate_matches': True,
 'includes_deprecated_taxa': False,
 'includes_suppressed_names': False,
 'matched_names': ['avez'],
 'results': [{'matches': [{'is_approximate_match': True,
     'is_synonym': False,
     'matched_name': 'Aves',
     'nomenclature_code': 'ICZN',
     'score': 0.75,
     'search_string': 'avez',
     'taxon': {'flags': [],
      'is_suppressed': False,
      'is_suppressed_from_synth': False,
      'name': 'Aves',
      'ott_id': 81461,
      'rank': 'class',
      'source': 'ott3.2draft9',
      'synonyms': ['avian', 'Lophorus', 'Lepturus', 'Phyllomanes'],
      'tax_sources': ['ncbi:8782', 'worms:1836', 'gbif:212', 'irmng:1142'],
      'unique_name': 'Aves'}}],
   'name': 'avez'}],
 'taxonomy': {'author': 'open tree of life project',
  'name': 'ott',
  'source': 'ott3.2draft9',
  'version': '3.2',
  'weburl': 'https://tree.opentreeoflife.org/about/taxonomy-version/ott3.2'},
 'unambiguous_na

# Getting a list of taxa in a group by rank
While the OpenTree taxonomy is not rank focused, it does track rank information from component taxonomies, which we can use to capture families. The fastest way to access these data is to download the OpenTree taxonomy directly.

In [8]:
# You can download the taxonomy by going to https://tree.opentreeoflife.org/about/taxonomy-version/ott3.2
# or by running these commands:
# you can set 'loc' to wherever you wany to store the taxonomy files.
taxonomy_helpers.download_taxonomy_file(version = 3.2, loc = '../..')

'../../ott3.2'

In [9]:
bird_families = taxonomy_helpers.get_ott_ids_group_and_rank(group_ott_id=aves, 
                                                            rank='family', 
                                                            taxonomy_file='../../ott3.2/taxonomy.tsv')
# By default this query prunes taxa that are not included in synth
# (usually bc they are extinct and we have no phylogentic input information)
# To get a list of taxa including those excluded from synth, run the same command with synth_only = False
#e.g.
# bird_families = taxonomy_helpers.get_ott_ids_group_and_rank(group_ott_id=aves, 
#                                                            rank='family', 
#                                                            synth_only = False,
#                                                            taxonomy_file='../../ott3.2/taxonomy.tsv')

Gathering ott ids from group with ott id 81461.


In [10]:
len(bird_families)

196

# Requesting a  tree
To get the relationships between these taxa we request a labelled induced synth tree

In [12]:
ret = taxonomy_helpers.labelled_induced_synth(ott_ids = bird_families, label_format="name_and_id")


This return value packages togther a bunch of information in a dictionary. We can see what the keys are:

In [14]:
ret.keys()

dict_keys(['labelled_tree', 'original_tree', 'unknown_ids', 'non-monophyletic_taxa', 'supporting_studies', 'label_map'])

The labelled tree is the central output. It is a dendropy Tree object.

In [15]:
ret['labelled_tree'].print_plot()


                       /---++ MRCA of taxa in Sittidae_ott603925               
                       |                                                       
                      /+ /--- MRCA of taxa in Turdidae_ott96286                
                      ||/+                                                     
                      |\+\+++ Rhabdornithidae ott814752                        
                 /----+ |                                                      
                 |    | \---- Cinclidae ott496027                              
                 |    |                                                        
                 |    |   /++ Mohoidae ott661149                               
                 |    \---+                                                    
               /-+        \-+ Hypocolidae ott294593                            
               | |                                                             
               | | /--------+ Dicaeidae 

In [12]:
ret['labelled_tree'].write(path="labelled_bird_families.tre", schema="newick")

But lets dig in a bit deeper. Many of these tips are named 'MRCA of taxa in' a family. Those are tips that represent familes that are not-monophyletic, according to our phylogenetic inputs. These are seomtimes called 'broken' taxa. Information about them is returned under the key 'non-monophyletic_taxa'.

In [13]:
len(ret['non-monophyletic_taxa'])

65

We can get more info on what is going on with these taxa by looking at that dictionary, e.g. for Sittidiae https://tree.opentreeoflife.org/taxonomy/browse?id=603925

In [14]:
ret['non-monophyletic_taxa']['603925']

{'flags': ['sibling_higher'],
 'is_suppressed': False,
 'is_suppressed_from_synth': False,
 'name': 'Sittidae',
 'ott_id': 603925,
 'rank': 'family',
 'source': 'ott3.2draft9',
 'synonyms': ['Sitella'],
 'tax_sources': ['ncbi:50247', 'gbif:5283', 'irmng:105095'],
 'unique_name': 'Sittidae',
 'tax_url': 'https://tree.opentreeoflife.org/taxonomy/browse?id=603925',
 'synth_url': 'https://tree.opentreeoflife.org/opentree/argus/ottol@603925',
 'MRCA_location_in_synth': 'mrcaott25638ott452744',
 'broken_taxa_mapping_to_same_node': ['Sittidae_ott603925'],
 'is_tip': True}

Any taxa that are are 'broken' have at least one study in the corpus that states that the mebers of that taxon are non-monophyletic.  We can interactively view what published papers 'broke' Sittidae  https://tree.opentreeoflife.org/opentree/argus/ottol@603925

In [15]:
#We can also accessthat information directly in python
resp = OT.synth_subtree('ott603925').response_dict['broken']['contesting_trees'].keys()
cites = OT.get_citations(resp)
print(cites)

https://tree.opentreeoflife.org/curator/study/view/ot_290?tab=trees&tree=tree1
Selvatti, Alexandre Pedro, Luiz Pedreira Gonzaga, Claudia Augusta de Moraes Russo. 2015. A Paleogene origin for crown passerines and the diversification of the Oscines in the New World. Molecular Phylogenetics and Evolution 88: 1-15.
http://dx.doi.org/10.1016/j.ympev.2015.03.018

https://tree.opentreeoflife.org/curator/study/view/ot_521?tab=trees&tree=tree1
Burleigh, J. Gordon, Rebecca T. Kimball, Edward L. Braun. 2015. Building the avian tree of life using a large-scale, sparse supermatrix. Molecular Phylogenetics and Evolution 84: 53-63
http://dx.doi.org/10.1016/j.ympev.2014.12.003

https://tree.opentreeoflife.org/curator/study/view/ot_809?tab=trees&tree=tree2
Jetz, W., G. H. Thomas, J. B. Joy, K. Hartmann, A. O. Mooers. 2012. The global diversity of birds in space and time. Nature 491 (7424): 444-448
http://dx.doi.org/10.1038/nature11631



The call also returns a dictionary of any ids that weren't found the the current tree. However, in this case that dictionary is empty, as all search ID's were found.

In [16]:
ret['unknown_ids']

{}

But we should have the rest of our 194 bird families!

In [17]:
tips = [leaf.taxon.label for leaf in ret['labelled_tree'].leaf_node_iter()]

In [18]:
len(tips) 

150

# Why are we still missing families?

Some of the non-monophyletic taxa map to internal nodes on our output tree. Input phylogenies are telling us that these 'families' are paraphyletic with respect to other families.

In [19]:
internal_node_fams = []
for tax in ret['non-monophyletic_taxa']:
    if ret['non-monophyletic_taxa'][tax]['is_tip'] == False:
        internal_node_fams.append(ret['non-monophyletic_taxa'][tax]['name'])

In [20]:
len(internal_node_fams)

45

In [21]:
print(internal_node_fams)

['Passeridae', 'Dicruridae', 'Psittacidae', 'Cotingidae', 'Meliphagidae', 'Muscicapidae', 'Pycnonotidae', 'Zosteropidae', 'Tyrannidae', 'Corvidae', 'Cuculidae', 'Sylviidae', 'Batrachostomatidae', 'Cisticolidae', 'Charadriidae', 'Locustellidae', 'Acanthizidae', 'Mimidae', 'Rallidae', 'Nectariniidae', 'Certhiidae', 'Thraupidae', 'Parulidae', 'Ptilonorhynchidae', 'Rhinocryptidae', 'Formicariidae', 'Ramphastidae', 'Paridae', 'Timaliidae', 'Sturnidae', 'Psittaculidae', 'Furnariidae', 'Bombycillidae', 'Aegithalidae', 'Megapodiidae', 'Eurylaimidae', 'Campephagidae', 'Phasianidae', 'Prionopidae', 'Fringillidae', 'Cinclosomatidae', 'Laridae', 'Alcedinidae', 'Columbidae', 'Glareolidae']


We have a much more accurate tree than taxonomy would have geiven us, thanks to 64 published studies informating the relationships!!

In [22]:
len(ret['supporting_studies'])

64

In [23]:
print(OT.get_citations(ret['supporting_studies']))

https://tree.opentreeoflife.org/curator/study/view/ot_816?tab=trees&tree=tree1
Gibson, Rosemary, Allan Baker. 2012. Multiple gene sequences resolve phylogenetic relationships in the shorebird suborder Scolopaci (Aves: Charadriiformes). Molecular Phylogenetics and Evolution 64 (1): 66-72
http://dx.doi.org/10.1016/j.ympev.2012.03.008

https://tree.opentreeoflife.org/curator/study/view/ot_425?tab=trees&tree=tree1
Stein, R. Will, Joseph W. Brown, Arne Ø. Mooers. 2015. A molecular genetic time scale demonstrates Cretaceous origins and multiple diversification rate shifts within the order Galliformes (Aves). Molecular Phylogenetics and Evolution 92: 155-164
http://dx.doi.org/10.1016/j.ympev.2015.06.005

https://tree.opentreeoflife.org/curator/study/view/ot_809?tab=trees&tree=tree2
Jetz, W., G. H. Thomas, J. B. Joy, K. Hartmann, A. O. Mooers. 2012. The global diversity of birds in space and time. Nature 491 (7424): 444-448
http://dx.doi.org/10.1038/nature11631

https://tree.opentreeoflife.org