# Welcome to HAM Tool Box!

This ipython notebook aims to help you to do your first steps with HAM (or as a spreadsheet for HAM). 

In this tutorial we will explain how to: 
* [set up](#set) an HAM analysis using the different availble options.
* use the different [queries](#query) that allow to fetch the information you want.
* explore what the ham [objects](#object) have to offer.
* [compare](#compare) several genomes throught their HOGS.
* run [iHam](#hvis) (single hog visualisation)
* run [treeprofile](#tprofile) (species tree with annotated node with evolutionary events)



#### <a id='set'></a>
## SET UP THE HAM

### Required import

In [1]:
#  This is the HAM package
import pyham

#  OPTIONAL: only if you want to have the logger information printed
import logging
logging.basicConfig(level=logging.INFO, format="%(asctime)s %(name)-12s %(levelname)-8s %(message)s")

### First, we need a species tree and an orthoXML !

In this example, we show how to initialise a pyham instance with a newick tree but you can also do it with a phyloxml tree. Here are the different tree input compatible with pyHam:
- **newick string (default)**  ham_analysis = pyham.Ham(tree_str, orthoxml_path, use_internal_name=True)
- **newick file**  ham_analysis = pyham.Ham(newick_path, orthoxml_path, use_internal_name=True, tree_format='newick')
- **phyloxml file**  ham_analysis = pyham.Ham(phyloxml_path, orthoxml_path, use_internal_name=True, tree_format='phyloxml')


In [2]:

##### This is just to get the example file from your pyham github ####################################
import os, inspect, urllib 
import sys
if sys.version_info[0] == 3:
    from urllib.request import urlretrieve
else:
    from urllib import urlretrieve

# download species tree and orthoxml
tree_url = "https://raw.githubusercontent.com/DessimozLab/pyham/master/tests/data/simpleEx.nwk"
urlretrieve(tree_url , "./simpleEx.nwk")

OXML_url = "https://raw.githubusercontent.com/DessimozLab/pyham/master/tests/data/simpleEx.orthoxml"
urlretrieve(OXML_url , "./simpleEx.orthoxml")
#####################################################################################################

#  Select a nwk file as a taxonomy reference
nwk_path = "./simpleEx.nwk"
#  And extract the newick tree as a string
tree_str = pyham.utils.get_newick_string(nwk_path, type="nwk")

# Then you select your favorite orthoXML file
orthoxml_path =  "./simpleEx.orthoxml"


### Let's feed HAM with some HOGs and a tree.

In [3]:
# pyham.Ham is the main object that containes all information and functionalities.
ham_analysis = pyham.Ham(tree_str, orthoxml_path, use_internal_name=True)


2018-11-02 22:50:51,061 pyham.ham    INFO     Build taxonomy: completed.
2018-11-02 22:50:51,065 pyham.parsers INFO     Species HUMAN created. 
2018-11-02 22:50:51,067 pyham.parsers INFO     Species PANTR created. 
2018-11-02 22:50:51,069 pyham.parsers INFO     Species CANFA created. 
2018-11-02 22:50:51,071 pyham.parsers INFO     Species MOUSE created. 
2018-11-02 22:50:51,073 pyham.parsers INFO     Species RATNO created. 
2018-11-02 22:50:51,075 pyham.parsers INFO     Species XENTR created. 
2018-11-02 22:50:51,082 pyham.ham    INFO     Parse Orthoxml: 3 top level hogs and 19 extant genes extract.
2018-11-02 22:50:51,084 pyham.ham    INFO     Set up Ham analysis: ready to go with 3 hogs founded within 6 species.


### but we can also get the data from a public database !!

In [4]:
my_gene_query = 'P53_RAT'
pyham_analysis = pyham.Ham(query_database=my_gene_query, use_data_from='oma')

2018-11-02 22:50:53,571 pyham.ham    INFO     Build taxonomy: completed.
2018-11-02 22:50:53,593 pyham.parsers INFO     Species Ictidomys tridecemlineatus created. 
2018-11-02 22:50:53,615 pyham.parsers INFO     Species Sorex araneus created. 
2018-11-02 22:50:53,637 pyham.parsers INFO     Species Ochotona princeps created. 
2018-11-02 22:50:53,658 pyham.parsers INFO     Species Xiphophorus maculatus created. 
2018-11-02 22:50:53,686 pyham.parsers INFO     Species Ciona savignyi created. 
2018-11-02 22:50:53,712 pyham.parsers INFO     Species Astyanax mexicanus created. 
2018-11-02 22:50:53,747 pyham.parsers INFO     Species Echinops telfairi created. 
2018-11-02 22:50:53,773 pyham.parsers INFO     Species Danio rerio created. 
2018-11-02 22:50:53,795 pyham.parsers INFO     Species Homo sapiens created. 
2018-11-02 22:50:53,820 pyham.parsers INFO     Species Oryctolagus cuniculus created. 
2018-11-02 22:50:53,845 pyham.parsers INFO     Species Poecilia formosa created. 
2018-11-02 22:5

### Alternative options: filtering and taxonomic range naming!

#### Taxonomic range naming
If you don't want to use the internal names defined by the newick specie tree (or if it's have support values as internal names) you can use the automatic internal node naming of HAM (i.e. concatenation of children names).


In [5]:
# the use_internal_name attribute is doing the job
ham_analysis_no_name = pyham.Ham(tree_str, orthoxml_path, use_internal_name=False)

# In the previous section we use the internal name of the newick tree:
print("Ancestral genomes name using newick internal names:")
for ag in ham_analysis.taxonomy.internal_nodes:
    print("\t- {}".format(ag.name))
    

2018-11-02 22:50:55,836 pyham.ham    INFO     Build taxonomy: completed.
2018-11-02 22:50:55,839 pyham.parsers INFO     Species HUMAN created. 
2018-11-02 22:50:55,841 pyham.parsers INFO     Species PANTR created. 
2018-11-02 22:50:55,843 pyham.parsers INFO     Species CANFA created. 
2018-11-02 22:50:55,845 pyham.parsers INFO     Species MOUSE created. 
2018-11-02 22:50:55,847 pyham.parsers INFO     Species RATNO created. 
2018-11-02 22:50:55,850 pyham.parsers INFO     Species XENTR created. 
2018-11-02 22:50:55,855 pyham.ham    INFO     Parse Orthoxml: 3 top level hogs and 19 extant genes extract.
2018-11-02 22:50:55,857 pyham.ham    INFO     Set up Ham analysis: ready to go with 3 hogs founded within 6 species.


Ancestral genomes name using newick internal names:
	- Primates
	- Vertebrata
	- Rodents
	- Mammalia
	- Euarchontoglires


In [6]:
# Here we use the artificial internal names built by pyham:
print("Ancestral genomes name using artificial ham names:")
for ag in ham_analysis_no_name.taxonomy.internal_nodes:
    print("\t- {}".format(ag.name))
    

Ancestral genomes name using artificial ham names:
	- XENTR/HUMAN/PANTR/MOUSE/RATNO/CANFA
	- HUMAN/PANTR
	- HUMAN/PANTR/MOUSE/RATNO/CANFA
	- HUMAN/PANTR/MOUSE/RATNO
	- MOUSE/RATNO


In [7]:
#  In case you are using use_internal_name=False, the following function returns an ascii representation
#  of the newick tree with  articial names. This can be used even before instanciating the HAM object which
#  can be useful to avoid stupid error of query by name in scripts.
print(pyham.utils.previsualize_taxonomy(tree_str))



                                   /-XENTR
                                  |
                                  |                                                               /-HUMAN
-XENTR/HUMAN/PANTR/MOUSE/RATNO/CANFA                                                   /HUMAN/PANTR
                                  |                                                   |           \-PANTR
                                  |                             /HUMAN/PANTR/MOUSE/RATNO
                                  |                            |                      |           /-MOUSE
                                   \HUMAN/PANTR/MOUSE/RATNO/CANFA                      \MOUSE/RATNO
                                                               |                                  \-RATNO
                                                               |
                                                                \-CANFA


#### Pre Parsing filtering
If you have a large orthoXML file or you only want parse information of interest you can use the filter option.

In [8]:
# First you instanciate an empty ParserFilter object
filter_ham = pyham.ParserFilter()

# Then you can add to the filter hogs of interest using the following function (those 3 examples are doing the same)
filter_ham.add_hogs_via_hogId([2]) # by its toplevel hog id 
filter_ham.add_hogs_via_GeneIntId([2]) # by a gene unique id that belong to the hog of interest 
filter_ham.add_hogs_via_GeneExtId(["HUMAN2"]) # by a gene external id that belongs to the hog of interest 

# Finaly, you set up the HAM object as before using the filter_object attribute
ham_analysis_filter = pyham.Ham(tree_str, orthoxml_path, use_internal_name=True, filter_object=filter_ham)

# If you look at the log information beneath it is different from the one previously printed !!

2018-11-02 22:50:55,924 pyham.ham    INFO     Build taxonomy: completed.
2018-11-02 22:50:55,931 pyham.ham    INFO     Filtering Indexing of Orthoxml done: 1 top level hogs and 4 extant genes will be extract.
2018-11-02 22:50:55,934 pyham.parsers INFO     Species HUMAN created. 
2018-11-02 22:50:55,936 pyham.parsers INFO     Species PANTR created. 
2018-11-02 22:50:55,939 pyham.parsers INFO     Species CANFA created. 
2018-11-02 22:50:55,942 pyham.parsers INFO     Species MOUSE created. 
2018-11-02 22:50:55,944 pyham.parsers INFO     Species RATNO created. 
2018-11-02 22:50:55,948 pyham.parsers INFO     Species XENTR created. 
2018-11-02 22:50:55,952 pyham.ham    INFO     Parse Orthoxml: 1 top level hogs and 4 extant genes extract.
2018-11-02 22:50:55,954 pyham.ham    INFO     Set up Ham analysis: ready to go with 1 hogs founded within 6 species.


<a id='query'></a>
## QUERIES: HOW TO TALK WITH HAM ?

We can separate queries into the following categories:
*  [ExtantGene](#xgeneq)
*  [HOG](#hogq)
*  [ExtantGenome](#xgenomeq)
*  [AncestralGenome](#ancgenomeq)
*  [Taxon](#taxq)

<a id='xgeneq'></a>
### ExtantGene queries

ExtantGene are the object to represent genes at the leaves.

In [9]:
# Get a gene by its unique (orthoxml) id
gene_human3 = ham_analysis.get_gene_by_id(3)
print(gene_human3)

# Get a list of genes that match an external id
potentential_genes_human3 = ham_analysis.get_genes_by_external_id("HUMAN3")
print(potentential_genes_human3[0])

## You should see twice the same gene printed below!

# You can also get all genes created as a list
list_genes = ham_analysis.get_list_extant_genes()

# or as the dictionnary (key <-> unique id and value <-> gene)
dict_genes = ham_analysis.get_dict_extant_genes()

Gene(3)
Gene(3)


<a id='hogq'></a>
### HOG queries

In [10]:
# Get a HOG by its top level unique id
HOG_3 = ham_analysis.get_hog_by_id(3)
print(HOG_3)

HOG_3 = ham_analysis.get_hog_by_gene(gene_human3)
print(HOG_3)

## You should see twice the same HOG printed beneath !


<HOG(3)>
<HOG(3)>


In [11]:

# You can also get all genes created as a list
list_toplevel_hogs = ham_analysis.get_list_top_level_hogs()

# or as the dictionnary (key <->  top level id and value <-> HOG)
dict_toplevel_hogs = ham_analysis.get_dict_top_level_hogs()

<a id='xgenomeq'></a>
### ExtantGenome queries

In [12]:
# Get a extant genome by its name
genome_human = ham_analysis.get_extant_genome_by_name("HUMAN")
print(genome_human)


HUMAN


In [13]:

# You can also get all extant genomes created as a list
list_genome = ham_analysis.get_list_extant_genomes()
print("List of species:")
for g in list_genome:
    print("\t- {}".format(g.name))

List of species:
	- XENTR
	- HUMAN
	- PANTR
	- MOUSE
	- RATNO
	- CANFA


<a id='ancgenomeq'></a>
### AncestralGenome queries

AncestralGenome are the object to represent the genome associate to internal node in the species tree.

In [14]:
# Get an ancestral genome by its name
genome_rodents_1 = ham_analysis.get_ancestral_genome_by_name("Rodents")
print(genome_rodents_1)

# And in case you specified use_internal_name=False ! 
genome_rodents_2 = ham_analysis_no_name.get_ancestral_genome_by_name("MOUSE/RATNO")
print(genome_rodents_2)

Rodents
MOUSE/RATNO


In [15]:
# Get an ancestral genome by looking at the mrca of 2+ genomes
# First you get the descendant genomes as we seen above
genome_rat = ham_analysis.get_extant_genome_by_name("RATNO")
genome_mouse = ham_analysis.get_extant_genome_by_name("MOUSE")
# Then you get the corresponding mrca ancestral genomes 
genome_rodents_3 = ham_analysis.get_ancestral_genome_by_mrca_of_genome_set({genome_rat, genome_mouse})
print(genome_rodents_3)

Rodents


In [16]:
# You can also get an ancestral genome by its taxonomy node
taxon = ham_analysis.get_taxon_by_name("Rodents") # as seen in the next section
genome_rodents_4 = ham_analysis.get_ancestral_genome_by_taxon(taxon)
print(genome_rodents_4)

Rodents


In [17]:
# You can also get all ancestral genomes created as a list
list_genome = ham_analysis.get_list_ancestral_genomes()
print("\n List of ancestral genomes:")
for g in list_genome:
    print("\t- {}".format(g.name))


 List of ancestral genomes:
	- Primates
	- Vertebrata
	- Rodents
	- Mammalia
	- Euarchontoglires


In [18]:
# You can also get all ancestral genomes created as a list
list_genome = ham_analysis_filter.get_list_ancestral_genomes()
print("\n Here for the filter version of HAM, the list of ancestral genomes:")
for g in list_genome:
    print("\t- {}".format(g.name))
    
print("'Vertebrate' is not present in the filter list because not required during the parsing")


 Here for the filter version of HAM, the list of ancestral genomes:
	- Mammalia
	- Primates
	- Euarchontoglires
	- Rodents
'Vertebrate' is not present in the filter list because not required during the parsing


<a id='taxq'></a>
### Taxon queries

In [19]:
# You can get an Taxonomy node by its name
taxon = ham_analysis.get_taxon_by_name("Rodents") 
print(taxon.name)

taxon = ham_analysis.get_taxon_by_name("HUMAN") 
print(taxon.name)

Rodents
HUMAN


<a id='object'></a>
## HAM OBJECT: SHOW ME YOUR SECRETS?

Let's have a look at each HAM object !
*  [Gene/HOG](#gdef)
*  [Genome](#genomedef)
*  [Taxonomy](#taxonomydef)


<a id='gdef'></a>
### Gene/HOG

#### AbstractGene (Both ExtantGene and HOG)

In [20]:
# we select one extant gene and one hog for our demo
print("Demo| gene/hog: ")
gene2 = ham_analysis.get_gene_by_id(2)
print("\t - gene 2: ", gene2)
hog1 = ham_analysis.get_hog_by_id(1)
print("\t - hog 1: ", hog1)

Demo| gene/hog: 
('\t - gene 2: ', Gene(2))
('\t - hog 1: ', <HOG(1)>)


###### The cool shared features between both of them are:

In [21]:
# that we can fetch the genome where they belong to
print("Demo| gene/hog genome: ")
print("\t - gene 2: ", gene2.genome)
print("\t - hog 1: ", hog1.genome)
print("\n")

Demo| gene/hog genome: 
('\t - gene 2: ', <pyham.genome.ExtantGenome object at 0x11203f110>)
('\t - hog 1: ', <pyham.genome.AncestralGenome object at 0x11203f910>)




In [22]:
# that we can get the related top level hog
print("Demo| gene/hog related top level hog: ")
print("\t - gene 2: ", gene2.get_top_level_hog())
print("\t - hog 1: {} (it  returns itself it already top level hog) ".format(hog1.get_top_level_hog()))
print("\n")

Demo| gene/hog related top level hog: 
('\t - gene 2: ', <HOG(2)>)
	 - hog 1: <HOG(1)> (it  returns itself it already top level hog) 




In [23]:
# that we can get the related hogs at a specific level 
rodents = ham_analysis.get_ancestral_genome_by_name("Rodents")
print("Demo| gene/hog related top level hog: ")
print("\t - gene 2: ", gene2.get_at_level(rodents))
print("\t - hog 1: ", hog1.get_at_level(rodents))
print("/!\ THIS CAN RETURN MORE THAN ONE HOG /!\ ")
hog3 = ham_analysis.get_hog_by_id(3)
print("\t - hog 3: ", hog3.get_at_level(rodents))
print("\n")

Demo| gene/hog related top level hog: 
('\t - gene 2: ', [<HOG()>])
('\t - hog 1: ', [<HOG(1.M.E.R)>])
/!\ THIS CAN RETURN MORE THAN ONE HOG /!\ 
('\t - hog 3: ', [<HOG()>, <HOG()>])




In [24]:
# Singleton -> a gene present in a species of the orthoxml but that doesn't belong to any hog.

# that we can know if the abstractGene is a singleton or not
human_5 = ham_analysis.get_gene_by_id("5")
print("Demo| gene/hog singletons ? : ")
print("\t - gene 2: {}".format(gene2.is_singleton()))
print("\t - gene 5: {}".format(human_5.is_singleton()))
print("\t - HOG 1: {}".format(hog1.is_singleton()))
print("\n")

Demo| gene/hog singletons ? : 
	 - gene 2: False
	 - gene 5: True
	 - HOG 1: False




In [25]:
# that we can get the ancestral HOG of an abstractGene at a specific level. In addition a boolean specify if a duplication occured in between the two levels.
mamm = ham_analysis.get_ancestral_genome_by_name("Mammalia")
hog3_rodents = hog3.get_at_level(rodents)[0]

print("Demo| gene ancestor hog at Mammalia level: ")
print("\t - gene 2: {}".format(gene2.search_ancestor_hog_in_ancestral_genome(mamm)))
print("\t - hog 3 at rodents: {}".format(hog3_rodents.search_ancestor_hog_in_ancestral_genome(mamm)))

Demo| gene ancestor hog at Mammalia level: 
	 - gene 2: (<HOG(2)>, False)
	 - hog 3 at rodents: (<HOG(3.M)>, True)


#### ExtantGene

In [26]:
# You can get a dictionary with all the cross references for a gene
gene2 = ham_analysis.get_gene_by_id(2)
print(gene2.get_dict_xref())

{u'protId': 'HUMAN2', u'id': '2', u'geneId': 'HUMANg2'}


#### HOG

###### The cool feature in for HOG are that you can:

In [27]:
# demo hog
hog1 = ham_analysis.get_hog_by_id(1)
hog3 = ham_analysis.get_hog_by_id(3)

In [28]:
# get all descendant genes
desc_genes = hog1.get_all_descendant_genes()
print(desc_genes)

[Gene(51), Gene(21), Gene(1), Gene(11), Gene(31), Gene(41)]


In [29]:
# get all descendant genes clustered by species
desc_genes_clustered = hog3.get_all_descendant_genes_clustered_by_species()
for species, genes in desc_genes_clustered.items():
    print(species, genes)

(<pyham.genome.ExtantGenome object at 0x11203f110>, [Gene(3)])
(<pyham.genome.ExtantGenome object at 0x11203f750>, [Gene(33), Gene(34)])
(<pyham.genome.ExtantGenome object at 0x11203f850>, [Gene(53)])
(<pyham.genome.ExtantGenome object at 0x11203f3d0>, [Gene(13), Gene(14)])
(<pyham.genome.ExtantGenome object at 0x11203f550>, [Gene(23)])


In [30]:
# get all descendant level (internal node ancestral genomes)
desc_level = hog1.get_all_descendant_hog_levels()
for genome in desc_level:
    print(genome.name)

Vertebrata
Mammalia
Euarchontoglires
Primates
Rodents


In [31]:
# get all descendant hogs
desc_hog = hog1.get_all_descendant_hogs()
print(desc_hog)

[<HOG(1)>, <HOG(1.M)>, <HOG(1.M.E)>, <HOG(1.M.E.P)>, <HOG(1.M.E.R)>]


In [32]:
# visit the hog with prefix, postfix, leaf callback function
# (reading the docs at pyham.abstract.HOG.visit() is required here) 

# -----------------------------------------------------------#
# -- This is an example to apply a function to each leaves --#
# -----------------------------------------------------------#
def print_and_append_leaf(current, child, list):
    list.append(child)
    print(child)
    return list

passed_object = []
return_object = hog1.visit(passed_object, function_extant_gene=print_and_append_leaf)

print("The return object (list of leaves): {}".format(return_object))

Gene(51)
Gene(21)
Gene(1)
Gene(11)
Gene(31)
Gene(41)
The return object (list of leaves): [Gene(51), Gene(21), Gene(1), Gene(11), Gene(31), Gene(41)]


In [33]:
# ------------------------------------------------------------------#
# -- This is an example to apply a function to each node (prefix) --#
# ------------------------------------------------------------------#
def print_and_append_node(current, list):
    list.append(current)
    print(current)
    return list

passed_object = []
return_object = hog1.visit(passed_object, function_prefix=print_and_append_node)

print("The return object (list of internale nodes): {}".format(return_object))

<HOG(1)>
<HOG(1.M)>
<HOG(1.M.E)>
<HOG(1.M.E.P)>
<HOG(1.M.E.R)>
The return object (list of internale nodes): [<HOG(1)>, <HOG(1.M)>, <HOG(1.M.E)>, <HOG(1.M.E.P)>, <HOG(1.M.E.R)>]


In [34]:
# -------------------------------------------------------------------------------------------#
# -- This is an example to apply a function to each node after the recursive call (prefix) --#
# -------------------------------------------------------------------------------------------#
def print_and_append_node(self, child, elem):
    elem.append(child)
    print(child)
    return elem

passed_object = []
return_object = hog1.visit(passed_object, function_postfix=print_and_append_node)

print("The return object (list of internale nodes child): {}".format(return_object))

<HOG(1.M.E.P)>
<HOG(1.M.E.R)>
<HOG(1.M.E)>
<HOG(1.M)>
The return object (list of internale nodes child): [<HOG(1.M.E.P)>, <HOG(1.M.E.R)>, <HOG(1.M.E)>, <HOG(1.M)>]


<a id='genomedef'></a>
### Genome


In [35]:
human_genome = ham_analysis.get_extant_genome_by_name("HUMAN")
rodents_genome = ham_analysis.get_ancestral_genome_by_name("Rodents")

# You can get the name
print("Get genome name:")
print("\t -{}".format(human_genome.name))
print("\t -{}".format(rodents_genome.name))
print("\n")

Get genome name:
	 -HUMAN
	 -Rodents




In [36]:
# You can get the related node in the taxonomy
print("Get genome taxon:")
print(human_genome.taxon)
print(rodents_genome.taxon)
print("\n")

Get genome taxon:

--HUMAN

   /-MOUSE
--|
   \-RATNO




In [37]:
# You can get the list of genes associated to this genomes
print("Get genome genes:")
print(human_genome.genes)
print(rodents_genome.genes)
print("\n")

Get genome genes:
[Gene(1), Gene(2), Gene(3), Gene(5)]
[<HOG(1.M.E.R)>, <HOG()>, <HOG()>, <HOG()>]




In [38]:
# You can get the number of genes associated to this genomes
print("Get genome genes number:")
print(human_genome.get_number_genes(singleton=True))
print(human_genome.get_number_genes(singleton=False)) # Here we are not counting the singletons as species genes !
print(rodents_genome.get_number_genes())
print("\n")

Get genome genes number:
4
3
4




In [39]:
for h, gs in rodents.get_ancestral_clustering().items():
    print("HOG: {} -> genes: {}".format(h,gs))

HOG: <HOG(1.M.E.R)> -> genes: [Gene(31), Gene(41)]
HOG: <HOG()> -> genes: [Gene(33)]
HOG: <HOG()> -> genes: [Gene(34)]
HOG: <HOG()> -> genes: [Gene(32)]


<a id='taxonomydef'></a>

### Taxonomy

The pyham.taxonomy.Taxonomy object contains all informations about the species tree structure.

In [40]:
# it contains all the leaves nodes:
print(ham_analysis.taxonomy.leaves)
print("\n")

set([Tree node 'XENTR' (0x112031c5), Tree node 'HUMAN' (0x11203f09), Tree node 'PANTR' (0x11203f0d), Tree node 'MOUSE' (0x11203f15), Tree node 'RATNO' (0x11203f19), Tree node 'CANFA' (0x11203f1d)])




In [41]:
# and the internal node:
print(ham_analysis.taxonomy.internal_nodes)
print("\n")

set([Tree node 'Primates' (0x112031f1), Tree node 'Vertebrata' (0x105be875), Tree node 'Rodents' (0x112031bd), Tree node 'Mammalia' (0x105be861), Tree node 'Euarchontoglires' (0x112031e9)])




In [42]:
# The main interest of this object is the taxonomy.tree object.
tree = ham_analysis.taxonomy.tree

# Since it's an ete3.Etree tree is contains all the built-in functionalities
print("GenomeName leaf? root?")
print("-----------------------")
for node in ham_analysis.taxonomy.tree.traverse(): # .traverse() is an ete3.etree method.
    print(node.name, node.is_leaf(), node.is_root())

GenomeName leaf? root?
-----------------------
('Vertebrata', False, True)
('XENTR', True, False)
('Mammalia', False, False)
('Euarchontoglires', False, False)
('CANFA', True, False)
('Primates', False, False)
('Rodents', False, False)
('HUMAN', True, False)
('PANTR', True, False)
('MOUSE', True, False)
('RATNO', True, False)


<a id='compare'></a>
## COMPARE SEVERAL GENOMES

In HAM, you can compare genomes based on the evolutionary history of their genes. 

### Vertical comparison
For example you can compare the human genome with its ancestor at the level of Vertebrates. This means that you will investigate on how the ancestral genes (HOGs) in the Vertebrates ancestral genome have evolved (did they stay single copy, have duplicated or been lost, or did they evolve some new genes?) to gave rise to their related descendants in the human genome.

The comparison is not restricted to a extant genome and its ancestral genome, you could also compare 2 ancestral genomea that are on the same lineage (e.g the rodents with the mammals).

As show in the following example:

In [43]:
# Get the genome of interest
human = ham_analysis.get_extant_genome_by_name("HUMAN")
vertebrates = ham_analysis.get_ancestral_genome_by_name("Vertebrata")

# Instanciate the gene mapping !
vertical_human_vertebrates = ham_analysis.compare_genomes_vertically(human, vertebrates) # The order doesn't matter!



Ask the mapper about the evolutionary history between the human and the vertebrates genes.

In [44]:
# The identical genes (that stay single copies) 
print("HOG at vertebrates -> descendant gene in human")
print(vertical_human_vertebrates.get_retained())
print("\n")

HOG at vertebrates -> descendant gene in human
{<HOG(1)>: Gene(1)}




In [45]:
# The duplicated genes (that have duplicated) 
print("HOG at vertebrates -> list of descendants gene in human")
print(vertical_human_vertebrates.get_duplicated())
print("\n")

HOG at vertebrates -> list of descendants gene in human
{<HOG(3)>: [Gene(3)]}




In [46]:
# The gained genes (that emerged in between)
print("List of human gene")
print(vertical_human_vertebrates.get_gained())
print("\n")

List of human gene
[Gene(5), Gene(2)]




In [47]:
# The lost genes (that been lost in between) 
print("HOG at vertebrates that are lost")
print(vertical_human_vertebrates.get_lost())
print("\n")

HOG at vertebrates that are lost
set([])




### Lateral comparison
An other example could be to compare the rodents ancestral genome vs the primates ancestral genome through their common ancestor at the level of Euarchontoglires. Similar to the last paragraph you are investigating on how the ancestral Euarchontoglires genes gave rise to their related descendant ancestrals genomes in both primates and rodents.

In [48]:
# Get the genome of interest
human = ham_analysis.get_extant_genome_by_name("HUMAN")
mouse = ham_analysis.get_extant_genome_by_name("RATNO")

# Instanciate the gene mapping !
lateral_human_mouse = ham_analysis.compare_genomes_lateral(human, mouse) # The order doesn't matter!


 Ask the mapper about the genes evolutionary history between the human and the rat with and the Euarchontoglires (their mrca).

In [49]:
# The identical genes (that stay single copies) 
print("IDENTICAL GENES")
for hogs, dict_genome_gene in lateral_human_mouse.get_retained().items():
    print("\t- HOG at Euarchontoglires {} is the ancestor of: ".format(hogs))
    for g, gene in dict_genome_gene.items():
        print("\t\t-  {} in {}".format(gene, g))
print("\n")

IDENTICAL GENES
	- HOG at Euarchontoglires <HOG(2.E)> is the ancestor of: 
		-  Gene(2) in HUMAN
	- HOG at Euarchontoglires <HOG(1.M.E)> is the ancestor of: 
		-  Gene(41) in RATNO
		-  Gene(1) in HUMAN
	- HOG at Euarchontoglires <HOG(3.E.1)> is the ancestor of: 
		-  Gene(3) in HUMAN




In [50]:
# The duplicated genes (that have duplicated) 
print("DUPLICATED GENES")
for hogs, dict_genome_genes in lateral_human_mouse.get_duplicated().items():
    print("\t- HOG at Euarchontoglires {} is the ancestor of: ".format(hogs))
    for g, genes in dict_genome_gene.items():
        print("\t\t-  {} in {}".format(genes, g))
print("\n")

DUPLICATED GENES




In [51]:
# The gained genes (that emerged in between)
print("GAINED GENES")
for genome, gains in lateral_human_mouse.get_gained().items():
    print("\t- Genome {} have gained:".format(genome))
    for g in gains:
        print("\t\t-  {}".format(g))
print("\n")

GAINED GENES
	- Genome RATNO have gained:
		-  Gene(43)
	- Genome HUMAN have gained:
		-  Gene(5)




In [52]:
# The lost genes (that been lost in between) 
print("LOST GENES")
for hog, genomes in lateral_human_mouse.get_lost().items():
    print("\t- HOG at Euarchontoglires {} have been lost in ".format(hog))
    for g in genomes:
        print("\t\t- {}".format(g))
print("\n")

LOST GENES
	- HOG at Euarchontoglires <HOG(2.E)> have been lost in 
		- RATNO
	- HOG at Euarchontoglires <HOG(3.E.2)> have been lost in 
		- RATNO
		- HUMAN
	- HOG at Euarchontoglires <HOG(3.E.1)> have been lost in 
		- RATNO




<a id='hvis'></a>
## iHam

iHam is a tool to visualise how the HOG members genes are clustering based on their ancestral genes membership. 

In other words, given a taxonomic range of interest the HOG extant genes are grouped based on their ancestral genes at the descending taxonomic range.

**Let's run the following portion of code to see and example with its description below:**  

In [53]:
hog_3 =  ham_analysis.get_hog_by_id(3);ham_analysis.create_iHam(hog=hog_3,outfile="output/HOG{}.html".format(hog_3.hog_id));from IPython.display import IFrame;IFrame("output/HOG{}.html".format(hog_3.hog_id), width=700, height=350)

iHam is composed of two part:
- **Species tree**: This part allows you to select the level of interest (click to freeze the iHam at this level)
- **Genes panel**: Each line represents an extant genome meanwhile each square represents an extant gene.

If you look at the level of Vertabrata (hover the node), you will see all the genes that are descendant from one single ancestral gene at this level.

If you look now at the level of Euarchontoglires (again, hover the node), you can see that genes are split by a vertical line. This line is used to separate ancestral gene groups. Each columns created by those vertical separators are representing an ancestal gene at the level of interest and contains all the extant genes that are descending from it. 

Here we can observe one column at the level of Mammalia and 2 columns at the level of Euarchontoglires which mean that a duplication had occured in between those two levels.

### iHam : single HOG

In [54]:
# Select an HOG
hog_2 =  ham_analysis.get_hog_by_id(2)

# create the iHam for it and store it into an html file
output_name = "output/HOG{}.html".format(hog_2.hog_id)
ham_analysis.create_iHam(hog=hog_2,outfile=output_name)

# Here a little demo of what you can see with hogs vis
from IPython.display import IFrame
IFrame(output_name, width=700, height=350)

### iHam : single HOG at a specific taxonomic range (ancestral genome)

In [55]:
# select an HOG
hog_3 =  ham_analysis.get_hog_by_id(3)

# then an genome
mammals = ham_analysis.get_ancestral_genome_by_name("Mammalia")

# and you get the related HOG for this genome
hog3_Mammalia = hog_3.get_at_level(mammals)[0] # this function returns a list, I select the first and only one.

# create the iHam for it and save it into an html file
output_name = "output/HOG{}.html".format(hog3_Mammalia.hog_id)
ham_analysis.create_iHam(hog=hog3_Mammalia,outfile=output_name)

# Here a little demo of what you can see with hogs vis
from IPython.display import IFrame
IFrame(output_name, width=700, height=350)

### iHam : all HOGs for a specific taxonomic range (ancestral genome)

In [56]:
#  Select a genome
mammals = ham_analysis.get_ancestral_genome_by_name("Mammalia")

# Iterate over all its HOGs
for hog in mammals.genes:
    # and create the iHam for each
    ham_analysis.create_iHam(hog=hog,outfile="output/HOG-mammals{}.html".format(hog.hog_id))

<a id='tprofile'></a>
## TREEPROFILE

TreeProfile is a tool to visualise how the genes have evolved in terms of evolutionnary events along a phylogenetic tree (duplication, lost, gained).

**Let's run the following portion of code to see and example with its description below:**  

In [57]:
treeprofile = ham_analysis.create_tree_profile(outfile="output/tp.html")

from IPython.display import IFrame
IFrame("output/tp.html", width=680, height=480)

TreeProfile is composed of :
- **Species tree**: This is the reference taxonomy used by HAM.
- **Internal node stack histogram**: Represent the proportion of genes in the genome of the level of interest that have either stay as single copy **retained** or that have been **duplicated**, **lost** or **gained** on the branch leading to the current genome. The genome size is equal to the sum of number of genes in **retained** , **duplicated** , **gained** categories. You can also display the number of phylogenetic events 
- **Legend**: Provides a description of the stack histogram bar and the scale.

**You can customize with view using the top right spanner button to rescale horyzontally and vertically the tree. Plus, you can click and slide histograms if they overlap !!**

##  !!! jupyter notebook are not supporting remote pdf embbeding so the help button (bottom right) is not working. We provide you here an iFrame of the help pdf.

In [58]:
from IPython.display import IFrame
IFrame('https://cdn.rawgit.com/DessimozLab/pyham/fc01fb94/help.pdf', width=1000, height=500)

### TreeProfile for a specific HOG
If you give a single HOG to the 'hog' argument, pyHam will build a treeprofile only on using the information of that particular hog and will represent the evolutionary history of this HOG in terms of duplicated, lost and retained genes accross each taxonomic range.


In [59]:
# select an HOG
hog_3 =  ham_analysis.get_hog_by_id(3)

# create the treeprofile object
treeprofile_hog_3 = ham_analysis.create_tree_profile(hog=hog3, outfile="output/tph3.html")

from IPython.display import IFrame
IFrame("output/tph3.html", width=680, height=480)