# Welcome to HAM Tool Box !

This ipython notebook aims to help you to do your first steps with HAM. 

We will covered in this tutorial how to: 
    - set up an HAM analysis using the different options available.
    - shows the different queries that you can use to fetch information you want.
    - explore what all ham objects have to offer !
    - compare several genomes throught their HOGS.
    - run hogvis (single hog visualisation)
    - run treeprofile (species tree with annotated node with evolutionary events)
    

## SET-UP THE HAM

### Required import

In [1]:
#  This is the HAM package
import ham

#  OPTIONAL: only if you want to have the logger information printed
import logging
logging.basicConfig(level=logging.INFO, format="%(asctime)s %(name)-12s %(levelname)-8s %(message)s")

### First, we need a species tree and an orthoXML !

In [2]:
#  Select a nwk file as a taxonomy reference
nwk_path = './tests/data/simpleEx.nwk'
#  And extract the newick tree as a string
tree_str = ham.utils.get_newick_string(nwk_path, type="nwk")

# Then you select your favorite orthoXML file
orthoxml_path = './tests/data/simpleEx.orthoxml'

### Let's feed HAM with some HOGs and a tree.

In [3]:
# ham.HAM is th/e main object that contained all information and functionalities.
ham_analysis = ham.HAM(tree_str, orthoxml_path, use_internal_name=True)

2017-05-11 17:04:25,575 ham.ham      INFO     Build taxonomy: completed.
2017-05-11 17:04:25,577 ham.parsers  INFO     Species HUMAN created. 
2017-05-11 17:04:25,578 ham.parsers  INFO     Species PANTR created. 
2017-05-11 17:04:25,579 ham.parsers  INFO     Species CANFA created. 
2017-05-11 17:04:25,581 ham.parsers  INFO     Species MOUSE created. 
2017-05-11 17:04:25,582 ham.parsers  INFO     Species RATNO created. 
2017-05-11 17:04:25,583 ham.parsers  INFO     Species XENTR created. 
2017-05-11 17:04:25,586 ham.ham      INFO     Parse Orthoxml: 3 top level hogs and 19 extant genes extract.
2017-05-11 17:04:25,587 ham.ham      INFO     Set up HAM analysis: ready to go with 3 hogs founded within 6 species.


### Alternative options: filtering and taxonomic range naming !

#### Taxonomic range naming
If you don't want to use the internal names defined by the newick specie tree (or if it's have support values as internal names) you can use the automatic naming of HAM (by concatenation of node child names).


In [4]:
# the use_internal_name attribute is doing the job
ham_analysis_no_name = ham.HAM(tree_str, orthoxml_path, use_internal_name=False)

# In the previous section we use the internal name of the newick tree:
print("Ancestral genomes name using internal names:")
for ag in ham_analysis.taxonomy.internal_nodes:
    print("\t- {}".format(ag.name))
    
# Here we use the artificial internal names built by ham:
print("Ancestral genomes name using artificial ham internal names:")
for ag in ham_analysis_no_name.taxonomy.internal_nodes:
    print("\t- {}".format(ag.name))

#  In case you are using use_internal_name=False, the following function returns an ascii representation
#  of the newick tree with  articial names. This can be used even before instanciating the HAM object which
#  can be usefull to avoid stupid error in script.

print(ham.utils.previsualize_taxonomy(tree_str))


2017-05-11 17:04:25,597 ham.ham      INFO     Build taxonomy: completed.
2017-05-11 17:04:25,599 ham.parsers  INFO     Species HUMAN created. 
2017-05-11 17:04:25,601 ham.parsers  INFO     Species PANTR created. 
2017-05-11 17:04:25,602 ham.parsers  INFO     Species CANFA created. 
2017-05-11 17:04:25,603 ham.parsers  INFO     Species MOUSE created. 
2017-05-11 17:04:25,604 ham.parsers  INFO     Species RATNO created. 
2017-05-11 17:04:25,605 ham.parsers  INFO     Species XENTR created. 
2017-05-11 17:04:25,607 ham.ham      INFO     Parse Orthoxml: 3 top level hogs and 19 extant genes extract.
2017-05-11 17:04:25,608 ham.ham      INFO     Set up HAM analysis: ready to go with 3 hogs founded within 6 species.


Ancestral genomes name using internal names:
	- Euarchontoglires
	- Mammalia
	- Primates
	- Rodents
	- Vertebrata
Ancestral genomes name using artificial ham internal names:
	- XENTR/HUMAN/PANTR/MOUSE/RATNO/CANFA
	- HUMAN/PANTR/MOUSE/RATNO
	- HUMAN/PANTR/MOUSE/RATNO/CANFA
	- MOUSE/RATNO
	- HUMAN/PANTR

                                   /-XENTR
                                  |
                                  |                                                               /-HUMAN
-XENTR/HUMAN/PANTR/MOUSE/RATNO/CANFA                                                   /HUMAN/PANTR
                                  |                                                   |           \-PANTR
                                  |                             /HUMAN/PANTR/MOUSE/RATNO
                                  |                            |                      |           /-MOUSE
                                   \HUMAN/PANTR/MOUSE/RATNO/CANFA                      \MOUSE/RATNO
          

#### Pre Parsing filtering
If you have a large orthoXML file or you only want parse information of interest you can use the filter option.

In [5]:
# First you instanciate an empty ParserFilter object
filter_ham = ham.ParserFilter()

# Then you can add to the filter which hog you want using the following function (those 3 examples are doing the same)
filter_ham.add_hogs_via_hogId([2]) # by its toplevel hog id 
filter_ham.add_hogs_via_GeneIntId([2]) # by a gene unique id that belong to the hog of interest 
filter_ham.add_hogs_via_GeneExtId(["HUMAN2"]) # by a gene external id that belong to the hog of interest 

# Finnaly, you set up the HAM object as you learned with the filter_object attribute
ham_analysis_filter = ham.HAM(tree_str, orthoxml_path, use_internal_name=True, filter_object=filter_ham)

# If you look at the log information beneath it is differnt from the one previously printed !!

2017-05-11 17:04:25,620 ham.ham      INFO     Build taxonomy: completed.
2017-05-11 17:04:25,622 ham.ham      INFO     Filtering Indexing of Orthoxml done: 1 top level hogs and 4 extant genes will be extract.
2017-05-11 17:04:25,623 ham.parsers  INFO     Species HUMAN created. 
2017-05-11 17:04:25,624 ham.parsers  INFO     Species PANTR created. 
2017-05-11 17:04:25,625 ham.parsers  INFO     Species CANFA created. 
2017-05-11 17:04:25,627 ham.parsers  INFO     Species MOUSE created. 
2017-05-11 17:04:25,628 ham.parsers  INFO     Species RATNO created. 
2017-05-11 17:04:25,630 ham.parsers  INFO     Species XENTR created. 
2017-05-11 17:04:25,633 ham.ham      INFO     Parse Orthoxml: 1 top level hogs and 4 extant genes extract.
2017-05-11 17:04:25,634 ham.ham      INFO     Set up HAM analysis: ready to go with 1 hogs founded within 6 species.


## QUERIES: HOW TO TALK WITH HAM ?

We can split the queries in diffents types:
    - ExtantGene
    - HOG
    - ExtantGenome
    - AncestralGenome
    - Taxon

### ExtantGene queries

In [6]:
# Get a gene by its unique (orthoxml) id
gene_human3 = ham_analysis.get_gene_by_id(3)
print(gene_human3)

# Get a list of genes that matches an external id
potentential_genes_human3 = ham_analysis.get_genes_by_external_id("HUMAN3")
print(potentential_genes_human3[0])

## You should see twice the same gene printed beneath !

# You can also get all genes created as a list
list_genes = ham_analysis.get_list_extant_genes()

# or as the dictionnary (key <-> unique id and value <-> gene)
dict_genes = ham_analysis.get_dict_extant_genes()

Gene(3)
Gene(3)


### HOG queries

In [7]:
# Get a HOG by its top level unique id
HOG_3 = ham_analysis.get_hog_by_id(3)
print(HOG_3)

HOG_3 = ham_analysis.get_hog_by_gene(gene_human3)
print(HOG_3)

## You should see twice the same HOG printed beneath !

# You can also get all genes created as a list
list_toplevel_hogs = ham_analysis.get_list_top_level_hogs()

# or as the dictionnary (key <->  top level id and value <-> HOG)
dict_toplevel_hogs = ham_analysis.get_dict_top_level_hogs()

<HOG(3)>
<HOG(3)>


### ExtantGenome queries

In [8]:
# Get a extant genome by its name
genome_human = ham_analysis.get_extant_genome_by_name("HUMAN")
print(genome_human)

# You can also get all extant genomes created as a list
list_genome = ham_analysis.get_list_extant_genomes()
print("List of species:")
for g in list_genome:
    print("\t- {}".format(g.name))

HUMAN
List of species:
	- XENTR
	- HUMAN
	- MOUSE
	- PANTR
	- CANFA
	- RATNO


### AncestralGenome queries

In [9]:
# Get an ancestral genome by its name
genome_rodents_1 = ham_analysis.get_ancestral_genome_by_name("Rodents")
print(genome_rodents_1)

# And in case you specified use_internal_name=False ! 
genome_rodents_2 = ham_analysis_no_name.get_ancestral_genome_by_name("MOUSE/RATNO")
print(genome_rodents_2)

# Get an ancestral genome by its the looking at the mrca of 2+ genomes
# First you get the descendant genomes as we seen above
genome_rat = ham_analysis.get_extant_genome_by_name("RATNO")
genome_mouse = ham_analysis.get_extant_genome_by_name("MOUSE")
# Then you get the corresponding mrca ancestral genomes 
genome_rodents_3 = ham_analysis.get_ancestral_genome_by_mrca_of_genome_set({genome_rat, genome_mouse})
print(genome_rodents_3)

# You can also get an ancestral genome by its taxonomy node
taxon = ham_analysis.get_taxon_by_name("Rodents") # as seen in the next section
genome_rodents_4 = ham_analysis.get_ancestral_genome_by_taxon(taxon)
print(genome_rodents_4)

# You can also get all ancestral genomes created as a list
list_genome = ham_analysis.get_list_ancestral_genomes()
print("\n List of ancestral genomes:")
for g in list_genome:
    print("\t- {}".format(g.name))
    
    
# You can also get all ancestral genomes created as a list
list_genome = ham_analysis_filter.get_list_ancestral_genomes()
print("\n Here for the filter version of HAM, the list of ancestral genomes:")
for g in list_genome:
    print("\t- {}".format(g.name))
print("'Vertebrate' is not present in the filter list because not required during the parsing")

Rodents
MOUSE/RATNO
Rodents
Rodents

 List of ancestral genomes:
	- Euarchontoglires
	- Mammalia
	- Primates
	- Rodents
	- Vertebrata

 Here for the filter version of HAM, the list of ancestral genomes:
	- Rodents
	- Mammalia
	- Primates
	- Euarchontoglires
'Vertebrate' is not present in the filter list because not required during the parsing


### Taxon queries

In [10]:
# You can get an Taxonomy node by its name
taxon = ham_analysis.get_taxon_by_name("Rodents") 
print(taxon.name)

taxon = ham_analysis.get_taxon_by_name("HUMAN") 
print(taxon.name)

Rodents
HUMAN
