In [1]:
%load_ext autoreload
%autoreload 2

import sys
sys.path.insert(0, '/cellar/users/mikeyu/DeepTranslate/ddot')

from ddot import Ontology





# Tutorial


An ontology is a hierarchical arrangement of two types of nodes: (1)
genes at the leaves of the hierarchy and (2) terms at intermediate
levels of the hierarchy. The hierarchy can be thought of as directed
acyclic graph (DAG), in which each node can have multiple children or
multiple parent nodes. DAGs are a generalization of trees
(a.k.a. dendogram), where each node has at most one parent.

The DDOT Python library provides many functions for assembling,
analyzing, and visualizing ontologies.  The main functionalities are
implemented in an object-oriented manner by an "Ontology" class.  This
class can handle both ontologies that are data-driven as well as those
that are manually curated like the Gene Ontology.


# Creating an Ontology object

An object of the Ontology class can be created in several ways. To demonstrate
this, we will build the following ontology

<img src="https://github.com/michaelkyu/ontology/blob/master/docs/toy_ontology.png?raw=true" width="500" align="left">

## Through the __init__ constructor

In [25]:
# Connections from child terms to parent terms
hierarchy = [('S3', 'S1'),
             ('S4', 'S1'),
             ('S5', 'S1'),
             ('S6', 'S2'),
             ('S1', 'S0'),
             ('S2', 'S0')]

# Connections from genes to terms
mapping = [('A', 'S3'),
           ('B', 'S3'),
           ('C', 'S3'),
           ('C', 'S4'),
           ('D', 'S4'),
           ('E', 'S5'),
           ('F', 'S5'),
           ('G', 'S6'),
           ('H', 'S6')]

# Construct ontology
ont = Ontology(hierarchy, mapping)

In [26]:
# Print summary
print(ont)

8 genes, 7 terms, 9 gene-term relations, 6 term-term relations
node_attributes: []
edge_attributes: []


## From from a tab-delimited table or pandas DataFrame


In [23]:
ont.to_table(output='toy_ontology.txt', term_2_term=True)

Unnamed: 0,Parent,Child,EdgeType
0,S2,S6,Child-Parent
1,S1,S3,Child-Parent
2,S1,S4,Child-Parent
3,S1,S5,Child-Parent
4,S0,S1,Child-Parent
5,S0,S2,Child-Parent
6,S3,A,Gene-Term
7,S3,C,Gene-Term
8,S4,C,Gene-Term
9,S3,B,Gene-Term


In [24]:
ont = Ontology.from_table('toy_ontology.txt')

8 genes, 7 terms, 9 gene-term relations, 6 term-term relations
node_attributes: []
edge_attributes: [2]

In [None]:
ont.to_table(output='toy_ontology.gene_2_term.txt', gene_2_term=True)
ont.to_table(output='toy_ontology.term_2_term.txt', term_2_term=True)

In [None]:
ont = Ontology.from_table('toy_ontology.txt', 

# From the Network Data Exchange (NDEx, requires registering a user account at http://ndexbio.org/)

In [30]:
# Replace with your own NDEx user account
ndex_user, ndex_pass = 'ddot_test', 'ddot_test'

url, _ = ont.to_ndex(ndex_user=ndex_user, ndex_pass=ndex_pass)
print(url)

In [34]:
ont = Ontology.from_ndex('http://dev2.ndexbio.org/v2/network/fccec840-2b9d-11e8-84e4-0660b7976219')




In [35]:
print(ont)

8 genes, 7 terms, 9 gene-term relations, 6 term-term relations
node_attributes: [u'NodeType', 'name', u'x_pos', u'isRoot', u'Vis:Shape', u'y_pos', u'Label', u'Vis:Border Paint', u'Vis:Size', u'Vis:Fill Color', u'Size']
edge_attributes: [u'Is_Tree_Edge', u'Vis:Visible', u'EdgeType']


# Inspection the structure of an ontology

An Ontology object contains seven attributes:

* ``genes`` : List of gene names
* ``terms`` : List of term names
* ``gene_2_term`` : dictionary mapping a gene name to a list of terms connected to that gene. Terms are represented as their 0-based index in ``terms``.
* ``term_2_gene`` : dictionary mapping a term name to a list or genes connected to that term. Genes are represented as their 0-based index in ``genes``.
* ``child_2_parent`` : dictionary mapping a child term to its parent terms.
* ``parent_2_child`` : dictionary mapping a parent term to its children terms.
* ``term_sizes`` : A list of each term's size, i.e. the number of unique genes contained within this term and its descendants. The order of this list is the same as ``terms``. For every ``i``, it holds that ``term_sizes[i] = len(self.term_2_gene[self.terms[i]])``

In [38]:
ont.genes

['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H']

In [39]:
ont.terms

['S0', 'S1', 'S2', 'S3', 'S4', 'S5', 'S6']

In [45]:
ont.gene_2_term

{'A': [3],
 'B': [3],
 'C': [3, 4],
 'D': [4],
 'E': [5],
 'F': [5],
 'G': [6],
 'H': [6]}

In [42]:
ont.term_2_gene

{'S0': [],
 'S1': [],
 'S2': [],
 'S3': [0, 1, 2],
 'S4': [2, 3],
 'S5': [4, 5],
 'S6': [6, 7]}

In [43]:
ont.child_2_parent

{'S0': [],
 'S1': ('S0',),
 'S2': ('S0',),
 'S3': ('S1',),
 'S4': ('S1',),
 'S5': ('S1',),
 'S6': ('S2',)}

Alternatively, the hierarchical connections can be viewed as a matrix, using `connected()`

In [55]:
ont.connected()

array([[ True, False, False, False, False, False, False, False,  True,
         True, False,  True, False, False, False],
       [False,  True, False, False, False, False, False, False,  True,
         True, False,  True, False, False, False],
       [False, False,  True, False, False, False, False, False,  True,
         True, False,  True,  True, False, False],
       [False, False, False,  True, False, False, False, False,  True,
         True, False, False,  True, False, False],
       [False, False, False, False,  True, False, False, False,  True,
         True, False, False, False,  True, False],
       [False, False, False, False, False,  True, False, False,  True,
         True, False, False, False,  True, False],
       [False, False, False, False, False, False,  True, False,  True,
        False,  True, False, False, False,  True],
       [False, False, False, False, False, False, False,  True,  True,
        False,  True, False, False, False,  True],
       [False, False, Fa

A summary of an Ontology’s object, i.e. the number of genes, terms, and connections, can be printed `print(ont)`

In [54]:
print(ont)

8 genes, 7 terms, 9 gene-term relations, 6 term-term relations
node_attributes: [u'NodeType', 'name', u'x_pos', u'isRoot', u'Vis:Shape', u'y_pos', u'Label', u'Vis:Border Paint', u'Vis:Size', u'Vis:Fill Color', u'Size']
edge_attributes: [u'Is_Tree_Edge', u'Vis:Visible', u'EdgeType']


In [53]:
ont.to_igraph(include_genes=True).vs['name']

['A',
 'B',
 'C',
 'D',
 'E',
 'F',
 'G',
 'H',
 'S0',
 'S1',
 'S2',
 'S3',
 'S4',
 'S5',
 'S6']

# Manipulating the structure of an ontology

DDOT provides several convenience functions for processing Ontologies into a desirable structure. Currently, there are no functions for adding genes and terms. If this is needed, then we recommend creating a new Ontology or manipulating the contents in a different library, such as NetworkX or igraph, and transforming the results into Ontology.

In [49]:
# Renaming genes and terms.
ont2 = ont.rename(genes={'A' : 'A_alias'}, terms={'S0':'S0_alias'})
ont2.to_table()

KeyError: 'S3'

In [None]:
ont.delete

# Inferring a data-driven ontology

An ontology can also be inferred in a data-driven manner based on an input set of node-node similarities.

In [4]:
sim, genes = ont.flatten()

In [6]:
ont2 = Ontology.run_clixo(sim, 0.0, 1.0, square=True, square_names=genes)

In [8]:
print(ont2)

8 genes, 6 terms, 9 gene-term relations, 5 term-term relations
node_attributes: []
edge_attributes: ['CLIXO_score']
