In this part of the tutorial, we run two ontology based methods to produce vector representations of biological entities: Onto2Vec and OPA2Vec.  

## Onto2vec

Onto2vec produces vectory representations based on the logical axioms of an ontology and the known associations between ontology classes and biological entities. In the case study below, we use Onto2vec to produce vector representations of proteins based on their GO annotations and the GO logical axioms.

In [10]:
org_id ='4932' #or 9606 fpr human data 
!python onto2vec_opa2vec/runOnto2Vec.py  -ontology data/go.owl -associations data/train/{org_id}.OPA_associations.txt -outfile data/{org_id}.onto2vec_vecs -entities data/train/{org_id}.protein_list.txt  

	
		*********** Onto2Vec Running ... ***********


		1.Reasoning over ontology ...

SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
Loading of Axioms ...
Loading ...
    1%
    2%
    3%
    4%
    6%
    7%
    9%
    10%
    12%
    14%
    16%
    18%
    19%
    21%
    23%
    25%
    27%
    28%
    30%
    32%
    34%
    36%
    37%
    39%
    42%
    44%
    46%
    48%
    50%
    52%
    54%
    56%
    57%
    59%
    61%
    63%
    65%
    67%
    69%
    71%
    74%
    76%
    78%
    79%
    80%
    81%
    82%
    83%
    84%
    86%
    87%
    88%
    89%
    90%
    91%
    93%
    94%
    96%
    97%
    98%
    ... finished
    ... finished
Property Saturation Initialization ...
    ... finished
Reflexive Property Computation ...
    ... finished
Object Property Hierarchy and Composition Computation ...

## OPA2Vec

In addition to the ontology axioms and their entity associations, OPA2Vec also uses the ontology metadata and literature to represent biological entities. The code below runs OPA2Vec on GO and protein-GO associations to produce protein vector representations

In [9]:
!python onto2vec_opa2vec/runOPA2Vec.py  -ontology data/go.owl -associations data/train/{org_id}.OPA_associations.txt -outfile data/{org_id}.opa2vec_vecs -entities data/train/{org_id}.protein_list.txt

	
		*********** OPA2Vec Running ... ***********


		1.Ontology Processing ...

SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
Loading of Axioms ...
Loading ...
    1%
    2%
    3%
    4%
    5%
    6%
    7%
    8%
    9%
    10%
    11%
    12%
    13%
    14%
    15%
    16%
    18%
    19%
    20%
    21%
    23%
    25%
    26%
    28%
    29%
    31%
    33%
    34%
    36%
    37%
    38%
    40%
    42%
    44%
    45%
    47%
    49%
    51%
    53%
    55%
    57%
    59%
    61%
    63%
    64%
    66%
    67%
    68%
    69%
    70%
    71%
    73%
    74%
    75%
    76%
    78%
    80%
    81%
    83%
    84%
    86%
    87%
    89%
    91%
    92%
    94%
    95%
    97%
    99%
    ... finished
    ... finished
Property Saturation Initialization ...
    ... finished
Reflexive Property Computation ...
    ... f

## Generate features

Map proteins to corresponding vectors

In [11]:
org_id = '9606' #org_id = '4932'
onto2vec_map = {} 
opa2vec_map = {}
with open (f'data/{org_id}.onto2vec_vecs','r') as f:
       for line in f:
           protein, vector=line.strip().split(" ",maxsplit=1)
           onto2vec_map [protein]=vector
with open (f'data/{org_id}.opa2vec_vecs','r') as f:
       for line in f:
            protein, vector=line.strip().split(" ",maxsplit=1)
            opa2vec_map [protein]=vector


Generate pair features for the training/validation/testing datasets

In [13]:
import random 
data_type = ['train', 'valid', 'test']
for i in data_type:
        pair_data = []
        feature_vecs =[]
        label_map ={}
        with open (f'data/{i}/{org_id}.protein.links.v11.0.txt','r') as f1:
              for line in f1:
                  prot1, prot2 = line.strip().split()
                  pair_data.append((prot1,prot2))
                  label_map[(prot1, prot2)] = 1
        with open (f'data/{i}/{org_id}.negative_interactions.txt','r') as f2:
             for line in f2:
                  prot1, prot2 = line.strip().split()
                  pair_data.append((prot1, prot2))
                  label_map[(prot1, prot2)] = 0 
        random.shuffle(pair_data)
        with open (f'data/{i}/{org_id}.onto2vec_features','w') as f3:
              with open (f'data/{i}/{org_id}.opa2vec_features', 'w') as f4:
                   with open (f'data/{i}/{org_id}.labels','w') as f5:
                        with open (f'data/{i}/{org_id}.pairs','w') as f6:
                             for prot1, prot2 in pair_data:
                                 if (prot1 in onto2vec_map and prot1 in opa2vec_map and prot2 in onto2vec_map and prot2 in opa2vec_map):
                                   f6.write (f'{prot1} {prot2}\n')
                                   f5.write (f'{label_map[(prot1,prot2)]}\n')
                                   f4.write (f'{opa2vec_map[prot1]} {opa2vec_map[prot2]}\n')
                                   f3.write (f'{onto2vec_map[prot1]} {onto2vec_map[prot2]}\n')   
                                    