# OPA2Vec

In this part of the tutorial, we run two ontology based methods to produce vector representations of biological entities: OPA2Vec.

## Imports

In [None]:
import mowl
mowl.init_jvm("20g")
from mowl.datasets.ppi_yeast import PPIYeastSlimDataset, PPIYeastDataset


## Loading the dataset

In [None]:
ds = PPIYeastSlimDataset()

## OPA2Vec

Onto2vec produces vectory representations based on the logical axioms of an ontology and the known associations between ontology classes and biological entities. In the case study below, we use Onto2vec to produce vector representations of proteins based on their GO annotations and the GO logical axioms.

In [None]:
from mowl.corpus.base import extract_axiom_corpus, extract_and_save_axiom_corpus, extract_annotation_corpus, extract_and_save_annotation_corpus

corpus = extract_axiom_corpus(ds.ontology)
extract_and_save_axiom_corpus(ds.ontology, out_file = "data/opa2vec_axiom_corpus")



In [None]:
annot_corpus = extract_annotation_corpus(ds.ontology)
extract_and_save_annotation_corpus(ds.ontology, "data/opa2vec_axiom_corpus", mode = "a")


Train the model

Evaluate PPI prediction performance

In [None]:
mean_rank, rank_1, rank_10, rank_100 = model.evaluate_ppi()
print(f'Mean rank: {mean_rank}, Top 1: {rank_1}, Top 10: {rank_10}, Top 100: {rank_100}')

In [None]:
nodemap = {}
embeddings = model.w2v_model.wv

In [None]:
n = len(embeddings)
emb_size = len(embeddings[0])
embeds = np.zeros((n, emb_size), dtype=np.float32)
for i in range(n):
    embeds[i, :] = embeddings[i]
X = TSNE(n_components=2, verbose=1, n_iter=2500).fit_transform(embeds)

In [None]:
ec_numbers = {}
with open('../../../data/yeast_ec.tab') as f:
    next(f)
    for line in f:
        it = line.strip().split('\t', -1)
        if len(it) < 5:
            continue
        if it[3]:
            prot_id = it[3].split(';')[0]
            prot_id = '{0}'.format(prot_id)    
            ec_numbers[prot_id] = it[4]
classes = {'0': [[], []]}
for i in range(n):
    v = embeddings.index_to_key[i]
    if not v.startswith('<http://4932'):
        continue
    v = v[8:-1]
    if v in ec_numbers:
        ec = ec_numbers[v].split('.')[0]
        if ec not in classes:
            classes[ec] = [[], []]
        classes[ec][0].append(X[i, 0])
        classes[ec][1].append(X[i, 1])
        
colors = iter(plt.cm.rainbow(np.linspace(0, 1, len(classes))))
fig, ax = plt.subplots(figsize=(20, 20))

for ec, items in classes.items():
    if ec == '0':
        continue
    color = next(colors)
    ax.scatter(items[0], items[1], color=color, label=ec)

ax.legend()
ax.grid(True)

plt.show()