# Welcome to the OPA2Vec tutorial session.

## Introduction

OPA2Vec is a tool that can be used to produce feature vectors for biological entities from an ontology. To run OPA2Vec simultaneously during the tutorial please follow the instructions explained below.

The source code of OPA2Vec is available at: https://github.com/bio-ontology-research-roup/opa2vec/. Please download all files available in the repository to be able to follow this tutorial.

In this tutorial, we will use the gene ontology as a case study. Please download the owl version of the gene ontology from: https://geneontology.org/ontology/go.owl

Please also download the pre-trained Word2vec model available at: http://bio2vec.net/data/pubmed_model/ and save it in the same folder where you saved OPA2Vec source code. More details on what this model means will be discussed later.

## Dependencies

Now that all needed files are downloaded, we need to prepare the environment to run OPA2Vec. OPA2Vec is implemented in 3 different programming languages python, groovy and perl. The versions we use for each language are the following: 
- python 2.7.5
- groovy 2.4.10 JVM:1.8.0_121
- perl: v5.16.3

OPA2Vec also uses the gensim python library which requires scipy and numpy. Assuming you have numpy and scipy installed, you can install gensim by running the following in your terminal: 

In [None]:
easy_install -U gensim

To process our ontology with groovy and the owl api, some librairies are required. Please run the cell below to download the needed libraries using Grape depency management system. 

In [None]:
import groovy.grape.Grape
Grape.grab(group:"org.semanticweb.elk", module:"elk-owlapi", version:"0.4.2")
Grape.grab(group:"net.sourceforge.owlapi", module:"owlapi-api", version:"4.1.0")
Grape.grab(group:"net.sourceforge.owlapi", module:"owlapi-apibinding", version:"4.1.0")
Grape.grab(group:"net.sourceforge.owlapi", module:"owlapi-impl", version:"4.1.0")
Grape.grab(group:"net.sourceforge.owlapi", module:"owlapi-parsers", version:"4.1.0")
Grape.grab(group:"org.codehaus.gpars", module:"gpars", version:"1.1.0")



## Running OPA2Vec 
Now that our environment is ready, we can go ahead and run OPA2Vec. In the folder where you downloaded OPA2Vec files, open the terminal and run the following command: 

In [None]:
python runOPA2Vec.py go.owl SampleAssociationFile.lst 

This command will run OPA2Vec with the default parameters. The only required input is the ontology owl file and the file with the entity-class associations. If everything goes well, an output file *AllVectorResults.lst* should have been created and should contain the obtained vector representations. 

OPA2Vec allows you to choose different parameters depending on the your data and type of application. Let's try to change the default parameters by providing optional arguments through the command line. 
In particular, the optional parameters we are allowed to specify in the command line are :
 
    -embedsize [embedding size]
    Size of obtained vectors

    -windsize [window size]
    Window size for word2vec model

    -mincount [min count]
    Minimum count value for word2vec model

    -model [model]
    Preferred word2vec architecture, sg or cbow

    -annotations [metadata annotations] List of full URIs of annotation properties to be included in metadata separated by a comma . Use 'all' for all annotation properties (default) or 'none' for no annotation property


    -pretrained [pre-trained model] Pre-trained word2vec model for background knowledge. If no pre-trained model is specified, the program will assume you have downloaded the default pre-trained model from http://bio2vec.net/data/pubmed_model/ 



In [None]:
python runOPA2Vec.py go.owl SampleAssociationFile.lst -embedsize 50 -windsize 10 -mincount 20 -model sg -annotations all

The command above should relaunch OPA2Vec with the new specified parameters and update the results in *AllVectorsResults.lst* with the new obtained results. 

Let's now have a more detailed look on some of the steps that OPA2Vec performs to obtain vector representations of biological entities. 

One of the first steps, OPA2Vec performs is reasoning over the specified ontology and storing all the axioms (asserted and inferred) as a first part of the corpus.

In [None]:
// Run reasoner on ontology
import org.semanticweb.owlapi.model.parameters.*;
import org.semanticweb.elk.owlapi.ElkReasonerFactory;
import org.semanticweb.elk.owlapi.ElkReasonerConfiguration;
import org.semanticweb.elk.reasoner.config.*;
import org.semanticweb.owlapi.apibinding.OWLManager;
import org.semanticweb.owlapi.reasoner.*;
import org.semanticweb.owlapi.reasoner.structural.StructuralReasoner
import org.semanticweb.owlapi.vocab.OWLRDFVocabulary;
import org.semanticweb.owlapi.model.*;
import org.semanticweb.owlapi.io.*;
import org.semanticweb.owlapi.owllink.*;
import org.semanticweb.owlapi.util.*;
import org.semanticweb.owlapi.search.*;
import org.semanticweb.owlapi.manchestersyntax.renderer.*;
import org.semanticweb.owlapi.reasoner.structural.*;
import uk.ac.manchester.cs.owlapi.modularity.ModuleType;
import uk.ac.manchester.cs.owlapi.modularity.SyntacticLocalityModuleExtractor;
import org.semanticweb.owlapi.manchestersyntax.renderer.*;
import java.io.*;
import java.io.PrintWriter;
import org.semanticweb.owlapi.apibinding.OWLManager;
import org.semanticweb.owlapi.model.*;
import org.semanticweb.owlapi.reasoner.InferenceType;
import org.semanticweb.owlapi.util.InferredAxiomGenerator;
import org.semanticweb.owlapi.util.InferredOntologyGenerator;
import org.semanticweb.owlapi.util.InferredSubClassAxiomGenerator;
import org.semanticweb.owlapi.util.InferredEquivalentClassAxiomGenerator;
import org.semanticweb.owlapi.reasoner.OWLReasonerFactory;
import org.semanticweb.owlapi.reasoner.OWLReasoner;

import groovyx.gpars.GParsPool;

	
OWLOntologyManager outputManager = OWLManager.createOWLOntologyManager();
OWLOntologyManager manager = OWLManager.createOWLOntologyManager();
//String ontostring = args[0];
OWLOntology ont = manager.loadOntologyFromOntologyDocument(new File("go.owl"));
OWLDataFactory dataFactory = manager.getOWLDataFactory();
OWLDataFactory fac = manager.getOWLDataFactory();
ConsoleProgressMonitor progressMonitor = new ConsoleProgressMonitor();
OWLReasonerConfiguration config = new SimpleConfiguration(progressMonitor);
ElkReasonerFactory f1 = new ElkReasonerFactory();
OWLReasoner reasoner = f1.createReasoner(ont, config);
reasoner.precomputeInferences(InferenceType.CLASS_HIERARCHY);

List<InferredAxiomGenerator<? extends OWLAxiom>> gens = new ArrayList<InferredAxiomGenerator<? extends OWLAxiom>>();
gens.add(new InferredSubClassAxiomGenerator());
gens.add(new InferredEquivalentClassAxiomGenerator());
OWLOntology infOnt = outputManager.createOntology();
InferredOntologyGenerator iog = new InferredOntologyGenerator(reasoner,gens);
iog.fillOntology(outputManager.getOWLDataFactory(), infOnt);

// Save the inferred ontology.
outputManager.saveOntology(infOnt,IRI.create((new File("inferredontologygo2.owl").toURI())));

// Display Axioms
	OWLObjectRenderer renderer =new ManchesterOWLSyntaxOWLObjectRendererImpl ();
	int numaxiom1= infOnt.getAxiomCount();
	Set<OWLClass> classes=infOnt.getClassesInSignature();
	FileWriter fw= new FileWriter ("axiomsinf.lst",true); BufferedWriter bw =new BufferedWriter (fw); PrintWriter out =new PrintWriter (bw);
	FileWriter fw1= new FileWriter ("classes.lst",true); BufferedWriter bw1 =new BufferedWriter (fw1); PrintWriter out1 =new PrintWriter (bw1);
	for (OWLClass class1 : classes)
	{
	   Set<OWLClassAxiom> ontoaxioms=infOnt.getAxioms (class1);
	   for (OWLClassAxiom claxiom: ontoaxioms)
	   {
	  	 classess=renderer.render(class1);
	    	 classaxiom=renderer.render (claxiom);
	    	out1.println (class1);
	    	out.println (claxiom);
	    }
	}

//display original axioms
	//int numaxiom1= Ont.getAxiomCount();
	Set<OWLClass> classeso=ont.getClassesInSignature();
	FileWriter fwo= new FileWriter ("axiomsorig.lst",true); BufferedWriter bwo =new BufferedWriter (fwo); PrintWriter outo =new PrintWriter (bwo);
	//FileWriter fw1= new FileWriter ("classesgo.lst",true); BufferedWriter bw1 =new BufferedWriter (fw1); PrintWriter out1 =new PrintWriter (bw1);
	for (OWLClass classo : classeso)
	{
	   Set<OWLClassAxiom> ontoaxioms=ont.getAxioms (classo);
	   for (OWLClassAxiom claxiom: ontoaxioms)
	   {
	  	// classess=renderer.render(class1);
	    	 classaxiom=renderer.render (claxiom);
	    	//out1.println (classess);
	    	println (claxiom);
	    }
	}



The second step of OPA2Vec is to extract relevant metadata from the ontology.
- The script *getMetadata.groovy* extracts annotation axioms for the specified properties. 
For illustration, let's run it by extracting all the annotation properties specified in the ontology. 

In [None]:
// Extract the metadata

import org.coode.owlapi.manchesterowlsyntax.ManchesterOWLSyntaxOntologyFormat;
import org.coode.owlapi.turtle.TurtleOntologyFormat;
import org.junit.Ignore;
import org.junit.Test;
import org.semanticweb.owlapi.apibinding.OWLManager;
import org.semanticweb.owlapi.io.OWLOntologyDocumentTarget;
import org.semanticweb.owlapi.io.OWLXMLOntologyFormat;
import org.semanticweb.owlapi.io.RDFXMLOntologyFormat;
import org.semanticweb.owlapi.io.StreamDocumentTarget;
import org.semanticweb.owlapi.io.StringDocumentTarget;
import org.semanticweb.owlapi.io.SystemOutDocumentTarget;
import org.semanticweb.owlapi.model.AddAxiom;
import org.semanticweb.owlapi.model.AddOntologyAnnotation;
import org.semanticweb.owlapi.model.IRI;
import org.semanticweb.owlapi.model.OWLAnnotation;
import org.semanticweb.owlapi.model.OWLAnnotationProperty;
import org.semanticweb.owlapi.model.OWLAxiom;
import org.semanticweb.owlapi.model.OWLClass;
import org.semanticweb.owlapi.model.*;
import org.semanticweb.owlapi.model.OWLClassAssertionAxiom;
import org.semanticweb.owlapi.model.OWLClassExpression;
import org.semanticweb.owlapi.model.OWLDataExactCardinality;
import org.semanticweb.owlapi.model.OWLDataFactory;
import org.semanticweb.owlapi.model.OWLDataProperty;
import org.semanticweb.owlapi.model.OWLDataPropertyAssertionAxiom;
import org.semanticweb.owlapi.model.OWLDataPropertyRangeAxiom;
import org.semanticweb.owlapi.model.OWLDataRange;
import org.semanticweb.owlapi.model.OWLDataSomeValuesFrom;
import org.semanticweb.owlapi.model.OWLDataUnionOf;
import org.semanticweb.owlapi.model.OWLDatatype;
import org.semanticweb.owlapi.model.OWLDatatypeDefinitionAxiom;
import org.semanticweb.owlapi.model.OWLDatatypeRestriction;
import org.semanticweb.owlapi.model.OWLDeclarationAxiom;
import org.semanticweb.owlapi.model.OWLDifferentIndividualsAxiom;
import org.semanticweb.owlapi.model.OWLDisjointClassesAxiom;
import org.semanticweb.owlapi.model.OWLEntity;
import org.semanticweb.owlapi.model.OWLEquivalentClassesAxiom;
import org.semanticweb.owlapi.model.OWLFacetRestriction;
import org.semanticweb.owlapi.model.OWLFunctionalDataPropertyAxiom;
import org.semanticweb.owlapi.model.OWLIndividual;
import org.semanticweb.owlapi.model.OWLLiteral;
import org.semanticweb.owlapi.model.OWLNamedIndividual;
import org.semanticweb.owlapi.model.OWLObjectAllValuesFrom;
import org.semanticweb.owlapi.model.OWLObjectExactCardinality;
import org.semanticweb.owlapi.model.OWLObjectHasValue;
import org.semanticweb.owlapi.model.OWLObjectIntersectionOf;
import org.semanticweb.owlapi.model.OWLObjectOneOf;
import org.semanticweb.owlapi.model.OWLObjectProperty;
import org.semanticweb.owlapi.model.OWLObjectPropertyAssertionAxiom;
import org.semanticweb.owlapi.model.OWLObjectPropertyExpression;
import org.semanticweb.owlapi.model.OWLObjectSomeValuesFrom;
import org.semanticweb.owlapi.io.OWLObjectRenderer;
import org.semanticweb.owlapi.model.OWLOntology;
import org.semanticweb.owlapi.model.OWLOntologyCreationException;
import org.semanticweb.owlapi.model.OWLOntologyFormat;
import org.semanticweb.owlapi.model.OWLOntologyID;
import org.semanticweb.owlapi.model.OWLOntologyIRIMapper;
import org.semanticweb.owlapi.model.OWLOntologyManager;
import org.semanticweb.owlapi.model.OWLOntologyStorageException;
import org.semanticweb.owlapi.model.OWLSubClassOfAxiom;
import org.semanticweb.owlapi.model.OWLSubObjectPropertyOfAxiom;
import org.semanticweb.owlapi.model.PrefixManager;
import org.semanticweb.owlapi.model.SWRLAtom;
import org.semanticweb.owlapi.model.SWRLObjectPropertyAtom;
import org.semanticweb.owlapi.model.SWRLRule;
import org.semanticweb.owlapi.model.SWRLVariable;
import org.semanticweb.owlapi.model.SetOntologyID;
import org.semanticweb.owlapi.reasoner.BufferingMode;
import org.semanticweb.owlapi.reasoner.ConsoleProgressMonitor;
import org.semanticweb.owlapi.reasoner.InferenceType;
import org.semanticweb.owlapi.reasoner.Node;
import org.semanticweb.owlapi.reasoner.NodeSet;
import org.semanticweb.owlapi.reasoner.OWLReasoner;
import org.semanticweb.owlapi.reasoner.OWLReasonerConfiguration;
import org.semanticweb.owlapi.reasoner.OWLReasonerFactory;
import org.semanticweb.owlapi.reasoner.SimpleConfiguration;
import org.semanticweb.owlapi.reasoner.structural.StructuralReasoner;
import org.semanticweb.owlapi.reasoner.structural.StructuralReasonerFactory;
import org.semanticweb.owlapi.util.AutoIRIMapper;
import org.semanticweb.owlapi.util.DefaultPrefixManager;
import org.semanticweb.owlapi.util.InferredAxiomGenerator;
import org.semanticweb.owlapi.util.InferredOntologyGenerator;
import org.semanticweb.owlapi.util.InferredSubClassAxiomGenerator;
import org.semanticweb.owlapi.util.InferredPropertyAssertionGenerator;
import org.semanticweb.owlapi.util.OWLClassExpressionVisitorAdapter;
import org.semanticweb.owlapi.util.OWLEntityRemover;
import org.semanticweb.owlapi.util.OWLOntologyMerger;
import org.semanticweb.owlapi.util.OWLOntologyWalker;
import org.semanticweb.owlapi.util.OWLOntologyWalkerVisitor;
import org.semanticweb.owlapi.util.SimpleIRIMapper;
import org.semanticweb.owlapi.vocab.OWL2Datatype;
import org.semanticweb.owlapi.vocab.OWLFacet;
import org.semanticweb.owlapi.vocab.OWLRDFVocabulary;
//import org.semanticweb.HermiT.Reasoner;
import org.semanticweb.owlapi.search.EntitySearcher.*;
import static org.semanticweb.owlapi.search.EntitySearcher.getAnnotationObjects;
import uk.ac.manchester.cs.owlapi.modularity.ModuleType;
import uk.ac.manchester.cs.owlapi.modularity.SyntacticLocalityModuleExtractor;
import org.semanticweb.owlapi.manchestersyntax.renderer.*;
import org.semanticweb.owlapi.model.providers.*;
import java.io.*;
import java.util.Scanner;
import org.semanticweb.owlapi.model.OWLAnnotation;
import org.semanticweb.owlapi.model.OWLAnnotationProperty;
import org.semanticweb.owlapi.model.OWLAnnotationValue;
import org.semanticweb.owlapi.search.EntitySearcher;


OWLOntologyManager manager = OWLManager.createOWLOntologyManager();

//annotations =args[1].split(',');

OWLOntology MyOntology = manager.loadOntologyFromOntologyDocument(new File("go.owl"));
OWLObjectRenderer rend =new ManchesterOWLSyntaxOWLObjectRendererImpl ();
OWLDataFactory factory=manager.getOWLDataFactory();
OWLAnnotationProperty  p=factory.getRDFSLabel();

FileWriter fw= new FileWriter ("metadata.lst",true); BufferedWriter bw =new BufferedWriter (fw); PrintWriter out =new PrintWriter (bw);

	Set<OWLClass> classes = MyOntology.getClassesInSignature();
	//rdfsLabels = new HashMap<String,String>();
			for (OWLClass cls:classes) 
			{
				for(OWLAnnotation a : EntitySearcher.getAnnotations(cls, MyOntology)) 
				{
					// properties are of several types: rdfs-label, altLabel or prefLabel
					OWLAnnotationProperty prop = a.getProperty();
					OWLAnnotationValue val = a.getValue();
					if(val instanceof OWLLiteral) 
					{
						myproperty=prop.toString();
						class11=cls.toString();	
						println((class11+ " "+ myproperty+" " + ((OWLLiteral) val).getLiteral()));
	 
				    	}
				}
			}
println ("Done");



The third main step in the OPA2Vec algorithm is using Word2Vec to obtain the vecor representations as shown in the python script below.

In [None]:
import gensim
import gensim.models
import os
import sys


mymodel=gensim.models.Word2Vec.load ("RepresentationModel_pubmed.txt")
sentences =gensim.models.word2vec.LineSentence('ontology_corpus.lst')
mymodel.min_count = 0
mymodel.build_vocab(sentences, update=True)
mymodel.train (sentences,total_examples=mymodel.corpus_count, epochs=mymodel.iter)
word_vectors=mymodel.wv
file= open ('AllVectorResults.lst', 'w')
with open("finalclasses.lst") as f:
	for line in f:
		myclass1=line.rstrip()
		if myclass1 in word_vectors.vocab:		
			print (str(myclass1) + ' '+ str(mymodel[myclass1]) +'\n')
	file.close()
