SemanticindexLubm

Table of Contents Semantic Index LUBM experimentation The ontology The queries The reasoner Query statistics The DAG, Indexes and Ranges of the Semantic Index for LUBM Reproducing the test How to read the LOG files Authors

Note for ISWC reviewers: We have just released a preview of the reasoner that implements the semantic index technique. You can find it in this link. You will also be able to test the performance of the queries using the LUBM-5 universities we prepared here.

Semantic Index LUBM experimentation

This experimentation has as objective to show the effectTBox optimizations in the reformulation process. In particular, we experiment with dependency induction in the LUBM benchmark scenario. To reproduce this test please use the instructions of to the last Section of this site.

The ontology

For this experiments we used the DL-Lite version of LUBM is the attachment to this page named attachment:univ-bench-dllitea.owl To create this file we used the original [attachment:univ-bench-original.owl] and a) removed all equivalence axioms and replaced them for subclass axioms and b) removed all transitive axioms.

Note that this file is not the same ontology as the one identified as [attachment:univ-bench-incomplete] that appears in several papers about evaluation of query rewritings for DL-Lite. U.owl is not only a simplification of LUBM w.r.t. expressivity, but it also removes part of the hierarchies and renames some classes. We decided to not use this version to stay with the more expressive variation that one gets by simplifying the original ontology.

The queries

We used two sets of queries.

We used the original queries from the LUBM benchmark and can be found here. The original queries are expressed in a KIF-like language. Now we present the queries expressed as CQs and SPARQL queries, valid syntaxes for the reasoner we used (Quest).

Note that Quest query syntax is aware of namespaces declared in a SPARQL preamble. In all these queries the following preamble is attached

PREFIX : <http://www.lehigh.edu/~zhp2/2004/0401/univ-bench.owl#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>

Original queries from LUBM:

SELECT ?x WHERE { ?x a :GraduateStudent. ?x :takesCourse <http://www.Department0.University0.edu/GraduateCourse0> .}

2. SELECT ?x ?y WHERE { ?y a :University. ?z a :Department. ?x a :GraduateStudent. ?x :memberOf ?z. ?z :subOrganizationOf ?y. ?x :undergraduateDegreeFrom ?y.} 3. SELECT ?x WHERE { ?x a :Publication. ?x :publicationAuthor <http://www.Department0.University0.edu/AssistantProfessor0> . } 4. SELECT ?x ?y1 ?y2 ?y3 WHERE {?x a :Professor. ?x :worksFor <http://www.Department0.University0.edu>. } 5. SELECT ?x WHERE { ?x a :Person. ?x :memberOf <http://www.Department0.University0.edu> . } 6. SELECT ?x WHERE { ?x a :Student . } 7. SELECT ?x ?y WHERE { ?x a :Student. ?y a :Course. <http://www.Department0.University0.edu/AssociateProfessor0> :teacherOf ?y. ?x :takesCourse ?y } 8. SELECT ?x ?y ?z WHERE { ?x a :Student. ?y a :Department. ?x :memberOf ?y. ?y :subOrganizationOf <http://www.University0.edu> . ?x :emailAddress ?z . } 9. SELECT ?x ?y ?z WHERE { ?x a :Student. ?y a :Faculty . ?z a :Course. ?x :advisor ?y. ?x :takesCourse ?z. ?y :teacherOf ?z . } 10. SELECT ?x WHERE { ?x a :Student . ?x :takesCourse <http://www.Department0.University0.edu/GraduateCourse0> } 11. SELECT ?x WHERE { ?x a :ResearchGroup . ?x :subOrganizationOf <http://www.University0.edu> } 12. SELECT ?x ?y WHERE {?x a :Chair . ?y a :Department . ?x :worksFor ?y . ?y :subOrganizationOf <http://www.University0.edu> } 13. SELECT ?x WHERE {?x a :Person . <http://www.University0.edu> :hasAlumnus ?x} 14. SELECT ?x WHERE {?x a :UndergraduateStudent}

We also used the following queries from papers about reformulation techniques (tagged U-X). These queries are more general and less selective. They where created by the authors of papers about reformulations from 2009 and 2010 to show the improvement of their techniques. The difficulty of these queries mainly resides in reasoning with respect to qualified existentials. The queries:

U-1. SELECT ?x WHERE { ?x :worksFor ?y. ?y :affiliatedOrganizationOf ?z}

2. U-2. SELECT ?x ?y WHERE { ?x a :Person. ?x :teacherOf ?y . ?y a :Course } 3. U-3. SELECT ?x ?y ?z WHERE { ?x a :Student. ?x :advisor ?y . ?y a :Faculty. ?x :takesCourse ?z. ?y :teacherOf ?z. ?z a :Course } 4. U-4. SELECT ?x ?y WHERE { ?x a :Person. ?x :worksFor ?y. ?y a :Organization } 5. U-5. SELECT ?x WHERE { ?x a :Person. ?x :worksFor ?y. ?y a :University. ?y :hasAlumnus ?x }

Note: Query 3 had a typo w.r.t. to the vocabulary of LUBM. Instead of :Faculty, the query used :FacultyStaff, here this has been fixed.

The reasoner

The reasoner used for this experimentation is Quest. Quest is a new reasoner for Ontology Based Data Access (OBDA) based on query reformulation and tuned for handling very large volumes of data. Quest is currently under heavy development and will be released as a open-source project in Summer 2011.

Optimizations included in Quest:

Selective unification: We avoid the unification of arbitrary atoms. We only unify two atoms R(x,y), R(m,z) if there exists an assertion B ISA exists R that could be applied to the result of the unification.
Removal of redundant atoms: at any moment we avoid redundant atoms in the body of a query. An atom is redundant if it can be satisfied by the binding of another atom. This is a CQC based optimization.
Syntactic query containment. We remove all redundant queries w.r.t. query containment based on the String of the atoms in the body of the query. This is a special, trivial, case of general Conjunctive Query Containment (CQC).
General CQC. We remove all redundant queries with respect to (CQC).

Query statistics

The following summarizes the data resulting from this experiment.

Columns descriptions: Each column contains the metrics for a single query identified by the value of the first row.

Row descriptions:

 '''r<sub>1</sub> and r'<sub>1</sub>''':: 
   Number of UCQs produced by the reasoner quest using the original TBox and the optimized Tbox w.r.t. dependencies induced by the Semantic Index.
 '''r<sub>2</sub> and r'<sub>2</sub>''':: 
   Number of UCQs produced by Quest after optimizing the initial reformulations with respect to conjunctive query containment.
 '''t''' and '''t' ''':: 
   Time in milliseconds required for Quest to produce the optimal reformulation with respect to homomorphism-based query containment.
 '''r'<sub>3</sub>''':: 
   Number of UCQs resulting from cleaning the reformulations produced with the optimal TBox by means of the set of dependencies Sigma induced by the semantic index. This step was done manually since Quest doesn't implement CQC w.r.t. to dependencies at the moment.

																			Queries
Metric	1	2	3	4	5	6	7	8	9	10	11	12	13	14	U-1	U-2	U-3	U-4	U-5	=
r₁	3	108	17	63	423	3	72	54	936	9	3	14	235	1	7	376	858	9212	9870
r'₁	1	10	1	1	130	1	1	5	1	1	1	1	25	1	1	26	1	208	52
r₂	1	4	1	18	4	3	3	12	3	3	2	2	5	1	2	1	3	2	10
r'₂	1	4	1	1	21	1	1	2	1	1	1	1	1	1	1	1	1	140	1
t,,	340	195	10	53	187	4	30	37	341	4	4	6	35	7	6	80	215	2567	3646
t'	370	21	4	3	105	3	11	9	4	3	5	3	10	2	3	14	5	639	22
r'₃	1	1	1	1	1	1	1	1	1	1	1	1	1	1	1	1	1	1	1

The following pages contain the raw data used to obtain the following files:

[attachment:final-log-T.log] The full log file produced by Quest for the production of the reformulations of all queries using the original TBox.

2. [attachment:final-log-T-sigma.log] The full log produced by Quest for the production of the reformulations and SQL queries using the optimized TBox and the mappings generated by the semantic index.

The DAG, Indexes and Ranges of the Semantic Index for LUBM

The following are the images of the DAGs used for computing the indexes and ranges for the classes and roles in LUBM. The images also contain the values of the indexes and ranges assigned to them in this run. Note that you might get other values as this can depend on the order in which the axioms are read. However, the number of ranges will be roughly the same always.

NOTE ABOUT ORDER: Note that the order of the edges and nodes in this plot does not correspond to the actual order of the DAG. This is a flaw of our plotting tool. However, the plot is still illustrative of the result.

The Class hierarchy:

The Role hierarchy:

Reproducing the test

First download the ISWC package from this url. To unpack the package do: unzip quest-iswc.zip

To produce the rewriting for all 19 queries in classic reformulation do:

java -Xmx1024M -cp quest-iswc.jar:. it.unibz.krdb.obda.LUBM.LUBMExecutionHelper rewrite classic univ-bench-dllitea.owl univ-bench-dllitea.obda

NOTE TO WINDOWS USERS: If you use windows use you must use the semicolon as classpath separator. That is, you will need to modify the commands shown here to use -cp quest-iswc.jar;. instead of -cp quest-iswc.jar:..

This will read the ontology from univ-bench-dllitea.owl and setup Quest. Once this is done, it will read all the SPARQL queries in univ-bench.dllitea.obda (the 19 queries of this test) and request the reformulation for each of them. You will be able to see the log printed during this process. There you will find all the relevant information.

To request Quest to produce the Log for the reformulations using semantic index mode call:

java -Xmx1024M -cp quest-iswc.jar:. it.unibz.krdb.obda.LUBM.LUBMExecutionHelper rewrite semindex univ-bench-dllitea.owl univ-bench-dllitea.obda

This will produce a log with all the relevant information as before.

If you want the SQL queries for the optimal rewritings use:

java -Xmx1024M -cp quest-iswc.jar:. it.unibz.krdb.obda.LUBM.LUBMExecutionHelper unfold semindex univ-bench-dllitea.owl univ-bench-dllitea-semindex-optimal.obda

If you want to generate the DAG figures on your own do: java -Xmx1024M -cp quest-iswc.jar:. it.unibz.krdb.obda.LUBM.LUBMExecutionHelper rewrite semindex univ-bench-dllitea.owl univ-bench-dllitea.obda graph

However, this requires the file /usr/bin/dot in your system.

How to read the LOG files

Quest produces a log file in which you can see all the debugging messages for the initialization of the reasoner as well as the execution of the reformulation/unfolding for each of the queries. Now we describe how to read these last messages.

The debugging information for one query starts with the header:

11:30:25.296 INFO  - ##################  Rewriting query: query-2

This line corresponds to the query id.

And ends with the lines:

11:30:25.317 DEBUG - Final size of the reformulation: 4
11:30:25.317 INFO  - Time elapsed for reformulation: 0.0s
11:30:25.317 INFO  - Total time for rewriting: 21

The first line corresponds to the value r2. The last line corresponds to the value t.

During the reformulation of the query Quest distinguishes from a Main loop stage, and a CQC optimization stage. The result of the main loop is indicated with the lines:

11:30:25.311 DEBUG - Main loop ended. Queries produced: 10

This value corresponds to the column r1.

In the logs you can also see the input queries as well as the reformulated queries. Note that the name of the predicates in these queries are the full URIs of the concepts or roles. If you need to read these queries, it is better if you process the logs replacing the base URI of these predicates for an empty string. For example, the output query reformulation for query 4 is originally

q(x, y) :- http://www.lehigh.edu/~zhp2/2004/0401/univ-bench.owl#hasAlumnus(y, _), http://www.lehigh.edu/~zhp2/2004/0401/univ-bench.owl#Department(z), http://www.lehigh.edu/~zhp2/2004/0401/univ-bench.owl#GraduateStudent(x), http://www.lehigh.edu/~zhp2/2004/0401/univ-bench.owl#member(z, x), http://www.lehigh.edu/~zhp2/2004/0401/univ-bench.owl#subOrganizationOf(z, y), http://www.lehigh.edu/~zhp2/2004/0401/univ-bench.owl#undergraduateDegreeFrom(x, y)
q(x, y) :- http://www.lehigh.edu/~zhp2/2004/0401/univ-bench.owl#hasAlumnus(y, _), http://www.lehigh.edu/~zhp2/2004/0401/univ-bench.owl#Department(z), http://www.lehigh.edu/~zhp2/2004/0401/univ-bench.owl#GraduateStudent(x), http://www.lehigh.edu/~zhp2/2004/0401/univ-bench.owl#memberOf(x, z), http://www.lehigh.edu/~zhp2/2004/0401/univ-bench.owl#subOrganizationOf(z, y), http://www.lehigh.edu/~zhp2/2004/0401/univ-bench.owl#undergraduateDegreeFrom(x, y)
q(x, y) :- http://www.lehigh.edu/~zhp2/2004/0401/univ-bench.owl#University(y), http://www.lehigh.edu/~zhp2/2004/0401/univ-bench.owl#Department(z), http://www.lehigh.edu/~zhp2/2004/0401/univ-bench.owl#GraduateStudent(x), http://www.lehigh.edu/~zhp2/2004/0401/univ-bench.owl#memberOf(x, z), http://www.lehigh.edu/~zhp2/2004/0401/univ-bench.owl#subOrganizationOf(z, y), http://www.lehigh.edu/~zhp2/2004/0401/univ-bench.owl#undergraduateDegreeFrom(x, y)
q(x, y) :- http://www.lehigh.edu/~zhp2/2004/0401/univ-bench.owl#University(y), http://www.lehigh.edu/~zhp2/2004/0401/univ-bench.owl#Department(z), http://www.lehigh.edu/~zhp2/2004/0401/univ-bench.owl#GraduateStudent(x), http://www.lehigh.edu/~zhp2/2004/0401/univ-bench.owl#member(z, x), http://www.lehigh.edu/~zhp2/2004/0401/univ-bench.owl#subOrganizationOf(z, y), http://www.lehigh.edu/~zhp2/2004/0401/univ-bench.owl#undergraduateDegreeFrom(x, y)

However, we can replace the URI http://www.lehigh.edu/~zhp2/2004/0401/univ-bench.owl# for the empty string and it will become as follows:

q(x, y) :- hasAlumnus(y, _), Department(z), GraduateStudent(x), member(z, x), subOrganizationOf(z, y), undergraduateDegreeFrom(x, y)
q(x, y) :- hasAlumnus(y, _), Department(z), GraduateStudent(x), memberOf(x, z), subOrganizationOf(z, y), undergraduateDegreeFrom(x, y)
q(x, y) :- University(y), Department(z), GraduateStudent(x), memberOf(x, z), subOrganizationOf(z, y), undergraduateDegreeFrom(x, y)
q(x, y) :- University(y), Department(z), GraduateStudent(x), member(z, x), subOrganizationOf(z, y), undergraduateDegreeFrom(x, y)

Authors

Sergejs Pugacs
Mariano Rodriguez Muro

SemanticindexLubm

Table of Contents