Skip to content

ObdalibQuestPerformance

mislusnys edited this page Dec 9, 2013 · 5 revisions

Table of Contents

Performance Notes

Large TBoxes (e.g., SNOMED CT and such) (VIRTUAL or CLASSIC)

Version 1.6 of Quest has performance big TBoxes, and it sometimes have problems with the official syntax. We currently use OWLAPI 2.2 to do parsing, this is an old version that it not compatible with the current standard. Moreover, it requires big amounts of memory to do the parsin. Medium sized ontologies (few thousand axioms) should load fine, but you will see a big memory consumption.

We are now preparing version 1.7, and improving this aspect of the system. We are using SNOMED CT as benchmark for the improvements in the loading mechanism. The release should come on time, the first quarter of 2012.

In classic OBDA mode

At the moment, Quest's performance excels in classic mode using Semantic Index-based reasoning. In this mode you will be able to deal with very large ontologies (e.g., deep and wide hierarchies) even in the presence of large amounts of data. Do not use other modes of classic reasoning unless you are experimenting with Quest. We have tested this mode with ontologies containing more than 2 million subClass assertions and more than 1 billion data triples (ABox assertions). We do not expect you to have performance issues in classic mode if you stay with the default configuration. There are some "edge" cases that could trigger bad performance, however, we dont expect to see these in the wild, please, if you do find cases with bad performance, please contact us, we will gladly study the case

In virtual OBDA mode

In the virtual mode, Quest will perform very well in many cases, however, the system lacks many optimizations that put it at a high disadvantage w.r.t. to the performance of classic mode. Performance problems will rise in the current conditions:

  • A lot of multiplicity in the mappings, e.g., many mappings for the same Classes or Properties.
  • Ontologies with deep hierarchies.
  • Redundancy in the mappings
This will soon change when we implement the T-mappings technique, the equivalent of the semantic index technique for virtual OBDA.

However, we already have the theory to optimize the system in virtual mode [1], the key technique for this is called T-Mappings for OBDA, and it is basically a query preserving transformation of the mappings of the system that allows us to simplify the rewriting process and the SQL generated by the system. Once T-Mappings are implemented in Quest, the system will get a dramatic boost in performance in the virtual OBDA setting, similar to the performance boost that you get when you use "Semantic Index" instead of "direct" modes. No more exponential SQL queries!. T-mappings are going to be available in version 1.7, scheduled for the first quarter of 2012.

Dependencies: Making ontology based data access work in practice Mariano Rodriguez-Muro and Diego Calvanese. In Proc. of the 5th Alberto Mendelzon Int. Workshop on Foundations of Data Management (AMW 2011), volume 749 of CEUR Electronic Workshop Proceedings, 2011. pdf

Benchmarking. Please contact the authors if you are going to perform benchmarks with Quest to make sure you are using it in the best possible way. Benchmarking Quest in virtual mode is not advised at the moment since the methods implemented for virtual mode are outdated.

Virtual mode performance

Here we will add information about considerations to take while creating the mappings for a OBDA model to guarantee performance. Things we will discuss include:

  • NEVER use DISTINCT in a mapping.
  • NEVER use ORDER BY in a mapping.
  • Dont use UNION or UNION ALL in mappings, instead, add multiple mappings.
  • "Tough" SQL queries
  • Avoiding the generation of duplicate virtual data, i.e., do not add mapping for A/R with the same SQL or with SQL that is contained in the answers to the SQL of an existing mapping for A/R.
  • Avoiding the use of nesting in SQL queries in mappings.
  • Avoiding the generation of values on the fly with SQL in the mappings, e.g., SELECT id, x + y as myvalue FROM t -> q(?myvalue) :- :has-age(:person($id), $myvalue)
  • Use of materialized views for performance
  • Indexes for performance
Clone this wiki locally