Skip to content
Benjamin Cogrel edited this page Apr 23, 2015 · 4 revisions

Table of Contents

Mapping, Ontology, and Query Design Tips

Mapping Design Tips

  • Keep the SQL simple:
    • Use the supported SQL language to avoid sub-queries: See SQL Issues
    • High selectivity:
    • Do join over key attributes. If your database don't have them, add them manually. See this
    • Use tables with Primary and Foreign Key:
    • Avoid redundant mappings. Try to use a minimal number of mappings.
    • Do not use UNION in SQL, use multiple mappings instead.
    • Use consistent naming: either use always full table name (e.g. schema.table) or always use the shortcut (e.g. table). Do not mix both.
    • Avoid unnecessary joins in the source query. They are the main performance issue, since they can slow down queries up to 100 times.
    • Avoid complex queries, SELECT-PROJECT-JOIN queries work best.
  • Avoid several mappings for the same concept/property. Try to use UNION queries in a single mapping.
  • Use the same template for the same concept/property: The following scenario is discouraged:
 :Well-1-{x} rdf:type :A;   <-  SELECT x FROM sometable
 :Well-2-{x} rdf:type :A;   <-  SELECT x FROM sometable2
  • If you need data types, add them in the mapping and not in the ontology.
  • Try to build URIs from attributes that do not contain spaces or special characters such as @. In addition, turn off the SQL_GENERATE_REPLACE feature.
  • Use primary keys as part of the URI templates if possible.
  • You can force the creation of a sub-view by adding a DISTINCT. It is something cheaper to join with a SELECT DISTINCT sub-view than with a highly redundant multi-set.



Ontology Design Tips

  • Avoid declaring properties domain and range unless you have good foreign keys on your tables.
  • Try to avoid long concept/properties hierarchies.
  • Try to avoid imports.

SPARQL Query Design Tips

  • Be specific when writing the query. The more specific concepts and properties you use, the faster we will perform.
  • Try to avoid OPTIONALS. In the SQL translation, OPTIONALS are translated into LEFTJOIN.
  • Try not to use variables in the property/class position.

Database Tips

  • Define primary and foreign keys in your database.
  • Memory: set your database to use around 2/3 of available memory.
  • Disks: in most cases they are the bottlenecks of database performance. If possible put your data on a faster disk.
  • Indexes and Keys: create indexes in your SQL tables for columns that are used in JOINs and or WHERE clauses. Quest uses optimizations based mainly on primary keys. Indexes speed queries a lot. Use indexes where possible (when your data is more or less static, there are few inserts, or space is not an issue).

Benchmarking Tips

  • System warmup: for the best performance allow the system to "warm up". First queries are always slower, because of connections to database, initializations, etc...
In particular, when you have a set of queries on which you would like to obtain maximal performance, here are the things that you can do:
  • analyze your queries, check which tables they use, see if the database uses indexes on the selected columns. This you can do by analyzing the query plan. Copy the generated SQL query, prefix it with the keyword "EXPLAIN", and execute it over an SQL querying interface over the database. It will give you information about which indexes and keys it uses, and from here you can get a hint if you need to create a new, possibly combined index on the table columns.
Example: having a database table about train logistics event (iLog scenario)
ituevent ID ITUEventNumber eventTime terminalID trainID ITUID ITUEventTypeID

and the query:

PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX : <http://www.semanticweb.org/ontologies/2011/10/iLogOntology.owl#>
SELECT DISTINCT ?terminal  ?itu
WHERE {
?event a :ITUEvent;
 :takesPlaceAt ?terminal;
 :happensToITU ?itu;
 :hasEventTime ?time.
FILTER(?time < "2011-11-12T00:00:00.000Z"^^xsd:dateTime)}
ORDER BY ?terminal

Although terminalID, trainID, ITUID and ITUEventTypeID are foreign keys, and therefore are indexed automatically by the database, the use of these indexes does not result in the best performance. A new, combined index is added on the columns: eventTime, terminalID, ITUID.

In order to analyze the query plan, we execute the explain command on the resulting SQL string:

EXPLAIN
SELECT *
FROM (
SELECT DISTINCT 
   1 AS "terminalQuestType", NULL AS "terminalLang", CAST(CONCAT('http://www.semanticweb.org/ontologies/2011/10/iLogOntology.owl#terminal/',(CAST(QVIEW1.<tt>terminalID</tt> AS CHAR))) AS CHAR) AS <tt>terminal</tt>, 
   1 AS "ituQuestType", NULL AS "ituLang", CAST(CONCAT('http://www.semanticweb.org/ontologies/2011/10/iLogOntology.owl#itu/',(CAST(QVIEW1.<tt>ITUID</tt> AS CHAR))) AS CHAR) AS <tt>itu</tt>
 FROM 
<tt>ituevent</tt> QVIEW1
WHERE 
QVIEW1.<tt>ID</tt> IS NOT NULL AND
QVIEW1.<tt>terminalID</tt> IS NOT NULL AND
QVIEW1.<tt>ITUID</tt> IS NOT NULL AND
QVIEW1.<tt>eventTime</tt> IS NOT NULL AND
(QVIEW1.<tt>eventTime</tt> < '2011-11-12 00:00:00+00:00')
) SUB_QVIEW
ORDER BY SUB_QVIEW.<tt>terminal</tt>
  • analyze the generated UCQ and SQL for unneccessary joins. First, the length of the query can make you suspicious that there are unnecessary fragments. Analyze your mappings, to see that for each class, object and data property you have only one mapping as a source of materialization, if possible. Be sure not to use joins inside the mappings.
  • keep a minimal number of mappings
Example: having three tables
Book Author Editor
bid aid eid
title name name
authorID birth_date
edition death_date
editorID nationality

The table Book contains the foreign keys authorID and editorID linking to other tables Author and Editor. When constructing the mappings, the URI-s contain the primary keys, i.e. the id-s of each table. One can make for the table Books two mappings as follows:

mappingId       Books
target		<"&:;book/{$bid}/"> a :Book; :title $title . 
source		select bid, title from Book

mappingId       Book has Author
target		<"&:;book/{$bid}/"> :author <"&:;author/{$authorid}/">
source		select bid, authorID from Book

OR one could compact the two mappings into one as follows:

mappingId       Books
target		<"&:;book/{$bid}/"> a :Book; :title $title; :author <"&:;author/{$authorid}/"> . 
source		select bid, title, authorID from Book
  • when your database is properly set up with primary keys, foreign keys and indexes, and you have the clearest set of mappings, you can do the following: increase database memory buffer to almost maximum, warm up (run the query several times) before you record the performance values.
  • use "DISTINCT" and "ORDER BY" in the SPARQL query only when necessary. It slows down performance significantly with large result sets.
Clone this wiki locally