Skip to content

ObdalibQuestMappinglanguage

mislusnys edited this page Dec 9, 2013 · 4 revisions

Table of Contents

Sources and Mappings (OBDA)

In the Virtual mode data is not given by means of individual assertions (e.g., :individual-mariano rdf:type :Person), but by means of a pair of Data source and a set of Mappings that tell Quest how to use the data from the data source to provide answers to the queries posed to the system.

Data source

Data sources in Quest are relational databases accessed by means of JDBC connections. At the moment, Quest supports only 1 data source definition. Support for multiple sources will be added in the future. In each release of Quest we will add support for sources that do not provide these constructs. The specific roadmap for RDBMS support is as follows:

  • MySQL. available
  • PostgreSQL. available
  • H2. available
  • DB2. available
  • Oracle. available
  • MSSQL. Only if requested by users.

Using DB2

Using DB2 requires the DB2 Universal Driver, which can be downloaded from here. The following is an example of the connection parameters for this driver:

JDBC URL: jdbc:db2://192.168.8.130:50000/SAMPLE:currentSchema=DB2INST1;
Username: user
Password: password
Driver: com.ibm.db2.jcc.DB2Driver

Using Oracle

We have tested Oracle using the thin driver. The following is an example of the connection parameters for this driver:

JDBC URL: jdbc:oracle:thin:@192.168.8.131:1521
Username: user
Password: password
Driver: oracle.jdbc.driver.OracleDriver

Mappings in plain english

Mappings in Quest are a pair of an SQL query over the data source (i.e., the source query) and a query over the ontology (i.e., the target query).

The source query can be any SQL query. All attributes in the query signature must have a proper name, e.g., do not use the symbol "*" and if you use expressions, name these expressions (e.g., SELECT x + y AS column1 FROM ...).

The target query is a conjunctive query with no constants whose arity has the same arity of the source query, and in which all variables appear in the head. More over, all atoms in the query's body are over Classes and Properties of the ontology and the atom in the head of the query is a predicate that is not mentioned in the ontology.

Example. Assume we have two tables in the database, i.e., employee with columns uri, name and project with columns personuri, projecturi. The table employee holds information about employee. The table project holds information about the projects in which employees work. If you ontology contained a Class Person a Object Property works-for and a Data property has-name, the following would be four valid mappings.

m1: SELECT uri, name FROM employee 
        → q($uri, $name) :- :Person($uri)
m2: SELECT uri, name FROM employee 
        → q($uri, $name) :- :has-name($uri,$name)
m3: SELECT uri, name FROM employee 
        → q($uri, $name) :- :Person($uri), :has-name($uri,$name)
m4: SELECT t1.uri as uri1, t2.projecturi as uri2  
            FROM employee t1 JOIN project t2 ON t1.uri = t2.personuri
        → q($uri1, $uri2) :- :Person($uri1), :works-for($uri1,$uri2)

Examples of invalid mappings would be:

m5: SELECT * FROM employee → q($uri, $name) :- :Person($uri), :has-name($uri,$name)
m6: SELECT uri FROM employee → q($uri,$name) :- :Person($uri), :has-name($uri,$name)
m7: SELECT uri FROM employee → q($uri) :- :Person($uri), :has-name($uri,$name)
  • m5 is invalid because the source query uses the symbol *
  • m6 is invalid because the arity of the target query is bigger than the arity of the source query (i.e., 2 > 1, the number of columns in is bigger than the number of columns in SELECT uri FROM employee).
  • m7 is invalid because the body of the target query contains variable, i.e., $name, that does not appear in the head of the query, i.e., q($uri).

Meaning of mappings

In the following we that the default prefix : is defined as <http://ontology.org/example#></http://ontology.org/example#>.

The mappings of an OBDA model define virtual assertions that are used for reasoning, e.g., query answering. These virtual assertions are defined by the answers to the source queries in the mappings and the body of the target queries in the mappings. Unary atoms define virtual Class assertions (e.g., &#58;indvidual1 rdf&#58;type &#58;Person), and binary atoms define virtual Object or Data Property assertions (e.g., &#58;individual1 &#58;has&#45;father &#58;individual2). We will use an example to clarify this idea. Asume that the data for person is as follows:

employee
uri name
person-1
Peter
person-2
Paul
person-3
Mary

and

project
personuri projecturi
person-1
project-10
person-2
project-12

Then the mapping m3 and m4 define the following assertions

:person-1 rdf:type :Person. 
:person-2 rdf:type :Person. 
:person-3 rdf:type :Person. 
:person-1 :has-name "Peter"^^rdfs:Literal.
:person-2 :has-name "Paul"^^rdfs:Literal.
:person-3 :has-name "Mary"^^rdfs:Literal.
:person-1 rdf:type :Person. 
:person-2 rdf:type :Person. 
:person-1 :works-for :project-10.
:person-2 :works-for :project-12.

The first 6 triples are defined by m3, the last 4 triples are defined by m4.

Note that these are virtual triples, they are never really generated and we present them here only to clarify the meaning of the mappings.

URI functions

One of the most powerful features of mappings in Quest is the ability to define mappings for databases that do not hold object URI's. If your database has only data values and not URI, you can indicate Quest that it should construct a URI out of the values in your database using URI functions in the body of your target queries. A URI function is a term of the form

BASEURI($variable1,...,$variablen) 
We can use base URI function in any position where URIs can be used, i.e., in Class atoms, as the first and second terms of Object Property atoms, and as first term of Data Property atoms. Given an assignment &quot;c1&quot;,...,&quot;cn&quot; for $variable1,...,$variablen, the URI constructed by a URI function is exactly:


To clarify the meaning of URI function we will modify the previous example. Assume now that our tables are employee, with column id and name, and project with columns personid, projectid. Let the data be as follows:

employee
id name
1 Peter
2 Paul
3 Mary

and

project
personid projectid
1 10
2 2

Then we can use the following mappings with URI functions to define exactly the same virtual triples as before.

m8&#58; SELECT id, name FROM employee 
        → q($id, $name) &#58;&#45; &#58;Person(&#58;person($id)), &#58;has&#45;name(&#58;person($id),$name)
m9&#58; SELECT t1.id as id1, t2.projectid as id2  
        FROM employee t1 JOIN project t2 ON t1.id &#61; t2.personid
        → q($id1, $id2) &#58;&#45; &#58;Person(&#58;person($id1)), &#58;works&#45;for(&#58;person($id1),&#58;project($id2))

IMPORTANT: If the form of the URI's produced by Quest is not good for you, please contact the authors about this and we will make sure that in future releases of Quest we will enable more flexible ways of constructing URIs.

Creating new data with mappings

In some cases, it might be useful to create data by means of the mappings. For example, you might have a data property &#58;fullName and, in your database, three columns employeeuri firstname and lastname. Obviously, the data of this columns can be used to create &#58;fullName triples, but we need to create data on the fly with SQL. In this case, we can use the string concatenation operation to create a full name out of the first name and last name. For example:

→ q($personuri, $fullname) &#58;&#45; &#58;fullName($personuri, $fullname)

Since we do not restrict the SQL queries in the mappings, you can use any function that you see fit when creating this kind of mappings.

Performance note: Introducing data on-the-fly is ok for performance if no JOIN operations or WHERE clauses are applied to those values. Otherwise, the performance of query answering will degrade, severely if the volumes of data are high. If you absolutely must do this, then you should define FUNCTION INDEXES over your tables. Function indexes are indexes that are based on the result of an operation, most modern DBMS support these kind of indexes (sometimes with different names). The index you create must correspond to the operation that you introduced in your mapping.

Restrictions

  • It is not possible to have mappings with URI functions and mappings without them in the same OBDA model. That is, either your database holds object URIs and none of your mappings use URI functions, or your database holds no object URIs and all your mappings use URI functions. We might relax this restriction in the future on user request.
  • All Classes and Properties mentioned in the target queries must appear in the Ontology.

Typing

At the moment, it is not possible to give fine grained types for the Literals that appear in the virtual triples and every Literal is assumed to be of type rdfs:Literal. In the near future we will extend typing support in the mappings to allow for more types, as well as for language tags.

Data Integration and Non-RDBMS sources

This responds to the question: Can I define multiple data sources in Quest?. It is not possible to do this. However, if you want to define multiple sources to be able to do data integration you can still do it using a database federation tool, e.g., Oracle's db_links. There is some tools that will even allow you to present non-RDBMS sources as if they were normal SQL databases.

What you should do in this case is:

  • Define a virtual database schema with links to the tables of all the databases you want to integrate.
  • Define only one data source in your OBDA model pointing to the virtual database.
  • Create mappings over the virtual database as if it was a normal database.
  • You are done, query as usual.
Performance of this setup depends on the capabilities of the data federation tool, the volumes of data involved in the queries and the performance of the real databases.

Performance considerations

UNDER CONSTRUCTION.

Here we will add information about considerations to take while creating the mappings for a OBDA model to guarantee performance. Things we will discuss include:

  • NEVER use DISTINCT in a mapping.
  • NEVER use ORDER BY in a mapping.
  • Dont use UNION or UNION ALL in mappings, instead, add multiple mappings.
  • 'Tough' SQL queries
  • Avoiding the generation of duplicate virtual data, i.e., do not add mapping for A/R with the same SQL or with SQL that is contained in the answers to the SQL of an existing mapping for A/R.
  • Avoiding the use of nesting in SQL queries in mappings.
  • Avoiding the generation of values on the fly with SQL in the mappings, e.g., SELECT id, x + y as myvalue FROM t &#45;&gt; q(?myvalue) &#58;&#45; &#58;has&#45;age(&#58;person($id), $myvalue)
  • Use of materialized views for performance
  • Indexes for performance
Clone this wiki locally