Skip to content
akinanop edited this page Jun 22, 2015 · 11 revisions

OBS Project Report

Nika Pona
23.04.2015

###Aim of the project

In Ontop system one queries the data using SPARQL query language. SPARQL queries are then optimized through Datalog rewriting and then translated into simpler SQL queries. Currently many SPARQL functions are not supported. The goal of my project was to implement some missing functions and create tests for them. The implementation involved extending classes for Datalog translation and SQL generation. The latter process is quite complex since functions on strings and hash functions are implemented differently in various SQL dialects. I tested the implemented functions on an H2 and PostgreSQL database with simple Books ontology and corresponding mappings. To see the changes I made click here or go to https://github.com/akinanop/ontop.

###Contribution

I implemented SPARQL functions on Strings, Numerics, Date and Time, as well as Hash functions, which are called in SparqlAlgebraToDatalogTranslator. I made changes in the following classes:

OBDAVocabulary.java | OBDADataFactory.java| OBDADataFactoryImpl.java| SparqAlgebraToDatalogTranslator.java | SQLGenerator.java

Since functions on Strings and Hash functions are treated differently in some dialects, I had to change the dialect adapter classes as well:

SQLDialectAdapter.java | SQL99DialectAdapter.java | H2SQlDialectAdapter.java | PostgreSQLDialectAdapter.java | MySQLDialectAdapter.java | TeiidSQLDialectAdapter.java | DB2SQLDialectAdapter.java | SQLServerDialectAdapter.java

One can see the list of available SPARQL functions and their description here. Below I list the functions I implemented and provide a table summarizing the differences in their implementation within various SQL dialects.

####Functions on strings

xsd:integer  STRLEN(string literal str)
string literal  SUBSTR(string literal source, xsd:integer startingLoc)
string literal  SUBSTR(string literal source, xsd:integer startingLoc, xsd:integer length)
string literal  UCASE(string literal str)
string literal  LCASE(string literal str)
literal  STRBEFORE(string literal arg1, string literal arg2)
literal  STRAFTER(string literal arg1, string literal arg2)
simple literal ENCODE_FOR_URI(string literal ltrl)

####Boolean functions on strings

xsd:boolean  STRENDS(string literal arg1, string literal arg2)
xsd:boolean  STRSTARTS(string literal arg1, string literal arg2)
xsd:boolean  CONTAINS(string literal arg1, string literal arg2)

####Functions on RDF tems

simple literal  STRUUID()

####Dialect Adapters

In columns corresponding to SQL dialects I list functions I used in implementation that differ from the default behavior (SQL99). To see the actual implementation consult the corresponding SQL Adapter class.

SPARQL SQL99/SQLServer MySQL H2 PostgreSQL
STRLEN() LENGTH() CHAR_LENGTH() SQL99 CHAR_LENGTH()
SUBSTR() SUBSTR() SQL99 SQL99 SUBSTRING(... FROM ... FOR)
UCASE(), LCASE() UPPER(), LOWER() SQL99 SQL99 SQL99
STRBEFORE() LEFT(),CHARINDEX() INSTR() SQL99 POSITION()
STRAFTER() REGEX_REPLACE() N/A SQL99 SQL99
ENCODE_FOR_URI() REPLACE() SQL99 SQL99 SQL99
UUID() UUID() N/A RANDOM_UUID() SQL99
STRENDS() CHARINDEX(),LENGTH() CHAR_LENGTH(),INSTR() SQL99 CHAR_LENGTH(), POSITION()
STRSTARTS() RIGHT(),LENGTH() CHAR_LENGTH(),RIGHT() SQL99 CHAR_LENGTH()
CONTAINS() CHARINDEX() INSTR() SQL99 POSITION()

SQL Dialect Reference wiki page contains an (incomplete) comparison table of functions on strings for various SQL dialects that can be used for further development of the system.

####Numeric functions

Numeric functions are uniform across SQL dialects, thus their implementation was quite straightforward and didn't require the usage of Dialect Adapter.

numeric  ABS (numeric term)
numeric  ROUND (numeric term)
numeric  CEIL (numeric term)
numeric  FLOOR (numeric term)
xsd:double  RAND ( )

####Hash functions

Most dialects support just one hash algorithm, so the default behavior is to throw an error.

simple literal  MD5 (xsd:string arg)
simple literal  SHA1 (xsd:string arg)
simple literal  SHA256 (xsd:string arg)
simple literal  SHA512 (xsd:string arg)
DB Built-in hash functions:
H2 SHA256
PostgreSQL MD5
SQL Server MD5, SHA1, SHA512, SHA256
Oracle MD5, SHA1
MySQL SHA1, MD5
Teiid None
HSQLDB None
DB2 None

####Functions on Date and Time

Functions on Date and Time are mostly uniform across SQL dialects with the exception for timezone extraction.

DB Built-in functions on date and time: Syntax (timezone)
SQL99 all EXTRACT(TIMEZONE FROM ... )
H2 all except Timezone N/A
Postgress all EXTRACT(TIMEZONE_ABBR FROM ...)
SQL Server all DATEPART(..., ...)
Oracle all EXTRACT(TZoffset FROM ...)
MySQL all except timezone N/A
Teiid all except timezone N/A
HSQLDB all EXTRACT(TIMEZONE_HOUR FROM ... )
DB2 all EXTRACT(TIMEZONE_HOUR FROM ... )

###Test cases

I extended Books ontology and mappings to create new test cases for the implemented functions on H2 database.

This is the database I used:

CREATE TABLE books (
    id int NOT NULL,
    title character varying(100),
    price int,
    discount decimal,
    description character varying(100),
    lang character varying(100),
    publication_date date
);


INSERT INTO books VALUES (1, 'SPARQL Tutorial', 42.50, 0.2, 'good', 'en', '2014-06-05' );
INSERT INTO books VALUES (2, 'The Semantic Web', 23, 0.25, 'bad', 'en', '2011-12-08' );
INSERT INTO books VALUES (3, 'Crime and Punishment', 33.50, 0.2, 'good', 'en', '1866-07-01' );
INSERT INTO books VALUES (4, 'The Logic Book: Introduction, Second Edition', 10, 0.15, 'good', 'en', '1967-11-05' );

ALTER TABLE books
    ADD CONSTRAINT books_pkey PRIMARY KEY (id);

You can see the tests here: BindTestWithFunctions. Moreover, I made the same tests for PostgreSQL database to see how the dialect adapter works: BindTestWithFunctionsPostgreSQL. The tests show that the functions work as expected for H2 and PostgreSQL. ###Summary of the outcome

  • Functions on Strings, Numerics, Date and Time, as well as Hash functions specified are implemented and tested for H2 and PostgreSQL dialect and ready to be used;
  • Dialect adapaters for Date/Time and Hash functions are completed and ready to be tested for all dialects;
  • Dialect adapters for functions on Strings are ready to be tested for SQLServer and MySQL dialects;
Clone this wiki locally