Branch | build status |
---|---|
master | |
version4 |
Ontop is a Virtual Knowledge Graph system. It exposes the content of arbitrary relational databases as knowledge graphs. These graphs are virtual, which means that data remains in the data sources instead of being moved to another database.
Ontop translates SPARQL queries expressed over the knowledge graphs into SQL queries executed by the relational data sources. It relies on R2RML mappings and can take advantage of lightweight ontologies.
The OntopSpark extension developed by Chimera, enables Ontop to perform OBDA on relational data using Apache Spark as distributed query processing engine. This opens new possible scenarios, where OBDA can be applied to Data Lakes.
The extension work consists into the implementation of 7 Java classes that manages the interaction with the databases. The classes have been inserted into the ontop-rdb package, and are the following ones:
SparkSQLDBMetadataProvider.java
: is the class in charge of reading the database metadata by interacting with the Apache Spark ThriftServer using JDBC calls. In this case, has been necessary to manually retrieve the default schema and the matedata because the default calls are not supported by the JDBC (OntopSpark is developed to interact with HiveJDBC). We have also implementedSparkSQLQuotedIDFactory.java
.SparkSQLDBFunctionSymbolFactory.java
: is the class that manages the translation of SPARQL functions into SQL functions and it's the implementation of AbstractSQLDBFunctionSymbolFactory. In this implementation, the main issue is related to the timestamp translation from SparkSQL to SPARQL and vice-versa. We have decided to translate the SparkSQL timestamp datatypes in the standard format “yyyy-MM-ddTHH:mm:ss.SSSxxx” and the denormalization inSparkSQLTimestampDenormFunctionSymbol.java
. Another issue was related to\
characters in the VKG formulations, which are interpreted by SparkSQL as escape characters and have been fixed inSparkSQLEncodeURLorIRIFunctionSymbolImpl.java
.SparkSQLDBTypeFactory.java
: defines the datatypes that can be used by Ontop when performing queries to build the internal VKG representation. We have to adapt the class to the SparkSQL datatypesSparkSQLSelectFromWhereSerializer.java
implements some minor features to manage the SPARQL 'OFFSET' clause that is not supported by SparkSQL, while theAlwaysProjectOrderByTermsNormalizer.java
has not required any adjustment.
The project is a Maven project. Compiling, running the unit tests, building the release binaries all can be done using maven. Currently, we use Maven 3 and Java 8 to build the project.
- Official Website and Documentation
- SourceForge Download
- Docker Hub
- GitHub
- GitHub Issues
- Google Group
- Travis CI
The Ontop framework is available under the Apache License, Version 2.0
Copyright (C) 2009 - 2020 Free University of Bozen-Bolzano
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
All documentation is licensed under the Creative Commons (attribute) license.