Skip to content

OntopSpark is an Ontop's extension which supports Apache Spark as query processing engine. The Ontop platform allows to query relational databases as Virtual RDF Knowledge Graphs using SPARQL.

License

Notifications You must be signed in to change notification settings

chimera-suite/OntopSpark

 
 

Repository files navigation

Maven Central GitHub license SourceForge Twitter

Branch build status
master Build Status
version4 Build Status

Ontop

Ontop is a Virtual Knowledge Graph system. It exposes the content of arbitrary relational databases as knowledge graphs. These graphs are virtual, which means that data remains in the data sources instead of being moved to another database.

Ontop translates SPARQL queries expressed over the knowledge graphs into SQL queries executed by the relational data sources. It relies on R2RML mappings and can take advantage of lightweight ontologies.

OntopSpark extension

The OntopSpark extension developed by Chimera, enables Ontop to perform OBDA on relational data using Apache Spark as distributed query processing engine. This opens new possible scenarios, where OBDA can be applied to Data Lakes.

The extension work consists into the implementation of 7 Java classes that manages the interaction with the databases. The classes have been inserted into the ontop-rdb package, and are the following ones:

  • SparkSQLDBMetadataProvider.java: is the class in charge of reading the database metadata by interacting with the Apache Spark ThriftServer using JDBC calls. In this case, has been necessary to manually retrieve the default schema and the matedata because the default calls are not supported by the JDBC (OntopSpark is developed to interact with HiveJDBC). We have also implemented SparkSQLQuotedIDFactory.java.
  • SparkSQLDBFunctionSymbolFactory.java: is the class that manages the translation of SPARQL functions into SQL functions and it's the implementation of AbstractSQLDBFunctionSymbolFactory. In this implementation, the main issue is related to the timestamp translation from SparkSQL to SPARQL and vice-versa. We have decided to translate the SparkSQL timestamp datatypes in the standard format “yyyy-MM-ddTHH:mm:ss.SSSxxx” and the denormalization in SparkSQLTimestampDenormFunctionSymbol.java. Another issue was related to \ characters in the VKG formulations, which are interpreted by SparkSQL as escape characters and have been fixed in SparkSQLEncodeURLorIRIFunctionSymbolImpl.java.
  • SparkSQLDBTypeFactory.java: defines the datatypes that can be used by Ontop when performing queries to build the internal VKG representation. We have to adapt the class to the SparkSQL datatypes
  • SparkSQLSelectFromWhereSerializer.java implements some minor features to manage the SPARQL 'OFFSET' clause that is not supported by SparkSQL, while the AlwaysProjectOrderByTermsNormalizer.java has not required any adjustment.

Compiling, packing, testing, etc.

The project is a Maven project. Compiling, running the unit tests, building the release binaries all can be done using maven. Currently, we use Maven 3 and Java 8 to build the project.

Links

License

The Ontop framework is available under the Apache License, Version 2.0

  Copyright (C) 2009 - 2020 Free University of Bozen-Bolzano

  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

      http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License.

All documentation is licensed under the Creative Commons (attribute) license.

About

OntopSpark is an Ontop's extension which supports Apache Spark as query processing engine. The Ontop platform allows to query relational databases as Virtual RDF Knowledge Graphs using SPARQL.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Java 83.7%
  • HTML 14.5%
  • q 0.9%
  • Ruby 0.3%
  • XSLT 0.2%
  • Shell 0.2%
  • Other 0.2%