RDFHive

Using Apache Hive to directly evaluate SPARQL queries.

Overview: SPARQL is the W3C standard query language for querying data expressed in RDF (Resource Description Framework). The increasing amounts of RDF data available raise a major need and research interest in building efficient and scalable distributed SPARQL query evaluators.

In this context, we propose and share RDFHive: a simple implementation of a distributed RDF datastore benefiting from Apache Hive. RDFHive is designed to leverage existing Hadoop infrastructures for evaluating SPARQL queries. RDFHive relies on a translation of SPARQL queries into SQL queries that Hive is able to evaluate.

Technically, RDFHive directly evaluates SPARQL queries i.e. there is no preprocessing step, indeed an RDF triple file is seen by Hive as a three-column table. Thus, the bash translator simply translates SPARQL queries according to this scheme. This method has two advantages: first, creating a database is very fast; second, since the upfront investment is light, RDFHive is an interesting tool to evaluate a few SPARQL queries at once.

Version: 1.0

Requirements

Apache Hadoop (+HDFS) version 2.6.0-cdh5.7.0
Apache Hive version 1.1.0

How to use it?

In this package, we provide sources to load and query RDF datasets with RDFHive. We also present a simple test-suite based on the popular RDF/SPARQL benchmark: LUBM. For space reasons, this dataset only contains a few hundred of thousand RDF triples.

Get the sources.

git clone github.com/tyrex-team/rdfhive.git ;
cd rdfhive/ ;

Load an RDF dataset.

RDFHive can only load RDF data written according to the N-Triples format. This file has to be uploaded first on the HDFS.

hadoop fs -copyFromLocal local_file.nt hdfs_file.nt ;
bash bin/load.sh dbName hdfs_file.nt ;

Query Evaluation.

To execute a SPARQL query over a loaded RDF dataset, RDFHive first translates it into SQL and then evaluates the generated query. If --debug is specified, RDFHive will be more verbose.

bash bin/eval.sh dbName LocalQueryFile ;

Remove a Database.

An already created database can also be removed.

bash bin/remove.sh dbName ;

Test Suite.

Finally, a very basic test suite is included in this repository to demonstrate RDFHive.

cd tests/ ;
bash preliminaries.sh ;
bash run-benchmarks.sh ;
bash clean-all.sh ;

Additional Scripts.

Moreover, two scripts are also part of the project (in bin/): lubmqueries.sh and watdivqueries.sh which already contain translation of LUBM and WatDiv SPARQL queries.

Supported SPARQL Fragment

digit := [1-9]
alphanum := [a-z|A-Z|1-9]
prefix := PREFIX (alphanum)*: <(alphanum)*>
var := ("?"|"$")(alphanum)*
tp := (var|(alphanum)*) (var|(alphanum)*) (var|(alphanum)*)
selectQuery :=
   (prefix)*
   SELECT (REDUCED|DISTINCT)? ("*"|(var)+)
   WHERE { (tp) (" . "tp)* }
   (LIMIT (digit)*)? (OFFSET (digit)*)?

License

This project is under the CeCILL license.

Authors

Damien Graux
damien.graux@inria.fr

Pierre Genevès
Nabil Layaïda

Tyrex Team, Inria (France), 2016

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
bin		bin
tests		tests
LICENSE		LICENSE
README.md		README.md
VERSION		VERSION

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bin

bin

tests

tests

LICENSE

LICENSE

README.md

README.md

VERSION

VERSION

Repository files navigation

RDFHive

Requirements

How to use it?

Get the sources.

Load an RDF dataset.

Query Evaluation.

Remove a Database.

Test Suite.

Additional Scripts.

Supported SPARQL Fragment

License

Authors

About

Releases

Packages

Languages

License

tyrex-team/rdfhive

Folders and files

Latest commit

History

Repository files navigation

RDFHive

Requirements

How to use it?

Get the sources.

Load an RDF dataset.

Query Evaluation.

Remove a Database.

Test Suite.

Additional Scripts.

Supported SPARQL Fragment

License

Authors

About

Resources

License

Stars

Watchers

Forks

Languages