Faster SPARQL client.
This is a Java library for writing clients to SPARQL endpoints that support
HTTP 1.1 Transfer-Encoding: chunked
. When a SPARQL endpoint supports this,
it can send the first result of a query ASAP and a fastersparql client can
process that result also ASAP, without having to wait the server's engine to
enumerate all rows or having to wait the whole download of the whole results
serialization to start parsing them.
Although this is a client library The "faster" in fastersparql mostly depends on the SPARQL server internal implementation. Most SPARQL endpoints will first enumerate all results, them serialize them into the HTTP connection.
You can use the BOM and include other modules as discussed
below, or directly add the netty module to your maven
pom.xml
:
<dependency>
<groupId>com.github.alexishuf.fastersparql</groupId>
<artifactId>fastersparql-netty</artifactId>
<version>1.0.0-SNAPSHOT/version>
</dependency>
Then create a SparqlClient
for an endpoint. In this example, we have an
endpoint that only supports JSON or TSV serialization with GET methods:
SparqlClient<String[], byte[]> client = FasterSparql.clientFor("json,tsv,get@http://example.org/sparql");
Side notes:
The
json,tsv,get@
can be omitted for endpoints that implement all standard serializations and query methods.Each SparqlClient has a type for rows (
String[]
above) and fragments (byte[]
above). Rows are produced bySELECT
andASK
queries, whileDESCRIBE
andCONSTRUCT
queries produce graph serialization fragments.To change the row and fragment types, provide a
RowParser
orFragmentParser
toFasterSparql.clientFor
.
SparqlClient
is anAutoCloseable
, so consider wrapping it in atry () { /*...*/ } block
.
With a client, send some queries:
Results<String[]> results = client.query(sparqlQuery);
printVars(results.vars()); //List<String> with result variable names
consumeResults(results.publisher());
Results are delivered ASAP and this is achieved by a
reactive streams
Publisher
. Real applications will likely use a higher-level library such as
Reactor or
Mutiny, which have nice wrappers for
Publisher
.
If reactive is not your thing, do this:
IterableAdapter<String[]> iterable = new IterableAdapter<>(results.publisher());
for (String[] row : iterable) {
// Consume each row of N-Triples RDF terms. The i-th element of a row
// corresponds to the i-th variable in results.vars()
}
if (iterable.hasError()) { // check for errors, if you care
throw iterable.error();
}
To avoid dependency conflicts, fastersparql is split into the following modules:
- fastersparql-client
- fastersparql-netty: provides a implementation of SparqlClient over netty
- fastersparql-operators: implementations for SPARQL algebra operators (Join, Filter, Union, etc.). Use this to implement a SPARQL mediator or simply combine the results of two SPARQL queries to one or more endpoints
- fastersparql-operators-jena: If you need filters with boolean expressions
(i.e., not
FILTER EXISTS
/FILTER NOT EXISTS
) this wraps the Jena implementation. If you prefer RDF4J, use it as inspiration when sending a PR forfastersparql-operators-rdf4j
! - fastersparql-bom: a Bill Of Materials for keeping versions of the modules in sync
fastersparql uses Java SPI
(Service Provider Interface), aka ServiceLoader
for locating implementations. To add a new SparqlClient implementation,
implement and expose to the classpath via META-INF/services
a
SparqlClientFactory
. To provide implementations for SPARQL algebra operators,
read this section.
RowParser
and FragmentParser
are an exception to the SPI, since they are
provided by you via the Fastersparql
facade. Simply create new
implementations if the bundled implementations are not enough. PRs for your
implementations will be apreciated.