Skip to content

REST_API

Brad Bebee edited this page Feb 13, 2020 · 1 revision

SPARQL End Point

You need at least single instance of NanoSparqlServer setup and running to use REST API.

The NaneSparqlServer responds at the following URL:

http://localhost:9999/bigdata/sparql

Blazegraph workbench provides a graphical interface for the REST API. Blazegraph workbench could be accessed at the following URL:

http://localhost:9999/

The baseURI for the NanoSparqlServer is the effective service end point URL.

MIME Types

In general, requests may use any of the known MIME types. Likewise, you can CONNEG for any of these MIME types. However, CONNEG may not be very robust. Therefore, when seeking a specific MIME type for a response, it is best to specify an Accept header which specifies just the desired MIME type.

RDF data

RDF data is based on the org.openrdf.rio.RDFFormat declarations. The set of understood formats is extendable. Additional declarations may be registered with the openRDF platform and associated with parsers and writers for that RDFFormat. The recommended charset, file name extension, etc. are always as declared by the IANA MIME type registration. Note that a potential for confusion exists with the ".xml" MIME Type and its use with this API is not recommended. RDR means that both RDF* and SPARQL* are supported for a given data interchange syntax. See Reification_Done_Right for more details.

MIME Type File extension Charset Name URL RDR? Comments
application/rdf+xml .rdf, .rdfs, .owl, .xml UTF-8 RDF/XML Link
text/plain .nt US-ASCII N-Triples Link N-Triples defines an escape encoding for non-ASCII characters.
application/x-n-triples-RDR .ntx US-ASCII N-Triples-RDR Link Yes This is a bigdata specific extension of N-TRIPLES that supports RDR.
application/x-turtle .ttl UTF-8 Turtle Link
application/x-turtle-RDR .ttlx UTF-8 Turtle-RDR Yes This is a bigdata specific extension that supports RDR.
text/rdf+n3 .n3 UTF-8 N3 Link
application/trix .trix UTF-8 TriX Link
application/x-trig .trig UTF-8 TRIG Link
text/x-nquads .nq US-ASCII NQUADS Link Parser only before bigdata 1.4.0.
application/sparql-results+json, application/json .srj, .json UTF-8 Bigdata JSON interchange for RDF/RDF* N/A Yes bigdata json interchange supports RDF RDR data and also SPARQL result sets.

SPARQL Result Sets

MIME Type Name URL RDR? Comments
application/sparql-results+xml SPARQL Query Results XML Format Link
application/sparql-results+json, application/json SPARQL Query Results JSON Format Link Yes The bigdata extension allows the interchange of RDR data in result sets as well.
application/x-binary-rdf-results-table Binary Query Results Format Link This is a format defined by the openrdf platform.
text/tab-separated-values Tab Separated Values (TSV) Link
text/csv Comma Separated Values (CSV) Link

Property set data

The Multi-Tenancy API interchanges property set data. The MIME types understood by the API are:

MIME Type File extension Charset
application/xml .xml UTF-8
text/plain .properties UTF-8

Mutation Result

Operations which cause a mutation will report an XML document having the general structure of:

<data modified="5" milliseconds="112"/>

where modified is the mutation count and

where milliseconds is the elapsed time for the operation.

Boolean Result

Blazegraph extension operations which report a truth value use an XML document having the general structure:

<data result="true|false" milliseconds="112"/>

Where result is either "true" or "false".

Where milliseconds is the elapsed time for the operation.

API Atomicity

Queries use snapshot isolation.

Mutation operations are ACID against a standalone database and shard-wise ACID against a bigdata federation.

API Parameters

Some operations accept parameters that MUST be URIs. Others accept parameters that MAY be either Literals or URIs. Where either a literal or a URI value can be used, as in the s, p, o, and c parameters for DELETE or ESTCARD, then angle brackets (for a URI) or quoting (for a Literal) MUST be used. Otherwise, angle brackets and quoting MUST NOT be used.

URI Only Value Parameters

If an operation accepts a parameter that MUST be a URI, then the URI is given without the surrounding angle brackets (< >). This is true for all SPARQL and SPARQL 1.1 query and update URI parameters.

For example, the following method inserts the data from tbox.ttl into the context named <http://example.org/tbox\>. The context-uri MUST be a URI. The angle brackets are NOT used.

curl -D- -H 'Content-Type: text/turtle' --upload-file tbox.ttl -X POST 'http://localhost:80/bigdata/sparql?context-uri=http://example.org/tbox'

URI or Literal Valued Parameters

If an operation accepts parameters that MAY be either a URI or a Literal, then the value MUST be specified using angle brackets or quotes as appropriate. For these parameters, the quotation marks and angle brackets are necessary to distinguish between values that are Literals and values that are URIs. Without this, the API could not distinguish between a Literal whose text was a well-formed URI and a URI.

Examples of properly formed URIs and Literals include:

<http://www.bigdata.com/>
"abc"
"abc"@en
"3"^^xsd:int

A number of the bigdata REST API methods can operate on Literals or URIs. The following example will delete all triples in the named graph <http://example.org/graph1\>. The angle brackets MUST be used since the DELETE methods allow you to specify the s (subject), p (predicate), o (object), or c (context) for the triple or quad pattern to be deleted. Since the pattern may include both URIs and Literals, Literals MUST be quoted and URIs MUST use angle brackets:

curl -D- -X DELETE 'http://localhost:80/bigdata/sparql?c=<http://example.org/graph1>'

Some REST API methods (e.g., DELETE_BY_ACCESS_PATH) allow multiple bindings for the context position. Such bindings are distinct URL query parameters. For example, the following removes all statements in the named graph <http://example.org/graph1\> and the named graph <http://example.org/graph2\>.

curl -D- -X DELETE 'http://localhost:80/bigdata/sparql?c=<http://example.org/graph1>&c=<http://example.org/graph2>'

Access Path Operations

The ESTCARD, HASSTMT, GETSTMTS, DELETE with Access Path, etc. methods all accept the following parameters. All of these parameters are optional. Together they define the access path.

parameter definition
s= (uri;literal) The Subject position of a triple or quad pattern.
p=uri The Predicate position of a triple or quad pattern.
o=(uri;literal) The Object position of a triple or quad pattern.
c=(uri;literal) The Context (aka Named Graph) position of a triple or quad pattern. This pattern is ignored unless the namespace is in the quads mode, Unlike the other arguments, this argument may appear zero or more times.

Where uri and literal use the SPARQL syntax for fully specified URI and literals, as per #URI_or_Literal_Valued_Parameters e.g., <http://www.bigdata.com/> "abc", "abc"@en, and "3"^^xsd:int. The quotation marks and angle brackets are necessary to distinguish between values that are Literals and values that are URIs. All statements matching the bound values of the subject (s), predicate (p), object (o), and/or context (c) position will be deleted from the database. Each position may be specified at most once, but more than one position may be specified. For example:

When the namespace is in quads mode, the context parameters are interpreted according to the openrdf API are as follows:

Java API

REST API

Description

foo(s,p,o)

The c URL query parameter does NOT appear.

All named graphs are addressed. The contexts parameter (in the server) will be a Resource[0] reference.

foo(s,p,o,(Resource[])null)

The c URL query parameter appears as &c=. (Note that a Java null is represented by the c= NOT by c=null or by omitting the c parameter.)

The openrdf "nullGraph" is addressed. The contexts parameter will be Resource[]{null}. Java will autobox the Resource reference as a Resource[]{null} array.

foo(s,p,o,x,y,z)

The s, p, and o parameters are optional. The c parameter appears three times as c= c= and c=

The openrdf named graphs (x,y,z) are addressed. The contexts parameter will be Resource[]{x,y,z}.

foo(s,p,o,x,null,z)

The s, p, and o parameters are optional. The c parameter appears three times as c= c= and c=

The openrdf named graphs (x,nullGraph,z) are addressed. The contexts parameter will be Resource[]{x,null,z}

queryId

Since 1.5.2, all REST API methods accept a queryId URL query parameter whose value is a UUID. This UUID may be used to CANCEL the request on the server. (Prior to 1.5.2 this URL query parameter was only available for SPARQL QUERY and SPARQL UPDATE).

QUERY

GET or POST

GET Request-URI ?query=...

-OR-

POST Request-URI ?query=...

The response body is the result of the query.

The following query parameters are understood:

parameter definition
timestamp A timestamp corresponding to a commit time against which the query will read.
explain The query will be run, but the response will be an HTML document containing an "explanation" of the query. The response currently includes the original SPARQL query, the operator tree obtained by parsing that query, and detailed metrics from the evaluation of the query. This information may be used to examine opportunities for query optimization.
analytic This enables the AnalyticQuery mode.
default-graph-uri Specify zero or more graphs whose RDF merge is the default graph for this query (protocol option with the same semantics as FROM).
named-graph-uri Specify zero or more named graphs for this query (protocol option with the same semantics as FROM NAMED).
format Available in versions after 1.4.0. This is an optional query parameter that allows you to set the result type other than via the Accept Headers. Valid values are json, xml, application/sparql-results+json, and application/sparql-results+xml. json and xml are simple short cuts for the full mime type specification. Setting this parameter will override any Accept Header that is present.
baseURI The base URI to resolve any relative URIs that are in the query against. If not specified, then defaults to request URL (SPARQL endpoint), which contains a protocol, server name, port number, and server path, but it does not include query string parameters.
includeInferred when true inferred statements will also be considered. Default value: true
timeout Specifies the maximum time that a query is allowed to run, measured in seconds. If you need to set timeout in milliseconds, use HTTP header X-BIGDATA-MAX-QUERY-MILLIS.
${var}=Value binds variable ?{var} to Value, for example $x="abc" set binding ?x to "abc". Value should be specified using N-Triples representation, see also [http://rdf4j.org/sesame/2.7/apidocs/org/openrdf/http/protocol/Protocol.html#encodeValue(org.openrdf.model.Value) encodeValue] and RDF 1.1 N-Triples. Value should be specified, otherwise request will be rejected by server with HTTP error code 400.
suppressTruthMaintenance Suppresses incremental truth maintenance. May result in an inconsistent state for the database (in the sense that inferences might not have been added or removed). You can restore the database to a consistent state after applying a series of mutations with truth maintenance supposed by issuing the "CREATE ENTAILMENTS" UPDATE REQUEST afterwards to update the entailments for the KB. This is sufficient to compute any missing entailments if nothing has been deleted out of the database. If you have also retracted statements, then you would need to issue "DROP ENTAILMENTS; CREATE ENTAILMENTS;" to remove the old entailments before (re-)computing the entailments for the KB. See also Manage truth maintenance in SPARQL UPDATE.

The following HTTP headers are understood:

parameter definition
X-BIGDATA-MAX-QUERY-MILLIS The maximum time in milliseconds for the query to execute.
X-ECHO-BACK-QUERY If this header is present (Starting in 1.5.3), the REST call will echo back the query in the response. It is off by default.

For example, the following simple query will return one statement from the default KB instance:

    curl -X POST http://localhost:8080/bigdata/sparql --data-urlencode 'query=SELECT * { ?s ?p ?o } LIMIT 1' -H 'Accept:application/rdf+xml'

If you want the result set in JSON using Accept headers, use:

    curl -X POST http://localhost:8080/bigdata/sparql --data-urlencode 'query=SELECT * { ?s ?p ?o } LIMIT 1' -H 'Accept:application/sparql-results+json'

If you want the result set in JSON using the format query parameter, use:

    curl -X POST http://localhost:8080/bigdata/sparql --data-urlencode 'query=SELECT * { ?s ?p ?o } LIMIT 1' --data-urlencode 'format=json'

If cached results are alright, then you can use HTTP GET instead:

    curl -G http://localhost:8080/bigdata/sparql --data-urlencode 'query=SELECT * { ?s ?p ?o } LIMIT 1' -H 'Accept:application/sparql-results+json'

If you want to run query with binding (?s = <:s>) and get only original triples without inferencing:

    curl -G http://localhost:8080/bigdata/sparql --data-urlencode 'query=SELECT * { ?s ?p ?o } LIMIT 1'  --data-urlencode '$s=<:s>' --data-urlencode 'includeInferred=false' -H 'Accept:application/sparql-results+json'

Exporting a Namespace

The NanoSparqlServer REST API provides a means to export a namespace to a file using a construct query.

curl -X POST http://localhost:9999/blazegraph/sparql --data-urlencode \
'query=CONSTRUCT  WHERE { hint:Query hint:analytic "true" . hint:Query hint:constructDistinctSPO "false" . ?s ?p ?o }' \ 
-H 'Accept:application/rdf+xml' | gzip > <filename>

You can change the Accept header to receive the serialization of your choice.

If you are exporting a large name space in quads mode, you may use the query hint hint:Query hint:constructDistinctSPO false to a distinct operation that may cause memory exceptions. See BLZG-1341.

curl  -X POST http://localhost:9999/blazegraph/sparql --data-urlencode \ 
"query=CONSTRUCT { ?s ?p ?o } WHERE { graph ?g { hint:Query hint:constructDistinctSPO false . ?s ?p ?o } }" -H 'Accept:text/plain'

Query Optimization

There are several ways to get information about running query evaluation plans.

  1. The #STATUS page has a showQueries=(details) option which provides in depth information about the SPARQL query, Abstract Syntax Tree, Bigdata Operators (BOps), and running statistics on current queries.
  2. The #QUERY ?explain parameter may be used with a query to report essentially the same information as the #STATUS page in an HTML response.

Performance Optimization resources

  1. There is a also good write up on query performance optimization on the blog 1.
  2. There is a section on performance optimization for bigdata on the wiki PerformanceOptimization.
  3. Bigdata supports a variety of query hints through both the SAIL and the NanoSparqlServer interfaces. See 2 for more details.
  4. Bigdata supports query hints using magic triples (since 1.1.0). See QueryHints

FAST RANGE COUNTS

Blazegraph uses fast range counts internally for its query optimizer. Fast range counts on an access path are computed with two key probes against the appropriate index. Fast range counts are appropriate for federated query engines where they provide more information than an "ASK" query for a triple pattern. Fast range counts are also exact range counts under some common deployment configurations.

Fast range counts are fast. They use two key probes to find the ordinal index of the from and to key for the access path and then report (toIndex-fromIndex). This is orders of magnitude faster than you can achieve in SPARQL using a construction like "SELECT COUNT (*) { ?s ?p ?o }" because the corresponding SPARQL query must actually visit each tuple in that key range, rather than just reporting how many tuples there are.

Fast range counts are exact when running against a BigdataSail on a local journal which has been provisioned without full read/write transactions. When full read/write transactions are enabled, the fast range counts will also report the "delete markers" in the index. In scale-out, the fast range counts are also approximate if the key range spans more than one shard (in which case you are talking about lot of data).

Note: This method is available in releases after version 1.0.2.

    GET Request-URI ?ESTCARD&([s|p|o|c]=(uri|literal))[&exact=(true|false)+

Where uri and literal use the SPARQL syntax for fully specified URI and literals, as per #URI_or_Literal_Valued_Parameters e.g.,

    <http://www.bigdata.com/>
    "abc"
    "abc"@en
    "3"^^xsd:int

The quotation marks and angle brackets are necessary to distinguish between values that are Literals and values that are URIs.

Where exact is an optional boolean query parameter (default false). When true the range count will be exact regardless of whether isolatable indices are in use or not. Exact range counts are fast when isolatable indices are not in use and require a scan of the key range when they are in use. (This feature is available since 1.5.2.)

The response is an XML document having the general structure:

    <data rangeCount="5" milliseconds="12"/>

Where rangeCount is the mutation count.

Where milliseconds is the elapsed time for the operation.

For example, this will report a fast estimated range count for all triples or quads in the default KB instance:

    curl -G -H 'Accept: application/xml' 'http://localhost:8080/bigdata/sparql' --data-urlencode ESTCARD

While this example will only report the fast range count for all triples having the specified subject URI:

    curl -G -H 'Accept: application/xml' 'http://localhost:8080/bigdata/sparql' --data-urlencode ESTCARD --data-urlencode 's=<http://www.w3.org/People/Berners-Lee/card#i>'

HASSTMT

The HASSTMT method will test whether a triple or quad pattern is matched for the given namespace.

Note: This method is available since 1.5.2.

GET Request-URI ?HASSTMT&([s|p|o|c]=(uri|literal))[&includeInferred=(true|false)+

Where uri and literal use the SPARQL syntax for fully specified URI and literals, as per #URI_or_Literal_Valued_Parameters e.g.,

<http://www.bigdata.com/>
"abc"
"abc"@en
"3"^^xsd:int

The quotation marks and angle brackets are necessary to distinguish between values that are Literals and values that are URIs.

Where includeInferred is an optional boolean query parameter (default true). Inferred statements are not checked when this parameter is false.

The response is an XML document having the general structure:

<data result="true" milliseconds="12"/>

Where data is a boolean value and will be either "true" or "false".

Where milliseconds is the elapsed time for the operation.

For example, this will report true iff the namespace contains any matches for triples having the specified subject URI:

curl -G -H 'Accept: application/xml' 'http://localhost:8080/bigdata/sparql' \
--data-urlencode HASSTMT --data-urlencode 's=<http://www.w3.org/People/Berners-Lee/card#i>'

GETSTMTS

GET Request-URI ?GETSTMTS
...
Content-Type
...

-OR-

POST Request-URI ?GETSTMTS
...
Content-Type
...

The following query parameters are understood:

parameter definition
s The subject (optional).
p The predicate (optional).
o The value (optional).
c The contexts (optional).
includeInferred when true inferred statements will also be considered. Default value: true

For example, the following query will return only original statements from the default KB instance:

curl -X POST http://localhost:8080/bigdata/sparql?GETSTMTS&includeInferred=false

INSERT

INSERT RDF (POST with Body)

POST Request-URI
...
Content-Type:
...
BODY

Perform an HTTP-POST, which corresponds to the basic CRUD operation "create" according to the generic interaction semantics of HTTP REST.

Where BODY is the new RDF content using the representation indicated by the Content-Type.

You can also specify a context-uri request parameter which sets the default context when triples data is loaded into a quads store (available in releases after 1.0.2).

For example, the following command will POST the local file 'data-1.nq' to the default KB:

curl -X POST -H 'Content-Type:text/x-nquads' --data-binary '@data-1.nq' http://localhost:8080/bigdata/sparql

INSERT RDF (POST with URLs)

POST Request-URI ?uri=URI

Where URI identifies a resource whose RDF content will be inserted into the database. The uri query parameter may occur multiple times. All identified resources will be loaded in a single operation. See 3 for the mime types understood by this operation.

You can also specify a context-uri request parameter which sets the default context when triples data is loaded into a quads store (available in releases after 1.0.2).

For example, the following command will load the data from the specified URI into the default KB instance. For this command, the uri parameter must be a resource that can be resolved by the server that will execute the INSERT operation. Typically, this means either a public URL or a URL for a file in the local file system on the server.

curl -X POST --data-binary 'uri=file:///Users/bryan/Documents/workspace/BIGDATA_RELEASE_1_2_0/bigdata-rdf/src/resources/data/foaf/data-0.nq' http://localhost:8080/bigdata/sparql

DELETE

DELETE with Query

DELETE Request-URI ?query=...

Where query is a CONSTRUCT or DESCRIBE query.

Note: The QUERY + DELETE operation is ACID.

DELETE with Body (using POST)

POST Request-URI ?delete
...
Content-Type
...
BODY

Above is a POST request because many APIs do not allow a BODY with a DELETE verb. The BODY contains RDF statements according to the specified Content-Type. Statements parsed from the BODY are deleted.

DELETE with Access Path

Note: This method is available in releases after version 1.0.2.

DELETE Request-URI ?([s|p|o|c]=(uri|literal))+

Where uri and literal use the SPARQL syntax for fully specified URI and literals, as per #URI_or_Literal_Valued_Parameters e.g.,

<http://www.bigdata.com/>
"abc"
"abc"@en
"3"^^xsd:int

The quotation marks and angle brackets are necessary to distinguish between values that are Literals and values that are URIs.

All statements matching the bound values of the subject (s), predicate (p), object (o), and/or context (c) position will be deleted from the database. Each position may be specified at most once, but more than one position may be specified. For example:

So, a DELETE of everything for a given context would be:

DELETE Request-URI ?c=<http://example.org/foo>

And a DELETE of everything for some subject and predicate would be:

DELETE Request-URI ?s=<http://example.org/s1>&p=<http://www.example.org/p1>

And to DELETE everything having some object value:

DELETE Request-URI ?o="abc"

or

DELETE Request-URI ?o="5"^^<datatypeUri>

And to delete everything at that end point:

DELETE Request-URI

For example, the following will delete all statements with the specified subject in the default KB instance:

'CAUTION: This curl command is tricky. If you specify just -x DELETE without the --get then it will ignore the ?s parameter and remove EVERYTHING in the default KB instance!'

curl --get -X DELETE -H 'Accept: application/xml' 'http://localhost:8080/bigdata/sparql' --data-urlencode 's=<http://www.w3.org/People/Berners-Lee/card#i>'

UPDATE (SPARQL 1.1 UPDATE)

POST Request-URI ?update=...
parameter definition
using-graph-uri Specify zero or more graphs whose RDF merge is the default graph for the update request (protocol option with the same semantics as USING).
using-named-graph-uri Specify zero or more named graphs for this the update request (protocol option with the same semantics as USING NAMED).

See SPARQL 1.1 Protocol.

Note: This method is available in releases after version 1.1.0.

For example, the following SPARQL 1.1 UPDATE request would drop all existing statements in the default KB instance and then load data into the default KB from the specified URL:

curl -X POST http://localhost:8080/bigdata/sparql --data-urlencode 'update=DROP ALL; LOAD <file:///Users/bryan/Documents/workspace/BIGDATA_RELEASE_1_2_0/bigdata-rdf/src/resources/data/foaf/data-0.nq.gz>;'

UPDATE (DELETE + INSERT)

UPDATE (DELETE statements selected by a QUERY plus INSERT statements from Request Body using PUT)

PUT Request-URI ?query=...
...
Content-Type
...
BODY

Where query is a CONSTRUCT or DESCRIBE query.

Note: The QUERY + DELETE operation is ACID.

Note: You MAY specify a CONSTRUCT query with an empty WHERE clause in order to specify a set of statements to be removed without reference to statements already existing in the database. For example:

CONSTRUCT { bd:Bryan bd:likes bd:RDFS } { }

Note the trailing "{ }" which is the empty WHERE clause. This makes it possible to delete arbitrary statements followed by the insert of arbitrary statements.

parameter definition
context-uri Request parameter which sets the default context when triples data is loaded into a quads store (available in releases after 1.0.2).

UPDATE (POST with Multi-Part Request Body)

POST Request-URI ?updatePost
...
Content-Type: multipart/form-data; boundary=...
...
form-data; name="remove"
Content-Type: ...
Content-Body
...
form-data; name="add"
Content-Type: ...
Content-Body
...
BODY

You can specify two sets of serialized statements - one to be removed and one to be added. This operation will be ACID on the server.

parameter definition
context-uri Request parameter which sets the default context when triples data are loaded into a quads store (available in releases after 1.0.2).

STATUS

GET /status

The following is various information about the SPARQL end point. URL Query parameters include:

parameter definition
showQueries(=details) Show information on all queries currently executing on the NanoSparqlServer. The queries will be arranged in descending order by their elapsed evaluation time. When the value of this query parameter is "details", the response will include the query evaluation metrics for each BOp (Bigdata Operator) in the query. Otherwise only the query evaluation metrics for the top-level query BOp in the query plan will be included. In either case, the reported metrics are updated each time the page is refreshed so it is possible to track the progress of a long running query in this manner.
queryId=UUID Request information only for the specified query(s). This parameter may appear zero or more times. (Since bigdata 1.1).

CANCEL

For the default namespace:

POST /bigdata/sparql/?cancelQuery&queryId=....

For a caller specified namespace:

POST /bigdata/namespace/sparql/?cancelQuery&queryId=....

Cancel one or more running query(s). Queries which are still running when the request is processed will be cancelled. (Since bigdata 1.1. Prior to bigdata 1.2, this method was available at /status. The preferred URI for this method is now the URI of the SPARQL end point. The /status URI is deprecated for this method.)

See the queryId QueryHint.

parameter definition
queryId=UUID The UUID of a running query.

For example, for the default namespace:

curl -X POST http://localhost:8091/bigdata/sparql --data-urlencode 'cancelQuery' --data-urlencode 'queryId=a7a4b8e0-2b14-498c-94ab-9d79caddb0f6'

For a caller specified namespace:

curl -X POST http://localhost:8091/bigdata/namespace/kb/sparql --data-urlencode 'cancelQuery' --data-urlencode 'queryId=a7a4b8e0-2b14-498c-94ab-9d79caddb0f6'

Generate UUID

Generate and send a UUID as a text/plain response entity.

GET /bigdata/sparql?uuid

OR

POST /bigdata/sparql?uuid

This is intended for use by JavaScript clients that want to generate new URLs locally. JavaScript does not provide an easy means to generate UUIDs, so we've added one to the server.

Online Backup

Starting in Blazegraph 2.0.2, there is a REST service for initiating an online backup BLZG-1727. It his available at the serviceURL with the the context and backup. The default is shown below.

http://localhost:9999/blazegraph/backup
parameter definition
file=/path/to/file The name and full path of the file. Defaults to backup.jnl in the current working directory.
compress=false Boolean to compress the backup. It defaults to false. It is true if the parameter is present without a value. Compress does not append a .gz to the backup file name.
block=true Boolean to block the REST call on creating the snapshot. Defaults to true.

A quick start example is:

 curl --data-urlencode "file=/path/to/backup.jnl" http://localhost:9999/blazegraph/backup 

It would produce an uncompressed backup in /path/to/backup and block until it was completed.

An example of the full syntax is:

 curl \
          --data-urlencode "file=/path/to/backup.jnl" \
          --data-urlencode "compress=true" \
          --data-urlencode "block=true" \
          http://localhost:9999/blazegraph/backup  

This would produce a compressed backup at /path/to/backup.jnl and block until it was completed.

Bulk Data Load

REST endpoint for the Blazegraph Dataloader utility. It offers some capabilities that are not present in the other aspects of the REST API, including support for processing of durable queues, compressed files, recursive processing of directories, etc. It allows bulk load into a running Blazegraph Nano Sparql Server (NSS).

POST /bigdata/dataloader
...
Content-Type
...
BODY

Bulk Load Configuration

An 'xml' or 'txt' file containing java properties is required to be posted. Parameters that could be specified:

parameter definition
quiet Suppress all stdout messages.
verbose Show additional messages detailing the load performance. Value is an integer zero or greater. Higher is more verbose. This is equivalent to passing multiple -verbose arguments to the DataLoader program on the command line.
-defaultGraph Specify the default graph. This is required for quads mode.
-format The format of the file (optional, when not specified the format is deduced for each file in turn using the RDFFormat static methods).
-baseURI The baseURI (optional, when not specified the name of the each file load is converted to a URL and used as the baseURI for that file).
closure Compute the RDF(S)+ closure.
durableQueues Supports restart patterns by renaming files as .good or .fail. All files loaded into a given commit are renamed to .good. Any file that can not be loaded successfully is renamed to .fail. The files remain in their original directories.
namespace The namespace of the KB instance.
propertyFile The configuration file for the database instance.
fileOrDir Zero or more files or directories containing the data to be loaded.

For example the following will load some files (file1, dir1, file2, dir2 - should be placed on the server's local file system and specified as fully qualified paths) into 'kb' namespace:

curl -X POST --data-binary @dataloader.xml --header 'Content-Type:application/xml' http://localhost:9999/bigdata/dataloader

The content of dataloader.xml file:

<?xml version="1.0" encoding="UTF-8" standalone="no"?> 
<!DOCTYPE properties SYSTEM "http://java.sun.com/dtd/properties.dtd"> 
	  <properties>
	      <!-- RDF Format (Default is rdf/xml) --> 
	      <entry key="format">rdf/xml</entry> 
	      <!-- Base URI (Optional) --> 
	      <entry key="baseURI"></entry> 
	      <!-- Default Graph URI (Optional - Required for quads mode namespace) --> 
	      <entry key="defaultGraph"></entry> 
	      <!-- Suppress all stdout messages (Optional) --> 
	      <entry key="quiet">false</entry> 
	      <!-- Show additional messages detailing the load performance. (Optional) --> 
	      <entry key="verbose">0</entry> 
	     <!-- Compute the RDF(S)+ closure. (Optional) --> 
             <entry key="closure">false</entry> 
	     <!-- Files will be renamed to either .good or .fail as they are processed. 
                   The files will remain in the same directory. -->
	     <entry key="durableQueues">true</entry> 
	     <!-- The namespace of the KB instance. Defaults to kb. --> 
	     <entry key="namespace">kb</entry> 
	     <!-- The configuration file for the database instance. It must be readable by the web application. --> 
             <entry key="propertyFile">/opt/RWStore.properties</entry> 
	     <!-- Zero or more files or directories containing the data to be loaded. 
                   This should be a comma delimited list. The files must be readable by the web application. -->
           <entry key="fileOrDirs">file1, dir1, file2, dir2</entry>
      </properties>

Context Not Bound Error (Quads mode without defaultGraph)

If you receive an error such as below, it means that you are loading a quads mode namespace without specifying the defaultGraph. You can resolve this to setting the defaultGraph property in your bulk load configuration.

java.util.concurrent.ExecutionException: java.lang.RuntimeException: java.util.concurrent.ExecutionException: java.lang.RuntimeException: java.util.concurrent.ExecutionException: java.lang.IllegalArgumentException: context not bound: < TermId(299U), TermId(297U), TermId(8L) : Explicit >
        at java.util.concurrent.FutureTask.report(FutureTask.java:122)
        at java.util.concurrent.FutureTask.get(FutureTask.java:192)
        at com.bigdata.rdf.sail.webapp.BigdataServlet.submitApiTask(BigdataServlet.java:281)
        at com.bigdata.rdf.sail.webapp.DataLoaderServlet.doBulkLoad(DataLoaderServlet.java:310)
        at com.bigdata.rdf.sail.webapp.DataLoaderServlet.doPost(DataLoaderServlet.java:108)

Via the REST API the option is:

              <!-- Default Graph URI (Optional - Required for quads mode namespace) --> 
	      <entry key="defaultGraph"></entry> 

Via the command line the option is:

java -cp blazegraph.jar com.bigdata.rdf.store.DataLoader -defaultGraph http://example.org server.properties data.nq

Multi-Tenancy API

The Multi-Tenancy API allows you to administer and access multiple triple or quad store instances in a single backing Journal or Federation. Each triple or quad store instance has a unique namespace and corresponds to the concept of a VoID Dataset. A brief VoID description is used to describe the known data sets. A detailed VoID description is included in the Service Description of a data set. The default data set is associated with the namespace "kb" (unless you override that on the NanoSparqlServer command line). The SPARQL end point for a data set may be used to obtain a detailed Service Description of that data set (including VoID metadata and statistics), to issue SPARQL 1.1 Query and Update requests, etc. That end point is:

/bigdata/namespace/NAMESPACE/sparql

where NAMESPACE is the namespace of the desired data set.

This feature is available in bigdata releases after 1.2.2.

DESCRIBE DATA SETS

GET /bigdata/namespace

Obtain a brief VoID description of the known data sets. The description includes the namespace of the data set and its SPARQL end point. A more detailed service description is available from the sparql end point. The response to this request MAY be cached.

For example:

curl localhost:8090/bigdata/namespace

LIST PROPERTIES

GET /bigdata/namespace/NAMESPACE/properties

Obtain a list of the effective configuration properties for the data set named NAMESPACE.

For example, retrieve the configuration for a specified KB in either the text/plain or XML format.

curl --header 'Accept: text/plain' http://localhost:8090/bigdata/namespace/kb/properties
curl --header 'Accept: application/xml' http://localhost:8090/bigdata/namespace/kb/properties

CREATE DATA SET

Request:

POST /bigdata/namespace
...
Content-Type
...
BODY

Where BODY contains the properties used to the create the namespace. The content type indicates whether the properties are in xml or text NanoSparqlServer#Property_set_data. (See NanoSparqlServer#Quads or NanoSparqlServer#Triples + Inference + Truth Maintenance below).

Response:

HTTP/1.1 201 Created
Content-Type: text/plain; charset=ISO-8859-1
Location: http://localhost:8080/bigdata/namespace/NAMESPACE/sparql
Content-Length: ...
CREATED: NAMESPACE

Status codes (since 1.3.2)

Status Code Meaning
201 Created
409 Conflict (Namespace exists).

The Location header in the response provides a URL for the newly created SPARQL end point. This URL may be used to obtain a service description, issue queries, issue updates, etc.

Create a new data set (aka a KB instance). The data set is configured based on the inherited configuration properties as overridden by the properties specified in the request entity (aka the BODY). The Content-Type must be one of those recognized for Java properties (the supported MIME Types are specified at NanoSparqlServer#Property_set_data).

You MUST specify at least the following property in order to create a non-default data set:

com.bigdata.rdf.sail.namespace=NAMESPACE

where NAMESPACE is the name of the new data set.

See the javadoc for the BigdataSail and AbstractTripleStore for other configuration options. Also see the sample property files in bigdata-sails/src/samples.

Note: You cannot reconfigure the Journal or Federation using this method. The properties will only be applied to the newly created data set. This method does NOT create a new backing Journal, it just creates a new data set on the same Journal (or on the same Federation when running on a cluster).

For example:

curl -v -X POST --data-binary @tmp.xml --header 'Content-Type:application/xml' http://localhost:8090/bigdata/namespace

where tmp.xml is patterned after one of the examples below. Be sure to replace MY_NAMESPACE with the namespace of the KB instance that you want to create. The new KB instance will inherit any defaults specified when the backing Journal or Federation was created. You can override any inherited properties by specifying a new value for that property with the request.

Quads

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE properties SYSTEM "http://java.sun.com/dtd/properties.dtd">
<properties>
<!-- -->
<!-- NEW KB NAMESPACE (required). -->
<!-- -->
<entry key="com.bigdata.rdf.sail.namespace">MY_NAMESPACE</entry>
<!-- -->
<!-- Specify any KB specific properties here to override defaults for the BigdataSail -->
<!-- AbstractTripleStore, or indices in the namespace of the new KB instance. -->
<!-- -->
<entry key="com.bigdata.rdf.store.AbstractTripleStore.quads">true</entry>
</properties>
Triples + Inference + Truth Maintenance

To setup a KB that supports incremental truth maintenance use the following properties:

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE properties SYSTEM "http://java.sun.com/dtd/properties.dtd">
<properties>
<!-- -->
<!-- NEW KB NAMESPACE (required). -->
<!-- -->
<entry key="com.bigdata.rdf.sail.namespace">MY_NAMESPACE</entry>
<!-- -->
<!-- Specify any KB specific properties here to override defaults for the BigdataSail -->
<!-- AbstractTripleStore, or indices in the namespace of the new KB instance. -->
<!-- -->
<entry key="com.bigdata.rdf.store.AbstractTripleStore.quads">false</entry>
<entry key="com.bigdata.rdf.store.AbstractTripleStore.axiomsClass">com.bigdata.rdf.axioms.OwlAxioms</entry>
<entry key="com.bigdata.rdf.sail.truthMaintenance">true</entry>
</properties>

PREPARE PROPERTIES LIST

The method prepares a list of properties based on the inherited configuration properties.

POST /bigdata/namespace/prepareProperties
...
Content-Type
...
BODY

Where BODY contains the properties wished to be added/overridden. The content type indicates whether the properties are in xml or text.

In a response it returns a list of properties that contains an appropriately prepared combination of the inherited configuration properties and those specified in the request entity (aka the BODY). Properties passed via BODY have a priority, they are added to the inherited list and override values in it in case of matching. Namespace specific properties (for example branching factors) will be renamed in accordance to a specified namespace. The prepared list could be used for a further namespace creating (see CREATE DATA SET method).

For example: Prepare a list of properties for the data set identified by NAMESPACE.

curl -X POST --data-binary @props.properties --header 'Content-Type:text/plain' http://localhost:8090/bigdata/namespace/prepareProperties

File props.properties :

com.bigdata.rdf.sail.namespace=NAMESPACE

Response:

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE properties SYSTEM "http://java.sun.com/dtd/properties.dtd">
<properties>
    <entry key="com.bigdata.rdf.sail.namespace">NAMESPACE</entry>
    <entry key="com.bigdata.rdf.store.AbstractTripleStore.axiomsClass">com.bigdata.rdf.axioms.NoAxioms</entry>
    <entry key="com.bigdata.namespace.NAMESPACE.spo.com.bigdata.btree.BTree.branchingFactor">1024</entry>
    <entry key="com.bigdata.rdf.sail.truthMaintenance">false</entry>
    <entry key="com.bigdata.rdf.store.AbstractTripleStore.textIndex">false</entry>
    <entry key="com.bigdata.rdf.store.AbstractTripleStore.quads">true</entry>
    <entry key="com.bigdata.rdf.store.AbstractTripleStore.statementIdentifiers">false</entry>
    <entry key="com.bigdata.namespace.NAMESPACE.lex.com.bigdata.btree.BTree.branchingFactor">400</entry>
    <entry key="com.bigdata.journal.Journal.groupCommit">false</entry>
</properties>

REBUILD TEXT INDEX

Request to rebuild text index or create a new one if it does not exist for a data set identified by NAMESPACE.

POST /blazegraph/namespace/NAMESPACE/textIndex
parameter definition
force-index-create when true full text index will be created if not exists. Default value is false.

Examples:

  • Rebuild an existing text index for a namespace identified by 'someNamespace':
curl -X POST http://localhost:9999/blazegraph/namespace/someNamespace/textIndex 

If the text index not exists in the namespace then 'Error 500 Could not rebuild full text index, because it is not enabled' will be received from the server in a response.

  • Create text index in an existing namespace:
curl -X POST http://localhost:9999/blazegraph/namespace/someNamespace/textIndex?force-index-create=true 

See also the Rebuild Text Index page.

DESTROY DATA SET

DELETE /bigdata/namespace/NAMESPACE

Destroy the data set identified by NAMESPACE.

For example:

curl -X DELETE http://localhost:8090/bigdata/namespace/kb

Transaction Management (API is under development)

This section is under development. We will be exposing support for sequences of read/write operations that are isolated by a common transaction through the REST API in 1.5.3. See http://jira.blazegraph.com/browse/BLZG-1195 for details.

Choosing the right transaction model

Blazegraph supports two basic transaction models: unisolated and isolated. This choice is made on a namespace by a namespace basis, but it may be defaulted for the Journal. For isolated operations specify the following option for a namespace.

com.bigdata.rdf.sail.isolatableIndices=true

- Unisolated operations write on the live index objects. This option provides better scalability and better throughput because the unisolated transaction does not need to buffer its write set. The mutations are simply applied to the indices. If the transaction commits, the indices are checkpointed. If the transaction aborts, then the write set is discarded.

- Isolated operations rely on a fused view of the indices. An isolating index is established by the transaction in front of each index on which it needs to write. Mutations are written onto the isolating index. The writes are initially buffered in memory, but they will spill onto the disk for large transactions. When the transaction prepares, the write set is validated against the then current state of the unisolated indices. If there are no write conflicts, or if all conflicts can be reconciled, then the transaction can commit. Otherwise it must abort. Note that Blazegraph does not support truth maintenance for namespaces that use isolated operations.

Note: Query always using snapshot isolation regardless of which transaction model you choose. These read-only views are completely non-blocking which is why Blazegraph has such good performance for concurrent query. Since all queries have snapshot isolation, creating an explicit read-only transaction is only useful when more than one query needs to be run against the same commit point and there are concurrent writes on the database. Further, creating an explicit transaction incurs significant overhead due to the additional messages (CREATE-TX, QUERY, ABORT-TX) vs (QUERY).

Note: Transactions are scoped to the database, not the namespace. Thus a transaction MAY be used to coordinate operations across multiple namespaces.

Note: Open transactions pin the commit point on which the transaction is reading. Thus, long running transactions can prevent recycling.

Group Commit and Transactions

Group commit allows multiple write sets associated with different isolated or unisolated transactions to be melded into a single commit point. Group commit relies on a hierarchica locking scheme to serialize unisolated mutation operations for the same namespace. If isolated operations are being used, then group commit does not come into play until the transaction attempts to commit.

High Availability and Transactions

Each HAJournalServer has a local transaction manager. These transaction managers do not interchange messages as transactions are created and destroyed. This is key to achieving perfect linear scaling in query throughput as a function of the size of the replication cluster. During the 2-phase ACID commit, the nodes in the quorum communicate to identify the new consensus around the release time during the commit protocol. This consensus release time is used to decide which commit points are pinned and which can be recycled.

Transactions created on one node are NOT also registered on the other nodes. Further, the protocol for resynchronization of a node does not consider resynchronization of the transaction manager state since all transaction managers are completely independent. Thus, transactions created on a given HAJournalServer may be used on that HAJournalServer but are not visible on other HAJournalServer instances. However, commit times are the same for all nodes and all nodes will have a consensus about the release time so a commit time that is pinned on the leader (by a transaction) will be visible on the other nodes as well. Thus, the client can create a transaction (CREATE-TX) on the leader and use the readsOnCommitTime reported for that transaction to load balance queries across all nodes in the replication cluster. Those reads will have snapshot isolation in terms of the commit point pinned by the transaction until the transaction is either aborted (ABORT-TX) or committed (COMMIT-TX).

The practical impact is:

- Clients MUST use the leader to coordinate transactions (CREATE-TX, PREPARE-TX, IS-ACTIVE-TX, COMMIT-TX, ABORT-TX). - Transaction identifiers (txId values) are created and managed by the leader. These txIds are NOT visible to the followers. - Mutation operations isolated by a transaction MUST be directed to the leader (this is the same when transactions are not used - only the leader accepts writes). - The readsOnCommitTime (see CREATE-TX) MAY be used to load balance read operations across the leader and followers (see above for details). - Transactions break if there is a leader failover event or quorum break

Scale-out and Transactions

Mutation operations in scale-out are shard-wise ACID and use the unisolated connection internally. Mutations are typically applied using an eventually consistent model. If the update fails, it is reapplied.

Scale-out supports snapshot isolation for query. The recommended pattern is to periodically update a global read lock to pin a globally consistent commit point. Queries are then issued against the commit time associated with the read lock. This removes the (significant) overhead of coordination with the transaction service in scale-out on a per-query basis.

Transaction Management API

This API is only required for isolated operations where the client wishes to have the life cycle of a transaction span multiple client operations (for example, more than one query, more than one update, or some combination of queries and updates). In this case, the client follows a pattern:

POST /bigdata/tx => txId

doWork(txId)....

POST /bigdata/tx/txid?COMMIT

Note: GET is not allowed for most transaction management methods to defeat http caching.

Response Entity

The general form of the response entity is an XML document having the following structure:

<xml
><response elapsed="..."
><tx txId="..." readsOnCommitTime="..." readOnly="true|false"
/></xml>

The response entity is an XML document. Depending on the operation there may be one or more than one tx elements in the request. For example, LIST-TX reports all active transactions.

The attributes of the response element are as follows:

Attribute Meaning
elapsed The elapsed time (milliseconds) to process the request on the server.

The attributes of the tx element are as follows:

Attribute Meaning
txId The transaction identifier. This must be used with the transaction API.
readsOnCommitTime The timestamp associated with the commit point on which the transaction is reading. This commit point (and all more recent commit points) are pinned by the transaction until it either aborts or commits.
readOnly "true" if the transaction is read-only and "false" if the transaction allows mutation.

Note: Either the txId or the readsOnCommitTime may be used for the &timestamp=... parameter on the REST API methods. However, in a Highly Available replication cluster the readsOnCommitTime MAY be used to load balance read operations across the cluster while only the leader will be able to interpret the txId.

LIST-TX

Obtain a list of active transactions.

GET /bigdata/tx

For example:

curl localhost:8090/bigdata/tx

A typical response:

HTTP/1.1 200 Ok
Location: http://localhost:8080/bigdata/tx/txId
Content-Type: application/xml
Content-Length: ...
<xml
><tx txId="..." readsOnCommitTime="..." readOnly="true|false"
><tx txId="..." readsOnCommitTime="..." readOnly="true|false"
><tx txId="..." readsOnCommitTime="..." readOnly="true|false"
><tx txId="..." readsOnCommitTime="..." readOnly="true|false"
><tx txId="..." readsOnCommitTime="..." readOnly="true|false"
/></xml>

CREATE-TX

Return a new transaction identifier.

POST /bigdata/tx(?timestamp=TIMESTAMP)

The timestamp parameter is a long (64-bit) integer. It meaning is defined as follows and defaults to 0 (UNISOLATED). Note that 0 corresponds to ITx.UNISOLATED and -1 corresponds to ITx.READ_COMMITTED. See ITransactionService for more details on the semantics of these symbolic constants.

what definition
0 This requests a new read/write transaction. The transaction will read on the last commit point on the database at the time that the transaction was created. This is the default behavior if the timestamp parameter is not specified. Note: The federation architecture (aka scale-out) does NOT support distributed read/write transactions - all mutations in scale-out are shard-wise ACID.
-1 This requests a new read-only transaction. The transaction will read on the last commit point on the database at the time that the transaction was created.
timestamp This requests a new read-only transaction. The operation will be executed against the most recent committed state whose commit timestamp is less than or equal to timestamp.

A typical response is below. Note that the Location header will include the URI for the transaction while the transaction identifier is given in the response entity. The response entity is an XML document as defined above.

HTTP/1.1 201 Created
Location: http://localhost:8080/bigdata/tx/txId
Content-Type: application/xml
Content-Length: ...
<xml><tx txId="..." readsOnCommitTime="..." readOnly="true|false"/></xml>

Status codes

Status Code Meaning
201 Created
400 Bad request if the TIMESTAMP is a negative value other than -1 (READ_COMMITTED).

STATUS-TX

Obtain status about the transaction. Note that committed and aborted transactions may no longer exist on the server.

POST /bigdata/tx/txId?STATUS

Status codes

Status Code Meaning
200 The transaction was found on the server.
404 The transaction was not found on the server.

The response entity is an XML document as defined above.

ABORT-TX

This aborts the transaction. The write set of the transaction (if any) is discarded. The transaction is no longer active.

POST /bigdata/tx/txId?ABORT

Status codes

Status Code Meaning
200 The transaction was aborted.
404 The transaction was not found on the server.

The response entity is an XML document as defined above.

PREPARE-TX

Returns true is the write set of the transaction that passes validation. If it does not pass validation, then a COMMIT-TX message will fail and the transaction must be aborted by the client.

POST /bigdata/tx/txId?PREPARE

Status codes

Status Code Meaning
200 The transaction was validated.
404 The transaction was not found on the server.
409 Validation failed for the transaction.

The response entity is an XML document as defined above.

COMMIT-TX

Prepares and commits the transaction. This message first performs validation. If validation is not successful, then the transaction cannot be committed and a failure message is returned. If the transaction was successfully validated, then it is melded into the next commit group and a success message is returned. Once the transaction commits it is no longer active.

POST /bigdata/tx/txId?COMMIT

Status codes

Status Code Meaning
200 The transaction was validated and committed.
404 The transaction was not found on the server.
409 Validation failed for the transaction.

The response entity is an XML document as defined above.

TODO

TODO Define the post-condition of PREPARE when the transaction fails validation (Has the transaction write set been discarded? Is the transaction aborted?). Update the client IRemoteTx API and implementation to be in compliance with these post-condition definitions.

TODO Document redo patterns for the client when a transaction fails validation (and write tests for those patterns).

TODO Update the "Java Client API" section per the javadoc at BigdataSailRemoteRepository. Do this when merging back to the master.

TODO Update the TxGuide. That page is really about internals and concepts. Have it point to this section?

TODO Scale-out does not report the readsOnCommitTime per trac #266. This makes the internal API a bit cumbersome. Consider reporting as -1L for scale-out until #266 is resolved and documenting this in the API.

TODO Introduce a timeout for transactions? Open transactions pin the commit point on which they read. If a client accidentally leaves open a transaction (e.g., by dying or becoming disconnected from the server) then recycling will be disabled. Transactions without ongoing client activity should probably be timed out. Also, it may make sense to impose a timeout on transactions and to allow the client to request a timeout allocation when it creates the transaction.

TODO Add an age attribute to the XML response entity for the tx element. This would make it possible to identify the longest running transactions. The readsOnCommitTime can identify the earliest pinned commit point, but this is not really the same. For example, all transactions against a read-only end point (and why would you bother ...) could have the same readsOnCommitTime if they read on the same commit point while some could have been open for days and others only milliseconds. The Tx class does not currently record when a transaction is created, but this could be changed easily enough.

Clone this wiki locally