Skip to content
Permalink
Browse files
Protobuf (#69)
* Remove Elephas from the 'Learn' dropdown

* Doc for RDF Binary with Protobuf
  • Loading branch information
afs committed Sep 26, 2021
1 parent 238a744 commit 4a04f19e4f3f2b568d21276216913b56cdb131dd
Showing 6 changed files with 195 additions and 45 deletions.
@@ -83,7 +83,6 @@
<li><a href="/documentation/shex/index.html">ShEx</a></li>
<li><a href="/documentation/rdfstar/index.html">RDF-star</a></li>
<li><a href="/documentation/tools/index.html">Command-line tools</a></li>
<li><a href="/documentation/hadoop/index.html">Elephas - tools for RDF on Hadoop</a></li>
<li><a href="/documentation/jdbc/index.html">SPARQL over JDBC</a></li>
<li><a href="/documentation/permissions/index.html">Permissions</a></li>
<li><a href="/documentation/assembler/index.html">Assembler</a></li>
@@ -34,7 +34,7 @@ See "[Reading JSON-LD 1.1](json-ld-11.html)" for additional setup and use for
reading JSON-LD 1.1. JSON-LD 1.0 is the current default in Jena.

RDF Binary is a binary encoding of RDF (graphs and datasets) that can be useful
for fast parsing. See [RDF Binary using Apache Thrift](rdf-binary.html).
for fast parsing. See [RDF Binary](rdf-binary.html).

## Command line tools

@@ -49,18 +49,20 @@ These can be called directly as Java programs:
The file extensions understood are:

| &nbsp;Extension&nbsp; |&nbsp; Language&nbsp; |
|-----------|------------|
| `.ttl` | Turtle |
| `.nt` | N-Triples |
| `.nq` | N-Quads |
| `.trig` | TriG |
| `.rdf` | RDF/XML |
| `.owl` | RDF/XML |
| `.jsonld` | JSON-LD |
| `.trdf` | RDF Thrift |
| `.rt` | RDF Thrift |
| `.rj` | RDF/JSON |
| `.trix` | TriX |
|-----------|--------------|
| `.ttl` | Turtle |
| `.nt` | N-Triples |
| `.nq` | N-Quads |
| `.trig` | TriG |
| `.rdf` | RDF/XML |
| `.owl` | RDF/XML |
| `.jsonld` | JSON-LD |
| `.trdf` | RDF Thrift |
| `.rt` | RDF Thrift |
| `.rpb | RDF Protobuf |
| `.pbrdf` | RDF Protobuf |
| `.rj` | RDF/JSON |
| `.trix` | TriX |

`.n3` is supported but only as a synonym for Turtle.

@@ -3,7 +3,9 @@ title: RDF Binary using Apache Thrift
---

"RDF Binary" is a efficient format for RDF and RDF-related data using
[Apache Thrift](https://thrift.apache.org/) as the binary encoding.
[Apache Thrift](https://thrift.apache.org/)
or [Google Protocol Buffers](https://developers.google.com/protocol-buffers)
as the binary data encoding.

The W3C standard RDF syntaxes are text or XML based. These incur costs in
parsing; the most human-readable formats also incur high costs to write, and
@@ -16,14 +18,14 @@ terms, then builds data formats for RDF graphs, RDF datasets, and for
SPARQL result sets. This gives a basis for high-performance linked data
systems.

[Apache Thrift](https://thrift.apache.org/) provides an efficient,
wide-used binary encoding layer with a large number of language bindings.
[Thrift](https://thrift.apache.org/) and
[Protobuf](https://developers.google.com/protocol-buffers) provides efficient,
widely-used, binary encoding layers each with a large number of language
bindings.

For more details of [RDF Thrift](http://afs.github.io/rdf-thrift).

This pages gives the details of RDF Binary encoding in [Apache Thrift](http://thrift.apache.org/).

## Thrift encoding of RDF Terms {#encoding-terms}
## Thrift encoding of RDF Terms {#encoding-terms-thrift}

RDF Thrift uses the Thrift compact protocol.

@@ -84,7 +86,7 @@ Source: [BinaryRDF.thrift](https://github.com/apache/jena/blob/main/jena-arq/Gra
12: RDF_Decimal valDecimal
}

### Thrift encoding of Triples, Quads and rows. {#encoding-tuples}
### Thrift encoding of Triples, Quads and rows. {#encoding-thrift-tuples}

struct RDF_Triple {
1: required RDF_Term S
@@ -104,7 +106,7 @@ Source: [BinaryRDF.thrift](https://github.com/apache/jena/blob/main/jena-arq/Gra
2: required string uri ;
}

### Thrift encoding of RDF Graphs and RDF Datasets {#encoding-graphs-datasets}
### Thrift encoding of RDF Graphs and RDF Datasets {#encoding-thrift-graphs-datasets}

union RDF_StreamRow {
1: RDF_PrefixDecl prefixDecl
@@ -116,7 +118,7 @@ RDF Graphs are encoded as a stream of `RDF_Triple` and `RDF_PrefixDecl`.

RDF Datasets are encoded as a stream of `RDF_Triple`, `RDF-Quad` and `RDF_PrefixDecl`.

### Thrift encoding of SPARQL Result Sets {#encoding-result-sets}
### Thrift encoding of SPARQL Result Sets {#encoding-thrift-result-sets}

A SPARQL Result Set is encoded as a list of variables (the header), then
a stream of rows (the results).
@@ -128,3 +130,144 @@ a stream of rows (the results).
struct RDF_DataTuple {
1: list<RDF_Term> row
}

## Protobuf encoding of RDF Terms {#encoding-terms-protobuf}

The Protobuf schema is simialr.

Source:
[binary-rdf.proto](https://github.com/apache/jena/blob/main/jena-arq/Grammar/RDF-Protobuf/binary-rdf.proto)

Streaming isused to allow for abitrary size graphs. Therefore the steram items
(`RDF_StreamRow` below) are written with an initial length (`writeDelimitedTo`
in the Java API).

See
[Protobuf Techniques Streaming](https://developers.google.com/protocol-buffers/docs/techniques#streaming).

```
syntax = "proto3";
option java_package = "org.apache.jena.riot.protobuf.wire" ;
// Prefer one file with static inner classes.
option java_outer_classname = "PB_RDF" ;
// Optimize for speed (default)
option optimize_for = SPEED ;
//option java_multiple_files = true;
// ==== RDF Term Definitions
message RDF_IRI {
string iri = 1 ;
}
// A prefix name (abbrev for an IRI)
message RDF_PrefixName {
string prefix = 1 ;
string localName = 2 ;
}
message RDF_BNode {
string label = 1 ;
// 2 * fixed64
}
// Common abbreviations for datatypes and other URIs?
// union with additional values.
message RDF_Literal {
string lex = 1 ;
oneof literalKind {
bool simple = 9 ;
string langtag = 2 ;
string datatype = 3 ;
RDF_PrefixName dtPrefix = 4 ;
}
}
message RDF_Decimal {
sint64 value = 1 ;
sint32 scale = 2 ;
}
message RDF_Var {
string name = 1 ;
}
message RDF_ANY { }
message RDF_UNDEF { }
message RDF_REPEAT { }
message RDF_Term {
oneof term {
RDF_IRI iri = 1 ;
RDF_BNode bnode = 2 ;
RDF_Literal literal = 3 ;
RDF_PrefixName prefixName = 4 ;
RDF_Var variable = 5 ;
RDF_Triple tripleTerm = 6 ;
RDF_ANY any = 7 ;
RDF_UNDEF undefined = 8 ;
RDF_REPEAT repeat = 9 ;
// Value forms of literals.
sint64 valInteger = 20 ;
double valDouble = 21 ;
RDF_Decimal valDecimal = 22 ;
}
}
// === StreamRDF items
message RDF_Triple {
RDF_Term S = 1 ;
RDF_Term P = 2 ;
RDF_Term O = 3 ;
}
message RDF_Quad {
RDF_Term S = 1 ;
RDF_Term P = 2 ;
RDF_Term O = 3 ;
RDF_Term G = 4 ;
}
// Prefix declaration
message RDF_PrefixDecl {
string prefix = 1;
string uri = 2 ;
}
// StreamRDF
message RDF_StreamRow {
oneof row {
RDF_PrefixDecl prefixDecl = 1 ;
RDF_Triple triple = 2 ;
RDF_Quad quad = 3 ;
RDF_IRI base = 4 ;
}
}
message RDF_Stream {
repeated RDF_StreamRow row = 1 ;
}
// ==== SPARQL Result Sets
message RDF_VarTuple {
repeated RDF_Var vars = 1 ;
}
message RDF_DataTuple {
repeated RDF_Term row = 1 ;
}
// ==== RDF Graph
message RDF_Graph {
repeated RDF_Triple triple = 1 ;
}
```
@@ -67,18 +67,18 @@ as:

The following is a suggested Apache httpd .htaccess file:

AddType text/turtle .ttl
AddType application/rdf+xml .rdf
AddType application/n-triples .nt
AddType text/turtle .ttl
AddType application/rdf+xml .rdf
AddType application/n-triples .nt

AddType application/ld+json .jsonld
AddType application/owl+xml .owl
AddType application/ld+json .jsonld

AddType text/trig .trig
AddType application/n-quads .nq
AddType text/trig .trig
AddType application/n-quads .nq

AddType application/trix+xml .trix
AddType application/rdf+thrift .trdf
AddType application/trix+xml .trix
AddType application/rdf+thrift .rt
AddType application/rdf+protobuf .rpb

### Example 1 : Using the RDFDataMgr {#using-rdfdatamgr}

@@ -17,7 +17,7 @@ See [Reading RDF](rdf-input.html) for details of the RIOT Reader system.
- [Turtle and Trig format options](#opt-turtle-trig)
- [N-Triples and N-Quads](#n-triples-and-n-quads)
- [JSON-LD](#json-ld)
- [RDF Binary](#rdf-thrift)
- [RDF Binary](#rdf-binary)
- [RDF/XML](#rdfxml)
- [Examples](#examples)
- [Notes](#notes)
@@ -110,9 +110,10 @@ an `RDFFormat` internally. The normal writers are:
| RDFXML | RDF/XML, pretty printed |
| RDFJSON | |
| TRIX | |
| RDFTHRFT | RDF Thrift |
| RDFTHRFT | RDF Binary Thrift |
| RDFPROTO | RDF Binary Protobuf |

Pretty printed RDF/XML is also known as RDF/XML-ABBREV
Pretty printed RDF/XML is also known as RDF/XML-ABBREV.

### Pretty Printed Languages

@@ -369,21 +370,25 @@ cases.
What can be done, and how it can be, is explained in the
[sample code](https://github.com/apache/jena/tree/main/jena-arq/src-examples/arq/examples/riot/Ex_WriteJsonLD.java).

### RDF Binary {#rdf-thrift}
### RDF Binary {#rdf-binary}

[This is a binary encoding](rdf-binary.html) using
[Apache Thrift](https://thrift.apache.org/) for RDF Graphs
[Apache Thrift](https://thrift.apache.org/) or
[Google Protocol Buffers](https://developers.google.com/protocol-buffers)
for RDF Graphs
and RDF Datasets, as well as SPARQL Result Sets, and it provides faster parsing
compared to the text-based standardised syntax such as N-triples, Turtle or RDF/XML.

| RDFFormat |
|------------------|
| RDFTHRIFT |
| RDFTHRIFT_VALUES |
| RDFFormat |
|-------------------|
| RDF_THRIFT |
| RDF_THRIFT_VALUES |
| RDF_PROTO |
| RDF_PROTO_VALUES |

`RDFTHRIFT_VALUES` is a variant where numeric values are written as values,
`RDF_THRIFT_VALUES` and `RDF_PROTO_VALUES` are variants where numeric values are written as values,
not as lexical format and datatype. See the
[description of RDF Thrift](http://afs.github.io/rdf-thrift)
[description of RDF Binary](https://rdf-binary.html).
for discussion.

### RDF/XML {#rdfxml}
@@ -7,8 +7,8 @@ fashion. Streaming can be used for manipulating RDF at scale. Jena
provides high performance readers and writers for all standard RDF formats,
and it can be extended with custom formats.

The [RDF Binary using Apache Thrift](rdf-binary.html) provides the highest
input parsing performance. N-Triples/N-Quads provide the highest
The [RDF Binary](rdf-binary.html) provides the highest
input parsing performance. N-Triples/N-Quads provide the highest
input parsing performance using W3C Standards.

Files ending in `.gz` are assumed to be gzip-compressed. Input and output
@@ -105,3 +105,4 @@ N-Triples and N-Quads are always written as a stream.
| `RDFFormat.NQUADS_ASCII` | |
| `RDFFormat.TRIX` | `Lang.TRIX` |
| `RDFFormat.RDF_THRIFT` | `Lang.RDFTHRIFT` |
| `RDFFormat.RDF_PROTO` | `Lang.RDFPROTO` |

0 comments on commit 4a04f19

Please sign in to comment.