InlineIVs
APIs
NanoSparqlServer
REST API
JAVA Client API
Client Libraries
Notes on compliance with the Sesame API
Apache Tinkerpop 3
Using Blueprints with Blazegraph
Blazegraph FAQ
Sample Applications
Sesame API embedded mode
Sesame API remote mode
Blueprints API embedded mode
Blueprints API remote mode
RDR
SPARQL Extensions
Full Text Search
External Full Text Search
GeoSpatial Search
Analytic Query
Virtual Graphs
Query Hints
Named Subquery
SPARQL Update
Federated Query
Stored Query
Property Paths
Custom Function
InlineIVs
Linked Data
Inference and Truth Maintenance
Reification Done Right
Graph Mining
RDF GAS API
Optimizations and Benchmarking
Performance Optimization
IO Optimization
Query Optimization
Using Explain
SPARQL Benchmarks
Deeper Background
Standalone Guide
BTree Guide
Tx Guide
Striterators
RWStore
Memory Manager
Retention History
Unicode
Developers
Maven Repository
Client Libraries
Sesame API Compliance
Maven Notes
Code Examples
Submitting Bugs
Contributors
ReleaseGuide
Javadoc/API
Common Problems
Operations procedures
Data Migration
Bulk Data Load
Rebuilding the Text Index
Hardware Configuration
Clone this wiki locally
Inline IVs
There are two kinds of identifiers in Blazegraph. External identifiers that correspond to RDF Values (IRIs, blank nodes, and Literals) and Internal Values (IVs). Internal Values are generated by a number of different mechanisms, but the specific mechanisms must be stable across the life cycle of a given triple or quad store instance. The typical mechanisms are:
- Dictionary coding (the TERM2ID, ID2TERM, and BLOBS indices, which encoding the RDF Value into an TermId or BlobIV).
- Vocabulary declarations (which encode the RDF Value into 2-3 bytes).
- Inlining of numerical XSD data types (including fixed length types such as xsd:int, xsd:float, xsd:long, etc. as well as variable length types such as xsd:Integer and xsd:Decimal).
- Inlining of small XSD non-numeric types.
- Inlining of blank nodes.
- Inlining according to application specific logic.
There are several major advantages to inlining:
- Inlined IVs make it possible to recover the external form of the RDF Value without a dictionary look against an index. This is a huge performance gain when it comes time to externalize the results of a query.
- Inlined IVs allow FILTERS based on comparisons in the value space (other than equality and inequality) to be evaluated directly against the inline IV (rather than doing a JOIN against the dictionary indices).
- Inlined IVs do not need to be stored in the dictionary indices. This reduces the size of those indices on the disk and reduces the IO Wait associated with the update of the dictionary indices.
Configuring the Inlining Behavior
For the most part, inlining is enabled by default. The LexiconConfiguration class is responsible for decisions about what can and cannot be inlined. Those decisions are made based on the AbstractTripleStore.Options.
Specifying a VocabularyClass
Blazegraph is capable of inlining the IRIs for pre-declared vocabulary items. The inlined IRIs occupy 2-3 bytes in the statement indices making them very compact. Further, since these vocabulary items are pre-declared, they can be decoded without reference to the dictionary indices. Thus they have no overhead when externalizing RDF Value objects from inline values.
Blazegraph uses a VocabularyClass by default. If you use other ontologies, you may want to extend an existing bigdata Vocabulary to also specify your own Vocabulary. One good extension point is com.bigdata.rdf.vocab.RDFSVocabulary. This class provides the IRI declarations for RDF, RDFS, OWL, FOAF, SKOS, Dublin Core, XML Schema and openrdf.
Vocabulary classes determine how external IRIs are mapped into internal values. This mapping MUST be stable. Thus you MUST version your Vocabulary class (by creating a new implementation class) each time you modify it. Otherwise you risk having an inconsistent encoding of some IRIs through the dictionary indices and the vocabulary class. This would result in a failure to correctly query the data involving those IRIs.
If you do define your own vocabulary class, then you would specify it as follows when creating a new triple or quad store.
com.bigdata.rdf.store.AbstractTripleStore.vocabularyClass=com.bigdata.rdf.vocab.MyVocabularyVersion1
Example Custom Vocabulary
You can see an example of a custom vocabulary for the PubChem data set on github.