AnalyticQuery
APIs
NanoSparqlServer
REST API
JAVA Client API
Client Libraries
Notes on compliance with the Sesame API
Apache Tinkerpop 3
Using Blueprints with Blazegraph
Blazegraph FAQ
Sample Applications
Sesame API embedded mode
Sesame API remote mode
Blueprints API embedded mode
Blueprints API remote mode
RDR
SPARQL Extensions
Full Text Search
External Full Text Search
GeoSpatial Search
Analytic Query
Virtual Graphs
Query Hints
Named Subquery
SPARQL Update
Federated Query
Stored Query
Property Paths
Custom Function
InlineIVs
Linked Data
Inference and Truth Maintenance
Reification Done Right
Graph Mining
RDF GAS API
Optimizations and Benchmarking
Performance Optimization
IO Optimization
Query Optimization
Using Explain
SPARQL Benchmarks
Deeper Background
Standalone Guide
BTree Guide
Tx Guide
Striterators
RWStore
Memory Manager
Retention History
Unicode
Developers
Maven Repository
Client Libraries
Sesame API Compliance
Maven Notes
Code Examples
Submitting Bugs
Contributors
ReleaseGuide
Javadoc/API
Common Problems
Operations procedures
Data Migration
Bulk Data Load
Rebuilding the Text Index
Hardware Configuration
Clone this wiki locally
Starting with our 1.1 release, bigdata includes an optional See analytic query mode. Enabling analytic query turns on support for the MemoryManager and the HTree and allows bigdata to scale to 4TB of data on the native process heap with zero GC overhead. In the future it will also turn on the runtime query optimizer (RTO).
Background
The problem is the Java architecture for managed memory. What you need to do for this query (and others like it) is turn on the “analytic” mode for bigdata (see below).
What the analytic query mode will do for you is buffer the data on the native process heap rather than on the JVM object heap. This will reduce the GC overhead associated with the query to basically zero. It performs this feat entirely within Java by leveraging the java.nio package.
There are analytic and non-analytic versions of all the joins, distinct, etc. operators. The analytic versions use the MemoryManager and the HTree. The non-analytic versions use Java collection classes. The Java collection classes are somewhat faster as long as you are not materializing a lot of data on the Java object heap. For example, for the BSBM “explore” use case the Java operators are about 10% faster overall. DISTINCT is a special case. The Java version of that operator uses a ConcurrentHashMap under the covers and can give you much higher concurrency in the query. But, if you are trying to DISTINCT a large number solutions then you are going to run into trouble with the Garbage Collector.
Turing on Analytic Query
There are several ways to turn on the “analytic” mode for bigdata. The easiest way to do this is to check the:
[x] analytic
option on the NanoSparqlServer’s SPARQL query form page. If you are using the NanoSparqlServer you can also specify the URL query parameter
...&analytic=true
Finally, you can enable this with a magic triple directly in the SPARQL query:
SELECT ...
...
hint:Query hint:analytic "true" .
...
Just put that triple somewhere in the WHERE clause of the query and the query will run with the “analytic” options enabled. You do not need to declare the “hint:” prefix, but if you want to the namespace should be “http://www.bigdata.com/queryHints\#”.
Globally for All Queries
You can pass in the property below to globally enable the Analytic Query mode for a running instance.
-Dcom.bigdata.rdf.sparql.ast.QueryHints.analytic=true