LUBM

The following instructions will let you run the LUBM benchmark against an embedded bigdata database.

Get the code The LUBM benchmark can be downloaded from [1]. Directions on its use are available from the project home page. You can download a modified version of the LUBM benchmark which can make it a bit easier to use with bigdata from here. The core benchmark is the same. We've added an HTTP SPARQL end point which is used to connect to bigdata and some new options for the generator which are useful when you are generating very large data sets for a cluster. Please contact the project maintainers if you have questions about this modified version of the LUBM benchmark.

The rest of this page assumes that you are working with the modified version of the LUBM test harness.

Obtain and unpack the code.

  https://www.blazegraph.com/files/bigdata-lubm.tgz

  tar xvfz bigdata-lubm.tgz
  cd bigdata-lubm

Configure LUBM

Edit build.properties, paying attention to at least:

bigdata.dir - Where to find the bigdata source code distribution.
lubm.univ - The data set size.
lubm.maxMem - The JVM heap used by the NanoSparqlServer in the tests.
lubm.baseDir - Where to put the generated data files, etc.
lubm.journalFile - The bigdata backing store file.

Note: The bigdata-lubm/lib directory includes a version of the sesame one jar. You may need to replace this jar with the one that works with the version of bigdata that you are test. For example, use sesame 2.3.0 with bigdata 1.0.x.

Download the blazegraph.jar

https://github.com/blazegraph/database/releases/latest

Build lubm

Note: The openrdf dependencies are required in order to build the bigdata-lubm project. You MUST use the correct version of the openrdf dependency for the version of bigdata that you are testing. If you compile the bigdata-lubm project against the wrong openrdf dependency version then you can have run-time dependency errors when you try to load the data or query the data.

 cd ...
 ant

Generate a data set Generate the LUBM data set per the build.properties file.

  ant run-generator

Load a data set

Load an LUBM data set into bigdata per the build.properties file.

  ant run-load

Running

The NanoSparqlServer is used to answer SPARQL queries. It "knows" about bigdata's MVCC semantics (multi-version concurrency control) and will issue queries to a read-only connection reading from the last commit time on the database and may have somewhat better performance or concurrency as a result. You can more or less follow the same instructions if you want to run against a bigdata federation, but you will have to have the federation up and running already and you will have to use the bulk data loader for the federation to get the data into the database.

Start an http sparql endpoint for that bigdata database instance.

 ant start-nano-server

Run the lubm queries (do this in a different terminal window).

 ant run-query

Results

Here are some sample results.

LUBM U50 (WORM)

LUBM U50 using the Journal in the WORM mode. The load time was 122 seconds (56,183 triples per second). Closure time was 44 seconds.

 [java] query       Time    Result#
 [java] query1      40      4
 [java] query3      8       6
 [java] query4      48      34
 [java] query5      59      719
 [java] query7      22      61
 [java] query8      260     6463
 [java] query10     22      0
 [java] query11     20      0
 [java] query12     27      0
 [java] query13     19      0
 [java] query14     3068    393730
 [java] query6      2800    430114
 [java] query9      3590    8627
 [java] query2      999     130
 [java] Total       10982

LUBM U50 (RWStore)

 [java] query	Time	Result#
 [java] query1	28	4
 [java] query3	17	6
 [java] query4	29	34
 [java] query5	39	719
 [java] query7	16	61
 [java] query8	166	6463
 [java] query10	29	0
 [java] query11	29	0
 [java] query12	25	0
 [java] query13	27	0
 [java] query14	2778	393730
 [java] query6	2920	430114
 [java] query2	540	130
 [java] query9	3356	8627
 [java] Total	9999

Introduction