Skip to content


Brad Bebee edited this page Feb 13, 2020 · 1 revision

LUBM Benchmark using Bulk Data Loader

Generate an LUBM data set

This generates the LUBM U100 data set. You can generate much larger data sets in exactly the same way. The files will be written onto $NAS in the specified directory. By default this will generate gzip'd RDF/XML files. The LUBM generator is single threaded so it can take quite a while to generate a large data set.

mkdirs ${NAS}/data/U100

# This assumes that you are running as root. If running as a normal user, then set the corresponding group on this directory.
chgrp -R wheel ${NAS}/data

cd ${NAS}/data/U100 100

Bulk load a data set

Edit the main bigdata configuration file and specify the data set to bulk load in the RDFDataLoadMaster configuration section. With the federation running, start the bulk load using The same approach works with any data set. The RDFDataLoadMaster is setup by default to load files from a shared volume, but the behavior is extensible and can be made to load from URLs, HDFS, etc.

tail -f nohup.out

nohup is used since a large data set load can run for hours. If you have setup the ssh tunnel then you can watch the progress using the Excel worksheets.

LUBM Query

Run the LUBM queries for the named KB instance.

nohup U100&
tail -f nohup.out
Clone this wiki locally
You can’t perform that action at this time.