Skip to content


Brad Bebee edited this page Feb 13, 2020 · 1 revision

DataLoader utility may be used to create and/or load RDF data into a local database instance. Directories will be recursively processed. The data files may be compressed using zip or gzip, but the loader does not support multiple data files within a single archive.

Command line:

java -cp *:*.jar [-quiet][-closure][-verbose][-namespace namespace] propertyFile (fileOrDir)*
parameter definition
-quiet Suppress all stdout messages.
-verbose Show additional messages detailing the load performance.
-closure Compute the RDF(S)+ closure.
-namespace The namespace of the KB instance.
propertyFile The configuration file for the database instance.
fileOrDir Zero or more files or directories containing the data to be loaded.


1. Load all files from /opt/data/upload/ directory using /opt/data/upload/ properties file:

java -cp *:*.jar /opt/data/upload/ /opt/data/upload/

2. Load an archive /opt/data/data.nt.gz using /opt/data/upload/ properties file into a specified namespace:

java -cp *:*.jar -namespace someNameSpace /opt/data/upload/ /opt/data/data.nt.gz

If you are loading data with an enabled inferencing, then a temporary file will be created to compute the delta in entailments. The temporary file could grow extremely in case of loading a large data set. It may cause "no space left on device" error and, as a consequence, the data loading process will be interrupted. To avoid such a situation, it is strongly recommended to specify the DataLoader.Options.CLOSURE property as ClosureEnum.None in the properties file:

You may need to specify Java heap size to match data size. In most cases 6G will be enough (add java parameter: -Xmx6g). Also beware of setting more than 8G heap due to garbage collector pressure.

Then load the data using the DataLoader and pass it the -closure option:

java -Xmx6g -cp *:*.jar -closure /opt/data/upload/ /opt/data/upload/

The DataLoader will not do incremental truth maintenance during the load. Once the load is complete it will compute all entailments. This will be the "database-at-once" closure and will not use a temporary store to compute the delta in entailments. Thus the temporary store will not "eat your disk".

Clone this wiki locally