Skip to content

StandaloneGuide

Brad Bebee edited this page Feb 13, 2020 · 1 revision

There are two distinct persistence store modes for standalone bigdata instances. This page briefly describes these two modes and provides some guidance as to why you would choose one mode or the other. Both the WORM and RW modes support HighAvailability based on replicating writes from a master along a failover chain.

WORM

The WORM (Write-Once, Read-Many) is the traditional log-structured append only journal. It was designed for very fast write rates and is used to buffer writes for scale-out. This is a good choice for immortal databases where people want access to ALL history. It scales to several billions of triples.

The WORM mode is selected with the following option.

com.bigdata.journal.AbstractJournal.bufferMode=Disk

RW

The RW store (Read-Write) supports the recycling of allocation slots on the backing file. It may be used as a time-bounded version of an immortal database where history is aged off of the database over time. This is a good choice for standalone workloads where updates are continuously arriving and older database states may be released. The RW store is also less sensitive to data skew because it can reuse B+Tree node and leaf revisions within a commit group on a large data set loads. Scaling is to 50B+ triples or quads.

The RW mode is selected with the following option.

 com.bigdata.journal.AbstractJournal.bufferMode=DiskRW

Binary compatibility

The RW and WORM modes DO NOT have binary compatibility. They use very different internal structures to manage their allocations on the persistence store. However, they have very good logical compatibility. For the most part, applications should run over either store without change. The only exceptions would be applications, which deliberately try to take advantage of the RW store features for aging history.

Embedded use of standalone modes

Both the WORM and RW modes can operate in either very small memory footprints or with very large heaps if you chose the appropriate configuration options. Other than the memory allocated to the JVM, other configuration options which can influence performance are the size and #of write cache buffers (these are direct memory buffers), the size of the write retention queue, and the capacity of the global LRU.

The number of dependencies can also be pruned depending on your application requirements and deployment goals.

OpenRDF Sesame Server

See Using Blazegraph with the OpenRDF Sesame HTTP Server if you are trying to install Blazegraph with the Sesame Server.

Clone this wiki locally