RetentionHistory

Accessing Historical Data

Running on a local datastore, Bigdata provides an option to retain historical data for a specified period.

Background

The RWStore (for Read-Write) provides an updatable store that can efficiently recycle data allocations as it is updated.

By default no historical data is retained and long-lived read-only transactions are supported via an internal mechanism that protects against the immediate re-allocation of any storage accessible to them.

The option to retain historical data has two main usages:

1) To explicitly allow access to historical data for some analytical purpose.

2) To enable the management of overlapping read-only transactions used in continual load/query deployments that would otherwise be unable to recycle storage.

Staring Up

We will use the Bigdata Sail classes for these examples.

// A simple initialisation method setting a retention period
public BigdataSail initializeSail(final long retention_ms) {
  Properties properties = new Properties();

  // create temporary file for this application run
  File journal = File.createTempFile("BIGDATA", "jnl").getAbsolutePath();

  properties.setProperty(BigdataSail.Options.FILE, journal.getAbsolutePath());

  // Set RWStore
  properties.setProperty(Options.BUFFER_MODE, BufferMode.DiskRW.toString());

  // Set retention with the minimum release age propertyproperty
  properties.setProperty(AbstractTransactionService.Options.MIN_RELEASE_AGE, retention_ms);

  BigdataSail sail = new BigdataSail(properties);
  sail.initialize();

  return sail;
}

Using the above method we can easily create a Bigdata Sail with a specified retention period.

Note that we have also set the BufferMode to specify the RWStore

Historical State

The historical retention points are accessible using a "state" value. This is approximately the system time returned by System.currentTimeMillis(), but you must not rely on using the system time to record retention points. Instead you should retrieve the commit time from the connection.

Here is a handy method:

// Method to commit a connection and return the commit time
public long commit(final BigdataSailRepositoryConnection cxn) {
  cxn.commit();

  return cxn.getRepository().getDatabase().getIndexManager().getLastCommitTime();
}

Exception

But of course this all relies on the reason why you want to retain the history. You may wish to retain a few days history and to be able to run queries as of a few hours previously, without concern for a precise state; in which case using the system time will be quite sensible.

General Pattern

The general approach is to maintain an "update" connection and to make queries against a read only connection:

  final long TWO_HOURS = 2L * 60 * 60 * 1000;
  final BigdataSail sail = initializeSail(TWO_HOURS);

  final BigdataSailRepository repo = new BigdataSailRepository(sail);

  final BigdataSailRepositoryConnection cxn = repo.getConnection();

The cxn is used to update the repository by adding and removing statements explicitly or via queries.

The commit method defined above can then be used to commit the current set of updates and return the state which can later be used to read from this commit point.

  final long rememberedState = commit(cxn);

The Read Only Connection

Generally read only connections are required for two reasons:

1) To access the currently committed state:

final BigdataSailRepositoryConnection ro_cxn1 = repo.getReadOnlyConnection(ITx.READ_COMMITTED);

or 2) To access a specified state:

final BigdataSailRepositoryConnection ro_cxn2 = repo.getReadOnlyConnection(rememberedState);

Tidy Up

Remember to close those read only connections when you are done with them or they may hang around resulting in more history retention than you planned for.

final BigdataSailRepositoryConnection ro_cxn2 = repo.getReadOnlyConnection(rememberedState);
try {
  // ..do something
} finally {
  ro_cxn2.close();
}

Reference

1) Sample program

Introduction