Skip to content
Tatu Saloranta edited this page Feb 6, 2014 · 14 revisions

Features

Basic storage

Storage implementation consists of two parts:

  • A local database, exposed as StorableStore (and implemented by StorableStoreImpl), which delegates to an implementation (StoreBackend), in which all entry metadata is stored, along with small inlined data.
    • Currently two backend implementations exists:
      • BDB-JE (see storemate-backend-bdb-je sub-module)
      • LevelDB/Java (see storemate-backend-leveldb sub-module) -- NOTE: it should be simple to plug-in JNI-accessible native storage too, but I haven't tested this set up
  • File system, where larger data entries ("blobs") are stored.
    • Directory structure is handled by FileManager; amount of code is minimal and mostly consists of arranging files in balanced directories, using timestamps for creating new directories as necessary.

Data to store is divided in three parts:

  • Opaque key: no semantics are implied at StoreMate level, sorting is based on raw byte collation: additional semantics are usually defined by higher-level systems.
  • Entry metadata
    • Standard entry metadata contains minimal state information, such as is-soft-deleted flag; compression indicator, checksum for payload, last-modified time
    • Optional custom metadata is opaque byte sequence (byte[]) exposed to higher-level systems: StoreMate simply stores it along with other metadata without using it for anything
  • Payload: actual data to store; small payloads are inlined in the database (size threshold configurable, typically something like 2kB), larger ("blobs") are stored on disk.

Data compression

Payload is automatically compressed (unless detected to be compressed, or explicitly instructed not to compress) when stored; and uncompressed unless client indicates it accepts compressed data. This is meant to allow convenient but customizable handling of compression over protocols like HTTP.

Currently two compression formats are supported: GZIP, LZF. GZIP is used for smaller entries, due to its higher CPU overhead; LZF for larger entries. LZF also supports efficient content skipping, important if Content Range (partial payload data access) is to be supported at higher level.

Checksums

Checksums are automatically calculated over payload, and used to guard against data corruption both on uploads (assuming client provides checksums to compare against) and when offering data for synchronization. All checksums are calculated using MurMur3/32 (32 bit) algorithm.

Indexing

A single secondary ("last-modified") index is maintained. It was designed to allow for reliable and efficient Change List style node-to-node synchronization of content. Backends that natively support secondary indexes (BDB-JE) use it; others (LevelDB) simply use another table and handle synchronization separately; regardless, library presents unified view of atomic CRUD operations to using application.

Operations

Basic CRUD (create, read, update, delete) operations are supported; as well as iteration over Key and Last-Modified orders.

Last-modified order can be used for change list traversal; and key order iteration for entry-range queries.

Documentation

Javadocs:

Implementation

Format definitions:

Related

Projects that use StoreMate:

  • ClusterMate is a framework for building distributed systems, and it uses StoreMate as its per-node storage layer.