Skip to content
cliffmoon edited this page Sep 12, 2010 · 7 revisions

The Concept

Dynomite currently provides integrated storage and distribution, requiring developers to adopt a simple, key/value
data model to get the availability and scalability advantages. By separating these two functions, developers can
take advantage of the sophisticated distribution and scaling techniques of Dynomite with great flexibility in the
choice of data model. In this new architecture, Dynomite handles data partitioning, versioning, and read repair,
and user-provided storage engines provide persistence and query processing.

Why?

  • Modern databases must be distributed.
  • Distributed databases are very hard to implement properly.
  • Dynomite solves many of the distributed database problems developers encounter.
  • Splitting out the data distribution components of Dynomite opens them up for use by implementers of many other, distributed database systems, even those that do not use the key/value model of Dynomite storage.

Handling Arbitrary Updates

Handling arbitrary updates requires breaking down different categories of updates. Broadly speaking, an update can operate on one row or on many rows at once. An update can either replace or insert a new row, or it might modify some of the data in an existing row if it exists. For the purposes of actually distributing the update, Dynomite cares about whether the update will affect one row or many and how to efficiently locate the rows in terms of which partition in which they belong. The second categorization of an update, whether it modifies an entire row or part of a row, is a concern of the storage engine and whatever data model it wants to present. As long as Dynomite has a way to hash the result of the update it can handle arbitrary intra-row updates.

Handling Single Row Updates

Single row updates should be the easiest to manage. For any single row update query the query should include a key parameter which uniquely identifies the row. This will allow Dynomite to direct the request to the correct nodes based on whichever partition is responsible for that particular key. This will work almost exactly like how put operations are currently handled in Dynomite.

Dynomite engine API

open(Directory, Name, Options) → {ok, DBHandle} | {error, Reason} Directory = A path on the filesystem which the storage engine should use for persistent data. Name = The name of the partition. It’s optional that the storage engine care about this. Options = A proplist of options taken from the storage engine specific section of the config file. Useful for tuning the parameters of a storage engine.

Opens up a new instance of the storage engine. It’s a requirement that multiple storage engine instances should run independently in the same VM.

api_info() → {api_info, HashFun = fun(Keys), MethodSpecs = [api_spec()] } api_spec() = { ReadWrite = read | write | cas, Plurality = single | many, Designator = get | put | scan | undefined, Name = atom(), Arguments = [argument_type_spec()], PartitionStrategy = single | many, CollationFun = fun(ResultsA, ResultsB) | undefined, ResolutionFun = fun(ContextA, ValueA, ContextB, ValueB) | undefined } argument_type_spec() = { Key = true | false, Name = atom(), DataType = binary | object %% how will this impact serialization across the wire? }

Returns a full spec of the databases api. The api spec gives details about the various functions exported by the storage engine module.