Skip to content

Latest commit

 

History

History
51 lines (31 loc) · 5.62 KB

File metadata and controls

51 lines (31 loc) · 5.62 KB
description
High level concepts that serve as the foundation for understanding and working with RecallGraph. Skip this if you're familiar with graph databases, ArangoDB and temporal database constructs.

Background

What are Graph Databases?

Graph databases have become ubiquitous over the years due to their incredible representational and querying capabilities, when it comes to highly interconnected data. Wherever data has an inherent networked structure, graph databases fare better at storing and querying that data than other NoSQL databases as well as relational databases, because they naturally persist the underlying connected structure. This allows for traversal semantics in declarative graph query languages, and also, better performance than SQL - especially for deep traversals. Additionally, they often help unravel emergent network topologies in legacy data, that had not previously been mined for such structures. At the very least, they make the process a lot less tedious.

Versioned Graph Stores

In addition to reaping the benefits of living in graph databases, many real world applications also stand to take advantage of network evolution models, i.e. a record of changes to a network over time; for example, analyzing railway track utilization efficiency as a function of signal array timing, or the simulation of nucleotide concentration changes over time in a nuclear fission reactor. However, in most of the prominent mainstream graph databases that are freely available at the time of this writing, I have not come across any that offer some sort of built-in revision tracking (meaning older versions of data are retained for future retrieval).

Particularly for graph databases, the concept of revisions applies not only to individual nodes and edges, but also to the structure of the graph as a whole, i.e. it should be relatively easy to store and retrieve not only individual document (node/edge) histories, but also the structural history of the graph or a portion of it. This is a key difference between a hypothetical versioned or historical graph database and a general purpose event store, which is usually tuned for the former but not the latter.

There is a need for a practical, historical graph database that has the following minimal set of characteristics:

  1. A mechanism for efficiently recording individual document (node/edge) writes (creates/updates/deletes) in such a way that they can be rewound and replayed.
  2. An internal storage architecture that not only maintains the current structure of the graph, but also allows for a quick rebuild and retrieval of its structure at any point of time in the past. This could, optionally, be optimized to retrieve recent structures faster than older ones.
  3. An efficient query engine that can traverse current/past graph structures to retrieve sub-graphs or k-hop neighborhoods of specified nodes. In case of historical traversals, this should be optimized to rebuild only the relevant portions of the graph, where feasible.

{% hint style="info" %} For a deep dive into the criticality of versioning your data, and how RecallGraph helps meet your data versioning needs, see https://blog.recallgraph.tech/never-lose-your-old-data-again. {% endhint %}

About ArangoDB

ArangoDB is a free and open-source native multi-model database system developed by ArangoDB GmbH. The database system supports three data models (key/value, documents, graphs) with one database core and a unified query language AQL (ArangoDB Query Language). The query language is declarative and allows the combination of different data access patterns in a single query. ArangoDB is a NoSQL database system but AQL is similar in many ways to SQL.

ArangoDB has been referred to as a universal database but its creators refer to it as a "native multi-model" database to indicate that it was designed specifically to allow key/value, document, and graph data to be stored together and queried with a common language.

Why RecallGraph?

There is a general consensus in the computing and scientific research community for the need of a historical graph database. A database that records entity write operations (creates/updates/deletes) as a series of deltas wrapped in events. Each delta is the difference between the contents of the updated entity and its previous version. It is part of an event payload, where the event represents the particular write operation (create/update/delete) that occurred. Thus, deltas encode the entire write history of the entity. RecallGraph was developed to fulfill this.

Built-In Transaction Time Dimension

Transaction time is the record of the actual time in the real world when a fact was recorded in the database. This is often auto-filled by the database itself, using its system clock.

(Planned) Built-In Valid Time Dimension

Valid time is a marker of the time of a real-world or business context-specific event, of which a database record is a representation. The time of occurrence of the actual event may be different from the time when it was recorded in the DB. This necessitates the existence of valid time as being separate and independent from transaction time, in order to distinguish between the two timestamps.

{% hint style="info" %} For an in-depth explanation of temporal dimensions and temporality as applies to databases, see https://adityamukho.com/exploring-temporality-in-databases. {% endhint %}