The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119.
PomoDB represents data in a consistent, content addressable data format for Datalog facts. This specification describes the IPLD encoding.
Interplanetary Linked Data (IPLD) is a consistent, highly general data model for content addressed linked data forming any DAG. This data model provides a convenient pivot between many serializations of the same information.
PomoDB was originally designed with IPFS — and by extension IPLD — in mind. Content addresses depend on the codec of some data. As such, the exact layout of the data as IPLD is important. This specification gives the IPLD encoding.
A PomoDB "Fact" is an ordered 4-tuple ("quad") consisting of an Entity ID, Attribute, Value, and Caused By.
type Fact struct {
entityId EntityID
attribute Attribute
value Value
causedBy [&Fact]
} representation tuple
Note that the representation tuple
flattens the struct to a positional array. For example, the following concrete DAG-JSON representation parses to the IPLD representation below it (given in Rust).
// DAG-JSON
[123, "name/last", "Monroe", [{"/": "bafyreiaajfbxfnbbdbhvxmowe6t63ytsimv4daiitv5gkqetwrpww5zmsy"}]]
// Rust
Fact {
attribute: "name/last",
caused_by: [Link("bafyreiaajfbxfnbbdbhvxmowe6t63ytsimv4daiitv5gkqetwrpww5zmsy")],
entity_id: 123,
value: "Monroe",
}
Restricting an entity ID to 128-bits is RECOMMENDED.
type EntityID = Bytes
Attributes MUST be represented as one of the following:
type Attribute
= Integer -- e.g. Normal indices
| Float -- e.g. Fractional indices
| String
| Bytes
Values MUST be given as one of the following:
type Value union {
| Boolean
| Integer
| Float
| String
| Link
| Bytes
} representation kinded
Note that all floating point values MUST be representable as 64-bit (double-precision) floats as defined in IEEE 754-2019 when deserialized. NaN
s MUST NOT be used.
Links to other facts MUST be placed in an array in the causedBy
field.
type CausedBy = [&Fact]
An OPTIONAL capsule type to clarify the enclosed data MAY be used.
type FactCapsule struct {
c Fact (rename "pomodb/v0.1/fact")
}
As PomoDB is intended to operate over many Facts, storing and referencing collections is important. Below are two strategies for representing groups of Facts.
A collection of Facts is called a "Store". This structure is simply a content-addressed blockstore indexing Facts by their CID.
type Store = {CID : Fact}
Stores contain too much information for transmission and encrypted storage. In these cases, a flat set called a Collection MAY be used:
type Collection = [Fact]
IPLD cleanly canonicalizes data, though differently per codec. However, the same data MAY have multiple CIDs due to differences in encoding, hash algorithm, and so on. Strictly speaking, this in no way poses a problem for PomoDB: the same fact entered into the store twice is trivial for operations that only depend on the graph structure of the store.
Certain aggregate functions (e.g. counts, sums, averages) and stateful queries (e.g. graph colorings) depend on a node being present no more than once per graph. Deduplication is thus imperative for many use cases. It is RECOMMENDED that all facts added to a store have a canonical CID. This MAY be of any CID configuration. To reduce the amount of recomputation, using the following parameters are RECOMMENDED:
Parameter | Recommended Setting |
---|---|
CID Version | CIDv1 |
Multicodec | DAG-CBOR |
Multihash | SHA2-256 |
Note that the multibase of a CID is defined by the codec and CID version.