Constructive is an experiment in indexed data storage in Golang, specifically focused on storing and querying structs.
- I often have large numbers of structs in a collection that I need to search in various ways.
- I often need to efficiently apply changes to the collection without affecting current readers.
- I often need to convert from one type of struct to another with very similar or identical data.
- I am convinced the datum rdf model is widely applicable and underused.
- I only care about local storage at this time. Durable data storage and interchange are goals, but not immediate.
- I care most about correct behavior, then API usability and stability, then performance, then memory efficiency.
- I do not care about being able to go back in history at this time. The data model readily supports it, but it would require a more sophisticated index to be practical.
This library is an implementation of many of the ideas and features of the Datomic databases, albeit with no durability beyond the process runtime and no transaction history. In this respect, it is significantly also inspired by the Datascript library.
Constructive uses slightly different system idents than either, preferring a path hierachy.
The datum model is an extension of the RDF model. There are four components to a datum:
An entity is a thing with an identity. Its characteristics may change over time, but its identity persists. I am an entity.
An attribute is a property of an entity. Attributes have global identities and are themselves entities. "Person's given name" is an attribute.
A value is an observation, a claim, a fact, a reading. "Donald" is a value which happens to be my current given name.
A transaction is an entity that asserts a set of datums are true at a point in time. When I register an account, my name, credentials, etc. are collectively recorded in a transaction. The transaction necessarily records the time but may also include other audit details like the IP address of the host from which I created the account.
I contend this data model constitutes the simplest possible fundamental data model for general use. Simplifying by dropping the transaction component results in a system which has no general treatment of data attribution and is a huge problem for the industry.
This is not a great data model for representing observations about things that lack durable identities or where time and attribution are not important features. It is particularly valuable when working with data combined from diverse sources, where being able to track the provenance of data consistently is important. Joining datasets either by sharing attributes or asserting a specific relationship between attributes can be readily expressed and often straightforwardly implemented.
Attributes are entities that have at least two system attributes, ident and type, and are governed by others. The system attribute values, once asserted, may neither be asserted anew with new values nor retracted.
An system ident uniquely identifies a datum by name, e.g. person/name
or, indeed, sys/db/ident
. Attributes are almost always referred to by their idents. Not all idents are attributes, only those with types. Idents are also the idiomatic way to represent enumerations, and have many uses beyond. The system reserves the sys
root, rejecting claims for such idents or about the entities to which they may refer. Users may use the remainder of the space as they see fit, though they're
recommended to use paths for consistency.
This identifies the type of value to which the attribute refers, one of:
sys/attr/type/string
sys/attr/type/inst
a moment in timesys/attr/type/int
sys/attr/type/float
sys/attr/type/ref
a reference to an entitysys/attr/type/bool
Nil is not a valid value for any type. The absence of a value is represented by the absence of the datum. An affirmation of a value's absence should be represented by another attribute if the zero value is valid in the use domain.
This specifies the number of values to which the attribute may refer. The valid cardinality values are:
sys/attr/cardinality/one
sys/attr/cardinality/many
Cardinality one, a scalar, is assumed in the absence of a cardinality attribute. Cardinality many uses set semantics.
This specifies that the attribute's value is unique in the database, only one entity may assert it. It has two values:
sys/attr/unique/identity
sys/attr/unique/value
Both enforce the uniqueness constraint. The only difference is that when asserting claims, if a tempid is used in a claim for this attribute, and an entity already asserts the claimed value, the tempid will resolve to the extant entity for identity uniqueness. By contrast, a value uniqueness attribute will cause the claim to be rejected.
THESE ARE LIES this is aspirational, an experiment in documentation-driven development.
This qualifies the type of reference. Its only value is:
sys/attr/ref/type/component
This specifies that the reference is to a component. A component entity is one whose existence is governed by a parent. A component ref attribute changes the behavior of the system when applying transactions in the following ways:
If a claim to a tempid that resolves via identity uniqueness is about a scalar component ref, and the claimed value is a tempid, and the existing entity has a value for the component ref, the value's tempid will resolve to the existing component.
If a claim retracts a datum for a component ref, all datums about the referent entity are retracted.
Component refs may not form cycles. A claim for a component ref that would complete a cycle is rejected.
If a similar claim is about a set component ref, this attribute governs the transaction behavior.
If this is not present, all datums about the existing components are retracted before considering the claims. This results in unnecessary writes if used liberally. It is recommended that the component key be given, identifying the attribute whose value on the components is unique. If the key is present, all existing components that lack a claim to their identity are fully retracted.
The primary use interface for constructive is intended to be structs, the dominant data structure in Golang.
Structs may represent claims, entities, or queries. They declare their role in the constructive database with field tags, for example:
type Person struct {
ID ID `"attr:sys/db/id"`
Name string `"attr:person/name,identity"`
Age int `"attr:person/age"`
}
Such structs can be given to the database to record as datums. The database first asserts the schema required by the struct tags, then the values in the fields.
All such datums are assigned tempids in the claims. If an identity attribute field value exists, it will resolve
to any existing referent entity. Similarly, a sys/db/id
pseudo-attribute field is taken to contain the entity id.
If any such references exist, they must all resolve to the same referent or the claims are rejected.
Such structs can be populated by the database in two ways.
Individual entities can be fetched by passing a reference to a struct with identity values as above. If such an entity exists, the attribute fields are populated from their values in the database.
---- THESE ARE LIES
Queries may be expressed on structs similarly;
type PersonQuery struct {
Names []string `"attr:person/name"`
Type *Person `"attr:sys/struct/type"`
Queries []PersonQuery `"attr:sys/struct/query"`
}
Fields that refer to user attributes constrain the values allowed in the results. The
sys/struct/type
attribute must contain a reference to the entity struct type to
instantiate and populate with the matching entity's values. The sys/struct/query
may
be used on a field that contains a slice of query structs, often but not necessarily
of the root type. Such queries are combined between ane above by unions.