A language for defining subsets of the valuable values, suitable for validation, documentation, test data generation, etc. Slightly opinionated, vdl is intended to define sensible subsets, not to be able to represent every oddly defined API on the planet.
Status: deliberately not a full spec yet, but hopefully an interesting read already
Some comparable/related efforts (for other data formats):
VDL allows you to specify certain subsets of the valuable values. Consider, for example, an API that expect an object that maps the key "name"
to a utf-8 string, and the key "age"
to an integer. With VDL, you can define the set of all such values in a compact, unambiguous, and machine-friendly manner.
In the same spirit as the valuable value specification, we start by developing an abstract model of the VDL before giving a specific (and ultimately replaceable) syntax. Throughout this document, a value refers to a valuable value. A type is a set of values.
VDL is concerned with type descriptions, objects that can be mapped to types. An implementation of VDL can take a value and check whether it is of the type that is described by some type description.
VDL does not support naming of type descriptions, hence they are neither recursive nor generic type descriptions. Those are part of VDLM, the valuable description language with modules. VDLM does not exist outside the head of the author yet, but inside that head, it is already fully developed. Feel free to reach out if you are very curious about this. For now, I am focusing on quickly sketching a larger valuable value ecosystem by omitting recursion everywhere (in particular, VVM, the valuable values with modules does not yet exist either outside my head).
Another aspect of useful type systems (in particular when thinking about query languages and/or mutation languages for valuable values) that the specification deliberately omits for simplicity sake are read-only and write-only access to component types. Again, I have plans for those, but keep them out of this very basic specification.
Type descriptions are defined inductively.
- Every value is a type description. The corresponding type is the set singleton set containing exactly that value.
- A range is a type description, based on the subvalue relation. A range consists of one or two values
s
andt
, and is either exclusive, inclusive, or open:- An exclusive range contains all values
v
such thats <= v < t
. We writes..=t
. - An inclusive range contains all values
v
such thats <= v <= t
. We writes..t
. - An open range contains all values
v
such thats <= v
. We writes...
.
- An exclusive range contains all values
F32
is a type description. The corresponding type is the set of all floats that can be exactly represented as IEEE 754 single precision floats (32 bit).Utf8
is a type description. The corresponding type is the set of arrays that contain only integers between zero (inclusive) and 256 (exclusive) that form valid utf-8.
These are technically the only base cases we need, but VDL specifically defines some further type descriptions. All of these could also be represented through some other type descriptions, but it is convenient to have shorthands for these, and implementations would special-case them anyways:
Empty
: maps to the empty type.Value
: maps to the type of all values.Bool
: maps to the type{false, true}
.U8
,U16
,U32
,U63
: map to the set of integers from0
(inclusively) to2^8
,2^16
,2^32
, and2^63
(exclusively) respectively.I8
,I16
,I32
,I64
: map to the set of integers representable in Two's complement with 8, 16, 32, 64 bits respectively.F64
: maps to the set of floats.
The inductive cases allow combining multiple type descriptors into a new type.
- Unions: If
td_0
is a type description that maps to the typeT_0
, andtd_1
is a type description that maps to the typeT_1
, then the union description oftd_0
andtd_1
, written astd_0 || td_1
, maps to the union ofT_0
andT_1
. - Intersections: If
td_0
is a type description that maps to the typeT_0
, andtd_1
is a type description that maps to the typeT_1
, then the intersection description oftd_0
andtd_1
, written astd_0 && td_1
, maps to the intersection ofT_0
andT_1
. - Homogeneous Arrays: If
td
is a type description that maps to the typeT
, then the array description oftd
, written as[td]
, maps to the set of arrays whose items all have typeT
. - Homogeneous Maps: If
k_d
andv_d
are type descriptions that map to the typesK
andV
respectively, then the map description ofk_d
andv_d
, written ask_d=>v_d
, maps to the set of maps that map keys of typeK
to values of typeV
. - Records: If
k_0, k_1, ..., k_k
are values, andtd_0, td_1, ..., td_k
are type descriptions that map to the typesT_0, T_1, ..., T_k
, then their record description, written as{k_0: td_0, k_1: td_1, ..., k_k: td_k}
, maps to the type of maps that contain an entryk_i: v_i
for eachi
from zero tok
(inclusive) such thatv_i
is of typeT_i
. Note that the map may also contain any number of additional entries. - Products: If
td_0, td_1, ..., td_k
are type descriptions that map to the typesT_0, T_1, ..., T_k
, then their product description, written as(td_0, td_1, ... td_k)
, maps to the type of arrays[v_0, v_1, ..., v_k]
such thatv_0
is of typeT_0
,v_1
is of typeT_1
, and so on.- We write
(td; n)
for the product ofn
times the type descriptiontd
.
- We write
- Sums: If
tag_0, tag_1, ..., tag_k
are values, andtd_0, td_1, ..., td_k
are type descriptions that map to the typesT_0, T_1, ..., T_k
, then their sum description, written as(tag_0: td_0) + (tag_1: td_1) + ... + (tag_k: td_k)
, maps to the type of maps of the form{tag_i: v_i}
(exactly one entry) for somei
from zero tok
(inclusive) such thatv_i
is of typeT_i
.
Yes, there will be a syntax eventually. Multiple, in fact: human-readable, compact, and canonic ones. Possibly a way of encoding VDL type descriptions as valuable values. The compact and canonic encodings could then be defined as the typed encoding (see below) obtained from the VDL meta type description for the type of all VV-encoded type descriptions.
If encoder and decoder are both aware of the type of values they need to handle, they can use more efficient encodings than what untyped VV can provide. These encodings can be derived mechanically from a type description. That will likely be two different typed encodings: one that minimizes size (suitable for network transfer) and one more suitable for in-memory representations (something that a programming language with algebraic datatypes would use internally).
Briefly sketching the contents of future specifications that generalize VDL:
- VDLM (valuable description language with modules) generics are parametric polymorphism where all type variables must have the kind of simple types.
- VDLM has modules for organizing named type descriptions, and can optionally make use of a package manager.
- VDLM types can be recursive, and statically tracks whether recursive types represent a hierarchy of containment, acyclic graphs, or arbitrary graphs (essentially corresponding to rust's boxes, reference counted pointers, and garbage collected pointers).
- The extension with access capabilities (read-access and/or write-access) annotates the type definitions with those capabilities. An even more general language that is aware of a hierarchy of roles can define types and annotate them with access for those different roles; such a definition can be compiled into capability-annotated definitions for a particular roles.
Other stuff:
- A query language similar to GraphQL, which is simple with non-recursive types. The query language for VDLM can allow fully recursive queries when the recursion only happens in subvalues that are part of a containment hierarchy or an acyclic hierarchy. GraphQL cannot support recursive queries at all.
- A mutation language based on identical principles.
- A language combining query language and mutation language. Might look into time travel à la Capt'n Proto.
- Orthogonal to recursive extensions, computational extensions that perform computations that are guaranteed to terminate, based on the functions of the vvvm. Combine this with the "time travel" for the combined query and mutation language for a pretty flexible yet efficiently implementable system.