Skip to content

TOGoS/TOGVM-Spec

Repository files navigation

TOGVM-Spec

Hello and welcome to TOGVM-Spec!

The project might be somewhat misnamed. TOGVM is not a virtual machine (though VMs could be, and to varying degrees have been built to evaluate the expressions it defines), but a specification for a functional language whose programs can be described by RDF.

Why RDF? Well, I like the idea of language-agnostic intermediate languages with fully qualified names for the building blocks of a program so that you can know exactly what one should do without having to ask “Which compiler is used? What options? What libraries are provided?”. It also provides a simple escape hatch for adding extensions. Invent whatever additional primitives you need for your own implementation. Just make sure they’re namespaced to avoid confusion.

This project also includes a meta-syntax for languages that compile to the RDF representation. This is useful because people don’t always want to write RDF+XML or JSON-LD directly.

This project consists of a JSON file describing ‘built-in’ functions, a JSON file describing expression classes, a TEF file indicating some other RDF types/attributes, documentation of some XML datatypes, an essay translating between different ways of talking about values, and some test vectors (mostly about parsing based on the meta-language).

This information could probably be organized a bit better, and maybe all be represented as RDF/XML itself instead of a mishmash of org, JSON, and TEF files, some of which is pretty concrete (expression types), and some of which is more theoretical and mostly here to guide how I think about values while crafting interpreters/compilers.

Long names of expression attributes

Are not always spelled out in EXPRESSION-CLASSES.json. They should be assumed to be “http://ns.nuke24.net/TOGVM/Expression/” + the short name unless explicitly indicated otherwise by ‘longName’.

To be backward-compatible with v1.1.x of the spec, the name of the speific expression type + “/” + the short name should be treated as an alias. e.g. http://ns.nuke24.net/TOGVM/Expression/Let/variableValueExpressions is an alias for http://ns.nuke24.net/TOGVM/Expression/variableValueExpressions.

History

This project uses semantic versioning. Non-breaking versions may add new expression / function types, or provide better definitions.

Major version number changes are breaking changes. Names may be changed (e.g. 0.x -> 1.x renamed the entire ‘Expression’ namespace prefix), or new fields added that an implementation must be aware of. I may cheat and call conceptually breaking changes bugfix versions (x.x.+1) if nobody is actually using the feature yet.

Obviously, for ‘single meaning’ to be true, even major version increases should not change the meaning of existing names!

Immutable, hash-identified functions

Closely-related to the concept of universal identifiers for intrinsic functioss is that of universal identifiers for non-intrinsic functions. This can be done easily by serializing the function and then referencing the hash.

This is the exact same idea the Unison language (introduction video) is based around (I’m not sure who came up with the idea first (I’ve been toying with the idea since about 2010 or possibly earlier), but they have a nicer website than I do).

Functions vs expressions

You might ask: why do some operations have dedicated expression types, while others only have a function?

The short answer is that, aside from wanting special expression types to represent primitive operations (like literals, variables, and control structures), the distinction is arbitrary.

But generally, expression classes are defined for things that are expected to be primitive concepts in the language, or to be reasoned about without evaluating them. Any computation that can be defined in terms of those is left to functions, which can be treated as ‘opaque’.

For example, you might want a dedicated expression type for representing rational or complex numbers so that the interpreter/compiler can reason about them (do ‘symbolic manipulation’), but still be able to treat function applications as opaque. ‘concatenation’ / ‘union’ / ‘intersection’ are other operations that may make sense to have dedicated expression classes, for the same reason.

In some systems (e.g. TDAR, an active: URI resolver written in Deno), I use TOGVM expressions to represent both computations and their resulting values. Having dedicated expression types that make it possible to represent those values without having to refer to functions makes it possible to define a ‘simplified expression’ type that is a union of the expression classes that the system knows how to handle, which hopefully excludes function application!

TOGVM-Spec should provide functional alternatives for any expression types that can be represented as functions.

And it may be a good idea to mark expression classes that exist only to simplify representation of certain function applications, with the implication that they can be a standard internal representation for systems that need them, but that ‘in the wild’ programs should use the functional form, instead.

The expected transformation of the program, then, is that an ‘external form’ TOGVM expression, which uses only the most primitive expression types, is ‘canonicalized’ to whatever ‘internal form’ is most useful to the compiler/interpreter/analyzer, probably replacing certain function calls with equivalent expressions, and potentially evaluated to a ‘value form’ where most computation has been done, but some trivial operations such as concatenation or dereferencing of URNs might be left for later.

e.g. if your expression represents a byte sequence to be written to a file or over the network, and the host system ‘knows about’ concatenation and references to a datastore, it can represent the concatenated value without having to actually load all the parts into memory. And since TOGVM-Spec exists, that abstract representation can be serialized in a standard way!

See also

  • OpenMath appears to be a standard with similar goals to TOGVM. Its data structures are not explicitly RDF-based, but they do mention RDF in the standard (e.g. their sin function is https://openmath.org/cd/transc1#sin), and it seems like expressions represented in OpenMath’s model could be mapped pretty easily to/from TOGVM expressions.

About

Specification and test cases for TOGVM

Resources

Stars

Watchers

Forks

Packages

No packages published