Skip to content

orbitaldecay/zoa

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

60 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

zoa: serialized data made easy

This document is written using cxt in README.cxt
zoa is a set of standards related to serialized structured data and it's textual representation and processing. It is part of the Civboot project, specifically fngi .

zoa is named after "protozoa", which is itself a nod to protobuf. Yes, I'm aware protozoa are not fungi, but the name was too good.
zoa encompases the following technologies:
  • zty .zty: specification for structured types similar to protobuf
  • zoab .zoa: a binary structured data composed of only data and array.
  • zoat .zoa: an extensible text format for representing zoab in a human readable way. Zoat intentionally has little functionality on it's own but allows extensibility and a standardized syntax for the different tools built on top of it.
  • zoac .zc: a configuration lanugage built on zoat and and another langauge (i.e. fngi), allowing execution of arbitrary (but stateless) user-defined functions.
  • zoash .zh: a text-based shell built on zoac allowing creating of processes, mutation of files or anything else a computer user might want to do. zoac functions are still callable to process data, but other functions are available as well perform side effects. File extension: .zh
Yes, zoab and zoat have the same file extension. This is intentional and you will see how below!
Both zoab and zoat are extremely simple.

zoab is about the simplest possible general purpose structured data that can exist. Having only two types allows it to be compact and fast. The use of single byte signals alows it to be trivially deserialized by even embedded system.

zoat is simpler than yaml but can achieve the same functionaly but with far less ambiguitity. The only thing it is missing is integer/float/literal-map/none types, which are not actually necessary for serializing or deserializing arbitrary structured data. Zoat also has massively increased functionality over yaml with features like variable storage and concatenation.

zoat

zoat (like zoab) has only two data types: data and array. Data start at the first non-whitespace character (except for | or { , which we will get to) and continue until a pipe. For example:

This is a zoat string. It ends with a pipe|
An array is specified using curly braces and can be nested, like this:
{ first string in array|
second string in array|
{ nested array element 1|
nested array element 2|
}
}
Leading whitespace is always ignored until the first character, after which newlines are treated as spaces and special characters use c-like escapes. Here are the valid escapes in text:
  \n          newline
\t tab
\| a literal '|' character
\{ a literal '{' character
\} a literal '}' character
\s
\ a literal space character
\xHH hex byte of value 0xHH
\<newline> line continuation. When \ is used at the end of a line no
space will be insert when going to the next line.

An example,
there is one space after the comma above. \
\ However there are three spaces after the period since
an escape was used both before and after the newline.|
The following are how to use commands. Commands must be specified immediately following either a pipe or curly brace:

  |X          Where X is a non-whitespace character. Runs a command, some of
which are defined below. Note that multiple '|' together are
ignored, i.e. '|| | | ||' is the same as '|'.

{ Starts an array. Characters following are always treated as text
(never a command), but whitespace is still ignored. Example:
{array item 1| array item 2}

} Ends an array. Characters following are always treated as text
(never a command), but whitespace is still ignored. Example above.

Note: {{}} will create an empty array inside a array. However,
adding '|' does nothing. i.e. '|{| {||} |}|' is exactly the same.
This allows for trailing pipes. If you've ever used JSON you know
the pain of not being able to use trailing commas, this avoids
that pain.

| <data> A set of data until the next '|', '{' or '}'. There _must_ be
whitespace (newline, tab or space) after | otherwise the character
will be treated as a command. Whitespace will be ignored until the
first non-whitespace character.

|\ Starts text immediately using the escape. Example:
|\nThis text started on a new line.|

|* Line comment. Text is ignored until a newline.

{* Block comment {*nesting allowed*}, it must be closed with: *}
After a closing block comment a new command can start immediately.
Example:
{* comment *} |h1 header

|''' Plain-text/code block. Any number of ' can be used, and must be
escaped with the same number of ', which ends the item (but will
not execute a command. As a general rule only '|' executes
commands). The raw text will ignore escapes (i.e. \n) and preserve
newlines, _except_ for the first newline if it immediately follows
the opening ' block or the last newline if it immediately preceeds
the ' block. Also, the text will be de-indented by the indent at
the start of the command. Example:
|'''
This is some raw text.
It will have no indent.
I can use { | } \n etc and they are interpreted literally.
I can even end with a space. '''
This is a new item, as if the previous had ended with pipe|

|+ Concatenate (join) this item to the last item of the same type.
Note that text starts immediately.
Example:
all one |+single item. ==> all one single item.|
{1| 2} |+ {3| 4} ==> {1| 2| 3| 4}

|. Begins the variable compiler, similar to fngi's. When assigning
variables, values are not added until referenced. Examples:

|.foo = fngi | |* set foo equal to "fngi "
{ I think |.foo|is great } |* { I think |fngi |is great }

An array can be accessed by index using @<index>

|.myArray @0 @3| |* C equivalent: myArray[0][3]

If the data is an array of two-length arrays with strings as the
first item, you can access by key with '.'. Whitespace or special
characters in the keys are not supported.

|.myDict.foo.bar|

|.+ Joins the variable to the last item. Example:
{ I think |.+foo|+is great } |* { I think fngi is great }

|[0-9] An integer (big-endian). 0x (hex), 0b (binary) and 0c (char) are
supported. (see Type Specification Dyn)

|-[0-9] A negative integer (big-endian, see Type Specification Dyn)

zoab: structured array and byte data

The binary representation of zoa is called zoab. Like zoat it is composed of only two types: data and array. Both can only have sections of length 63, but a join bit may be used to make longer values.

The start of a zoab item has a single byte which specifies it's type, join status and length. it looks like:

  Bitmap      Description
JTLL LLLL : J=join bit T=type bit L=length bits (0-63)
  • J can be 1, which will cause it to be "joined" to the next item (which must be the same type). This allows the true length to be longer than 63 (any length is possible with multiple joins).
  • T can be 0 for "data" or 1 for "array".
  • L contains a length (0-63) of the data or array.

Binary Runtime Type Selection

There is one byte that is invalid for both zoab and utf8 (and ascii): J=1 T=0 L=0 , which in binary is 1000 0000 or in hex is 0x80.

Instead of making this byte illegal, we will _require_ it in the zoa(...) function, which is intended to deserialize either zoab or zoat from user inputs, data stored in file systems or data transferred over the wire. If 0x80 is the first character then it is assumed this is zoab data. If 0x80 is not the first byte then it should be assumed that the file/etc is human written zoat.

0x80 will still be illegal for pure-zoab related functions, which expect the data to already be deserialzed.
0x80 will signal that the next byte "steam type". The supported ones are:
  1. pure unstructured data with no special meaning.
  2. _mostly_ human readable data. i.e. struct fields are names, integers are in text format, etc. This is typically what should be passed to dbg logs, human readable compressed configs, etc.
  3. protozoa data, which is similar to protobuf. Fields are represented by user-assigned integers, data is in binary format, etc. The deserializer obviously needs access to a backwards-compatible reference (i.e. a struct definition) to know how to deserialize this data.
  4. log event, used for zoa's own core logging.
We also reserve J=1 T=1 L=0 to mean that the next data value is actually a pointer to more data. This allows fngi (or other languages) to use a maximum block size of 4kB while still permitting arbitrary length zoab data.

Type Specification

WARNING: This section is currently being worked on so is not yet polished.
Zoa also has a default type protocol and text specification for generating serializers and deserializers for zoab data. These are typically contained in .zt files.

Zoa types are specified in a language similar to protobufs, with a few key differences. For an overview:

Native types include:

  • Data, Int, Num, Time (integers: seconds, microseconds), Path (array of components)
  • Arr(generic)
  • IntMap(Int key, generic value), DataMap(Data key, generic value)
  • Dyn: dynamic (runtime) type
Users can create their own struct types with the format below. Struct fields can be positional or have an id. Default values can be provided, otherwise all arguments are required.

\ Comments use '\' character
struct Foo [
a: Int; \ positional required argument.
b: Int = 0; \ positional argument with default.
c: Data = ||; \ data argument with default.
d:0 Int = 0; \ indexed argument with default.
d:1 Arr[Int] = 0;
]
The struct protocol is:
  • The first items contains the number of positional args P.
  • The next P items contains the first P specified positional arguments. Any unspecified positional args must have a default value.
  • Indexed args can be specified in any order.
  • Any unspecified args must have a default.
Similarily, enumerated values can be defined. Enums must always specify their id. All ids from [0 - max specified] must be declared (in any order).

struct Cheese [ isCheddar: Int ]
struct Cake [ isChocolate: Int ]
struct Pizza [
cheese: Cheese = {
isCheddar|1|
};
hasPeperoni: Int = 0;
]

enum Food [
none:0; \ note: empty type, no data
someNuts:3 Int; \ note: declared out of order (OK)
pizza:1 Pizza;
cake:2 Cake;
cheeses:3 Arr[Cheese];
]
The enum protocol is:
  • The first item contains the enum variant, i.e. noaccount, user, admin
  • If the variant is not an Arr or *Map, the next item is the value.
  • else, the next N values are the data for the array.

Contributing

To build the README.md and run the tests, simply run make .

When opening a PR to submit code to this repository you must include the following disclaimer in your first commit message:

I <author> assent to license this and all future contributions to this project
under the dual licenses of the UNLICENSE or MIT license listed in the
`UNLICENSE` and `README.md` files of this repository.

LICENSING

This work is part of the Civboot project and therefore primarily exists for educational purposes. Attribution to the authors and project is appreciated but not necessary.

Therefore this body of work is licensed using the UNLICENSE unless otherwise specified at the beginning of the source file.

If for any reason the UNLICENSE is not valid in your jurisdiction or project, this work can be singly or dual licensed at your discression with the MIT license below.

Copyright 2022 Garrett Berg

Permission is hereby granted, free of charge, to any person obtaining a copy of
this software and associated documentation files (the "Software"), to deal in
the Software without restriction, including without limitation the rights to
use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies
of the Software, and to permit persons to whom the Software is furnished to do
so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.

About

serialized structured data and it's textual representation

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 87.4%
  • C 12.1%
  • Makefile 0.5%