schemer

Lightweight and robust data encoding library for Go

Schemer provides an API to construct schemata that describe data structures; a schema is then used to encode and decode values into sequences of bytes to be sent over the network or written to a file.

Schemer seeks to be an alternative to protobuf or Avro, but it can also be used as a substitute for JSON.

Features

Compact binary data format
High-speed encoding and decoding
Forward and backward compatibility
No code generation and no new language to learn
Simple and lightweight library with no external dependencies
Supports custom encoding for user-defined data types
JavaScript library for web browser interoperability (coming soon!)

Why?

Schemer is an attempt to further simplify data encoding. Unlike other encoding libraries that use interface description languages (i.e. protobuf), schemer allows developers to construct schemata programmatically with an API. Rather than generating code from a schema, a schema can be constructed from code. In Go, schemata can be generated from Go types using the reflection library. This adds a surprising amount of flexibility and extensibility to the encoding library.

Here's how schemer stacks up against other encoding formats:

Property	JSON	XML	MessagePack	Protobuf	Thrift	Avro	Gob	Schemer
Human-Readable	✔️	😐	❌	❌	❌	❌	❌	❌
Support for Many Programming Languages	✔️	✔️	✔️	✔️	✔️	✔️	❌	✔️
Widely Adopted	✔️	✔️	❌	✔️	❌	❌	❌	❌
Precise Encoding of Numbers	😐	❌	✔️	✔️	✔️	✔️	✔️	✔️
Binary Strings	❌	❌	✔️	✔️	✔️	✔️	✔️	✔️
Compact Encoded Payload	❌❌	❌❌	❌	✔️	✔️	✔️	✔️	✔️
Fast Encoding / Decoding	❌	❌	✔️	✔️	❔	😐	😐	❔
Backward Compatibility	✔️	✔️	✔️	😐	😐	✔️	😐	✔️
Forward Compatibility	✔️	✔️	✔️	😐	😐	✔️	😐	✔️
No Language To Learn	✔️	✔️	✔️	❌	❌	😐	✔️	✔️
Schema Support	😐	😐	❓	✔️	✔️	✔️	❌	✔️
Supports Fixed-field Objects	❌	❌	❌	✔️	✔️	✔️	✔️	✔️
Works on Web Browser	✔️	✔️	✔️	✔️	😐	✔️	❌	📆 soon…

The table above is intended to guide the reader toward an encoding format based on their requirements, but the evaluations of these encoding formats are, of course, rather subjective. Please feel free to open an issue if you feel something should be adjusted and/or corrected.

Types

schemer uses type information provided by the schema to encode values. The following are all of the types that are supported:

Integer
- Can be signed or unsigned
- Fixed-size or variable-size ¹
  - Fixed-size integers can be 8, 16, 32, or 64 bits
Floating-point number (32 or 64-bit)
Complex number (64 or 128-bit)
Boolean
Enumeration
String
- Can support any encoding, including UTF-8 and binary
- Fixed-size or variable-size ²
Array
- Fixed-size or variable-size
Object w/fixed fields (i.e. struct)
Object w/variable fields (i.e. map)
Schema (i.e. a schemer schema)
Dynamically-typed value (i.e. variant)
User-defined types
- A few common types are provided for representing timestamps, time durations, IP addresses, UUIDs, regular expressions, etc.

Schema JSON Specification

Types

Type	JSON Type Name	Additional Options
Fixed-size Integer	int	* `signed` - boolean indicating if integer is signed or unsigned * `bits` - one of the following numbers indicating the size of the integer: 8, 16, 32, 64, 128, 256, 512, 1024 Note: integers larger than 64 bits are not fully supported
Variable-size Integer	int	* `signed` - boolean indicating if integer is signed or unsigned * `bits` - must be `null` or omitted
Floating-point Number	float	* `bits` - one of the following numbers indicating the size of the floating-point: 32, 64
Complex Number	complex	* `bits` - one of the following numbers indicating the size of the complex number: 64, 128
Boolean	bool
Enum	enum	* `values` - an object mapping strings to integer values
Fixed-Length String	string	* `length` - the length of the string in bytes
Variable-Length String	string	* `length` - must be `null` or omitted
Fixed-Length Array	array	* `length` - the length of the string in bytes
Variable-Length Array	array	* `length` - must be `null` or omitted
Object w/fixed fields	object	* `fields` - an array of fields. Each field is an type object with keys: `name`³, `type`, and any additional options for the `type`
Object w/variable fields	object	* `fields` - must be `null` or omitted
Variant	variant

Example

Here's a struct with three fields:

firstName (string)
lastName (string)
age (uint8 - unsigned integer requiring a single byte)

{
  "type": "object",
  "fields": [
    {
      "name": "firstName",
      "type": "string"
    }, {
      "name": "lastName",
      "type": "string"
    }, {
      "name": "age",
      "type": "int",
      "signed": false,
      "bits": 8
    }
  ]
}

Type Compatibility

When decoding values from one type to another, schemer employs the following compatibility rules. These rules, while rather opinionated, provide safe defaults when decoding values. Users who want to carefully craft how values are decoded from one type to another can simply create a custom type.

As a general rule, types are only compatible with themselves (i.e. boolean values can only be decoded to boolean values). The table below outlines a few notable exceptions and describes how using "weak" decoding mode can increase type compatibility by sacrificing type safety and by making a few assumptions.

	Destination
Source	int	float	complex	bool	enum	string	array (see #12)	object
int	✔️ #1	✔️ #1	✔️ #1	❕ #6	❕ #7	❕ #9	❌	❌
float	✔️ #1	✔️ #1	✔️ #1	❌	❌	❕ #9	❌	❌
complex	✔️ #1	✔️ #1	✔️ #1	❌	❌	❕ #9	❕ #11	❌
bool	❕ #6	❌	❌	✔️	❌	❕ #10	❌	❌
enum	❕ #7	❌	❌	❌	✔️ #2	✔️ #2	❌	❌
string	❕ #8	❕ #8	❕ #8	❕ #10	✔️ #2	✔️	❌	❌
array (see #12)	❌	❌	❕ #11	❌	❌	❌	✔️ #3	❌
object	❌	❌	❌	❌	❌	❌	❌	✔️ #4

Legend:
✔️ - indicates compatibility according to the specified rule
❕- indicates compatibility according to the specified rule only if weak decoding is used
❌ - indicates that the source type cannot be decoded to the destination (excepting rule #12)

Compatibility Rules:

Any number can be decoded to any other number, provided the decoded value can be stored into the destination without losing any precision. If weak decoding is specified, we loosen this restriction slightly by allowing floating-point and complex number conversions to lose precision.

For example, if the number 3.14 is decoded, it can be stored as a float or complex number, but it cannot be stored as an integer. Similarly, the number 500 can be stored into a uint16 but not a uint8, since uint8 can only store values between 0 and 255.
Enumerations are decoded to other enumerations by performing a case-insensitive match on the named value, not a match on the numeric value. If multiple matches occur, a case-sensitive match is then performed. Decoding fails if the decoded named value does not match a named value in the destination enumeration. Enumerations can also be converted to strings and vice-versa by matching on the enumeration's named value.
Arrays can be decoded to arrays if the element type and array length is compatible. Specifically, when the destination array is of fixed-size and does not support null values, the decoded array must match exactly in length.
Objects are decoded to other objects by performing a case-insensitive match on the key or field name. If multiple matches occur, a case-sensitive match is then performed. When the destination is an object with fixed fields and the decoded value does not have a matching key or field name, the key / field is simply skipped and will remain unchanged.
Null values can only be decoded to destinations that support null values (i.e. pointers), but a non-null value can be decoded even if the destination does not support null values.

The following compatibility rules apply for weak decoding only:

The boolean value true can be converted to the integer value 1, and the boolean value false can be converted to the integer value 0. Similarly, the integer 0 will be decoded as false, and all other integers are decoded as true.
Enumerations can be converted to integer values and vice-versa, and they are matched on the enumeration's numeric value.
Strings can be decoded to numeric values by considering the string format according to the table below. The resulting numeric value is compatible with the destination according to the relevant compatibility rules.
Numbers are always encoded to strings in base 10.
Boolean values true and false are converted to string values "true" and "false" respectively. Strings "1", "t", "T", "TRUE", "true", and "True" can be converted to the boolean value true. Strings "0", "f", "F", "FALSE", "false", and "False" can be converted to boolean value false.
Complex numbers may be converted into 2-element arrays of floating-point numbers and vice-versa. The real part of the complex number will be matched with array element 0, and the complex part will be matched with array element 1.
Single-element arrays can be decoded to a destination that is compatible with the array element and vice-versa.

String to number decoding:

String Example	Decoded As	Regular Expression
`"-3.14"`	Number, base 10	`^[-+]?(0
`"0b1101"`	Integer, base 2	`^[-+]?0[bB][01]+$`
`"0775"`	Integer, base 8	`^[-+]?0[oO]?[0-7]+$`
`"0x2020"`	Number, base 16	`^[-+]?0[xX][0-9A-Fa-f]+(\.[0-9A-Fa-f]*)?([pP][+-]?[0-9A-Fa-f]+)?$`
`"2.34 + 2i"`	Complex number, base 10	You don't want to see it, but here's the link.

Credits

This library was created on April 14, 2021, the day of Bernie Madoff's death. What a schemer! May he rest in peace.

Special thanks to Benjamin Pritchard for his significant contributions to this library and for making it a reality.

By default, integer types are encoded as variable integers, as this format will most likely generate the smallest encoded values. ↩
By default, string types are encoded as variable-size strings. Fixed-size strings are padded with trailing null bytes / zeros. ↩
It is strongly encouraged to use camelCase for object field names. ↩

Name		Name	Last commit message	Last commit date
Latest commit History 80 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
bool.go		bool.go
bool_test.go		bool_test.go
complex.go		complex.go
complex_test.go		complex_test.go
date.go		date.go
date_test.go		date_test.go
encoding-format.md		encoding-format.md
enum.go		enum.go
enum_test.go		enum_test.go
fixedarray.go		fixedarray.go
fixedarray_test.go		fixedarray_test.go
fixedint.go		fixedint.go
fixedint_test.go		fixedint_test.go
fixedobject.go		fixedobject.go
fixedobject_test.go		fixedobject_test.go
fixedstring.go		fixedstring.go
fixedstring_test.go		fixedstring_test.go
float.go		float.go
float_test.go		float_test.go
go.mod		go.mod
ipv4.go		ipv4.go
ipv4_test.go		ipv4_test.go
options.go		options.go
schemer.go		schemer.go
schemer_test.go		schemer_test.go
structtag.go		structtag.go
typebyte.go		typebyte.go
util.go		util.go
vararray.go		vararray.go
vararray_test.go		vararray_test.go
varint.go		varint.go
varint_test.go		varint_test.go
varobject.go		varobject.go
varobject_test.go		varobject_test.go
varstring.go		varstring.go
varstring_test.go		varstring_test.go

License

bminer/schemer

Folders and files

Latest commit

History

Repository files navigation

schemer

Features

Why?

Types

Schema JSON Specification

Types

Example

Type Compatibility

Compatibility Rules:

String to number decoding:

Credits

Footnotes

About

Resources

License

Stars

Watchers

Forks

Languages