Skip to content

Latest commit

 

History

History
330 lines (208 loc) · 10.1 KB

sext.md

File metadata and controls

330 lines (208 loc) · 10.1 KB

Module sext

Sortable serialization library. Authors: Ulf Wiger (ulf@wiger.net).

Function Index

decode/1Decodes a binary generated using the function sext:encode/1.
decode_hex/1
decode_next/1Decode a binary stream, returning the next decoded term and the stream remainder.
decode_sb32/1Decodes a binary generated using the function encode_sb32/1.
encode/1Encodes any Erlang term into a binary.
encode/2Encodes an Erlang term using legacy bignum encoding.
encode_hex/1Encodes any Erlang term into a hex-encoded binary.
encode_sb32/1Encodes any Erlang term into an sb32-encoded binary.
from_hex/1Converts from a hex-encoded binary into a 'normal' binary.
from_sb32/1Converts from an sb32-encoded bitstring into a 'normal' bitstring.
partial_decode/1Decode a sext-encoded term or prefix embedded in a byte stream.
prefix/1Encodes a binary for prefix matching of similar encoded terms.
prefix_hex/1Generates a hex-encoded binary for prefix matching.
prefix_sb32/1Generates an sb32-encoded binary for prefix matching.
to_hex/1Converts a binary into a hex-encoded binary This is conventional hex encoding, with the proviso that only capital letters are used, e.g.
to_sb32/1Converts a bitstring into an sb-encoded bitstring.

Function Details

decode/1


decode(B::binary()) -> term()

Decodes a binary generated using the function sext:encode/1.

decode_hex/1

decode_hex(Data) -> any()

decode_next/1


decode_next(X1::Bin) -> {N, Rest}

Decode a binary stream, returning the next decoded term and the stream remainder

This function will raise an exception if the beginning of Bin is not a valid sext-encoded term.

decode_sb32/1

decode_sb32(Data) -> any()

Decodes a binary generated using the function encode_sb32/1.

encode/1


encode(T::term()) -> binary()

Encodes any Erlang term into a binary. The lexical sorting properties of the encoded binary match those of the original Erlang term. That is, encoded terms sort the same way as the original terms would.

encode/2


encode(T::term(), Legacy::boolean()) -> binary()

Encodes an Erlang term using legacy bignum encoding. On March 4 2013, Basho noticed that encoded bignums didn't always sort properly. This bug has been fixed, but the encoding of bignums necessarily changed in an incompatible way.

The new decode/1 version can read the old bignum format, but the old version obviously cannot read the new. Using encode(Term, true), the term will be encoded using the old format.

Use only as transition support. This function will be deprecated in time.

encode_hex/1


encode_hex(Term::any()) -> binary()

Encodes any Erlang term into a hex-encoded binary. This is similar to encode/1, but produces an octet string that can be used without escaping in file names (containing only the characters 0..9 and A..F). The sorting properties are preserved.

Note: The encoding used is regular hex-encoding, with the proviso that only capital letters are used (mixing upper- and lowercase characters would break the sorting property).

encode_sb32/1


encode_sb32(Term::any()) -> binary()

Encodes any Erlang term into an sb32-encoded binary. This is similar to encode/1, but produces an octet string that can be used without escaping in file names (containing only the characters 0..9, A..V and '-'). The sorting properties are preserved.

Note: The encoding used is inspired by the base32 encoding described in RFC3548, but uses a different alphabet in order to preserve the sort order.

from_hex/1


from_hex(Bin::binary()) -> binary()

Converts from a hex-encoded binary into a 'normal' binary

This function is the reverse of to_hex/1.

from_sb32/1


from_sb32(Bits::bitstring()) -> bitstring()

Converts from an sb32-encoded bitstring into a 'normal' bitstring

This function is the reverse of to_sb32/1.

partial_decode/1


partial_decode(Other::Bytes) -> {full | partial, DecodedTerm, Rest}

Decode a sext-encoded term or prefix embedded in a byte stream.

Example:

  1> T = sext:encode({a,b,c}).
  <<16,0,0,0,3,12,176,128,8,12,177,0,8,12,177,128,8>>
  2> sext:partial_decode(<<T/binary, "tail">>).
  {full,{a,b,c},<<"tail">>}
  3> P = sext:prefix({a,b,'_'}).
  <<16,0,0,0,3,12,176,128,8,12,177,0,8>>
  4> sext:partial_decode(<<P/binary, "tail">>).
  {partial,{a,b,'_'},<<"tail">>}

Note that a decoded prefix may not be exactly like the encoded prefix. For example, ['_'] will be encoded as <<17>>, i.e. only the 'list' opcode. The decoded prefix will be '_', since the encoded prefix would also match the empty list. The decoded prefix will always be a prefix to anything to which the original prefix is a prefix.

For tuples, {1,'_',3} encoded and decoded, will result in {1,'_','_'}, i.e. the tuple size is kept, but the elements after the first wildcard are replaced with wildcards.

prefix/1


prefix(X::term()) -> binary()

Encodes a binary for prefix matching of similar encoded terms. Lists and tuples can be prefixed by using the '_' marker, similarly to Erlang match specifications. For example:

  • prefix({1,2,'_','_'}) will result in a binary that is the same as the first part of any encoded 4-tuple with the first two elements being 1 and 2. The prefix algorithm will search for the first '_', and treat all following elements as if they were '_'.

  • prefix([1,2|'_']) will result in a binary that is the same as the first part of any encoded list where the first two elements are 1 and 2. prefix([1,2,'_']) will give the same result, as the prefix pattern is the same for all lists starting with [1,2|...].

  • prefix(Binary) will result in a binary that is the same as the encoded version of Binary, except that, instead of padding and terminating, the encoded binary is truncated to the longest byte-aligned binary. The same is done for bitstrings.

  • prefix({1,[1,2|'_'],'_'}) will prefix-encode the second element, and let it end the resulting binary. This prefix will match any 3-tuple where the first element is 1 and the second element is a list where the first two elements are 1 and 2.

  • prefix([1,[1|'_']|'_']) will result in a prefix that matches all lists where the first element is 1 and the second element is a list where the first element is 1.

  • For all other data types, the prefix is the same as the encoded term.

prefix_hex/1


prefix_hex(X::term()) -> binary()

Generates a hex-encoded binary for prefix matching. This is similar to prefix/1, but generates a prefix for binaries encoded with encode_hex/1, rather than encode/1.

prefix_sb32/1


prefix_sb32(X::term()) -> binary()

Generates an sb32-encoded binary for prefix matching. This is similar to prefix/1, but generates a prefix for binaries encoded with encode_sb32/1, rather than encode/1.

to_hex/1


to_hex(Bin::binary()) -> binary()

Converts a binary into a hex-encoded binary This is conventional hex encoding, with the proviso that only capital letters are used, e.g. 0..9A..F.

to_sb32/1


to_sb32(Bits::bitstring()) -> binary()

Converts a bitstring into an sb-encoded bitstring

sb32 (Sortable base32) is a variant of RFC3548, slightly rearranged to preserve the lexical sorting properties. Base32 was chosen to avoid filename-unfriendly characters. Also important is that the padding character be less than any character in the alphabet

sb32 alphabet:


  0 0     6 6     12 C     18 I     24 O     30 U
  1 1     7 7     13 D     19 J     25 P     31 V
  2 2     8 8     14 E     20 K     26 Q  (pad) -
  3 3     9 9     15 F     21 L     27 R
  4 4    10 A     16 G     22 M     28 S
  5 5    11 B     17 H     23 N     29 T