This is meant for transmitting vector data in situations where:
- Arbitrary bytes can't be transmitted
- We need to send the vector in as few bytes as possible
- Simply base64-encoding floating-point data -- at 5.33 bytes per entry -- isn't small enough
- Some loss of precision is acceptable
Possible applications include transmitting a vector in a URL or JSON object.
If you wonder why you'd need this, the fact is that you might not. But at Luminoso, we send lots of vectors over the network.
Returns a string b of bytes, representing digits using the URL-safe base64
(A-Z, a-z, 0-9, -, _), as follows:
b contains the power-of-two exponent, biased by 40. That is:
- An exponent of 0 (
A) means to multiply all the integers that follow by 2^-40.
- An exponent of 30 (
e) means to multiply the integers by 2^-10 (that is, divide them by 1024).
- An exponent of 40 (
o) means to leave the integers as is.
- An exponent of 63 (
_) means to multiply the integers by 2^23.
Call this number 2^(b - 40) the increment.
- An exponent of 0 (
The increment is chosen to maximize precision. To choose the increment, find the number "a" such that 2^a is larger than all the magnitudes in the vector. The correct value for the increment is then 2^(a - 17). However, if this gives an increment lower than 2^-40, use 2^-40 instead. For example, the zero-length vector is encoded as
A, and a length-two zero vector is encoded as
b[1:4], b[4:7], etc. contain the values in the vector, as 18-bit, big-endian, twos-complement integers, which will all be multiplied by the increment. That is:
f__represents the highest possible value, (2^17 - 1).
gAArepresents the lowest possible value, -(2^17).
The last value will be found in
b[3*K-2:3*K+1], so the length of the string
overall will be 3*K + 1.
This encoding can represent values in the range [-2^40, 2^40 - 2^23], with granularity potentially as small as 2^-40. It cannot represent inf or nan, and encoders should report an error when those are encountered; you probably don't want to transmit them in a vector anyhow. Encoders should likewise report an error if they encounter values outside the representable range, which may, for speed, include -2^40 itself.
Decodes a vector that has been encoded with pack64, returning an appropriate
data structure. In Python, this returns a NumPy vector of dtype
The largest entry in a pack64 vector is specified with 17 bits of precision. This is approximately 3/4 of the precision of an IEEE 754 single-precision float (24 bits), and corresponds to five significant (decimal) digits.
Importantly, however, every value in the vector is specified with the same absolute precision. The exponent defines 2^18 possible values for each entry, and each entry chooses from these same values. Thus entries of a vector that are much smaller than the largest entry will lose relative precision, possibly even being rounded to zero. However, for transmitting mathematical vectors, the relative precision of smaller entries generally matters less.
(c) 2012-2016 Luminoso Technologies. pack64 is released as free software under the MIT license. See LICENSE for the terms of the MIT license (there aren't many).