λ夫 Supporting Unions with Unibit Encoding

Matt Pouttu-Clarke edited this page Dec 24, 2015 · 1 revision

Supporting unions requires recognizing that a field may have either numeric or String data or both. Most text encoded data with Strings may contain numbers in a field we are assuming is a String and vice versa. In these situations if an algorithm must adapt to changes in data we must support measuring and quantifying union fields in a uniform and efficient way. Unibit encoding provides extremely performant encoding of String to double while sacrificing a minimal amount of meaningful information.

Unibit Encoding

Packs an ASCII compatible String into a double while preserving as much information as possible. If the String represents a Number, then it is encoded as such. If not, the MD64 enum defines a 6-bit encoding scheme used for values up to literalCutoff characters. After that length, the input String is phonetically encoded and then the MD16 enum is used for a 4-bit encoding of the phonetic characters. The packed value contains reserved bits so that more complex phonetic encodings (including international charsets) can be added. Unibit supports utility methods to identify the encoded data type and also to decode back to the original String (if literalCutoff or less chars) or the phonetic string (if literalCutoff+ chars). Numeric input less than or equal to -281474976710656L will be treated as negative infinity, in order to reserve flags to allow accurate decoding of String input. If encoding errors occur or any reason, a NaN result occurs.

You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.
Press h to open a hovercard with more details.