Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
λ夫 Supporting Unions with Unibit Encoding
Supporting unions requires recognizing that a field may have either numeric or String data or both. Most text encoded data with Strings may contain numbers in a field we are assuming is a String and vice versa. In these situations if an algorithm must adapt to changes in data we must support measuring and quantifying union fields in a uniform and efficient way. Unibit encoding provides extremely performant encoding of String to double while sacrificing a minimal amount of meaningful information.
Packs an ASCII compatible String into a double while preserving as much information as possible. If the String represents a Number, then it is encoded as such. If not, the MD64 enum defines a 6-bit encoding scheme used for values up to literalCutoff characters. After that length, the input String is phonetically encoded and then the MD16 enum is used for a 4-bit encoding of the phonetic characters. The packed value contains reserved bits so that more complex phonetic encodings (including international charsets) can be added. Unibit supports utility methods to identify the encoded data type and also to decode back to the original String (if literalCutoff or less chars) or the phonetic string (if literalCutoff+ chars). Numeric input less than or equal to -281474976710656L will be treated as negative infinity, in order to reserve flags to allow accurate decoding of String input. If encoding errors occur or any reason, a NaN result occurs.