Specs Details

WebReflection edited this page Sep 13, 2010 · 1 revision
Clone this wiki locally
JSON.hunpack :: Decompression
The nice part of JSON.hpack is that it does not matter how you present data, unpack will always be possible,
as long as list structure is the expected one.
JSON.hpack(ed) collection has a reserved index 0 with a mixed collection.
This collection can contain one or more key, as string, or one or more key,[value], as key array enum type.

As sumamry, an header is a collection of strings, eventually followed as next element by a collection of values.

    key         = "JSON string"
    key_enum    = "JSON string",value_enum
    value_enum  = [value(,value)*]
    value       = a generic JSON value (string, number, object, array)
    header      = [key|key_enum(,key|key_enum)*]

    example: ["name","skilled",[false,true]]

Starting from index 1, JSON.hpacked contain each row in the result set

    row_value   = value|index
    index       = header value_enum indexOf original value
    row         = [row_value(,row_value)*]

    example: ["Andrea",1]

As result, this is the valid JSON.hpack structure

    hpack       = [header,row(,row)*]

    example: [["name","skilled",[false,true]],["Andrea",1],["Daniele",0]]

Accordingly, the decompression algo is something like this (not optimized, just a baseline)

    // hpack is the JSON decoded hpacked list.
    // header is in index 0
    list header = (list)hpack[0];

    // loop over header elements
    for(int i = 0; i < header.length; i++)
            // verify that this is not the last element in the header
            i + 1 < header.length


            // verify that the next element is not a string
            header[i + 1].type != "string"

            // replace column indexes with original values

            // retrieve the enum (next header element)
            list key_enum = (list)header[i + 1];

            // loop over each result set column, skipping index 0 (the header)
            for(int j = 1; j < hpack.length; j++)
                // replace row index with original value (from key_enum)

                list row  =&(list)hpack[j];
                int index = (int) row[i];
                row[i] = key_enum[index];


            // skip next element, already used in this if
            i = i + 1;


    // every row column contains now values rather than possible indexes

    // create the original object
    list obj = new list(hpack.length - 1);

    // loop over each row
    for(int j = 1; j < hpack.length; j++)
        // create the row object
        object row = new object();

        // loop over header
        for(int i = 0; i < header.length; i++)
            // assign key and value

            row[(string)header[i]] = hpack[j][i];

            // skip the next element if it is not a key
            if(i + 1 < header.lengthheader[i + 1].type != "string")
                i = i + 1;

        // add the row to obj

    // obj will be now the original one

hpack       = [["name","skilled",[false,true]],["Andrea",1],["Daniele",0]]
obj         = [{"name":"Andrea","skilled":true},{"name":"Daniele","skilled":false}]

JSON.hpack :: Compression
Test Case:

Compression 0 - Remove Keys Redundancy Creating The Header
Create the header in the index 0. Put there every key found in one of the rows.
For each row, loop over header and create an array of values.
This will ensure correct assignment (order does NOT matter, key/value pairs are preserved).
    ["name","age","gender","skilled"],  // header ( row keys, order does not matter )
    ["a",31,"Male",true],               // row 0 (
                                        //      value for respective key at the same index.
                                        //      i.e. key "b" at index 1 in header, value at index 1 in row
                                        // )
    ["b",27,"Female",true],             // row N (the same)
    ["c",26,"Male",false]               // row last (same for each row)
Level 0 will always ensure a good compromise between hpack creation speed and string length, thanks to removed keys redundancy.

Compression 1 - Create Enum Like Collections For Each Column And Replace Values With Indexes
Loop over each result set "columns" EXCEPT when values are numbers (don't swap numbers for numbers).
A column is an array specific for one single key.
For Example, the array for column with key "name" will contain only rows->name values.
Loop over the result set and for each row try to find its value in the key array.
If value is already there, replace the row value with found index.
Otherwise add the value and replace row value with key array length - 1.
i.e. row 0 index 1 value "Male", array will be ["Male"] and row 0 value will be 0
     row 1 index 1 value "Femal". array will be ["Male","Female"] and row 1 value will be 1
     row 2 index 1 value "Male". array does not change and row 2 value will be 0
Column age is ignored since will be redundant to swap numbers with numbers (indexes).
Once every row contains only numbers (indexes or just numbers) put every key array NEXT to the key name.
    // header
        "name", ["a","b","c"],      // all unique names next to "name" key
        "age",                      // column age has no collections
        "gender", ["Male","Female"],// all unique genders next to "gender" key
        "skilled", [true,false]     // all unique skilled next to "skilled" key
    [0,31,0,0], // row 0
    [1,27,1,0], // row N
    [2,26,0,1]  // row last

Compression 2 - Try To Estimate Worthy Replacements
Compression 2 acts in a truly simple way.
If the length of the key array is bigger than ceil(allRows.length / 2) hpack consider the replacement NOT worthy.
In this case, the only column with an enum length greater than ceil(3/2) is the name one.
Loop over each row and replace index 0 with key array index[row[0]] value.
gender and skileld enum collections are considered worthy.
Level 2 does not guarantee better results but in some case could be the best compromise for performances and compression.
To guarantee worthy enum collections it is necessary to use level 3.

Compression 3 - Estimate Worthy Replacements
This level costs (2 * JSON encode) * (header keys).length
For each enumerated column (name, gender, skilled) create two collection:
 - one with just values
 - one with key header enum plus indexes
JSON encode them and compare the length.
For example, column key:
["a","b","c"]               // wins

For column gender
["Male","Female",0,1,0]     // wins

For column skilled
[true,false,true]           // wins
The reason both lists need to be JSON encoded is that some extra ASCII character could produce a long "\uXXXX" string.
In this case the index solution could be better choice

Compression 4 - Compare Lengths Of Each Level And Return Best One
This level is more for tests / debug purposes. It could cost too much but result will be always the best one.
If level 0 and level 4 produce the same length, level 0 will be the choice.
Returned JSON.hpack(ed) string level will be the one from method JSON.hbest().