Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Preserve insertion order in objects #53

Closed
wants to merge 5 commits into from
Closed

Conversation

csm
Copy link

@csm csm commented Mar 9, 2012

My application requires that objects preserve the order of elements when parsed. This modifies the hashtable code to also maintain a linked list of struct hashtable_pair pointers, in the order they are added to the hashtable. Iterator operations now use this instead.

It adds a bit more space to hashtables and key/value pairs, and adds a list append/remove operation on putting a new value, or deleting a value.

It does not modify the order on update: it preserves creation order, not update order.

I kind of butchered linked list handling. I couldn't break out of the "NULL terminates the list" mode of thinking, at least for append-only lists.

This should mean that iteration over objects will result in it returning
keys and values in the order they were inserted. This will also preserve the
ordering of object keys loaded.

The modifications turn the hashtable into something similar to the Java
class java.util.LinkedHashMap; it remains a hashtable, but an additional
linked list will maintain order of objects added.

Note that setting an existing key does *not* modify the insertion order;
to get that behavior, you must delete the key then set it.
@akheron
Copy link
Owner

akheron commented Mar 11, 2012

This is an interesting idea. There's already the serial field in struct hashtable_pair that records the insertion order of key-value pairs as a monotonically increasing integer. It's used with the JSON_PRESERVE_ORDER encoding flag to sort the object keys into the insertion order. It cannot be used to iterate through the object in any particular order, though.

This change would make serial unnecessary. It adds two pointers to each key, so after removing serial there would still be the extra overhead of one pointer for each object key. I'm not sure whether it's worth it, as this is the first time the feature of iterating objects in the insertion order has been requested, and Jansson has existed for quite a long time already.

May I ask what's your use case? Why is it important to iterate an object in the insertion order? And if this would be possible, why wouldn't you also want to iterate in alphabetic order of keys, for example?

@csm
Copy link
Author

csm commented Mar 11, 2012

The use case here is code that replicates how CouchDB computes revision numbers on documents — it computes an MD5 hash of the decoded JSON object (it actually computes the hash of the object using Erlang's native serialization, I documented how this works here), and thus it does need the values in the order they appear in the JSON stream. If it sorted the keys first that would be a better solution, but this is CouchDB, and it didn't exactly have the best design sense put into it.

Since the parser necessarily adds parsed items in the order they are read, preserving insertion order preserves the order they appear in the JSON document, too.

I know JSON specifically makes no guarantees about object member order, and this change is cheap but not free.

(The idea here was inspired by Java's LinkedHashMap class, which does pretty much the same thing)

@fabled
Copy link

fabled commented Mar 22, 2012

+1 from here.

My use case would be in config files, where I can have an ordered list of rules (e.g. acl, or other matching sequences). It would make sense to keep those objects as json objects in the structs and just traverse them in insert-order. It would be unpractical if the json object children needs sorting on each access.

Alternatively, one could just use single linked list (headnode to have head, tail; each item to have only next). Though, json_object_del would need O(n) lookup then which might be undesirable. Perhaps it could JSON object specific flag if ordered list maintenance is desired.

@akheron
Copy link
Owner

akheron commented Mar 22, 2012

I'm unsure about what to do with this.

One one hand, there seems to be a need to iterate the object items in the order they were added. Jansson already has an option to do this when encoding (the JSON_PRESERVE_ORDER flag).

On the other hand, relying on any particular order is bad design, as hash tables (which are commonly used to implement dictionary types) do not generally support ordering of items. Furthermore, the keys of dictionary types (objects) have many possible orders to choose from: lexical ordering of the keys, lexcial/numerical/whatever order of the values, deserialization/insertion order of keys/values, last updated entry first, random order, etc.

Neither JSON nor JavaScript (or ECMAScript) promise any particular order in which object keys should be returned when iterating over them. However, browsers generally return them in the insertion order for historical reasons (except for Chrome, that does it differently in some situations).

And JSON already has an ordered type: the array. My personal opinion is that if you need to rely on any particular ordering, use the array.

One way to make it possible to iterate an object in the insertion order would be to add an API to query the serial number of an object key. For example json_object_get_serial(obj, "key") and json_object_iter_serial(iter). Both would return an unsigned integer which, when sorted in ascending order, would give the insertion order of object items. The user would have to do storing the keys and sorting by serial by himself. This is what json_dumps() and other serialization functions do under the hood when given the JSON_PRESERVE_ORDER encoding flag (or JSON_SORT_KEYS, just with a different comparison function). See https://github.com/akheron/jansson/blob/master/src/dump.c#L293.

This approach would have the advantage of letting the user sort in any way he wants. Even with the current API, any order except for the insertion order is possible if you do it manually. Adding these two functions would just make it possible to also sort by the insertion order.

@csm
Copy link
Author

csm commented Mar 24, 2012

I'm unsure of what the right thing to do is, too :)

I know it's wrong to rely on any kind of ordering in a JSON object, but I'm still stuck with interop with existing software.

I think the solution of sorting by serial number (along with serializing in insertion order) would work for my use, but I don't think I would adopt that solution because it's less optimal in speed, and would just continue using my forked version.

@rogerz
Copy link
Contributor

rogerz commented Mar 26, 2012

I think another way to do this is abstracting the hashtable implementation from json_object_t struct. For example:

typedef struct {
    json_t json;
    void *hashtable;
} json_object_t;

typedef struct {
    hashtable_t *data;
    get_func_t (*get)(char *key);
    set_func_t (*set)(char *key, void *value);
    ...
} hashtable_prototype_t;

In this way, you can replace the built in hash table with your private implementation which could preserve the inserting order.

@akheron
Copy link
Owner

akheron commented Mar 26, 2012

In this way, you can replace the built in hash table with your private implementation which could preserve the inserting order.

Well, this seems a bit overkill to me :)

Casey Marshall added 4 commits March 31, 2012 18:22
* src/dump.c (do_dump): dump out JSON_BIGNUM.
* src/jansson.h: include tommath.h.
(JSON_BIGNUM): new json_type.
(json_is_number): include bignums too.
(json_is_bignum, json_bignum, json_bignum_value, json_bignum_set): new functions.
* src/jansson-private.h (json_bignum_t): new type.
(json_to_bignum): new macro.
* src/load.c (TOKEN_BIGNUM): new constant.
(lex_t.value.bignum): new union member.
(lex_scan_number): try creating a bignum if a normal integer will overflow.
(parse_value): handle TOKEN_BIGNUM.
* src/pack_unpack.c (type_names): add "bignum".
(unpack_value_starters): add 'N' for bignums.
(pack): pack bignums.
(unpack): unpack bignums.
* src/value.c (json_bignum, json_bignum_value, json_bignum_set, json_delete_bignum, json_bignum_equal, json_bignum_copy): new functions.
(json_number_value): handle bignums too (will truncate the value, I think).
(json_delete): delete bignums.
(json_equal): test bignums.
(json_copy): copy bignums.
(json_deep_copy): copy bignums.
The idea is that you run 'bootstrap_libtommath.sh', which should pull the
submodule, patch it to add the automake/libtool/autoconf support, and
regenerate the auto* stuff.
@akheron
Copy link
Owner

akheron commented Apr 13, 2012

Oh, you have added some commits. I didn't notice because github doesn't send any notification. The commits seem unrelated to this particular issue (preserving insertion order in objects), so it would probably be the best to open a new issue or pull request.

Some time ago, there was a thread on jansson-users about adding generic bignum hooks that could be used to hook any bignum library to Jansson. IIRC, there were some issues about memory management and the work was never finished, but in general I'm open to it.

@csm
Copy link
Author

csm commented Apr 13, 2012

Yeah, github just updates pull requests when you add new commits; I don't know if it's possible to split it out into different pull requests unless I commit them to separate branches. I don't really care enough to do that, though :)

I don't know how appropriate it is to introduce a dependency into jansson, even if it is one as simple as libtommath. My preferred implementation would just support the basics -- read/write large decimal strings, comparison, anything else the jansson API needed, and a way to grab the byte array representation and/or decimal representation, so users could interop with whatever bignum library (or none) they wanted. Just using libtommath was the quick and lazy way.

@akheron
Copy link
Owner

akheron commented Apr 16, 2012

I opened a new issue, see #69. The patch was developed in November last year and it work exactly like your preferred implementation.

@akheron akheron closed this Apr 16, 2012
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants