New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Preserve insertion order in objects #53
Conversation
This should mean that iteration over objects will result in it returning keys and values in the order they were inserted. This will also preserve the ordering of object keys loaded. The modifications turn the hashtable into something similar to the Java class java.util.LinkedHashMap; it remains a hashtable, but an additional linked list will maintain order of objects added. Note that setting an existing key does *not* modify the insertion order; to get that behavior, you must delete the key then set it.
This is an interesting idea. There's already the This change would make May I ask what's your use case? Why is it important to iterate an object in the insertion order? And if this would be possible, why wouldn't you also want to iterate in alphabetic order of keys, for example? |
The use case here is code that replicates how CouchDB computes revision numbers on documents — it computes an MD5 hash of the decoded JSON object (it actually computes the hash of the object using Erlang's native serialization, I documented how this works here), and thus it does need the values in the order they appear in the JSON stream. If it sorted the keys first that would be a better solution, but this is CouchDB, and it didn't exactly have the best design sense put into it. Since the parser necessarily adds parsed items in the order they are read, preserving insertion order preserves the order they appear in the JSON document, too. I know JSON specifically makes no guarantees about object member order, and this change is cheap but not free. (The idea here was inspired by Java's LinkedHashMap class, which does pretty much the same thing) |
+1 from here. My use case would be in config files, where I can have an ordered list of rules (e.g. acl, or other matching sequences). It would make sense to keep those objects as json objects in the structs and just traverse them in insert-order. It would be unpractical if the json object children needs sorting on each access. Alternatively, one could just use single linked list (headnode to have head, tail; each item to have only next). Though, json_object_del would need O(n) lookup then which might be undesirable. Perhaps it could JSON object specific flag if ordered list maintenance is desired. |
I'm unsure about what to do with this. One one hand, there seems to be a need to iterate the object items in the order they were added. Jansson already has an option to do this when encoding (the On the other hand, relying on any particular order is bad design, as hash tables (which are commonly used to implement dictionary types) do not generally support ordering of items. Furthermore, the keys of dictionary types (objects) have many possible orders to choose from: lexical ordering of the keys, lexcial/numerical/whatever order of the values, deserialization/insertion order of keys/values, last updated entry first, random order, etc. Neither JSON nor JavaScript (or ECMAScript) promise any particular order in which object keys should be returned when iterating over them. However, browsers generally return them in the insertion order for historical reasons (except for Chrome, that does it differently in some situations). And JSON already has an ordered type: the array. My personal opinion is that if you need to rely on any particular ordering, use the array. One way to make it possible to iterate an object in the insertion order would be to add an API to query the serial number of an object key. For example This approach would have the advantage of letting the user sort in any way he wants. Even with the current API, any order except for the insertion order is possible if you do it manually. Adding these two functions would just make it possible to also sort by the insertion order. |
I'm unsure of what the right thing to do is, too :) I know it's wrong to rely on any kind of ordering in a JSON object, but I'm still stuck with interop with existing software. I think the solution of sorting by serial number (along with serializing in insertion order) would work for my use, but I don't think I would adopt that solution because it's less optimal in speed, and would just continue using my forked version. |
I think another way to do this is abstracting the
In this way, you can replace the built in hash table with your private implementation which could preserve the inserting order. |
Well, this seems a bit overkill to me :) |
* src/dump.c (do_dump): dump out JSON_BIGNUM. * src/jansson.h: include tommath.h. (JSON_BIGNUM): new json_type. (json_is_number): include bignums too. (json_is_bignum, json_bignum, json_bignum_value, json_bignum_set): new functions. * src/jansson-private.h (json_bignum_t): new type. (json_to_bignum): new macro. * src/load.c (TOKEN_BIGNUM): new constant. (lex_t.value.bignum): new union member. (lex_scan_number): try creating a bignum if a normal integer will overflow. (parse_value): handle TOKEN_BIGNUM. * src/pack_unpack.c (type_names): add "bignum". (unpack_value_starters): add 'N' for bignums. (pack): pack bignums. (unpack): unpack bignums. * src/value.c (json_bignum, json_bignum_value, json_bignum_set, json_delete_bignum, json_bignum_equal, json_bignum_copy): new functions. (json_number_value): handle bignums too (will truncate the value, I think). (json_delete): delete bignums. (json_equal): test bignums. (json_copy): copy bignums. (json_deep_copy): copy bignums.
The idea is that you run 'bootstrap_libtommath.sh', which should pull the submodule, patch it to add the automake/libtool/autoconf support, and regenerate the auto* stuff.
Oh, you have added some commits. I didn't notice because github doesn't send any notification. The commits seem unrelated to this particular issue (preserving insertion order in objects), so it would probably be the best to open a new issue or pull request. Some time ago, there was a thread on jansson-users about adding generic bignum hooks that could be used to hook any bignum library to Jansson. IIRC, there were some issues about memory management and the work was never finished, but in general I'm open to it. |
Yeah, github just updates pull requests when you add new commits; I don't know if it's possible to split it out into different pull requests unless I commit them to separate branches. I don't really care enough to do that, though :) I don't know how appropriate it is to introduce a dependency into jansson, even if it is one as simple as libtommath. My preferred implementation would just support the basics -- read/write large decimal strings, comparison, anything else the jansson API needed, and a way to grab the byte array representation and/or decimal representation, so users could interop with whatever bignum library (or none) they wanted. Just using libtommath was the quick and lazy way. |
I opened a new issue, see #69. The patch was developed in November last year and it work exactly like your preferred implementation. |
My application requires that objects preserve the order of elements when parsed. This modifies the hashtable code to also maintain a linked list of
struct hashtable_pair
pointers, in the order they are added to the hashtable. Iterator operations now use this instead.It adds a bit more space to hashtables and key/value pairs, and adds a list append/remove operation on putting a new value, or deleting a value.
It does not modify the order on update: it preserves creation order, not update order.
I kind of butchered linked list handling. I couldn't break out of the "NULL terminates the list" mode of thinking, at least for append-only lists.