Syntactical Ordering of Object Keys #53

fugu13 · 2020-03-03T20:38:06Z

This adds a flag that makes object key ordering based on JSonnet syntactical ordering, by using ordered maps internally and skipping the sort step when the flag is provided.

I'm part of a team reusing SJSonnet to support data transformation (https://datasonnet.com, and we're still very early days). Our main audience is highly technical business analysts who are experts in the data schemas and formats they're manipulating, but infrequent programmers. The reordering behavior of JSonnet is one of the most confusing things to them about JSonnet programs. Additionally, some of the code we work with taking JSonnet output is ordering-sensitive (unfortunately).

Beyond our use case, I've noticed that requests for an ordering option have popped up several times in general JSonnet discussions. That does not look likely to happen across all implementations, but I think having it available would likely still be of interest to many SJSonnet users.

While this PR goes beyond that, since ordering flows naturally once ordered maps are used, the only two well-defined ordering guarantees this PR attempts to provide when using the flag are:

keys within a single syntactical JSonnet object are ordered lexically, and
when two objects are combined, any ordering of keys from the first guarantee in the first object are honored in the result before any from the second object, and then all keys only in the second object are ordered in the way they are in the second object, after the keys in the first object

Since there are many places output can be generated, and in most/all of those cases consistent behavior is important (such as objects rendered during errors vs in output), we've made that a global flag on the evaluator.

We're happy to revise or do other further work as needed if it will help include this capability in SJSonnet.

Thanks for looking,
Russell

Merge master

Merging the latest version

Preserving order of fields: Alternative

lihaoyi-databricks · 2020-03-05T01:40:07Z

This looks reasonable, but could you run benchmarks (I think we have some in the test suite) to see if this has a significant impact on performance? Our Jsonnet compilation at work is moderately slow, so I want to make sure this doesn't negatively impact performance too much.

If using LinkedHashMap affects performance too much, we may need to dynamically swap between mutable.LinkedHashMap and mutable.Map depending on the value of preserveOrder

lihaoyi-databricks · 2020-03-05T01:45:36Z

One more thing is I think we should have a slightly more thorough test suite, e.g. to verify the behavior when:

We override keys: does it preserve the original key or move it to the end of the key list?
hide keys: I think you can override keys with key:: value to hide a previously visible key; do we preserve the order of the remaining keys? What if we use key ::: value to re-introduce it?
using standard library methods:does std.mergePatch preserve order? How about std.objectFields and std.objectFieldsAll? std.manifestIni? etc.

There are probably more cases you can come up with, but we should err on the side of thoroughness when for this PR, since otherwise I'm sure this functionality will be accidentally broken in future

fugu13 · 2020-03-05T16:10:54Z

Thanks for taking a look, I'll work on the performance tests and increased test coverage and hopefully have updates for you soon!

…rveOrdering

…ent preserveOrdering implementation

fugu13 · 2020-03-09T15:16:55Z

I've now updated this PR with all the suggested tests and a number of additional tests, it went pretty smoothly other than a brief sidetracking when I accidentally left off a comma between objects and had them combine on me instead, which is syntax I'd forgotten about.

It did uncover one additional code change needed, since objectFields and objectFieldsAll did their own sorting of keys, which is done. In addition to adding tests for all methods that look plausible, I searched for places where sorting happens, and did not discover any more that this PR should change the behavior of.

I also discovered one (possible?) bug, around manifestXmlJsonml, where the example in the jsonnet docs errors in SJSonnet due to numerical attribute values. I've filed that as #56. I say possible because, while the example works on jsonnet.org, the function is pretty underspecified. I believe the JsonML spec does allow numbers in that particular place, though.

fugu13 · 2020-03-09T15:17:11Z

I hope to have benchmarking results soon.

… orders

…rrently fails when using a key function

…nnet == operation

…son logic in key function branches

fugu13 · 2020-03-10T21:54:32Z

@lihaoyi-databricks We've run the benchmarks, and the results suggest any speed differences due to the changes are very very small, so I'm hopeful we'll be able to merge with this approach.

Two of us ran the benchmarks, on different but basically identical computers. Once was before a small number of the changes due to tests were in, and the other time was after.

In both cases, we ran the version of SjsonnetTestMain at commit 2108b89, which is one commit behind master. The most recent version is unrunnable due to including several kubernetes configs that do not exist in the repository. We ran the tests with ./mill -i show sjsonnet[2.13.0].jvm.test.run. Both computers were 15” 2017 MacBook Pro, 2.9 GHz Quad-Core Intel Core i7, 16 GB 2133 MHz LPDDR3.

The two codebases were modusintegration:master (first run 6366b0a, next run 06da4df) and databricks:master (83cabbe)

Results are the number of times a set of tests can be run in 20 seconds. Higher numbers are better.

First runs

In the first set of runs we hadn’t yet established a formal procedure, but while no applications were quit, nothing was being done on the computer during the tests. No burn in run was done. Ten runs were recorded for each codebase.

The ten test results, in order for modusintegration:master (6366b0a)

2936
2811
2797
2829
2628
2590
2927
2740
2806
2795

The ten test results, in order, for databricks:master

2808
3009
2932
2969
2778
2679
3035
2845
2794
2865

The means are very close. The mean without ordering code is about 85 higher than the mean with. This about about 3/4 the standard deviation of each sample, and the standard deviation is low relative to the values, so not very large. Specifically, a one-sided t-test says there’s a slightly greater than 5% chance they’re both from the same distribution, so the null hypothesis should be rejected and we should assume there’s no difference.

Second runs

In the second set of runs, non-involved applications were quit, and nothing was done on the computer during test runs. For each codebase, the benchmark was run first once without recording the result, then ten runs that were recorded.

The ten run results, in order, for modusintegration:master (06da4df)

2880
3119
2791
2996
3039
3098
2962
3145
3057
3084

The ten run results, in order, for databricks:master

2990
3016
3027
3066
3022
3084
2997
2959
3039
2907

The means were virtually identical. In fact, the mean for with ordering is a very tiny bit higher. The standard deviation for each is much higher than the mean, though about twice as high for the with ordering runs. In both cases the standard deviation is modest relative to the mean. The t-test probability that they are from same the distribution is very high, so we should again assume there is no difference.

lihaoyi-databricks · 2020-03-12T06:13:33Z

@fugu13 this looks good to me! Happy to merge as is.

Going forward I'll be leaning heavily on your test suite to make sure future changes do not violate any ordering-related properties that are important to you. Hopefully it is enough to catch regressions, otherwise we can add further tests when any issues turn up in future

fugu13 · 2020-03-12T16:11:59Z

Thanks! And yeah, I feel pretty good about the test suite maintaining ordering everywhere that makes sense, but we're also going to be measured about what we tell people is definitely not going to change.

Would it be possible to have a released pushed to the maven repos soon with the updates?

By the way, thank you so much for creating SJSonnet, it's made integrating JSonnet with the java codebases common in the enterprise service bus world feasible. We've really enjoyed working with the SJSonnet codebase, too, it's worked well for our needs even though it wasn't written with our somewhat different data transformation scenario in mind.

lihaoyi-databricks · 2020-03-15T03:06:07Z

tagged and uploaded sjsonnet 0.2.4 to maven central

javaduke and others added 6 commits October 28, 2019 06:41

Merge pull request #1 from databricks/master

08ec0d3

Merge master

Merge pull request #3 from databricks/master

1d2f025

Merging the latest version

preserve ordering

8b408de

Preserving order of keys

86341ad

Update Materializer.scala

6d022ee

Merge pull request #5 from modusintegration/orderingpreserved

e71820b

Preserving order of fields: Alternative

fugu13 added 3 commits March 8, 2020 21:48

make ordering for objectFields and objectFieldsAll dependent on prese…

e2a2cb8

…rveOrdering

add numerous tests, including two that identify a problem in the curr…

5bd3cc9

…ent preserveOrdering implementation

fix missing comma that was causing issue

b11924c

fugu13 added 7 commits March 9, 2020 08:29

add test verifying toString preserves order

26d0b65

add test verifying member concat string to object preserves order

6feb5a1

add test verifying errors maintain ordering of keys

c20c62e

add test verifying order preserving objects compare when in different…

e251141

… orders

add test checking if preserving order preserves set membership. It cu…

6366b0a

…rrently fails when using a key function

modify set membership with key function to honor same contract as jso…

0a96dbc

…nnet == operation

add tests for all set operations and update type checking and compari…

06da4df

…son logic in key function branches

lihaoyi-databricks merged commit 12fe9ed into databricks:master Mar 12, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Syntactical Ordering of Object Keys #53

Syntactical Ordering of Object Keys #53

fugu13 commented Mar 3, 2020

lihaoyi-databricks commented Mar 5, 2020

lihaoyi-databricks commented Mar 5, 2020 •

edited

fugu13 commented Mar 5, 2020

fugu13 commented Mar 9, 2020

fugu13 commented Mar 9, 2020

fugu13 commented Mar 10, 2020

lihaoyi-databricks commented Mar 12, 2020

fugu13 commented Mar 12, 2020

lihaoyi-databricks commented Mar 15, 2020

Syntactical Ordering of Object Keys #53

Syntactical Ordering of Object Keys #53

Conversation

fugu13 commented Mar 3, 2020

lihaoyi-databricks commented Mar 5, 2020

lihaoyi-databricks commented Mar 5, 2020 • edited

fugu13 commented Mar 5, 2020

fugu13 commented Mar 9, 2020

fugu13 commented Mar 9, 2020

fugu13 commented Mar 10, 2020

First runs

Second runs

lihaoyi-databricks commented Mar 12, 2020

fugu13 commented Mar 12, 2020

lihaoyi-databricks commented Mar 15, 2020

lihaoyi-databricks commented Mar 5, 2020 •

edited