Allow JSON filtering and clarify sort order #1258

glasserc · 2017-06-07T17:59:11Z

Fixes #1215, #1216, #1257, and possibly others.

Instead of coercing all values to strings when doing comparisons in Postgres, let Postgres use its own JSONB type comparison. This gets us 90% of the way to supporting JSON comparisons, including comparisons against null. The remaining part is about handling "missing" fields. I've made the executive decision that a "missing" value is not equal to any value of any type, including the JSON null. (Fortunately, this is easy in Postgres.) We handle these explicitly by adding field IS NOT NULL AND or field IS NULL OR as needed.

Because we already support sorting on a field with mixed-type values and/or missing values, we already implicitly have an ordering defined for comparisons like LT, GT, MAX, MIN, etc. This ordering puts missing values as "bigger" than other values. While we're messing around with comparisons involving NULL, fix this too.

We don't appear to have any clear recommendation about our ordering in the memory backend. To some extent, the tests assume that the ordering is the same. However, I am skeptical that we can always harmonize on the Postgres ordering. I think a more sensible approach is to allow each backend to define its own sort order, as long as it's consistent. For this reason, I didn't struggle with making the memory backend comparison function exactly the same as the Postgres one does.

Add documentation.
Add tests.
Add a changelog entry.
(n/a) Add your name in the contributors file.
If you changed the HTTP API, update the API_VERSION constant and add an API changelog entry in the docs

r? @leplatrem, @Natim

I'm pretty confident in the approach, but I'm not sure whether this qualifies as a major version and/or API version.

glasserc · 2017-06-07T19:15:23Z

I've added some commits to implement a has_ operator, addressing #344 directly, since the change in how we handle missing fields breaks a test each in test_views_groups and test_views_collections. I know this is kind of outside the scope of this PR, but it touches a lot of the same code and benefits from the new cleaner implementation. If you'd rather, I can separate out the commits for that feature.

The storage tests fail hard in Postgres because everything is being cast to a string. Filters mostly work in memory except for None because of special handling that imitates how Postgres works. The view test passes because it works against memory, it's just a sanity check.

json.dumps() will never return a value like '"....."' unless the input value was a string, which was just ruled out by the if statement above.

(Or, you could say we "sorted out" the comparison order...) This isn't necessarily the best possible sort order, but it's the one that Postgres supports natively, so let's capitalize on that. We may have to allow different storage backends to define their own sort order, but we have a couple tests that rely on sort order (test_get_all_can_deal_with_none_values and test_get_all_can_filter_with_numeric_values) so let's just go with this one.

This is explicitly the same as the Postgres backend, at least as a first approximation. The full complete comparison order isn't implemented, and I expect probably won't ever be implemented as I think the only logical way to handle different storage backends is to allow them to enforce their own sort order. However, a couple of tests rely on this sort order for all sort backends, so enforce it.

This will be necessary to handle some test breakages that comes from our handling of NULLs.

leplatrem

This is excellent ❤️

I reviewed the whole set of changes thoroughly and couldn't find anything but micronits.

Note: Handling of ?value=null is possible because of #1252

I wish we could really avoid having different API behaviours depending on the storage backend that is used. Maybe we could open a dedicated issue and act later.

Thank you very much for tackling this with brio :)

r+ with docs/changelog :)

leplatrem · 2017-06-08T08:05:45Z

kinto/core/storage/memory.py

        if matches:
            yield record


+def schwartz_transform(value):


cool, did not know about this name :)

btw, on wikipedia it's called schwartzian transform

leplatrem · 2017-06-08T08:06:23Z

kinto/core/storage/postgresql/__init__.py

@@ -686,6 +686,7 @@ def _format_conditions(self, filters, id_field, modified_field,
        holders = {}
        for i, filtr in enumerate(filters):
            value = filtr.value
+            query_is_like = filtr.operator == COMPARISON.LIKE


µnit: boolean values usually start with is_ or has_... what about is_like_operator ?

leplatrem · 2017-06-08T08:13:03Z

kinto/core/storage/postgresql/__init__.py

+                        COMPARISON.IN,
+                        # Nor can they be LIKE anything.
+                        COMPARISON.LIKE,
+                ):


µnit: in python multi-lines if are not recommended. Would it be better to have two intermediary variables like null_comparable_ops and null_incomparable_ops ?

leplatrem · 2017-06-08T08:15:31Z

kinto/core/storage/testing.py

+        sorting = [Sort('author', 1)]
+        records, _ = self.storage.get_all(sorting=sorting, **self.storage_kw)
+        # Some interesting values to compare against
+        VALUES = ['A', 'Z', '', 0, 4]


µnit: I don't think it's necessary to have uppercase here

leplatrem · 2017-06-08T13:29:18Z

I'm not sure whether this qualifies as a major version and/or API version.

The way I see it:

You fixed inconsistencies and bugs
Those fixes deserve a mention in the API changelog, so we should bump its version
I don't think it requires a major dump for the API version ( Handle querystring parameters as JSON encoded values. Fixes #1217 #1252 didn't :) )
It doesn't break anything on the Python side, so major dump in the package version either

Additionnal tests about implicit casts

glasserc · 2017-06-08T14:41:07Z

I only have two concerns about HTTP API compatibility:

We break the technique shown in How to search for records having or not having a property? #344 (but that was never documented, so I'm not super worried about it)
We implicitly discourage the use of fields with the name has_foo

This is the name used on Wikipedia.

leplatrem · 2017-06-08T15:36:54Z

We break the technique shown in #344

With all due respect, it was very hacky.

We implicitly discourage the use of fields with the name has_foo

Well, I agree, but acceptable IMO since that's also true for other fields as shown in #1004

Natim · 2017-06-12T06:25:59Z

This is awesome thank you for tackling it!

glasserc added 8 commits June 7, 2017 15:19

Needless strip()

6e0476d

json.dumps() will never return a value like '"....."' unless the input value was a string, which was just ruled out by the if statement above.

Add test to enforce consistent ordering

b99cdf3

Tests for new "has" operator

21cdf3c

This will be necessary to handle some test breakages that comes from our handling of NULLs.

Introduce has operator

9f74439

We no longer need is_numeric

944f790

glasserc force-pushed the json-filters branch 2 times, most recently from 042fcfe to 63c3674 Compare June 7, 2017 20:01

Fix coverage and fix a bug found by coverage

17455d5

glasserc force-pushed the json-filters branch from 63c3674 to 17455d5 Compare June 7, 2017 20:09

leplatrem approved these changes Jun 8, 2017

View reviewed changes

Add tests about implicit casts (fixes Kinto#1217)

952199c

leplatrem mentioned this pull request Jun 8, 2017

WIP experiments around safe casts in postgresql backend (ref #1217) #1220

Closed

Merge pull request #2 from leplatrem/glasserc-json-filters

061f5b6

Additionnal tests about implicit casts

Rename to "schwartzian_transform"

1dba111

This is the name used on Wikipedia.

glasserc added 3 commits June 8, 2017 11:38

@leplatrem review

e653cab

Document new has_ and JSON values.

6299c3b

Update CHANGELOG

5358aff

glasserc force-pushed the json-filters branch from d28c028 to 5358aff Compare June 8, 2017 16:52

glasserc merged commit bad15bb into Kinto:master Jun 8, 2017

glasserc deleted the json-filters branch June 8, 2017 17:52

This was referenced Jun 8, 2017

Filter objects where a particular field contains an empty list #1216

Closed

Filter objects where a particular field is null #1215

Closed

How to search for records having or not having a property? #344

Closed

This was referenced Jun 8, 2017

Crash on dev ota server when requesting /records?min_target.version=53 #1217

Closed

Handling missing fields #1257

Closed

glasserc mentioned this pull request Nov 12, 2018

Add support for JSON format for in_ and exclude_ filters values #1877

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow JSON filtering and clarify sort order #1258

Allow JSON filtering and clarify sort order #1258

glasserc commented Jun 7, 2017 •

edited

Loading

glasserc commented Jun 7, 2017

leplatrem left a comment

leplatrem Jun 8, 2017

leplatrem Jun 8, 2017

leplatrem Jun 8, 2017

leplatrem Jun 8, 2017

leplatrem commented Jun 8, 2017

glasserc commented Jun 8, 2017

leplatrem commented Jun 8, 2017

Natim commented Jun 12, 2017

Allow JSON filtering and clarify sort order #1258

Allow JSON filtering and clarify sort order #1258

Conversation

glasserc commented Jun 7, 2017 • edited Loading

glasserc commented Jun 7, 2017

leplatrem left a comment

Choose a reason for hiding this comment

leplatrem Jun 8, 2017

Choose a reason for hiding this comment

leplatrem Jun 8, 2017

Choose a reason for hiding this comment

leplatrem Jun 8, 2017

Choose a reason for hiding this comment

leplatrem Jun 8, 2017

Choose a reason for hiding this comment

leplatrem commented Jun 8, 2017

glasserc commented Jun 8, 2017

leplatrem commented Jun 8, 2017

Natim commented Jun 12, 2017

glasserc commented Jun 7, 2017 •

edited

Loading