Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow JSON filtering and clarify sort order #1258

Merged
merged 15 commits into from
Jun 8, 2017

Conversation

glasserc
Copy link
Contributor

@glasserc glasserc commented Jun 7, 2017

Fixes #1215, #1216, #1257, and possibly others.

Instead of coercing all values to strings when doing comparisons in Postgres, let Postgres use its own JSONB type comparison. This gets us 90% of the way to supporting JSON comparisons, including comparisons against null. The remaining part is about handling "missing" fields. I've made the executive decision that a "missing" value is not equal to any value of any type, including the JSON null. (Fortunately, this is easy in Postgres.) We handle these explicitly by adding field IS NOT NULL AND or field IS NULL OR as needed.

Because we already support sorting on a field with mixed-type values and/or missing values, we already implicitly have an ordering defined for comparisons like LT, GT, MAX, MIN, etc. This ordering puts missing values as "bigger" than other values. While we're messing around with comparisons involving NULL, fix this too.

We don't appear to have any clear recommendation about our ordering in the memory backend. To some extent, the tests assume that the ordering is the same. However, I am skeptical that we can always harmonize on the Postgres ordering. I think a more sensible approach is to allow each backend to define its own sort order, as long as it's consistent. For this reason, I didn't struggle with making the memory backend comparison function exactly the same as the Postgres one does.

  • Add documentation.
  • Add tests.
  • Add a changelog entry.
  • (n/a) Add your name in the contributors file.
  • If you changed the HTTP API, update the API_VERSION constant and add an API changelog entry in the docs

r? @leplatrem, @Natim

I'm pretty confident in the approach, but I'm not sure whether this qualifies as a major version and/or API version.

@glasserc
Copy link
Contributor Author

glasserc commented Jun 7, 2017

I've added some commits to implement a has_ operator, addressing #344 directly, since the change in how we handle missing fields breaks a test each in test_views_groups and test_views_collections. I know this is kind of outside the scope of this PR, but it touches a lot of the same code and benefits from the new cleaner implementation. If you'd rather, I can separate out the commits for that feature.

The storage tests fail hard in Postgres because everything is being
cast to a string. Filters mostly work in memory except for None
because of special handling that imitates how Postgres works. The view
test passes because it works against memory, it's just a sanity check.
json.dumps() will never return a value like '"....."' unless the input
value was a string, which was just ruled out by the if statement above.
(Or, you could say we "sorted out" the comparison order...)

This isn't necessarily the best possible sort order, but it's the one
that Postgres supports natively, so let's capitalize on that. We may
have to allow different storage backends to define their own sort
order, but we have a couple tests that rely on sort
order (test_get_all_can_deal_with_none_values and
test_get_all_can_filter_with_numeric_values) so let's just go with
this one.
This is explicitly the same as the Postgres backend, at least as a
first approximation. The full complete comparison order isn't
implemented, and I expect probably won't ever be implemented as I
think the only logical way to handle different storage backends is to
allow them to enforce their own sort order. However, a couple of tests
rely on this sort order for all sort backends, so enforce it.
This will be necessary to handle some test breakages that comes from
our handling of NULLs.
@glasserc glasserc force-pushed the json-filters branch 2 times, most recently from 042fcfe to 63c3674 Compare June 7, 2017 20:01
Copy link
Contributor

@leplatrem leplatrem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is excellent ❤️

I reviewed the whole set of changes thoroughly and couldn't find anything but micronits.

Note: Handling of ?value=null is possible because of #1252

I wish we could really avoid having different API behaviours depending on the storage backend that is used. Maybe we could open a dedicated issue and act later.

Thank you very much for tackling this with brio :)

r+ with docs/changelog :)

if matches:
yield record


def schwartz_transform(value):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool, did not know about this name :)

btw, on wikipedia it's called schwartzian transform

@@ -686,6 +686,7 @@ def _format_conditions(self, filters, id_field, modified_field,
holders = {}
for i, filtr in enumerate(filters):
value = filtr.value
query_is_like = filtr.operator == COMPARISON.LIKE
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

µnit: boolean values usually start with is_ or has_... what about is_like_operator ?

COMPARISON.IN,
# Nor can they be LIKE anything.
COMPARISON.LIKE,
):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

µnit: in python multi-lines if are not recommended. Would it be better to have two intermediary variables like null_comparable_ops and null_incomparable_ops ?

sorting = [Sort('author', 1)]
records, _ = self.storage.get_all(sorting=sorting, **self.storage_kw)
# Some interesting values to compare against
VALUES = ['A', 'Z', '', 0, 4]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

µnit: I don't think it's necessary to have uppercase here

@leplatrem
Copy link
Contributor

I'm not sure whether this qualifies as a major version and/or API version.

The way I see it:

Additionnal tests about implicit casts
@glasserc
Copy link
Contributor Author

glasserc commented Jun 8, 2017

I only have two concerns about HTTP API compatibility:

This is the name used on Wikipedia.
@leplatrem
Copy link
Contributor

We break the technique shown in #344

With all due respect, it was very hacky.

We implicitly discourage the use of fields with the name has_foo

Well, I agree, but acceptable IMO since that's also true for other fields as shown in #1004

@Natim
Copy link
Member

Natim commented Jun 12, 2017

This is awesome thank you for tackling it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Filter objects where a particular field is null
3 participants