-
-
Notifications
You must be signed in to change notification settings - Fork 332
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Backend filtering refactor #921
Conversation
Also includes implementation for converting to SA filter spec.
Apparently dataclass defaults don't carry over from mixins.
I was wrong. Apparently I just got the order of mixin application wrong.
There's nothing faux about it.
Above discussed changes had some interesting knock-on effects and I ended up rethinking some things:
The |
Request for feedback: Currently the [...]
class DbFunction(ABC):
id = None
name = None
hints = None
def __init__(self, parameters):
if self.id is None:
raise ValueError('DbFunction subclasses must define an ID.')
if self.name is None:
raise ValueError('DbFunction subclasses must define a name.')
self.parameters = parameters
@property
def referenced_columns(self):
"""Walks the expression tree, collecting referenced columns.
Useful when checking if all referenced columns are present in the queried relation."""
columns = set([])
for parameter in self.parameters:
if isinstance(parameter, ColumnReference):
columns.add(parameter.column)
elif isinstance(parameter, DbFunction):
columns.update(parameter.referenced_columns)
return columns
@staticmethod
@abstractmethod
def to_sa_expression():
return None
class ColumnReference(DbFunction):
id = 'column_reference'
name = 'Column Reference'
hints = tuple([
hints.parameter_count(1),
hints.parameter(1, hints.column),
])
@property
def column(self):
return self.parameters[0]
@staticmethod
def to_sa_expression(p):
return column(p)
class List(DbFunction):
id = 'list'
name = 'List'
@staticmethod
def to_sa_expression(*ps):
return list(ps)
class Empty(DbFunction):
id = 'empty'
name = 'Empty'
hints = tuple([
hints.returns(hints.boolean),
hints.parameter_count(1),
])
@staticmethod
def to_sa_expression(p):
return p.is_(None)
[...]
[
{
"id": "and",
"name": "And",
"hints": [
{
"id": "returns",
"hints": [
{
"id": "boolean"
}
]
}
]
},
{
"id": "column_reference",
"name": "Column Reference",
"hints": [
{
"id": "parameter_count",
"count": 1
},
{
"id": "parameter",
"index": 1,
"hints": [
{
"id": "column"
}
]
}
]
},
{
"id": "empty",
"name": "Empty",
"hints": [
{
"id": "returns",
"hints": [
{
"id": "boolean"
}
]
},
{
"id": "parameter_count",
"count": 1
}
]
},
{
"id": "equal",
"name": "Equal",
"hints": [
{
"id": "returns",
"hints": [
{
"id": "boolean"
}
]
},
{
"id": "parameter_count",
"count": 2
}
]
},
{
"id": "extract_uri_authority",
"name": "Extract URI Authority",
"hints": [
{
"id": "parameter_count",
"count": 1
},
{
"id": "parameter",
"index": 1,
"hints": [
{
"id": "uri"
}
]
}
]
},
{
"id": "in",
"name": "In",
"hints": [
{
"id": "returns",
"hints": [
{
"id": "boolean"
}
]
},
{
"id": "parameter_count",
"count": 2
},
{
"id": "parameter",
"index": 2,
"hints": [
{
"id": "array"
}
]
}
]
},
{
"id": "lesser",
"name": "Lesser",
"hints": [
{
"id": "returns",
"hints": [
{
"id": "boolean"
}
]
},
{
"id": "parameter_count",
"count": 2
},
{
"id": "all_parameters",
"hints": [
{
"id": "comparable"
}
]
}
]
},
{
"id": "list",
"name": "List",
"hints": null
},
{
"id": "not",
"name": "Not",
"hints": [
{
"id": "returns",
"hints": [
{
"id": "boolean"
}
]
},
{
"id": "parameter_count",
"count": 1
}
]
},
[...]
] The frontend will look at the available functions via above endpoint; it will look at the hints defined on those functions and decide how to suggest/constrict user's choices when constructing a database expression. Suppose the user wants to define a filter. The frontend might make sure that the top-level function used returns a boolean (a filter is an expression that resolves to a boolean), then that the nested expressions' return types are compatible with the nesting functions' parameter hints. Also, worth noting that this expression language is maximally simple: everything is a function, even column references, literals and lists. An example filter expression in JSON might look like this: {"in": [
{"column_reference": ["column1"]},
{"list": [
{"column_reference": ["column2"]},
{"to_lowercase": [
{"column_reference": ["column3"]},
]},
{"literal": ["foo"]},
{"literal": ["bar"]},
]},
]} This would be equivalent to: In([
ColumnReference(["column1"]),
List([
ColumnReference(["column2"]),
ToLowercase([ColumnReference(["column3"])),
Literal(["foo"]),
Literal(["bar"]),
]),
]) Which would be equivalent to following SQL: column1 IN (column2, LOWER(column3), "foo", "bar") On the backend, database function expressions will receive a minimum of checks. The aim is to constrict the user as little as possible. The frontend will be able to guide the user's choices by using the hint system (hints were previously called suggestions; using hints because shorter). The hint system is very simple, but expressive. Here's the definition of hints on the backend: from frozendict import frozendict
def _make_hint(id, **rest):
return frozendict({"id": id, **rest})
def parameter_count(count):
return _make_hint("parameter_count", count=count)
def parameter(index, *hints):
return _make_hint("parameter", index=index, hints=hints)
def all_parameters(*hints):
return _make_hint("all_parameters", hints=hints)
def returns(*hints):
return _make_hint("returns", hints=hints)
boolean = _make_hint("boolean")
comparable = _make_hint("comparable")
column = _make_hint("column")
array = _make_hint("array")
string_like = _make_hint("string_like")
uri = _make_hint("uri") This allows the backend to specify the number of parameters expected (where appropriate), to define hints for all parameters and/or for individual parameters, and to define hints for the return type. New hints are trivial to add. Previous major revision of this refactor didn't require the frontend to know anything about the filter constructs. This revision does not require knowing anything about the functions, but it does require knowing about the hint system. That is the price of the simplicity and flexibility offered. For example, it will have to be hardcoded in the frontend that a filter expression is a nesting of database functions where the top-level one has the hint When it will come to mixing and matching database functions to data types, we'll define relevant hints on the data types. So the frontend, having looked up database functions and data types, will see that a @kgodey @mathemancer @silentninja does this make sense? |
I think this all makes sense to me, I like how extensible and simple it is. The only things that I'd like to add is:
|
Overall, I think this makes sense. Looking at this, I'm convinced that it does make sense to have the hints, etc. in with the @kgodey I thought we agreed (in some other spot on GH) that filters aren't (in general) subordinate to types, but rather the other way around. For example, I could filter on whether a string in column A is longer than the integer in column B. The associated function would take a string-like and integer-like input, and return a boolean, so it wouldn't make sense to put it under either the The same issues apply to groupings for similar reasons. I think it makes more sense to keep the filters and other functions under some |
I agree with @mathemancer suggestion to have it under @dmos62 Looks good, I got a few questions:
|
@mathemancer I think this makes sense at the I was suggesting leaving the This conversation is making me think that we should have separate API paths for our frontend-specific abstractions (Mathesar types + the filters they support + any other abstractions we come up with) and general DB functionality. I think that's out of scope for this issue, though. |
Yeah, definitely. Hints fill that role. To say that a function can take strings as input, you give it a hint along the lines of
[
{
"id": "character varying",
"name": "Character varying",
"hints": [
{
"id": "string_like"
}
]
},
[...]
] So, when frontend queries both the
This only supports a single signature per function. If there's a database function with multiple signatures and we'd like to expose them, we'd do best to wrap them in separate Note that this mechanism does not actually enforce following the hints. As far as This is also a pain point of this concept, in my mind, since it might be difficult to fail gracefully or to show nice error messages. But, that should only affect advanced users, unless we mess up the hints.
When both these hints are provided at the same time, I'd say that the more specific |
@dmos62 Okay, that makes sense. Please open a separate issue for the frontend to switch to the new APIs for filters, we can use it to discuss the API spec with the frontend team and make sure it works for them. |
Closing this PR. Splitting it into two PRs. First, there will be a PR that introduces the new function API (borne out of iterating over this PR). Then, there will be a PR that replaces the old filtering API with the new function API. |
Fixes #385, fixes #846
Sets up the backend infrastructure for filtering.
Features
Notice that I might use the terms filter specification and predicate interchangeably.
Some of the things this new predicate data structure does is:
LIKE
, compatibility with URI-type-specific SQL functions)empty
,equal
andin
take different number of params) and it can tell it what options it accepts (e.g. shouldstarts_with
be case insensitive)sqlalchemy-filters
What it does not (currently) do:
{equal: {column: x, parameter: {column: y}}}
Technical details
I organized the relevant pieces in the predicate data structure into a mixin hierarchy and also used this PR as a testbed for frozen dataclasses. Some of my objectives with the basic structure were:
LIKE
will directly or indirectly mix inReliesOnLike
,SingleParameter
,Leaf
andPredicate
db/filters/base::assert_predicate_correct
: and, what the JSON filter specification is is declared indb/filters/operations/deserialize
parameter
where a singular parameter is expected andparameters
where a sequence is expectedHow to extend with new predicate
Greater
; this includes defining the new Predicate's properties through mixins:ReliesOnComparability, SingleParameter, Leaf
in this case (note that mixin order matters, see Python docs);Introduce the predicate type enum:
LeafPredicateType.GREATER
in this case; it's what the JSON filter spec will use to identify a predicate;Tell it how to generate an equivalent SQLAlchemy filter by implementing the abstract
Predicate::to_sa_filter
(as seen above);Update correctness definition (
db/filters/base::assert_predicate_correct
), if needed; currently this involves finding the spot in the method's control flow tree that corresponds to this new predicate, adding a new type check, etc.Add new predicate to the
all_predicates
set;Update
mathesar.database.types::is_ma_type_supported_by_predicate
, if needed; this would involve declaring what types have the properties the new predicate depends on: in this PR it's:Notice that
relies_on_comparability
is an auxiliary function returning true when a predicate is a subclass ofReliesOnComparability
.dataclasses
andtyping
As Kriti suggested, I'll start a dedicated discussion on the use of
dataclasses
andtyping
.But, to summarize:
dataclasses
eliminated custom boilerplate; I got immutability,_post_init
and defaults without writing a single constructor; I think this is great since it's essentially standardized boilerplate;typing
got me partial typing, which is great:pyright
): catches a lot of mistakes and conflictsis_ma_type_supported_by_predicate
operates on uninstantiated predicate classes, not instances, so to express that I can just saypredicate: Type[Predicate])
instead ofpredicate_subclass
Any
if he prefersChecklist
Update index.md
).master
branch of the repositoryvisible errors.
Developer Certificate of Origin
Developer Certificate of Origin