Adding grammar for v0.10.0 #66

fekad · 2019-10-27T14:02:40Z

Thank for the previous work of @waychal, this is a PR for the grammar of the v0.10.0 specification. This grammar maps the original specification of the "Filter Language EBNF Grammar" into lark format. Although I applied some simplifications (like ignoring white space, used some definition from the common library...), all of them just makes the filters more robust.

Note: It also contains all the optional features, but they can be ignored during the transformation by using the __default__ method of the Transformer classes.

defining the grammar for the filter
fixing grammar version (allowing multiple variants of the same version eg.: v0.10.0.g and v0.10.0.elastic.g)
changing the extension of the grammar file from .g to .lark for allowing syntax highlights
extending the test cases
create a reference implementation for the Transformer class
creating a new PRs with the suggestions

codecov · 2019-10-27T14:04:21Z

Codecov Report

Merging #66 into master will increase coverage by 2.07%.
The diff coverage is 89.21%.

@@            Coverage Diff             @@
##           master      #66      +/-   ##
==========================================
+ Coverage   82.24%   84.31%   +2.07%     
==========================================
  Files          33       35       +2     
  Lines        1667     2098     +431     
==========================================
+ Hits         1371     1769     +398     
- Misses        296      329      +33

Impacted Files	Coverage Δ
optimade/filtertransformers/json.py	`0% <ø> (ø)`	⬆️
optimade/filtertransformers/debug.py	`0% <0%> (ø)`
...imade/filtertransformers/tests/test_transformer.py	`100% <100%> (ø)`	⬆️
optimade/filtertransformers/tests/test_django.py	`88.88% <100%> (ø)`	⬆️
optimade/server/entry_collections.py	`83.6% <100%> (+1.1%)`	⬆️
optimade/filterparser/tests/test_filterparser.py	`100% <100%> (ø)`	⬆️
optimade/filterparser/__init__.py	`100% <100%> (ø)`	⬆️
...ade/filtertransformers/tests/test_elasticsearch.py	`80.64% <100%> (ø)`	⬆️
optimade/server/tests/test_server.py	`87.93% <80.68%> (-12.07%)`	⬇️
optimade/filtertransformers/mongo.py	`76.47% <83.72%> (+5.88%)`	⬆️
... and 7 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b13e11c...cc6685b. Read the comment docs.

fekad · 2019-10-28T10:38:09Z

Questions about the grammar

For the property/IDENTIFIER should we stick to the spec and allowing lowercase only or should we just use the more robust CNAME (which is: ("_" | UCASE_LETTER | LCASE_LETTER) ("_" | UCASE_LETTER | LCASE_LETTER | DIGIT)*) and using the transformer to lower the case? For example, what would be the expected behaviour/mongo filter for the NOTICE=val query string:
- raised error because it is not valid property (according to the specification this should happen)
- {'NOTICE': {'$eq': 'val'}}
- {'ice': {'$not': {'$eq': 'val'}}}
- {'ICE': {'$not': {'$eq': 'val'}}} this is the result of the current implementation
Decision: allowing lower case characters only as it is defined in spec. So NOTICE=val raises an error , NOTice=val is equivalent to {'ice': {'$not': {'$eq': 'val'}}}, but "NOTice"=val is equivalent to {'val': {'$eq': 'NOTice'}}
Is the space mandatory or optional after a token like NOT or AND? optional

The following optional rule's effect is not obvious:

identifier <operator> identifier
constant <operator> constant

ml-evs · 2019-10-28T10:54:27Z

This is great stuff @fekad , please request reviews when you're ready so we can get this in!

fekad · 2019-10-28T11:57:49Z

Suggestions/ideas/questions about the specification

We can also suggest the following tiny modification(s) in the spec:

Handling the '_' character separately from the LowercaseLetter, by removing it from the definition of LowercaseLetter:

LowercaseLetter =
    'a' | 'b' | 'c' | 'd' | 'e' | 'f' | 'g' | 'h' | 'i' | 'j' | 'k' | 'l' | 
    'm' | 'n' | 'o' | 'p' | 'q' | 'r' | 's' | 't' | 'u' | 'v' | 'w' | 'x' |
    'y' | 'z' 
;

and adding it explicitly where we want to use it:

Identifier = ( '_' | LowercaseLetter ), { '_' | LowercaseLetter | Digit }, [Spaces] ;

Punctuator =
    '!' | '#' | '$' | '%' | '&' | "'" | '(' | ')' | '*' | '+' | ',' |
    '-' | '.' | '/' | ':' | ';' | '<' | '=' | '>' | '?' | '@' | '[' |
    ']' | '^' | '`' | '{' | '|' | '}' | '~' | '_'
;

There are small discrepancies between the names used in spec and the grammar in the appendix. Eg.:
property VS identifier at Substring Comparisons
Constant vs. Value
renaming ConstantFirstComparison to PropertyLastComparison
PredicateComparison = LengthComparison looks redundant.
using repetition instead of recursion:
Expression = ExpressionClause, { OR, ExpressionClause } ;
ExpressionClause = ExpressionPhrase, { AND, ExpressionPhrase } ;

Merging: Comparison into ExpressionPhrase:

ExpressionPhrase = [ NOT ], ( PropertyFirstComparison | PropertyLastComparison | 
LengthComparison | OpeningBrace, Expression, ClosingBrace );

LengthComparison = LENGTH, Property, [ Operator ], Number ;: optional Operator, Value -> Number
Operator = ( '<', [ '=' ] | '>', [ '=' ] | [ '!' ], '=' ), [Spaces] ;

(same as: LIKE operators OPTIMADE#87: adding REGEXP:

FuzzyStringOpRhs = ( CONTAINS | STARTS, [ WITH ] | ENDS, [ WITH ] | REGEXP ), String ;

As far as I know the EBNF standard allows to define terminals as string:
```
AND = 'AND', [Spaces] ;
NOT = 'NOT', [Spaces] ;
etc ...
```
Representing all the tokens with all caps in the grammar (like: Property, Operator, ) just to make the separatation of the tokens from the rules more easy for the developers.
The implementation of ConstantFirstComparison = Constant, ValueOpRhs ; is quite ugly because the "value" in ValueOpRhs is the property and the Constant is the value so we have to swap it. It works but it is a little bit ugly...
"5.2.5 Nested property names" and "5.2.6 Filtering on relationships" could be mergedinto "5.1 Lexical Tokens".
extra '`' character in the line: LENGTH list value`: applies the numeric comparison for the number of items in the list property.
typo at:
Examples:
- :property:_exmpl_formula_sum (a property specific to that database)
- :property:_exmpl_band_gap
- :property:_exmpl_supercell
- :property:_exmpl_trajectory
- :property:_exmpl_workflow_id

the definition of "value" vs "value may equal operator value" is a little bit confusing in the 5.2.4 section

set_op_rhs: HAS ( [ OPERATOR ] value
                 | ALL value_list
                 | ANY value_list
                 | ONLY value_list )
value_list: [ OPERATOR ] value ( "," [ OPERATOR ] value )*
VS 
set_op_rhs: HAS [ ALL | ANY | ONLY] value_list

Examples for "5.2.4 Comparisons of list properties":

list HAS 3:  ...
list HAS ALL 3: ...    
OPTIONAL: list HAS < 3: matches all entries for which list contains at least one element that is less than three.
OPTIONAL: list HAS ALL < 3, > 3: matches only those entries for which list simultaneously contains at least one element less than three and one element greater than three.

typo: (* OperatorComparison operator tokens: *) -> (* Comparison operator tokens: *)
Should we propagate negotion down in the tree? (eg: {"a": {"$not": {"$lt": 3}}} can be simplified to {"a": {"$gte": 3}}) Note: I already combined multiple ORs and ANDs when they are next to each other.

(Just an idea) New WIP PR to determine how we should handle/implement the queries for the list properties

Correlated positions in multiple arrays: As far as I know in the case of MongoDB there's no way to handle different but "correlated" arrays (I mean there is no way iterate over them simultaneously and using the requested operation separately). The "$elemMatch" can be used in special case when the property is the same (eg: list:list HAS >=2, <=5). (using the new syntax: list HAS ONE >=2, <=5). A similar problem is to filter an array by a different expression for each column.
Separate the cases of strings from operator + value. (?)
Draft (ANY is optional):
[ ANY | EVERY ] property HAS [ ANY | ONE | ALL ] oper1 value1, oper2 value2, ...
.

	ANY	ONE / BOTH	ALL
ANY	At least one element of the property matches to at least one of the expression. `list HAS value` `list HAS ANY values`	At least one element of the property matches to all of the expression. `list:list HAS val1:val2`	Each expression matches to at least one of the element of the property. `list HAS ALL values`
EVERY	Each element of the property matches to at least one of the expression. `list HAS ONLY values`	Each element of the property matches to all of the expression.

property HAS oper1 value1, oper2 value2, ...
property HAS ONE oper1 value1, oper2 value2, ...
property HAS ALL oper1 value1, oper2 value2, ...

EVERY property HAS oper1 value1, oper2 value2, ...
EVERY property HAS BOTH oper1 value1, oper2 value2, ...

Note: Please free to modify/extend this list!

fekad · 2019-10-28T12:15:13Z

This is great stuff @fekad , please request reviews when you're ready so we can get this in!

@ml-evs Please feel free the review/modify/add stuff to this branch anytime (Eventually that was the main reason to use a branch instead of my own repo :)). There is also a new DebugTransformer class which could be useful for testing. I'm going to try to create a transformer for pymongo but I don't have experience either with django nor eleasticsearch.

CasperWA · 2019-10-28T14:20:58Z

We can also suggest the following tiny modification(s) in the spec:
* Handling the `'_'` character separately from the `LowercaseLetter`, by removing it from the definition of `LowercaseLetter`:

This seems like a very straight-forward and logical suggestion.
Please make a PR in the spec repo for this! :) - referencing your comment here.

I would probably put @sauliusg, @merkys, and @rartino as reviewers.

optimade/filterparser/lark_parser.py

optimade/filtertransformers/json.py

optimade/grammar/v0.10.0.lark

Co-Authored-By: Casper Welzel Andersen <43357585+CasperWA@users.noreply.github.com>

CasperWA

A couple of comments more :)

And by the way, for the run.py, I actually do not mind having it in. Indeed, if anyone decides to develop a server in a Windows environment, it may be prudent to not have a bash script to start the server as a standard.
But it should be one or the other. And if it's run.py, we need to update .travis.yml. Thinking more about this, I will make an issue for this and a separate PR, since it's not within the scope of the current one.

CasperWA · 2019-11-14T10:54:07Z

optimade/filterparser/__init__.py

 from .lark_parser import LarkParser, ParserError
+
+__all__ = [LarkParser, ParserError]


To make this more dynamic, you could do a * import and specify the __all__ in lark_parser.py.
This __all__ then becomes lark_parser.__all__.
In this way, if you decide to reveal other classes from lark_parser, you'll add them to that file's __all__ instead of the __init__.py.

See, e.g., here.

Personally, I don't like to usage of * in imports. There are a lot of articles in favour and against its usage (this is just the first match on google). It is similar to the usage of global variable.
Of course, I will modify it as you suggested to keep the repo consistent.

I understand. We should at some point have a consensus concerning this package of what we do.

optimade/filtertransformers/mongo.py

optimade/grammar/v0.10.0.lark

CasperWA

The last * imports is not crucial for the merging of this PR, only for consistency with the rest of the repository.
So I vote to just get this is now, so that we can start developing our Transformers :)

Thank you immensely for this contribution @fekad 👍

Before merging, I would like the approval of @ml-evs as well - or I will at least leave a time-window of opportunity before merging 😅

ml-evs · 2019-11-15T13:38:05Z

Working on it! I think you've sorted out any requested changes to the code, so I'll just focus on double checking it all works in the example server. Would you mind commenting if you get a chance @fekad? I'm sure most of them are my fault... Current issues:

It seems like this might break the pagination at the moment; should http://localhost:5000/structures?filter=elements%20HAS%20%22Ac\%22?page_limit=5 work?
What am I doing wrong with this one? http://localhost:5000/structures?filter=species_at_sites%20HAS%20ALL%20%22Ba%22,%22F%22,%22H%22,%22Mn%22,%22O%22,%22Re%22,%22Si%22
I'm struggling to get the list grammar LENGTH, HAS ANY and HAS ALL working, are these supposed to work? (I may have missed it in the comments above). I'm just pushing my tests now so you can actually see what is failing (the request is in the error messages)

CasperWA · 2019-11-15T17:08:32Z

Working on it! I think you've sorted out any requested changes to the code, so I'll just focus on double checking it all works in the example server. Would you mind commenting if you get a chance @fekad? I'm sure most of them are my fault... Current issues:

It seems like this might break the pagination at the moment; should http://localhost:5000/structures?filter=elements%20HAS%20%22Ac\%22?page_limit=5 work?

What am I doing wrong with this one? http://localhost:5000/structures?filter=species_at_sites%20HAS%20ALL%20%22Ba%22,%22F%22,%22H%22,%22Mn%22,%22O%22,%22Re%22,%22Si%22

I'm struggling to get the list grammar LENGTH, HAS ANY and HAS ALL working, are these supposed to work? (I may have missed it in the comments above). I'm just pushing my tests now so you can actually see what is failing (the request is in the error messages)

As far as I can see, these are all valid queries (if I infer some % terminology). The only HAS that is currently implemented is property HAS value. Nothing else. Also LENGTH is not implemented.

ml-evs · 2019-11-15T18:57:13Z

Working on it! I think you've sorted out any requested changes to the code, so I'll just focus on double checking it all works in the example server. Would you mind commenting if you get a chance @fekad? I'm sure most of them are my fault... Current issues:

It seems like this might break the pagination at the moment; should http://localhost:5000/structures?filter=elements%20HAS%20%22Ac\%22?page_limit=5 work?

What am I doing wrong with this one? http://localhost:5000/structures?filter=species_at_sites%20HAS%20ALL%20%22Ba%22,%22F%22,%22H%22,%22Mn%22,%22O%22,%22Re%22,%22Si%22

I'm struggling to get the list grammar LENGTH, HAS ANY and HAS ALL working, are these supposed to work? (I may have missed it in the comments above). I'm just pushing my tests now so you can actually see what is failing (the request is in the error messages)

As far as I can see, these are all valid queries (if I infer some % terminology). The only HAS that is currently implemented is property HAS value. Nothing else. Also LENGTH is not implemented.

Ah okay, I was trying to match the grammar tests but I guess you mean its not implemented in the transformer? In that case I'm happy to add skips to these tests until we have them in. They do currently raise Lark errors rather than the NotImplementedError I'd expect from the code.

I assume the same thing doesn't apply to the pagination though?

ml-evs · 2019-11-15T19:14:32Z

A slight curiosity is that IS KNOWN seems to fail for aliased fields, e.g. id IS KNOWN and chemical_formula_descriptive IS KNOWN do not work, but lattice_vectors IS KNOWN works fine. I think I've skipped all the tests that we wouldn't expect to work now, so any remaining failures need to be investigated further.

fekad · 2019-11-15T21:55:51Z

A slight curiosity is that IS KNOWN seems to fail for aliased fields, e.g. id IS KNOWN and chemical_formula_descriptive IS KNOWN do not work, but lattice_vectors IS KNOWN works fine. I think I've skipped all the tests that we wouldn't expect to work now, so any remaining failures need to be investigated further.

Hi, thanks for the test cases and the review @ml-evs @CasperWA. This bug is actually caused by the _alias_filter function and StructureMapper because it ignores the lists when there is AND or OR relations. Of course, lattice_vectors works because it doesn't need to be mapped.

More about this here: #66 (comment)

fekad · 2019-11-15T23:18:47Z

Working on it! I think you've sorted out any requested changes to the code, so I'll just focus on double checking it all works in the example server. Would you mind commenting if you get a chance @fekad? I'm sure most of them are my fault... Current issues:

It seems like this might break the pagination at the moment; should http://localhost:5000/structures?filter=elements%20HAS%20%22Ac\%22?page_limit=5 work?

What am I doing wrong with this one? http://localhost:5000/structures?filter=species_at_sites%20HAS%20ALL%20%22Ba%22,%22F%22,%22H%22,%22Mn%22,%22O%22,%22Re%22,%22Si%22

I'm struggling to get the list grammar LENGTH, HAS ANY and HAS ALL working, are these supposed to work? (I may have missed it in the comments above). I'm just pushing my tests now so you can actually see what is failing (the request is in the error messages)

As far as I can see, these are all valid queries (if I infer some % terminology). The only HAS that is currently implemented is property HAS value. Nothing else. Also LENGTH is not implemented.

Ah okay, I was trying to match the grammar tests but I guess you mean its not implemented in the transformer? In that case I'm happy to add skips to these tests until we have them in. They do currently raise Lark errors rather than the NotImplementedError I'd expect from the code.

Yeah, only the implementation of the Transformer is missing the grammar/parser is there.

Unfortunately Lark code catches my NotImplementedError and raises its own error.

I assume the same thing doesn't apply to the pagination though?

This was a tricky one :). You have to use & combine multiple parameters in the URL. Cannot be white spaces around the & character and finally in the test data_returned has to be tested instead of data_available

ml-evs

Thanks @fekad , I was definitely being stupid for a few of those, but the fixes/responses look good. Happy to accept as is, but I'll leave merging to you!

CasperWA

Could you clarify what has happened concerning the predicate_comparison, i.e., should we simply merge here, and if this PR will change (again) before being merged, we "bug-fix" the grammar/transformer? Or should we wait until the spec. PR has been merged?

In my opinion, since we do not have 1 commit per 1 concept, it doesn't matter. More concretely, since the v0.10.0 grammar addition cannot be pinpointed to a specific commit, but rather has been implemented over several commits (and will be merged in like this via this PR), another commit to alter it after this PR has been merged, will not make a difference in the git history.

What do you think @fekad and @ml-evs? If we all agree, I will approve and merge.

optimade/filtertransformers/mongo.py

CasperWA · 2019-11-19T20:15:48Z

Since there have been no response, I will merge this and take it that you agree; we can "bug-fix" when the spec. PR has been merged, if needed.

Adding grammar for v0.10.0

6126433

fekad added 6 commits October 27, 2019 20:24

Allowing multiple grammar variants for the same version

b1c7e49

using special grammar variant for the parser

e4203ac

using '.lark' extensions for the grammar files

f4ffdb0

fixing warning

04dabd8

add more examples for testing the filter parser

8c48815

cleanup

62a811a

fekad added 3 commits October 28, 2019 11:11

creating a transformer for debugging

296dbc9

force to keep arguments for operator

dd86a45

adding rule for strings and numbers

4a5d135

fekad added 2 commits October 28, 2019 12:06

separate int and float in the rule for numbers

ae1ad6b

cleanup

e0c426c

CasperWA reviewed Oct 28, 2019

View reviewed changes

optimade/filterparser/lark_parser.py Show resolved Hide resolved

CasperWA reviewed Oct 28, 2019

View reviewed changes

optimade/filtertransformers/json.py Show resolved Hide resolved

fekad added 2 commits October 28, 2019 15:27

combining tokens to reduce complexity

1e2b38e

first working filter example

7644b58

CasperWA reviewed Oct 28, 2019

View reviewed changes

optimade/grammar/v0.10.0.lark Show resolved Hide resolved

fekad added 7 commits October 28, 2019 19:51

local tests + combining multiple ANDs and ORs

7d8312a

cleanup

913c626

update version of lark-parser + fix test

f8f077a

adding tests

572b468

fix precedence for the test case

43f6844

fixing precedence by parentheses

bbaed20

installing python package in order

415443a

fekad and others added 5 commits November 14, 2019 01:15

Update optimade/server/entry_collections.py

d1999a4

Co-Authored-By: Casper Welzel Andersen <43357585+CasperWA@users.noreply.github.com>

Update optimade/filterparser/tests/test_filterparser.py

ac9e3df

Co-Authored-By: Casper Welzel Andersen <43357585+CasperWA@users.noreply.github.com>

Update optimade/filterparser/tests/test_filterparser.py

6e9c1ea

Co-Authored-By: Casper Welzel Andersen <43357585+CasperWA@users.noreply.github.com>

cleanup

195a630

optimized test class

17b8e68

CasperWA mentioned this pull request Nov 14, 2019

Package structure #72

Closed

CasperWA reviewed Nov 14, 2019

View reviewed changes

CasperWA mentioned this pull request Nov 14, 2019

Central place with a list of queries for transformer tests #80

Open

CasperWA previously approved these changes Nov 15, 2019

View reviewed changes

Added filter integration tests with example server

3f092de

ml-evs dismissed CasperWA’s stale review via 3f092de November 15, 2019 14:46

ml-evs and others added 2 commits November 15, 2019 19:21

Skip some tests for unimplemented features

85e9ba5

fix type mismatch warning

c352196

fekad added 2 commits November 15, 2019 22:48

bugfix for the HAS filter

46af4fa

fixing page_filter issue

78fbc39

fekad added 2 commits November 15, 2019 23:34

quick dirty fix for "deep" queries

b599d80

minimal implementation of LENGTH filter

cc6685b

ml-evs approved these changes Nov 17, 2019

View reviewed changes

CasperWA reviewed Nov 18, 2019

View reviewed changes

optimade/filtertransformers/mongo.py Outdated Show resolved Hide resolved

optimade/filtertransformers/mongo.py Show resolved Hide resolved

optimade/filtertransformers/mongo.py Show resolved Hide resolved

CasperWA merged commit 2401873 into master Nov 19, 2019

Road to optimade-python-tools 1.0 automation moved this from In progress to Done Nov 19, 2019

CasperWA deleted the filter_v0.10.0 branch November 19, 2019 20:19

ml-evs mentioned this pull request Nov 28, 2019

List properties and HAS _ operators missing #98

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding grammar for v0.10.0 #66

Adding grammar for v0.10.0 #66

fekad commented Oct 27, 2019 •

edited

codecov bot commented Oct 27, 2019 •

edited

fekad commented Oct 28, 2019 •

edited

ml-evs commented Oct 28, 2019

fekad commented Oct 28, 2019 •

edited

fekad commented Oct 28, 2019

CasperWA commented Oct 28, 2019

CasperWA left a comment

CasperWA Nov 14, 2019

fekad Nov 14, 2019

CasperWA Nov 15, 2019

CasperWA left a comment •

edited

ml-evs commented Nov 15, 2019 •

edited

CasperWA commented Nov 15, 2019 •

edited

ml-evs commented Nov 15, 2019 •

edited

ml-evs commented Nov 15, 2019

fekad commented Nov 15, 2019

fekad commented Nov 15, 2019

ml-evs left a comment

CasperWA left a comment

CasperWA commented Nov 19, 2019

		from .lark_parser import LarkParser, ParserError

		__all__ = [LarkParser, ParserError]

Adding grammar for v0.10.0 #66

Adding grammar for v0.10.0 #66

Conversation

fekad commented Oct 27, 2019 • edited

codecov bot commented Oct 27, 2019 • edited

Codecov Report

fekad commented Oct 28, 2019 • edited

Questions about the grammar

ml-evs commented Oct 28, 2019

fekad commented Oct 28, 2019 • edited

Suggestions/ideas/questions about the specification

fekad commented Oct 28, 2019

CasperWA commented Oct 28, 2019

CasperWA left a comment

Choose a reason for hiding this comment

CasperWA Nov 14, 2019

Choose a reason for hiding this comment

fekad Nov 14, 2019

Choose a reason for hiding this comment

CasperWA Nov 15, 2019

Choose a reason for hiding this comment

CasperWA left a comment • edited

Choose a reason for hiding this comment

ml-evs commented Nov 15, 2019 • edited

CasperWA commented Nov 15, 2019 • edited

ml-evs commented Nov 15, 2019 • edited

ml-evs commented Nov 15, 2019

fekad commented Nov 15, 2019

fekad commented Nov 15, 2019

ml-evs left a comment

Choose a reason for hiding this comment

CasperWA left a comment

Choose a reason for hiding this comment

CasperWA commented Nov 19, 2019

fekad commented Oct 27, 2019 •

edited

codecov bot commented Oct 27, 2019 •

edited

fekad commented Oct 28, 2019 •

edited

fekad commented Oct 28, 2019 •

edited

CasperWA left a comment •

edited

ml-evs commented Nov 15, 2019 •

edited

CasperWA commented Nov 15, 2019 •

edited

ml-evs commented Nov 15, 2019 •

edited