Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding grammar for v0.10.0 #66

Merged
merged 49 commits into from
Nov 19, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
49 commits
Select commit Hold shift + click to select a range
6126433
Adding grammar for v0.10.0
Oct 27, 2019
b1c7e49
Allowing multiple grammar variants for the same version
Oct 27, 2019
e4203ac
using special grammar variant for the parser
Oct 27, 2019
f4ffdb0
using '.lark' extensions for the grammar files
Oct 27, 2019
04dabd8
fixing warning
Oct 27, 2019
8c48815
add more examples for testing the filter parser
Oct 27, 2019
62a811a
cleanup
Oct 27, 2019
296dbc9
creating a transformer for debugging
Oct 28, 2019
dd86a45
force to keep arguments for operator
Oct 28, 2019
4a5d135
adding rule for strings and numbers
Oct 28, 2019
ae1ad6b
separate int and float in the rule for numbers
Oct 28, 2019
e0c426c
cleanup
Oct 28, 2019
1e2b38e
combining tokens to reduce complexity
Oct 28, 2019
7644b58
first working filter example
Oct 28, 2019
7d8312a
local tests + combining multiple ANDs and ORs
Oct 28, 2019
913c626
cleanup
Oct 29, 2019
f8f077a
update version of lark-parser + fix test
Oct 29, 2019
572b468
adding tests
Oct 29, 2019
43f6844
fix precedence for the test case
Oct 29, 2019
bbaed20
fixing precedence by parentheses
Oct 29, 2019
415443a
installing python package in order
Oct 29, 2019
06dd3fe
formal definition of all rules (some of them just raise an NotImpleme…
Oct 29, 2019
5a41bc5
skeleton class for the transformer
Oct 29, 2019
465ec26
define targets for the tests + cleanup
Oct 29, 2019
a37691c
typo
Oct 29, 2019
9cf3e7b
boost code coverage
Oct 29, 2019
70c03b2
allowing lower-case characters only in the definition of identifiers
Oct 29, 2019
483939d
redefining operator as tokens instead of rules
Oct 30, 2019
0e47883
revere operators when the property isd on the right
Oct 31, 2019
7ba7769
Merge branch 'master' into filter_v0.10.0
fekad Nov 4, 2019
22ba26a
Merge branch 'master' into filter_v0.10.0
CasperWA Nov 8, 2019
e3656c7
Merge branch 'master' into filter_v0.10.0
CasperWA Nov 8, 2019
7349643
allowing white characters in strings
Nov 10, 2019
12fd19c
Merge branch 'master' into filter_v0.10.0
ml-evs Nov 11, 2019
fa9f849
Merge branch 'master' into filter_v0.10.0
ml-evs Nov 11, 2019
b9396e4
using repetition instead of recursion
Nov 12, 2019
d15330f
using the new parser and transformer
Nov 12, 2019
d1999a4
Update optimade/server/entry_collections.py
fekad Nov 14, 2019
ac9e3df
Update optimade/filterparser/tests/test_filterparser.py
fekad Nov 14, 2019
6e9c1ea
Update optimade/filterparser/tests/test_filterparser.py
fekad Nov 14, 2019
195a630
cleanup
Nov 14, 2019
17b8e68
optimized test class
Nov 14, 2019
3f092de
Added filter integration tests with example server
ml-evs Nov 15, 2019
85e9ba5
Skip some tests for unimplemented features
ml-evs Nov 15, 2019
c352196
fix type mismatch warning
Nov 15, 2019
46af4fa
bugfix for the HAS filter
Nov 15, 2019
78fbc39
fixing page_filter issue
Nov 15, 2019
b599d80
quick dirty fix for "deep" queries
Nov 15, 2019
cc6685b
minimal implementation of LENGTH filter
Nov 16, 2019
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
2 changes: 1 addition & 1 deletion .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8,12 +8,12 @@ python:
- "3.7"
- "3.8"
install:
- pip install -e .
- pip install -r requirements.txt
- pip install -r requirements/dev_requirements.txt
- pip install -r requirements/mongo_requirements.txt
- pip install -r requirements/django_requirements.txt
- pip install -r requirements/elastic_requirements.txt
- pip install -e .
- docker pull quen2404/openapi-diff
script:
- py.test --cov=optimade
Expand Down
4 changes: 2 additions & 2 deletions optimade/filterparser/__init__.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
__all__ = ["LarkParser", "ParserError"]

from .lark_parser import LarkParser, ParserError

__all__ = [LarkParser, ParserError]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To make this more dynamic, you could do a * import and specify the __all__ in lark_parser.py.
This __all__ then becomes lark_parser.__all__.
In this way, if you decide to reveal other classes from lark_parser, you'll add them to that file's __all__ instead of the __init__.py.

See, e.g., here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally, I don't like to usage of * in imports. There are a lot of articles in favour and against its usage (this is just the first match on google). It is similar to the usage of global variable.
Of course, I will modify it as you suggested to keep the repo consistent.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand. We should at some point have a consensus concerning this package of what we do.

51 changes: 30 additions & 21 deletions optimade/filterparser/lark_parser.py
Original file line number Diff line number Diff line change
@@ -1,33 +1,42 @@
import os
import re
from glob import glob

from pathlib import Path
from lark import Lark, Tree

parser = {}
for name in glob(os.path.join(os.path.dirname(__file__), "../grammar", "*.g")):
with open(name) as f:
ver = tuple(
int(n)
for n in re.findall(r"\d+", str(os.path.basename(name).split(".g")[0]))
)
parser[ver] = Lark(f.read())
from collections import defaultdict


class ParserError(Exception):
pass


def get_versions():
dct = defaultdict(dict)
for filename in Path(__file__).parent.joinpath("../grammar").glob("*.lark"):
tags = filename.stem.lstrip("v").split(".")
version = tuple(map(int, tags[:3]))
variant = "default" if len(tags) == 3 else tags[-1]
dct[version][variant] = filename
return dict(dct)


available_parsers = get_versions()


class LarkParser:
def __init__(self, version=None):
if version is None:
self.version = sorted(parser.keys())[-1]
self.lark = parser[self.version]
elif version in parser:
self.lark = parser[version]
self.version = version
else:
def __init__(self, version=None, variant="default"):

version = version if version else max(available_parsers.keys())

if version not in available_parsers:
raise ParserError(f"Unknown parser grammar version: {version}")

if variant not in available_parsers[version]:
raise ParserError(f"Unknown variant of the parser: {variant}")

self.version = version
self.variant = variant

with open(available_parsers[version][variant]) as file:
self.lark = Lark(file)
CasperWA marked this conversation as resolved.
Show resolved Hide resolved

self.tree = None
self.filter = None

Expand Down
244 changes: 242 additions & 2 deletions optimade/filterparser/tests/test_filterparser.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
import os
from glob import glob
from unittest import TestCase
import unittest

from lark import Tree

Expand All @@ -9,7 +9,7 @@
testfile_dir = os.path.join(os.path.dirname(__file__), "testfiles")


class ParserTest(TestCase):
class ParserTestV0_9_5(unittest.TestCase):
@classmethod
def setUpClass(cls):
cls.test_filters = []
Expand Down Expand Up @@ -38,3 +38,243 @@ def test_repr(self):
self.assertIsNotNone(repr(self.parser))
self.parser.parse(self.test_filters[0])
self.assertIsNotNone(repr(self.parser))


class ParserTestV0_10_0(unittest.TestCase):
version = (0, 10, 0)
variant = "default"

@classmethod
def setUpClass(cls):
cls.parser = LarkParser(version=cls.version, variant=cls.variant)

def parse(self, inp):
return self.parser.parse(inp)

def test_empty(self):
self.assertIsInstance(self.parse(" "), Tree)

def test_property_names(self):
self.assertIsInstance(self.parse("band_gap = 1"), Tree)
self.assertIsInstance(self.parse("cell_length_a = 1"), Tree)
self.assertIsInstance(self.parse("cell_volume = 1"), Tree)

with self.assertRaises(ParserError):
self.parse("0_kvak IS KNOWN") # starts with a number

with self.assertRaises(ParserError):
self.parse('"foo bar" IS KNOWN') # contains space; contains quotes

with self.assertRaises(ParserError):
self.parse("BadLuck IS KNOWN") # contains upper-case letters

# database-provider-specific prefixes
self.assertIsInstance(self.parse("_exmpl_formula_sum = 1"), Tree)
self.assertIsInstance(self.parse("_exmpl_band_gap = 1"), Tree)

# Nested property names
self.assertIsInstance(self.parse("identifier1.identifierd2 = 42"), Tree)

def test_string_values(self):
self.assertIsInstance(self.parse('author="Sąžininga Žąsis"'), Tree)
self.assertIsInstance(
self.parse('field = "!#$%&\'() * +, -./:; <= > ? @[] ^ `{|}~ % "'), Tree
)

def test_number_values(self):
self.assertIsInstance(self.parse("a = 12345"), Tree)
self.assertIsInstance(self.parse("b = +12"), Tree)
self.assertIsInstance(self.parse("c = -34"), Tree)
self.assertIsInstance(self.parse("d = 1.2"), Tree)
self.assertIsInstance(self.parse("e = .2E7"), Tree)
self.assertIsInstance(self.parse("f = -.2E+7"), Tree)
self.assertIsInstance(self.parse("g = +10.01E-10"), Tree)
self.assertIsInstance(self.parse("h = 6.03e23"), Tree)
self.assertIsInstance(self.parse("i = .1E1"), Tree)
self.assertIsInstance(self.parse("j = -.1e1"), Tree)
self.assertIsInstance(self.parse("k = 1.e-12"), Tree)
self.assertIsInstance(self.parse("l = -.1e-12"), Tree)
self.assertIsInstance(self.parse("m = 1000000000.E1000000000"), Tree)

with self.assertRaises(ParserError):
self.parse("number=1.234D12")
with self.assertRaises(ParserError):
self.parse("number=.e1")
with self.assertRaises(ParserError):
self.parse("number= -.E1")
with self.assertRaises(ParserError):
self.parse("number=+.E2")
with self.assertRaises(ParserError):
self.parse("number=1.23E+++")
with self.assertRaises(ParserError):
self.parse("number=+-123")
with self.assertRaises(ParserError):
self.parse("number=0.0.1")

def test_operators(self):
# Basic boolean operations
self.assertIsInstance(
self.parse(
'NOT ( chemical_formula_hill = "Al" AND chemical_formula_anonymous = "A" OR '
'chemical_formula_anonymous = "H2O" AND NOT chemical_formula_hill = "Ti" )'
),
Tree,
)

# Numeric and String comparisons
self.assertIsInstance(self.parse("nelements > 3"), Tree)
self.assertIsInstance(
self.parse(
'chemical_formula_hill = "H2O" AND chemical_formula_anonymous != "AB"'
),
Tree,
)
self.assertIsInstance(
self.parse(
"_exmpl_aax <= +.1e8 OR nelements >= 10 AND "
'NOT ( _exmpl_x != "Some string" OR NOT _exmpl_a = 7)'
),
Tree,
)
self.assertIsInstance(self.parse('_exmpl_spacegroup="P2"'), Tree)
self.assertIsInstance(self.parse("_exmpl_cell_volume<100.0"), Tree)
self.assertIsInstance(
self.parse("_exmpl_bandgap > 5.0 AND _exmpl_molecular_weight < 350"), Tree
)
self.assertIsInstance(
self.parse('_exmpl_melting_point<300 AND nelements=4 AND elements="Si,O2"'),
Tree,
)
self.assertIsInstance(self.parse("_exmpl_some_string_property = 42"), Tree)
self.assertIsInstance(self.parse("5 < _exmpl_a"), Tree)

# OPTIONAL
self.assertIsInstance(
self.parse("((NOT (_exmpl_a>_exmpl_b)) AND _exmpl_x>0)"), Tree
)
self.assertIsInstance(self.parse("5 < 7"), Tree)

def test_string_operations(self):
# Substring comparisons
self.assertIsInstance(
self.parse(
'chemical_formula_anonymous CONTAINS "C2" AND '
'chemical_formula_anonymous STARTS WITH "A2"'
),
Tree,
)
self.assertIsInstance(
self.parse(
'chemical_formula_anonymous STARTS "B2" AND '
'chemical_formula_anonymous ENDS WITH "D2"'
),
Tree,
)

def test_list_properties(self):
# Comparisons of list properties
self.assertIsInstance(self.parse("list HAS < 3"), Tree)
self.assertIsInstance(self.parse("list HAS ALL < 3, > 3"), Tree)
self.assertIsInstance(self.parse("list:list HAS >=2:<=5"), Tree)
self.assertIsInstance(
self.parse(
'elements HAS "H" AND elements HAS ALL "H","He","Ga","Ta" AND elements HAS '
'ONLY "H","He","Ga","Ta" AND elements HAS ANY "H", "He", "Ga", "Ta"'
),
Tree,
)

# OPTIONAL:
self.assertIsInstance(self.parse('elements HAS ONLY "H","He","Ga","Ta"'), Tree)
self.assertIsInstance(
self.parse(
'elements:_exmpl_element_counts HAS "H":6 AND '
'elements:_exmpl_element_counts HAS ALL "H":6,"He":7 AND '
'elements:_exmpl_element_counts HAS ONLY "H":6 AND '
'elements:_exmpl_element_counts HAS ANY "H":6,"He":7 AND '
'elements:_exmpl_element_counts HAS ONLY "H":6,"He":7'
),
Tree,
)
self.assertIsInstance(
self.parse(
"_exmpl_element_counts HAS < 3 AND "
"_exmpl_element_counts HAS ANY > 3, = 6, 4, != 8"
),
Tree,
)
self.assertIsInstance(
self.parse(
"elements:_exmpl_element_counts:_exmpl_element_weights "
'HAS ANY > 3:"He":>55.3 , = 6:>"Ti":<37.6 , 8:<"Ga":0'
),
Tree,
)

def test_properties(self):
# Filtering on Properties with unknown value
self.assertIsInstance(
self.parse(
"chemical_formula_hill IS KNOWN AND "
"NOT chemical_formula_anonymous IS UNKNOWN"
),
Tree,
)

def test_precedence(self):
self.assertIsInstance(self.parse('NOT a > b OR c = 100 AND f = "C2 H6"'), Tree)
self.assertIsInstance(
self.parse('(NOT (a > b)) OR ( (c = 100) AND (f = "C2 H6") )'), Tree
)
self.assertIsInstance(self.parse("a >= 0 AND NOT b < c OR c = 0"), Tree)
self.assertIsInstance(
self.parse("((a >= 0) AND (NOT (b < c))) OR (c = 0)"), Tree
)

def test_special_cases(self):
self.assertIsInstance(self.parse("te < st"), Tree)
self.assertIsInstance(self.parse('spacegroup="P2"'), Tree)
self.assertIsInstance(self.parse("_cod_cell_volume<100.0"), Tree)
self.assertIsInstance(
self.parse("_mp_bandgap > 5.0 AND _cod_molecular_weight < 350"), Tree
)
self.assertIsInstance(
self.parse('_cod_melting_point<300 AND nelements=4 AND elements="Si,O2"'),
Tree,
)
self.assertIsInstance(self.parse("key=value"), Tree)
self.assertIsInstance(self.parse('author=" someone "'), Tree)
self.assertIsInstance(self.parse('author=" som\neone "'), Tree)
self.assertIsInstance(
self.parse(
"number=0.ANDnumber=.0ANDnumber=0.0ANDnumber=+0AND_n_u_m_b_e_r_=-0AND"
"number=0e1ANDnumber=0e-1ANDnumber=0e+1"
),
Tree,
)

self.assertIsInstance(
self.parse("NOTice=val"), Tree
) # property (ice) != property (val)
self.assertIsInstance(
self.parse('NOTice="val"'), Tree
) # property (ice) != value ("val")
self.assertIsInstance(
self.parse('"NOTice"=val'), Tree
) # value ("NOTice") = property (val)

with self.assertRaises(ParserError):
self.parse("NOTICE=val") # not valid property or value (NOTICE)
with self.assertRaises(ParserError):
self.parse('"NOTICE"=Val') # not valid property (Val)
with self.assertRaises(ParserError):
self.parse("NOTICE=val") # not valid property or value (NOTICE)

def test_parser_version(self):
self.assertEqual(self.parser.version, self.version)
self.assertEqual(self.parser.variant, self.variant)

def test_repr(self):
self.assertIsNotNone(repr(self.parser))
self.parser.parse('key="value"')
self.assertIsNotNone(repr(self.parser))