Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor validation #42

Merged
merged 26 commits into from
Mar 24, 2019
Merged

Refactor validation #42

merged 26 commits into from
Mar 24, 2019

Conversation

robertodr
Copy link
Contributor

@robertodr robertodr commented Mar 19, 2019

This is a refactoring of the validation to fix, or offer a way to fix, #24, #26, #31, #33, and #34.
This is still work-in-progress. I need to do step 7 and make sure that we have a good enough test suite.
I am putting it up early because it might be a bit hard to digest "the festival of dict comprehensions". I have tried to document every function I have written in detail. Type hints might be wrong.

The general strategy:

  1. Read the validation template. Here we check that the template is well-formed, that is:
    a. All keywords have:
    * An allowed type.
    * A non-empty docstring.
    * A default callable that is valid Python, if present.
    * Predicates that are valid Python, if present.
    Note that the latter two criteria can only be checked later on.
    b. No sections are nested under keywords.
    c. All sections have a non-empty docstring.
    This step might throw.
  2. Extract views from it. A view has as keys the names of sections and keyword and as values the actual values, or the type or the docstring. So basically it's a way of taking a complex dictionary and extracting subsets of its information content. Note that this makes the generation of documentation trivial, as we can get the view-by-docstrings and use it directly to collate a documentation page.
  3. Read in user input. This is basically an incomplete view-by-defaults.
  4. Merge the view-by-defaults of the template ("THEIRS") and the user input ("OURS") with OURS strategy. As we now allow (almost) arbitrary pieces of code to be defaults, this is not the end of the story. This step might throw
  5. Get the actual defaults. This is a bit tricky, but I think I've solved it correctly. Bonus: no need to teach the YAML reader to do complex numbers right :) This point actually fixes read_yaml_file interprets complex numbers as strings #26 and Get default value from other keyword #31. The code that can be executed can only depend on the full input dictionary, nothing more. This step might throw
  6. Do a pass at type checking and type fixation. This step might throw
  7. Finally run the predicates. This step might throw

The fix for #24 is a byproduct of sorts to this restructuring: while traversing recursively if anything throws, we catch it and store how to get to the offending leaf and an appropriate error message.
So for example, when the exception finally throws one will have something like this:

Error(s) occurred when fixing defaults:
- At user['scf']['another_number']:
    KeyError 'min_num_iterations' in defaulting closure 'user['scf']['min_num_iterations'] / 2'
- At user['scf']['functional']:
    TypeError unsupported operand type(s) for /: 'str' and 'int' in defaulting closure ''B3LYP' / 2'

A drawback of this refactoring is that there's now a lot more functions and each of them does a recursive descent in the tree. There is also quite some code duplication: we might get rid of that using more higher-order functions.

TODO
- [ ] Switch to OrderedDict throughout, to avoid annoyances with ordering of elements. Not needed

  • Run through mypy to check all type hints are correct.
  • Check documentation is comprehensive enough.
  • Create a primitive API and hook it up to CLI. Still unstable and up for debate

Less typing...

Signed-off-by: Roberto Di Remigio <roberto.diremigio@gmail.com>
@bast
Copy link
Member

bast commented Mar 19, 2019

Few comments without having read everything:

  • I like the dictionary name user.
  • I saw few functions repeated with the same name in different files which surprised me but I guess this is what you meant with "There is also quite some code duplication".
  • Optimize for good error reporting and code readability and I think this is still the case.
  • Let's not do nested comprehensions and let us not sacrifice readability for elegance - but this was not a problem in the code I have browsed - so far it looked readable. But in other words I personally always prefer 4 readable lines over 1 compact line where I have to think for 15 minutes on a white board - but this was not a problem until now.

read_template.py Outdated Show resolved Hide resolved
@bast
Copy link
Member

bast commented Mar 19, 2019

I like the simplification in the template. I was a bit unsure whether docstring is more intuitive than documentation. For Python developers it is.

@bast
Copy link
Member

bast commented Mar 19, 2019

I think the plan looks solid. Are there any portions of the code or any choices you are least sure about and would like input on?

@robertodr
Copy link
Contributor Author

  • Do we need OrderedDict? (I think so, but I might be wrong)
  • Testing is quite ugly and it's actually where most of the code repetition/boilerplate is. I would really like to have one test file per module, but this means writing a lot of reference dictionaries by hand.

@bast
Copy link
Member

bast commented Mar 19, 2019

OrderedDict: For "older" Python yes but 3.6 and later we don't need it. In 3.6 it is an implementation detail, in 3.7 it became a feature. EDIT: by "it" I meant insertion order is preserved.

@bast
Copy link
Member

bast commented Mar 19, 2019

About writing the reference dicts by hand - maybe we can write a script that will generate these and break these in controlled ways so that we can generate many dicts in one go for testing?

@codecov
Copy link

codecov bot commented Mar 19, 2019

Codecov Report

Merging #42 into master will decrease coverage by 5.15%.
The diff coverage is 84.39%.

parselglossy/views.py Outdated Show resolved Hide resolved
parselglossy/views.py Outdated Show resolved Hide resolved
@robertodr
Copy link
Contributor Author

robertodr commented Mar 22, 2019

UPDATE I fixed this 😄
Lingering issues:- Default callables that mix types (like "'B3LYP' / 2") raise a SyntaxError rather than a TypeError. This is because the declared type is str and the way I implemented closure execution treats strings differently (see the run_callable function). This is the case for strings and complex numbers (as they are initially read in as strings). I don't know at the moment how to solve this (mild, I would say) inconsistency.

Signed-off-by: Roberto Di Remigio <roberto.diremigio@gmail.com>
Signed-off-by: Roberto Di Remigio <roberto.diremigio@gmail.com>
Signed-off-by: Roberto Di Remigio <roberto.diremigio@gmail.com>
Signed-off-by: Roberto Di Remigio <roberto.diremigio@gmail.com>
Signed-off-by: Roberto Di Remigio <roberto.diremigio@gmail.com>
Signed-off-by: Roberto Di Remigio <roberto.diremigio@gmail.com>
Signed-off-by: Roberto Di Remigio <roberto.diremigio@gmail.com>
Fix #33

Signed-off-by: Roberto Di Remigio <roberto.diremigio@gmail.com>
Fixes #34 and makes sure that we report all errors in the template.

Signed-off-by: Roberto Di Remigio <roberto.diremigio@gmail.com>
Signed-off-by: Roberto Di Remigio <roberto.diremigio@gmail.com>
Signed-off-by: Roberto Di Remigio <roberto.diremigio@gmail.com>
Signed-off-by: Roberto Di Remigio <roberto.diremigio@gmail.com>
And uncover a bug...

Signed-off-by: Roberto Di Remigio <roberto.diremigio@gmail.com>
Extend test suite somewhat.

Signed-off-by: Roberto Di Remigio <roberto.diremigio@gmail.com>
Signed-off-by: Roberto Di Remigio <roberto.diremigio@gmail.com>
Signed-off-by: Roberto Di Remigio <roberto.diremigio@gmail.com>
Signed-off-by: Roberto Di Remigio <roberto.diremigio@gmail.com>
Signed-off-by: Roberto Di Remigio <roberto.diremigio@gmail.com>
Signed-off-by: Roberto Di Remigio <roberto.diremigio@gmail.com>
So long! We test against Python 3.8-dev on Travis

Signed-off-by: Roberto Di Remigio <roberto.diremigio@gmail.com>
Signed-off-by: Roberto Di Remigio <roberto.diremigio@gmail.com>
Signed-off-by: Roberto Di Remigio <roberto.diremigio@gmail.com>
@robertodr robertodr marked this pull request as ready for review March 23, 2019 23:24
Signed-off-by: Roberto Di Remigio <roberto.diremigio@gmail.com>
@robertodr robertodr requested a review from bast March 23, 2019 23:32
-----
Complex numbers are a tad more fastidious, as they *might be* read in as
strings. To avoid false negatives, we have a nasty hack, for which RDR will
forever burn in hell. For complex and List[complex] we will re-run type
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:-) I would have gone for this same solution so may we burn then together.

Copy link
Member

@bast bast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • I am convinced the code is better than before so I recommend to merge.
  • I think that we will be able to refactor some repetitions but I would do that as follow-up steps later.
  • The only change I am unsure about is docstring vs. documentation. Would someone who has never heard of docstrings know what this is? Is documentation more intuitive? Also docstrings are really for documentation extracted out of source code and I am not sure whether the input blueprint counts as source code. But this question of mine should not delay the merge - we can let this question simmer for a while on master until we mint the next release.
  • I really like the other naming changes and the very careful documentation.

Thank you for this super work!

@bast
Copy link
Member

bast commented Mar 24, 2019

There seems to be some trouble on Python 3.5 - let me know if I should look into that.

Fixes behavior of resolve() for Python 3.5

Signed-off-by: Roberto Di Remigio <roberto.diremigio@gmail.com>
@robertodr
Copy link
Contributor Author

I switched to docstring because I consistently fail to type "documentation" correctly... The Python 3.5 problem should be fixed now. Compatibility is hard!

Signed-off-by: Roberto Di Remigio <roberto.diremigio@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

read_yaml_file interprets complex numbers as strings
2 participants