Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

495: Default string values with dashes are mistakenly treated as dynamic defaults #578

Merged
merged 3 commits into from
Apr 14, 2022

Conversation

lindsay-stevens
Copy link
Contributor

@lindsay-stevens lindsay-stevens commented Jan 17, 2022

Closes #495

Why is this the best possible solution? Were any other approaches considered?

Approaches the problem by adding a lexer, whose parsing rules are used to identify expression tokens that indicate a dynamic expression. These tokens are: math operators, union operator, xpath predicates, pyxform references, and function calls. There may be others but the lexer provides a framework to add to the parsing rules, or the business logic using them.

As mentioned in #495, a regular regex isn't adequate to identify the various tokens that indicate a dynamic expression. I looked for "simple" lexer libraries that could parse XPath in Python. Although that isn't really what is needed, there were some big hints in euxml for identifying common tokens and especially valid XML/XPath names. Also these searches revealed the presence of a regex scanner in the Python standard library, which is used here. The euxml library uses PLY but it seemed like overkill (if not already) to learn and bring in a dependency for this heuristic check.

Includes tests to reproduce the issues described in 495 and 533, as well as a range of cases which should / should not be considered dynamic. Some of these overlap with existing tests but I didn't want to go as far as refactoring them until I got some feedback on this approach.

What are the regression risks?

There currently aren't tests to benchmark performance for huge forms full of complex default expressions. I found that using a shared / pre-compiled scanner speeds things up signficantly vs. making a new one for each question, but I'm not sure on the remaining overhead for huge forms.

It's possible that some cases where an expression is or is not a default may be missed. I think this PR covers the all known cases so it would not be a regression at least.

Does this change require updates to documentation? If so, please file an issue here and include the link below.

Probably not, but it might be worth making a note of how Pyxform decides what is dynamic, to preempt forum questions.

Before submitting this PR, please make sure you have:

  • included test cases for core behavior and edge cases in tests
  • run nosetests and verified all tests pass
  • run black pyxform tests to format code
  • verified that any code or assets from external sources are properly credited in comments

@lindsay-stevens
Copy link
Contributor Author

Hi @lognaturel there may be some more to do on this wrt. tests but I think it is ready for a look. Thanks!

…SForm#495

- select1 choice test passes ODK Validate but generates an XForm with:
  <setvalue event="odk-instance-first-load" ref="/test/s1" value="a-2"/>
- text default test fails ODK Validate with error:
  Invalid XPath in value set action declaration: 'https://my-site.com'
  Problem found at nodeset: ${model}[@xforms-version=1.0.0]/setvalue
  With element <setvalue event="odk-instance-first-load" ref="${t3}"
    value="https://my-site.com"/>
Copy link
Contributor

@lognaturel lognaturel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely more involved than I would have gone for but more correct, too!

Here are some cases I played around with:

            self.Case(True, "date", """concat('2022-03', '-14')""", ""),

            self.Case(False, "text", """f-4""",""),
            self.Case(False, "text", """./f-4""",""),
            self.Case(True, "text", """./f - 4""",""),
            self.Case(False, "integer", """7-4""",""),
            self.Case(True, "integer", """7 - 4""",""),

The behavior with - is a change so that could possibly break some form updates. I can't imagine these types of cases are common but it might be good to have the tests as explicit documentation. And as you said, also add something to user-facing docs.

tests/test_dynamic_default.py Outdated Show resolved Hide resolved
tests/test_dynamic_default.py Outdated Show resolved Hide resolved
tests/test_dynamic_default.py Outdated Show resolved Hide resolved
@lindsay-stevens
Copy link
Contributor Author

Thanks for the review! I will update the tests as mentioned above to complete this PR.

- deleted tests where covered by case in TestDynamicDefaultSimpleInput
- updated tests using string matchers to use xpath instead
- added pyxform_test_case ability to escape literal pipe in values
- clarified existing translation performance test description
- added util function for coalesce
- added tests suggested by @lognaturel and performance tests
- updated test strategy for cases that initially or after rendering
  contain single quotes, namely to use xpath_exact to compare outside
  of xpath, instead of xpath_match.
@lindsay-stevens
Copy link
Contributor Author

Should be good to go now. New commit is all about tests, per commit message for ba7b196:

  • deleted existing tests where covered by case in TestDynamicDefaultSimpleInput
  • updated tests using string matchers to use xpath instead
  • added pyxform_test_case ability to escape literal pipe in values
  • clarified existing translation performance test description
  • added util function for coalesce
  • added tests suggested by @lognaturel and performance tests
  • updated test strategy for cases that initially or after rendering
    contain single quotes, namely to use xpath_exact to compare outside
    of xpath, instead of xpath_match.

The performance tests are:

  • test_dynamic_default_performance__time: processing time for 500, 1000, 2000, 5000 dynamic default questions shouldn't take much longer than no-op dynamic defaults check (results were approx 0 to 6%).
  • test_dynamic_default_performance__memory: question with 2000 dynamic default questions shouldn't increase memory usage more than x2.

@lognaturel lognaturel merged commit b0ad3a7 into XLSForm:master Apr 14, 2022
@lognaturel
Copy link
Contributor

🥳 🥳 🥳

@lindsay-stevens lindsay-stevens deleted the pyxform-495 branch April 14, 2022 19:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Default string values with dashes are mistakenly treated as dynamic defaults
2 participants