603: Always generate secondary instance for selects #614

lindsay-stevens · 2022-07-06T08:00:16Z

Interesting git history from 2012: a change to itemsets for everything, then reverted the next day for backwards compatibility.

Why is this the best possible solution? Were any other approaches considered?

Discussed in enketo/enketo#216 and references. Lots of changes required but generally explained in commit messages or new docstrings.

What are the regression risks?

Risk that some XForm consuming apps may rely on selects not always generating itemsets. Risk that since there are a lot of code changes there may be bugs, although tests were added / updated.

Does this change require updates to documentation? If so, please file an issue here and include the link below.

It's not a user-facing so probably not.

Before submitting this PR, please make sure you have:

included test cases for core behavior and edge cases in tests
run nosetests and verified all tests pass
run black pyxform tests to format code
verified that any code or assets from external sources are properly credited in comments

- survey.py: previous code would output position-based and reference-based translations (e.g. mylist-0 and choice:label), so it now skips reference-based translations for select_types. - survey_element.py: related to the above change, test_xform2json.py test_convert_toJSON_multi_language failed due to circular reference. Change traverses translations to remove offending objects. - pyxform_test_case.py: added the test case number to the debug/fail output, to help when there are many XPath expressions. - numerous test fixes to accommodate updated choice output.

- now covered by tests/xpath_helpers/choices.py

- The test was broken by changes to choices processing (XLSForm#603). - The test conditions seem to relate only to the instance_xmlns feature. - The new test asserts the same using the pyxformtestcase pattern.

- OK locally but failing on CI

- builder.py: add type hint, remove unnecessary dict - xform2json.py: - _set_survey_name now handles multiple instances - _set_submission_info now sets itemset details from secondary instance, and attempts to parse choice filters - _set_translations now returns value to be set in __init__() - _get_text_from_translation now does media comparison as a list, since a dictkeysview is not a list - add _get_choices to include choices in internal xform representation - xls2json.py - skip adding choice lists that aren't used in the xform to avoid extra unnecessary data. Maybe worth a warning or maybe not. - add secondary instance for table-list appearance, but reject it if it was being added with a choice filter (not supported). - xform2json_test.py: - expand test_load_from_dump variables for easier debugging - add test cases for choice filters, field-list and table-list - these test cases can be refactored to PyxformTestCase at some stage - xls2xform_tests.py: use equivalent test method instead of keyword - update expected XML from inline choices to secondary itemset for the following (these can be refactored to XPath PyxformTestCase): - test_table_list.py - repeat_date_test.xml - xml_escaping.xml

- reasonably confident that these are the expected types

- section should import SurveyElement from where it is defined. - A NotImplementedError is commonly used to indicate a base method that should be implemented in child classes, and it is used in SurveyElement for xml_control(). The Option and Tag classes don't seem to need this method, so adding an implementation that raises this error is a bit clearer.

- move choices logic off to new function add_choices_info_to_question() - exclude select with appearance=search() from itemset (per comment) - new func for readability and to clarify what info is used to determine choices structures - survey.py: skip adding translations for choices, instead of just for choice_filter. Use type "Option" to detect choice elements which should be more reliable. Move get_xpath lookup under if condition so it's only called if needed. - xls2json.py: fix lookup of select type "select one external" which has an alias "select_one_external". Docs use the underscore alias but the implementation is looking for the spaces alias. - xlsform_test_case/base.py: simplify reporter function - xlsform_spec_test.py: simplify repetitive tests by adding test func. Ideally these should be re-implemented as PyxformTestCase, but they cover a lot of scenarios so for now just simplifying the structure. - test_expected_output/*.xml: update expected XML to match current behaviour, generally: 1) update body to refer to instance, 2) add new instances in model, 3) add new translations in itext, 4) remove old itext from translations. - Misc minor fixes: - utils.py: fix deprecation warning for get_sheet_by_name. - replace in-line regex with pre-compiled regex, and replace match (only looks at start of string) with search (looks at whole string). - add more constants for commonly-used internal strings so that it's easier to find where they are used, added usages to xls2json.py.

- The right number of empty rows would be removed afterwards anyway, but for someone reading the code later it'd be better if that number was as stated.

- yes_or_no_question and xml_tests updated for new itemset output for selects. The default_survey_sheet test expected no default_form_name. - add type hints to workbook_to_json and parse_file_to_json.

lognaturel · 2023-05-08T16:29:56Z

Interesting git history from 2012: a change to itemsets for everything, then reverted the next day for backwards compatibility.

There are three reasons we've discussed:

Enketo's support for secondary instances was limited at the time (Why aren't instances generated for all selects? #203 (comment)). I can't think of a client or server that wouldn't support them fully now.
Some folks have analysis pipelines that depend on internal choices. This might be painful but it's hard to estimate the impact.
Injecting an other choice in a secondary instances to support or_other feels very weird. That said, it won't look any different from an XLSForm user's perspective. The longer-term goal is to support proper open selects (Why aren't instances generated for all selects? #203 (comment))

Of those, I think the second point has the highest risk. We've had a post on the forum announcing this change for a year, though, so hopefully folks building downstream tools have had time to adapt.

Have you looked at or_other yet? I think for now it makes sense to inject an other choice. I haven't looked -- is that what the 2012 implementation did?

tiritea · 2023-05-08T23:59:37Z

sorry, I only just noticed this the other day (I get spammed by so many git updates I tend to tune them out... :-( Would it be possible to add a commandline option to continue to generate 'inline' select option when desired? I do actually currently rely on them because I havent completed parsing secondary instances in my (iOS) XForms parser yet. And given it is such a breaking change, it might still be nice to still support it (as deprecated), even if you have you now explicitly enable it via said option. Otherwise I guess I'll have to fork the current version and use that in the interim.

lindsay-stevens · 2023-05-09T02:53:36Z

@lognaturel there's a few more translations test to update, then the last implementation thing to look at should be or_other. The search() appearance now generates an itemset - unrelated to this PR (before and after), search() works in Collect but not Enketo.

lognaturel · 2023-05-09T05:34:18Z

More conversation with @tiritea on the forum.

The search() appearance now generates an itemset - unrelated to this PR (before and after)

Really. And not an itemset AND static choices, only an itemset? I could have sworn I checked that as part of my comment long ago. Well, that makes that easy at least.

search() works in Collect but not Enketo.

Yes, that's right.

lindsay-stevens · 2023-05-09T15:08:23Z

@lognaturel I meant:

The search() appearance now generates an itemset (due to this PR)

unrelated to this PR (before and after), search() works in Collect but not Enketo.

lognaturel · 2023-05-09T15:11:13Z

Got it, thanks!

tiritea · 2023-05-09T20:49:46Z

FWIW I dont support search() yet, either, so please dont do anything funky here on my account... (I'll deal with all that after I get secondary instances working).

- test_translations.py: - remove xpath helpers now in xpath_helpers/choices.py - update test xpath assertions to match itemset behaviour - choices.py: - improve xpath helpers to not require specification of position

- builder.py: - add or_other choice to survey-level choice dict - add type annotations - constants.py: add new constant for "or specify other" internal suffix - xls2json.py: add error for choice_filter + or_other. Main reason I suppose is that it's unknown how to filter the "other" choice so the user might never see it. - xlsform_spec_test.py: - fix or_other expected XML to include "other" and match sorted element names. - move or_other test into this testcase since it is the same as others there already. - builder_tests.py: update expected output to match or_other output.

lindsay-stevens · 2023-05-12T12:43:06Z

@lognaturel ready for review. It's covering the cases mentioned in enketo/enketo#216, and any tests that failed as a result of code changes. I haven't combed through every test case though, e.g. in case there's some way to avoid generating itemsets.

lognaturel · 2023-07-17T22:46:44Z

Finally spending some focused time on this after speaking to @lindsay-stevens about it at the ODK Summit. We would like to release this by mid-August at the latest.

My review checklist:

Verify impact on some real forms
- WHO VA: 990kb before, 593kb after (va_before_after.zip). Tried on client, no obvious issues. Central doesn't flag any schema change. Skimmed diff which looks as expected.
- All widgets: 44kb before, 35kb after. I quickly skimmed through the form in Collect and Enketo. In Enketo:

Thing	Before	After
Likert
Selects with media

Haven't looked into what's going on yet. Ideally pyxform output could be modified slightly to not get those differences but it might not be possible.

or_other
translations
- has code to generate translation names been removed? No, still relevant for media
labels that use values from form
- An example. I don't think there's much reason to do this anymore now that we have selects from repeats but maybe people do things like get dynamic values from unrolled repeats or things like that. I verified that Collect and Enketo are happy with <output>s in itext.
"skip adding choice lists that aren't used in the xform to avoid extra unnecessary data. Maybe worth a warning or maybe not." in 1238bd6, This is existing behavior. I've filed Choice lists are only included in the output if they are referenced in a select. #647 for consideration. @lindsay-stevens I'm interested in what you think.
media, long/short forms - Lots of tests with image and big-image on choices. No support for long/short forms yet.

lognaturel

This looks really thoughtfully done. There's just one section I really can't wrap my brain around and makes me nervous. I've marked it inline.

There's some Enketo rendering weirdness that I think we should look into. Maybe it's just the spacing in the pyxform output? See above.

I still want to look a little bit more at translations and long/short forms but overall I think this is very close to merge.

lognaturel · 2023-07-17T23:04:23Z

pyxform/survey_element.py

@@ -244,6 +244,13 @@ def to_json_dict(self):
        result["children"] = []
        for child in children:
            result["children"].append(child.to_json_dict())
+        # Translation items with "output_context" have circular references.


output_context is a way to save the node reference for cases where there are output nodes so that ${} references can be expanded? I tried removing this scary bit of code and test_convert_toJSON_multi_language did fail. Then I set all places where output_context is set to None and some tests in test_repeat failed. I don't understand why this works, though. I thought it would remove all output_context values for all survey elements.

This code is reached by a call to survey.to_json so it shouldn't affect form conversion to XML.

The purpose of output_context is to hold a reference to a SurveyElement, so that references like ${} can be expanded. In this case we're going from "internal PyForm" structures into JSON - so I don't know what the intended JSON representation of output_context should be (maybe a copy of the node dict?). Or maybe the JSON output should have the de-referenced output somewhere (e.g. a string containing XML). It seems like it needs a separate issue to design and implement, or document.

Anyway, the problem at hand is that the node reference is a SurveyElement, which has a reference to it's parent Survey, which has references to it's child SurveyElements, which have references to the parent Survey, and so on. To break out of that loop, output_context is deleted so that survey.to_json() can return a document - it just would have the original referencing strings like My fav fruit is ${fruit}, and de-referencing those is left to the caller.

This code is reached by a call to survey.to_json so it shouldn't affect form conversion to XML.

Aha, thank you, that's great to know.

lognaturel · 2023-07-17T23:22:21Z

pyxform/builder.py

+    name_of_main_section: str = None,
+    sections: Dict[str, Dict] = None,
+    main_section: Dict[str, Any] = None,
+    id_string: Optional[str] = None,


id_string seems like it shouldn't be optional.

I sprinkled on this type annotation to help understand data flow. The id_string (and perhaps other params) could be mandatory, but then it seemed to make sense to refactor the body of the function to remove is None checks, etc., which could easily snowball into more refactoring.

lognaturel · 2023-07-18T22:53:26Z

pyxform/xls2json.py

+            # Reference to list name for data dictionary tools (ilri/odktools).
+            question["list_name"] = list_name
+            # Copy choices for data export tools (onaio/onadata).
+            # TODO: could onadata use the list_name to look up the list for


One challenge with not having a documented API for pyxform at all is that it's not clear how to put things on a deprecation path. Did you happen to also look at Kobo for any direct usage of this kind?

Various kobo repos (including kobocat, kpi, formpack) import pyxform. formpack uses workbook_to_json, but I didn't look in to how they use the data structures. formpack seems to be used by kpi, but I'm not sure how that relates to kobocat or other projects. kpi dependency spec has pyxform==1.9.0.

lognaturel · 2023-07-19T16:38:55Z

Ok, @lindsay-stevens, I'm ready to merge once the couple of things above are addressed! The only real blocker for me is that I'm concerned dropping output_context won't work in every case. I haven't come up with a failing example yet but it'd be helpful to at least understand that chunk of code a bit better. I'd also like to at least try to get the Enketo formatting looking ok.

lindsay-stevens · 2023-07-24T10:57:49Z

@lognaturel thanks for the review! I have responded to the review comments.

About the Enketo issue, I don't see where the pyxform output formatting could cause this. There's no spacing added to the output e.g. near the jr://images/ values, or the coded values in the itemsets. Maybe there is a different code path in Enketo for looking up labels via jr:itext(itextId) + itemset vs. directly e.g. jr:itext('/widgets/grid_test/a:label')"? If Collect is not doing the spacing thing then that also points to an Enketo issue.

lognaturel · 2023-07-31T17:35:47Z

I have filed Enketo issues: enketo/enketo#29, enketo/enketo#28

The first is bad but I don't think it's bad enough to block merge. We can communicate the issues and encourage platforms that use pyxform and Enketo to update both at the same time.

We will release this as pyxform 2.0 because there is a compatibility/interface change.

lognaturel · 2023-08-04T21:45:08Z

We've identified a number of Enketo bugs as we have done quality assurance on this change. We're holding release until we can resolve those.

lognaturel · 2023-08-24T22:52:26Z

All Enketo bugs we know about are fixed! 🎉

But now a user has reported this strange error. I can reproduce it on master and can't reproduce on v1.12.1. @lindsay-stevens could you please take a look?

lindsay-stevens · 2023-08-25T13:48:48Z

@lognaturel Great news about Enketo! I have opened a new PR to fix this error, which occurs when using or_other + translations + group or repeat.

lindsay-stevens force-pushed the pyxform-603 branch from 4dd99fb to 3248805 Compare July 15, 2022 07:02

lindsay-stevens added 5 commits April 20, 2023 19:08

wip: initial impl changes and test fixes - 27 tests still broken

faaf52a

fix: error created but not raised

ae54917

fix: lint

ff31a6f

fix: remove unused xpath expressions

66f8838

- now covered by tests/xpath_helpers/choices.py

lindsay-stevens force-pushed the pyxform-603 branch from 783f1f1 to ab992b1 Compare April 20, 2023 09:11

lindsay-stevens added 9 commits May 5, 2023 16:23

fix: upgrade fixture-based test for instance_xmlns setting

df4e01e

- The test was broken by changes to choices processing (XLSForm#603). - The test conditions seem to relate only to the instance_xmlns feature. - The new test asserts the same using the pyxformtestcase pattern.

fix: formatting

0683836

fix: import error due to local name vs. package namespaced name

7807e88

- OK locally but failing on CI

fix: translation test using incorrect test name param

f6df5b2

add: type annotations

c7caf44

- reasonably confident that these are the expected types

fix: remove debug output in some tests

c93d0a1

fix: update comments in _setup_choice_translations for current behaviour

48c6f8e

lindsay-stevens force-pushed the pyxform-603 branch from 8f72e42 to 7015ba7 Compare May 5, 2023 06:25

lindsay-stevens added 3 commits May 5, 2023 17:34

fix: minor correctness / off-by-one error in excel data processing

acd7640

- The right number of empty rows would be removed afterwards anyway, but for someone reading the code later it'd be better if that number was as stated.

fix: tests for new itemset output for selects, add xls2json type hints

2adb257

- yes_or_no_question and xml_tests updated for new itemset output for selects. The default_survey_sheet test expected no default_form_name. - add type hints to workbook_to_json and parse_file_to_json.

lindsay-stevens force-pushed the pyxform-603 branch from 7015ba7 to 2adb257 Compare May 5, 2023 07:37

lindsay-stevens added 2 commits May 11, 2023 20:55

fix: translation tests failing due to itemset changes

b23d704

- test_translations.py: - remove xpath helpers now in xpath_helpers/choices.py - update test xpath assertions to match itemset behaviour - choices.py: - improve xpath helpers to not require specification of position

fix: test failure on Windows due to file encoding not specified in test

c2dd549

lindsay-stevens force-pushed the pyxform-603 branch from 3275b6c to 751e4ea Compare May 12, 2023 12:28

lindsay-stevens marked this pull request as ready for review May 12, 2023 12:32

lindsay-stevens requested a review from lognaturel May 12, 2023 12:32

lognaturel reviewed Jul 18, 2023

View reviewed changes

lognaturel mentioned this pull request Nov 28, 2023

Likert appearance on select from secondary instance is broken enketo/enketo#29

Closed

lognaturel approved these changes Jul 31, 2023

View reviewed changes

lognaturel merged commit 129abe9 into XLSForm:master Jul 31, 2023
10 checks passed

lindsay-stevens deleted the pyxform-603 branch August 1, 2023 02:47

eyelidlessness mentioned this pull request Aug 16, 2023

Fix: multiple "static" itemset datalists in the same repeat enketo/enketo-core#995

Merged

lindsay-stevens mentioned this pull request Aug 25, 2023

603: (secondary instances) fix or_other + translations + group or repeat #650

Merged

4 tasks

lindsay-stevens mentioned this pull request Nov 24, 2023

itext identifiers prevent duplicate choices from displaying distinct labels #585

Closed

lognaturel mentioned this pull request Mar 8, 2024

autocomplete appearance does not work on select_multiple (despite ODK documentation) enketo/enketo#1301

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

603: Always generate secondary instance for selects #614

603: Always generate secondary instance for selects #614

lindsay-stevens commented Jul 6, 2022 •

edited

lognaturel commented May 8, 2023 •

edited

tiritea commented May 8, 2023

lindsay-stevens commented May 9, 2023

lognaturel commented May 9, 2023

lindsay-stevens commented May 9, 2023

lognaturel commented May 9, 2023

tiritea commented May 9, 2023

lindsay-stevens commented May 12, 2023

lognaturel commented Jul 17, 2023 •

edited

lognaturel left a comment •

edited

lognaturel Jul 17, 2023

lindsay-stevens Jul 24, 2023

lognaturel Jul 24, 2023

lognaturel Jul 17, 2023

lindsay-stevens Jul 24, 2023

lognaturel Jul 18, 2023

lindsay-stevens Jul 24, 2023

lognaturel commented Jul 19, 2023 •

edited

lindsay-stevens commented Jul 24, 2023

lognaturel commented Jul 31, 2023 •

edited

lognaturel commented Aug 4, 2023

lognaturel commented Aug 24, 2023 •

edited

lindsay-stevens commented Aug 25, 2023

603: Always generate secondary instance for selects #614

603: Always generate secondary instance for selects #614

Conversation

lindsay-stevens commented Jul 6, 2022 • edited

Why is this the best possible solution? Were any other approaches considered?

What are the regression risks?

Does this change require updates to documentation? If so, please file an issue here and include the link below.

Before submitting this PR, please make sure you have:

lognaturel commented May 8, 2023 • edited

tiritea commented May 8, 2023

lindsay-stevens commented May 9, 2023

lognaturel commented May 9, 2023

lindsay-stevens commented May 9, 2023

lognaturel commented May 9, 2023

tiritea commented May 9, 2023

lindsay-stevens commented May 12, 2023

lognaturel commented Jul 17, 2023 • edited

lognaturel left a comment • edited

Choose a reason for hiding this comment

lognaturel Jul 17, 2023

Choose a reason for hiding this comment

lindsay-stevens Jul 24, 2023

Choose a reason for hiding this comment

lognaturel Jul 24, 2023

Choose a reason for hiding this comment

lognaturel Jul 17, 2023

Choose a reason for hiding this comment

lindsay-stevens Jul 24, 2023

Choose a reason for hiding this comment

lognaturel Jul 18, 2023

Choose a reason for hiding this comment

lindsay-stevens Jul 24, 2023

Choose a reason for hiding this comment

lognaturel commented Jul 19, 2023 • edited

lindsay-stevens commented Jul 24, 2023

lognaturel commented Jul 31, 2023 • edited

lognaturel commented Aug 4, 2023

lognaturel commented Aug 24, 2023 • edited

lindsay-stevens commented Aug 25, 2023

lindsay-stevens commented Jul 6, 2022 •

edited

lognaturel commented May 8, 2023 •

edited

lognaturel commented Jul 17, 2023 •

edited

lognaturel left a comment •

edited

lognaturel commented Jul 19, 2023 •

edited

lognaturel commented Jul 31, 2023 •

edited

lognaturel commented Aug 24, 2023 •

edited