Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

157 missing translations warning #571

Merged
merged 23 commits into from
Feb 10, 2022

Conversation

lindsay-stevens
Copy link
Contributor

@lindsay-stevens lindsay-stevens commented Nov 19, 2021

Closes #157

Adds a warning if it looks like a XLSForm is missing a translation on one or more translatable columns.

The originating issue was a test case concerning only survey label and media::image. This PR covers all translatable columns (per the Reference Table. On the survey sheet: label, hint, guidance_hint, media::image, media::audio, media::video, constraint_message, and required_message. On the choices sheet: label, media::image, media::audio, and media::video.

The warning message specifies the columns and languages which appear to be missing. The (unwrapped) template is:

"Missing translation column(s): there is no default_language set, and a "
"translation column was not found for the {cols_msg}. "
"To avoid unexpected form behaviour, specify a default_language in the "
"settings sheet, or add the missing translation columns."

Where {cols_msg} lists the columns (for either or both of survey and choices) separated by a semicolon and the languages separated by a comma, e.g. 'hint': 'English', 'French'; 'label': 'English'. It can be a bit difficult to read when there are a lot of missing columns / languages but I'm not sure of a better alternative, such as using list notation (square brackets) etc. However, presenting the warning like this seems preferable to emitting a separate warning for each column. Example:

Missing translation column(s): there is no default_language set, and a translation column was not found for the survey column(s) and language(s): 'constraint_message': 'default'; 'guidance_hint': 'default'; 'hint': 'default'; 'media::audio': 'default'; 'media::image': 'default'; 'media::video': 'default'; 'required_message': 'default' and choices column(s) and language(s): 'media::audio': 'default'; 'media::image': 'default'; 'media::video': 'default'. To avoid unexpected form behaviour, specify a default_language in the settings sheet, or add the missing translation columns.

Why is this the best possible solution? Were any other approaches considered?

It would have been a good outcome if there was clearly a way that pyxform could produce an XForm that avoided the issue that the warning is for. However there were not many tests for translations previously. This made it difficult to tell whether aspects of and differences in XLSForm input and XForm output were symptoms of the originating issue, or normal behaviour, or undiscovered bugs.

This PR includes a raft of relatively pedantic XPath tests for translations behaviour. There are probably still more permutations that don't have tests but this should be a good boost to test coverage. In test_translations.py there are some TODO comments which should be reviewed and converted to tickets if they represent genuine bugs, or removed if they note normal behaviour. Once the TODO's are resolved, maybe it would be possible for pyxform to emit an XForm such that the warning is not required.

What are the regression risks?

Some users may expect no warning for certain XLSForms with missing translation columns. Depending on how warnings are handled it may or may not be a problem. The changes in this PR are otherwise additive.

Does this change require updates to documentation? If so, please file an issue here and include the link below.

The guidance_hint column was missing from the XLSForm Reference Table but I've submitted a PR for that already.

Before submitting this PR, please make sure you have:

  • included test cases for core behavior and edge cases in tests
  • run nosetests and verified all tests pass
  • run black pyxform tests to format code
  • verified that any code or assets from external sources are properly credited in comments

- added tests have a couple of todo's to follow up
- will also need to add:
  - test cases for happy path and other translatable columns
  - the same sort of check for the translatable choices sheet columns
- also added some obvious type annotations and moved repeated usages of
  "default" for the default language to the constants module.
- in the tests in the prior commit it was becoming difficult to discern
  the correct / normal behaviour is for various translation scenarios
- includes numerous todo's for suspect behaviour that may be bugs
- added guidance_hint to translatable columns list
- simplified representation of issue 157 source case where label + hint
  are in French, with a default lang image column, and no default set.
- media columns were not being detected correctly since the survey_sheet
  object nests audio, image and video in a "media" dict.
- updated constants list with comment on other possible translatable
  columns in case pyxform supports them in future.
- general tidy up of test xpaths and usages of them, such as to remove
  combination/shortcut xpaths, specify absence of ref or form attributes
  where relevant, condense guidance/audio/image/video xpaths.
- remove duplicate test todo's for the same issue in a similar context.
@codecov-commenter
Copy link

codecov-commenter commented Nov 23, 2021

Codecov Report

Merging #571 (eb6f0fe) into master (bb2497d) will increase coverage by 0.19%.
The diff coverage is 87.32%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #571      +/-   ##
==========================================
+ Coverage   83.59%   83.79%   +0.19%     
==========================================
  Files          25       26       +1     
  Lines        3761     3825      +64     
  Branches      930      952      +22     
==========================================
+ Hits         3144     3205      +61     
  Misses        474      474              
- Partials      143      146       +3     
Impacted Files Coverage Δ
pyxform/xls2json.py 79.82% <80.00%> (+0.35%) ⬆️
...m/validators/pyxform/missing_translations_check.py 86.20% <86.20%> (ø)
pyxform/aliases.py 100.00% <100.00%> (ø)
pyxform/constants.py 100.00% <100.00%> (ø)
pyxform/section.py 96.58% <100.00%> (+0.02%) ⬆️
pyxform/survey.py 94.37% <0.00%> (+0.58%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update bb2497d...eb6f0fe. Read the comment docs.

- moved TranslationTest.test_missing_media_itext to TestTranslations
- moved new TranslationTest tests to TestTranslationsSurvey
- added TestTranslationsChoices
  - similar to survey tests, added to clarify existing behaviour
  - includes some initial cases, for with/out a lang and choice_filter
  - some overlap with module test_secondary_instance_translations.py
    which should be resolved once these tests are complete.
- added a XPathHelper class to try and make the choices tests more
  legible and easier to document, vs. the constants used for survey
  tests. Could adapt / extend helper for survey tests as well.
- Update TestTranslations and TestTranslationsSurvey that were using
  string constants to use XPathHelper so that the assertions being made
  are easier to follow.
- Add XPathHelper method docstrings to describe what they assert.
- Previously just showing the XForm made it a mystery which of the
  (potentially many) XPath expressions failed to match.
- Add to existing warning for missing translations where the affected
  columns are constraint_message or required_message
- Due to a difference in the internally used name and supported external
  aliases, the TRANSLATABLE_SURVEY_COLUMNS object is now a dict and
  moved to aliases.py so it is closer to the existing aliases dicts to
  help keep them in sync.
- debug=True causes the XForm and other test info to be printed out but
  this shouldn't be present on committed tests.
- If warnings__* parameters are misspelled then they don't do anything.
- checks columns shown in aliases.py
- check done alongside survey check so there's just warning message
  about missing translations
- add choices warning test cases for same set of scenarios as for survey
- update existing survey tests to use warnings__contains or not_contains
  instead of passing in list and doing assertIn / assertNotIn
- the three related functions for this check are parts of a validation,
  and they otherwise clutter up xls2json.py a bit.
- may make sense to move other xls2json checks into the same package
  so it's clearer what validations pyxform does.
- combination of existing tests with same name pattern but in one test
  case that has both warning and no-warning assertions.
@lindsay-stevens lindsay-stevens marked this pull request as ready for review November 30, 2021 10:32
@KeynesYouDigIt
Copy link
Contributor

Pretty clean approach! I am still worried that _add_empty_translations in survey.py must be doing essentially the same work, but this leaves the separation of concerns much cleaner (maybe _add_empty_translations could be done here somehow and removed from survey.py?)

The added testing is pretty helpful!

- to check whether the missing translation check incurs a reasonable
  time cost, considering that it traverses the full XLSForm dict
  available in xls2json rather than earlier in the pipeline to just
  check sheet headers. It seems there is very little difference for a
  giant form with 500 questions and 2 choices per question.
# Conflicts:
#	pyxform/xls2json.py
@lindsay-stevens
Copy link
Contributor Author

Hi @KeynesYouDigIt thanks for your review! Please let me know if any changes or further clarification are needed.

@KeynesYouDigIt
Copy link
Contributor

@lindsay-stevens no im good, this all seems fine! I'm afraid I don't have official review permissions so someone else will have to do the official review.

@lindsay-stevens
Copy link
Contributor Author

Hi @lognaturel would you be able to review?

@lognaturel
Copy link
Contributor

Will do!

I believe dashes are filled in for labels because Validate requires a label for every language.

@lindsay-stevens
Copy link
Contributor Author

Hi @lognaturel just a reminder about this - no worries if it's til the new year. Thanks!

@lindsay-stevens
Copy link
Contributor Author

Hi @lognaturel happy new year! Just a reminder about this PR ready for review.

Copy link
Contributor

@lognaturel lognaturel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As seems to be the case for things you tackle, @lindsay-stevens, there's some mind-bending stuff here! I'm happy with this and we can always come back if we want to warn more aggressively as I mention. I'll leave this as a comment for now in case any of my inline thoughts spark any ideas, particularly for another way to arrange the warning.

pyxform/xls2json.py Outdated Show resolved Hide resolved
tests/test_translations.py Outdated Show resolved Hide resolved
tests/test_translations.py Outdated Show resolved Hide resolved
tests/test_translations.py Outdated Show resolved Hide resolved
- missing translations check can still work while only looking at first
  row, which is a little faster and still passes the tests.
- while profiling, found an inefficient unique check in section.py. The
  benefit is only apparent with forms >1000 questions, but it is
  significant: q=5000 28s vs. 11s, q=10000 103s vs. 21s.
@lindsay-stevens
Copy link
Contributor Author

Thanks for reviewing @lognaturel! I've replied to each comment, please let me know if any changes are needed at this stage (or via separate tickets).

- default_language setting shouldn't influence the warning since the
  underlying issue is missing languages, and the setting just happens
  to be a way around one of the problems that causes.
- updated the warning format to be one line per lang/sheet, with column
  names listed in the line.
- includes corresponding test changes for the above, and re-configures
  how the check func processes data to better match the warning format.
@lindsay-stevens
Copy link
Contributor Author

@lognaturel thanks for the feedback / discussion above. The outstanding items should all now be addressed, either by new commits or new tickets for follow up.

Copy link
Contributor

@lognaturel lognaturel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What a journey!

tests/test_translations.py Show resolved Hide resolved
tests/test_translations.py Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Sometimes users don't apply consistent language tags across all translatable columns
4 participants