Translation checks only run on columns used by first data row #637

lognaturel · 2023-01-17T22:42:20Z

While taking a look at #601, I noticed the use of sheet_data[0] at

https://github.com/XLSForm/pyxform/blob/master/pyxform/validators/pyxform/missing_translations_check.py#L90

This means translation checks are only run on the first row of a form. Unfortunately, that will not always use all translation columns. For example, the following test fails because no warning is produced:

    def test_translation_checks__work_when_first_row_columns_not_all_populated(self):
        md = """
        | survey |          |          |            |            |
        |        | type     | name     | label::eng | hint       |
        |        | deviceid | deviceid |            |            |
        |        | note     | n1       | hello      | salutation |
        """
        warning = format_missing_translations_msg(
            _in={SURVEY: {"eng": ["hint"], "default": ["label"]}}
        )
        self.assertPyxformXform(
            name="test",
            md=md,
            warnings__contains=[warning],
        )

If the deviceid row of the form is omitted, the check works as intended.

I'm not entirely sure what to do about this. I think the ideal would be to check the actual column headers. I believe that would mean adding header validation to xlsx_to_dict/xls_to_dict/csv_to_dict or passing on header information from there. Those currently don't know about the different expected sheets (which seems like a good thing).

Another option would be to do the translation check while processing each row. That might not be great performance-wise but I'm not sure.

The text was updated successfully, but these errors were encountered:

lindsay-stevens · 2023-01-19T19:48:07Z

translation checks are only run on the first row of a form

There was a performance concern (link) that lead to that. In that thread I noted having checked validating all items and it took "+/- a few ms at 5k questions".

check the actual column headers

I think I looked at getting the headers into workbook_to_json and it was one of those things that seems simple initially but kinda isn't (maybe in the context of the rest of #157).

lognaturel · 2023-01-19T21:14:32Z

Thanks, I had forgotten that conversation and that change.

it took "+/- a few ms at 5k questions".

That seems acceptable after all? Right now I expect most forms are missing out on this check. It's really common to get deviceid, starttime, etc at the start of a form and/or to start with a group or repeat. An alternative might be to check the first ~15 rows or something.

My current preference is to go back to checking every row. Maybe we could add the check to the main loop over all rows? I'm not entirely sure that would make a difference, even 5k is not a big number of list entries to loop over.

I think I looked at getting the headers into workbook_to_json and it was one of those things that seems simple initially but kinda isn't

I agree, that was what I set out to do with #601 and it would really require a significant restructuring.

lindsay-stevens · 2024-01-10T10:20:05Z

Since #666 the translations check uses the column headers, which addresses this bug. If it's desirable to warn users about any missing translation in any row, I think that's a different feature for a new ticket.

lognaturel added bug needs discussion labels Jan 17, 2023

lognaturel mentioned this issue Aug 29, 2023

Detection of multilanguage choices only works if first choice list is multilanguage #653

Closed

lognaturel added this to the Next milestone Sep 15, 2023

lindsay-stevens closed this as completed Jan 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Translation checks only run on columns used by first data row #637

Translation checks only run on columns used by first data row #637

lognaturel commented Jan 17, 2023

lindsay-stevens commented Jan 19, 2023

lognaturel commented Jan 19, 2023

lindsay-stevens commented Jan 10, 2024

Translation checks only run on columns used by first data row #637

Translation checks only run on columns used by first data row #637

Comments

lognaturel commented Jan 17, 2023

lindsay-stevens commented Jan 19, 2023

lognaturel commented Jan 19, 2023

lindsay-stevens commented Jan 10, 2024