Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple sibling remarks in the same language cause translation issues #1872

Closed
martindholmes opened this issue Apr 8, 2019 · 12 comments
Closed

Comments

@martindholmes
Copy link
Contributor

martindholmes commented Apr 8, 2019

@sydb and I have been looking at a few cases where there are multiple <remarks> elements in English in spec files. There are nine cases of this:

  • att.editLike.xml
  • att.measurement.xml
  • dataRef.xml
  • rendition.xml
  • expan.xml
  • path.xml
  • locus.xml
  • list.xml
  • att.datable.iso.xml

This causes problems for translation because there is no direct connection between each English <remarks> and its other language equivalent, so it's not easy to tell when a given translation is out of date.

It is also apparent that the decision on whether to create multiple remarks versus having multiple paragraphs in a single <remarks> is not particularly consistent.

There are two possible solutions to this:

  1. Link each English <remarks> to its other-language equivalents using @xml:id and @corresp.

  2. Have a rule that only one <remarks> element for each language is allowed, and use multiple paragraphs inside the single <remarks> to distinguish different topics (which is already done in many cases).

@martindholmes
Copy link
Contributor Author

martindholmes commented Apr 8, 2019

XPath to find instances:

for $r in //remarks  return 
if ($r/preceding-sibling::remarks[@xml:lang=$r/@xml:lang] and 
not (following-sibling::remarks/@xml:lang = $r/@xml:lang) and 
$r/@xml:lang = 'en') 
then $r else ()

@martinascholger
Copy link
Member

I'm in favour of solution 2.

@martindholmes
Copy link
Contributor Author

martindholmes commented Apr 8, 2019

@martinascholger I agree with you. @sydb favours number 1.

@duncdrum
Copy link
Contributor

duncdrum commented Apr 8, 2019

I don't see how having multiple paragraphs inside a single remark, would solve the problem of correspondence between paragraph 3 of 7 in english and its corresponding paragraph x of 3 in french.
Solution 1 seems cleaner

@martindholmes
Copy link
Contributor Author

@duncdrum The sequence of paragraphs is fixed, so the first para in the English should match the first para in the translation. If there are different numbers of paragraphs, then the translation is out of date or wrong and should be fixed. In the case of separate <remarks> elements, the order is not constrained, and in practice they end up all over the place, so the Japanese translation for the first English <remarks> may end up after the translation for the second one; that problem is not solved by the linking approach, whereas it is solved by constraining to a single <remarks> with paragraphs.

@duncdrum
Copy link
Contributor

duncdrum commented Apr 9, 2019

@martindholmes why does sequence matter, if we have a pair of xml:id and @corresp?

@martindholmes
Copy link
Contributor Author

@duncdrum In some cases where there are multiple English <remarks>, it looks as though they follow a logical reading sequence (and could perfectly well have been encoded as paragraphs in a single <remarks>). I assume that when that is the case, the same reading order ought to be maintained in the translations.

It's going to be a bit tedious to add and maintain all the ids and pointers for the majority of cases when the problem only exists for nine files, so I prefer the simple option mainly for that reason. If we also decide to encourage the use of multiple individual <remarks>, we'll also have to examine all the cases of multi-paragraph remarks to see whether they really should be split out into separate elements; and we'll have to clearly articulate what it means to have two <remarks> elements as opposed to having two paragraphs in a single <remarks> element. I don't believe that any of the existing multi-remarks cases is fundamentally different from other cases where there are multiple paragraphs in the same element, so I don't believe they really need to exist, and life would be simpler if only one were allowed.

But please have a look at the specific cases and see if you can make a clear argument for how they differ from multi-paragraph <remarks>.

@lb42
Copy link
Member

lb42 commented Aug 8, 2019

Coming a bit late to the party, but solution #2 gets my vote too. The extra baggage of trying to align translation equivalences is just not worth the effort. It's not done anywhere else, so why gild this particular lily?

@ebeshero ebeshero assigned martindholmes and unassigned raffazizzi Sep 15, 2019
@martindholmes
Copy link
Contributor Author

F2F Graz agrees that there should be 1 remark per language with paragraphs. @martindholmes to implement for these few cases, then add a Schematron rule to enforce it.

@martindholmes
Copy link
Contributor Author

martindholmes commented Sep 15, 2019

This got green for go in the meeting; the commits above, ending with #5d776c0ec, should have completed the collapsing of the remarks. Thinking of the Schematron to catch this, though, I'm pretty sure we only want to apply it to the P5 specs, not to everyone else's ODDs, so I need to check with Council about the best way to write Schematron that applies only to our spec files. Setting this back to Needs Discussion.

@martindholmes
Copy link
Contributor Author

Per @sydb added Schematron to p5odds.odd in commit #54f4d2278. When build has run and the rule is tested, the ticket can be closed.

@peterstadler peterstadler added this to the Guidelines 3.7.0 milestone Sep 16, 2019
@martindholmes
Copy link
Contributor Author

Tested and working.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants