-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
docx creates nested runs (<w:r><w:t><w:r><w:t>), which are then invisible in the opened document #19
Comments
Strangely, the text isn't divided in input to the pipeline. The split-point is <w:t xml:space="preserve">». De benytter et webview, i praksis en nettleser som er bygget inn i Office-programmene, for å vise innholdet sitt og utføre oppgavene sine. </w:t> Here's what it looks like for the first step of the pipeline:
Then right after the full pipeline, we have wordbound tags galore:
The second-to-last step, before postgenerator, looks like So is the issue here that postgenerator should not be creating these nested word blanks, or that transfuse should somehow know how to deal with nested word blanks? |
@mr-martian does your apertium/lttoolbox#144 avoid nested word blanks in postgen? |
The pipe may not yield nested structures, nor will Transfuse give it nested structures, so that looks like a bug in postgen. |
in.docx
With transfuse, we get this bit:
which word (and libreoffice) don't show on opening the document, presumably nested runs aren't allowed in OOXML.
(Note: If I first save in.docx from Libreoffice, transfuse can handle it fine, because LO merges all the runs in the input paragraph on saving (removing the proofErr stuff).)
The text was updated successfully, but these errors were encountered: