Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix TemplateProcessor::fixBrokenMacros to handle whitespaces #1971

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

chapa
Copy link

@chapa chapa commented Nov 20, 2020

Description

This is a proposal to fix #590 in a cleaner way than a global regex on the whole document, which might change parts of the document that must not change.

Edit: see #1971 (comment)

TemplateProcessor::fixBrokenMacros() is a method that strip all tags between ${, the name of the variable and } (let's call them the 3 parts). If there is a <w:t xml:space="preserve"> in those tags it means that there is a trailing space at the end of the content of the tag, because there is not supposed to be any space between the 3 parts, so the whole must be inside of it in order to keep this trailing space.

Examples:

<w:r><w:t>before ${</w:t></w:r><w:r><w:t xml:space="preserve">variable} </w:t></w:r><w:r><w:t>after</w:t></w:r>
                                                                       |
                                                              the trailing space

<w:r><w:t>before ${</w:t></w:r><w:r><w:t>variable</w:t></w:r><w:r><w:t xml:space="preserve">} </w:t></w:r><w:r><w:t>after</w:t></w:r>
                                                                                             |
                                                                                    the trailing space

I think it's safe to move the opening tag just before the variable, but one thing I'm not sure about because I don't know Open XML very well, is whether it's safe to prepend the variable by </w:t></w:r><w:r><w:t xml:space="preserve">. I think it's ok if a <w:t> is necessarily in a <w:r> and there cannot be multiple <w:t> in a <w:r> (otherwise we could just remove the </w:r><w:r> part).

Expected result of the above examples:

<w:r><w:t>before </w:t></w:r><w:r><w:t xml:space="preserve">${variable} </w:t></w:r><w:r><w:t>after</w:t></w:r>

Checklist:

  • I have run composer run-script check --timeout=0 and no errors were reported
  • The new code is covered by unit tests (check build/coverage for coverage report)
  • I have updated the documentation to describe the changes
    => it's a bug fix

@coveralls
Copy link

coveralls commented Feb 18, 2021

Coverage Status

coverage: 95.538% (+0.006%) from 95.532%
when pulling 838b63e on chapa:develop
into e76b701 on PHPOffice:master.

@4rthem
Copy link

4rthem commented Nov 15, 2021

Thank you for the fix!
Could it be merged?

@chapa
Copy link
Author

chapa commented Nov 16, 2021

Actually I found a case where my fix isn't working: when there is styling tags between the $ and the } of a single variable. Because all tags are stripped we lose styling information.

After several tests, I came to the conclusion that the best way to fix those broken macros without breaking anything else was to move all existing tags inside the variable right after it, for example:

<w:r><w:t>${</w:t></w:r><w:r><w:t>variable1</w:t></w:r><w:r><w:t>}</w:t></w:r>
<!--        └─────────1──────────┘         └─────────2──────────┘          -->

should be transformed to:

<w:r><w:t>${variable1}</w:t></w:r><w:r><w:t></w:t></w:r><w:r><w:t></w:t></w:r>
<!--                  └─────────1──────────┘└─────────2──────────┘         -->

That way we just reassemble all parts of the macro together.

But we also need to add xml:space="preserve" to all <w:t> we move, to handle this situation:

<w:r><w:t>${variable1</w:t></w:r><w:r><w:t>} ${variable2</w:t></w:r><w:r><w:t>}</w:t></w:r>

that would be transformed to:

<w:r><w:t>${variable1}</w:t></w:r><w:r><w:t xml:space="preserve"> ${variable2}</w:t></w:r><w:r><w:t xml:space="preserve"></w:t></w:r>
<!--                                        └──────────────────┘ │                                  └──────────────────┘          -->
<!--                                            we need this     └─ to preserve this space        (this one does not hurt)        -->

I updated the fix to handle this

@will2877
Copy link

Will there be any movement on this PR in the near future?
I'm experiencing isses with missing whitespaces all over my documents.

@chapa
Copy link
Author

chapa commented Jan 21, 2022

@PowerKiKi PowerKiKi changed the base branch from develop to master November 16, 2022 21:15
@Progi1984 Progi1984 force-pushed the master branch 3 times, most recently from 2d9f999 to e458249 Compare August 30, 2023 11:56
Update main regexp so that the full macro is matched (only `{{var}` was matched instead of `{{var}}`, which is a problem because we want to move tags after the whole macro)
@chapa
Copy link
Author

chapa commented Dec 1, 2023

I rebased the branch to be up to date with master, this needed some adaptations to handle custom macro chars that wasn't overridable at the time the PR was made.

@Progi1984 can you take a look ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

PHPWord randomly deletes whitespaces
5 participants