Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

invalid characters break insert tag replacement #349

Closed
fritzmg opened this issue Feb 18, 2019 · 2 comments
Closed

invalid characters break insert tag replacement #349

fritzmg opened this issue Feb 18, 2019 · 2 comments
Assignees
Labels
Milestone

Comments

@fritzmg
Copy link
Contributor

fritzmg commented Feb 18, 2019

Affected version(s)
4.4.34, probably 4.7.0 too

Description
If the preg_replace for insert tag replacement

$tags = preg_split('~{{([\pL\pN][^{}]*)}}~u', $strBuffer, -1, PREG_SPLIT_DELIM_CAPTURE);

gets fed with an invalid character somewhere in $strBuffer, no insert tag will be matched, and thus potentially no insert tags will be replaced on the page. Instead they will be shown as they are.

This is because of the u flag, that was introduced in 70a1403

@leofeyer is there a specific reason for adding this flag? Otherwise it can simply be removed to fix the issue.

How to reproduce

This is a little more difficult. The test case involved a site which contained characters in the database that were not encoded as UTF-8. This itself is not yet a problem. The issue manifests itself on the search result page of the search module. The search module was cutting of a word in the middle of a non-UTF-8 character, which produced the invalid output. So if the original search result context contained the word erschlieüen (erschließen with the ß encoded in a different character set), the search result module was clipping the content right in between the invalid character resulting in

<p class="context">... erschlieÃ</p>

/cc @ausi

@ausi
Copy link
Member

ausi commented Feb 19, 2019

If we want to support unicode characters at the beginning of insert tags (as we currently do) the regex should be /{{([a-zA-Z0-9\x80-\xFF][^{}]*)}}/ IMO. Otherwise we could use /{{([a-zA-Z0-9][^{}]*)}}/

@leofeyer leofeyer added the bug label Feb 19, 2019
@leofeyer leofeyer added this to the 4.4.35 milestone Feb 19, 2019
@leofeyer leofeyer self-assigned this Feb 19, 2019
@leofeyer
Copy link
Member

Fixed in de82452. Thank you @ausi.

@leofeyer leofeyer modified the milestones: 4.4.35, 4.4 May 14, 2019
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Mar 18, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

3 participants