New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
text_summary() should output valid HTML and Unicode text #308
Comments
This is one of those things in Drupal that always left me scratching my head, and I agree that something should be done about it. I'm glad to see that others in the Drupal community thought the same. There are two major patches in that d.o issue. The first (#76) is a simple solution that does a far better job than the current state, but doesn't handle a lot of corner cases or international sentence end punctuation or other stuff. But it's simple and works better. The second one (#169) has serious performance issues as it builds a DOM to parse out the body text and tries to cover move international punctuation and stuff. The tests also fail because the newlines ( I have branches where I've massaged both patches into Backdrop. I'm leaning towards committing #76 before diving down the rabbit hole that is #169. Thoughts? |
And by "committing" I mean "pushing up to my fork and generating a pull request." |
I can reproduce this: $text = "<p>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis finibus posuere purus, vitae condimentum elit cursus non. Phasellus diam sapien, maximus sit amet ipsum eget, interdum bibendum elit. Sed euismod, neque in elementum cursus, ante velit iaculis leo, a consequat quam dui non augue. Donec dapibus eleifend elementum. Duis scelerisque, nibh et blandit pharetra, libero massa viverra est, at bibendum mauris ante ornare odio. Nulla facilisi. Curabitur at est vestibulum, auctor nibh sit amet, fringilla justo. Donec aliquet aliquam lorem, in facilisis quam posuere vitae. Phasellus vestibulum turpis at mi imperdiet varius.</p>";
dpm(text_summary($text)); This produces:
(i.e. no closing But // If the htmlcorrector filter is present, apply it to the generated summary.
if (isset($format->filters['filter_htmlcorrector'])) {
$summary = _filter_htmlcorrector($summary);
} So I think the easiest solution is to just run Here's a PR: backdrop/backdrop#3072 |
@BWPanda do you know if making a view of teasers is a good way to test? Not sure if Views is using the same function or has its own. |
@herbdool I haven't tested or looked through the code, but this comment seems to suggest that Views has its own trimming function, so it might not work for testing this... |
A view should work as long as you choose the If you choose |
I'm now wondering about this... If you have a text format where HTML corrector is turned off, it still produces proper HTML since The only situation I can think of where this wouldn't be set at all, would be if you used the 'Plain text' processing option for the Body field. But in that case the HTML doesn't need to be valid because it's just escaped and output as actual text... So are there any other scenarios where this would be an issue? I.e. how to reproduce the original issue (on an actual Backdrop site setup, not just calling the individual functions like I originally did)? @jenlampton? |
That doesn't sound right. If somewhere along the way we have intentionally made that change, then we should remove that filter from the text format options, so it doesn't show up in the UI. Either that, or then treat this as a bug and fix things, so that if the filter is disabled, it doesn't run. |
@klonos It's not that the text filter itself is run, it's just that it's used to clean up the code if it's present: // If the htmlcorrector filter is present, apply it to the generated summary.
if (isset($format->filters['filter_htmlcorrector'])) {
$summary = _filter_htmlcorrector($summary);
} The |
I agree. If something can get disabled via UI, then we should adhere that setting. Even if that means that markup is chopped off. If it should be impossible to disable it - we want the htmlcorrector to always run - then it shouldn't be available for deactivation via UI. |
It's run on the summary because the summary is automatically chopped at a
specific number of characters, and this often results in broken HTML. Is it
also run on the full body? If so that might ba an issue.
…On Sun, Oct 4, 2020, 11:52 PM indigoxela ***@***.***> wrote:
So even though that filter's disabled, it's still run.
That doesn't sound right.
I agree. If something can get disabled via UI, then we should adhere that
setting. Even if that means that markup is chopped off.
If it should be impossible to disable it - we want the htmlcorrector to
always run - then it shouldn't be available for deactivation via UI.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#308 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AADBERYJL335HEA7WWTUXP3SJFUMVANCNFSM4ATJIXYQ>
.
|
Tested this with a page using the Raw HTML format (with "Correct faulty and chopped off HTML" turned off). Here was my page body text, ending in an unclosed
Here is the HTML as rendered:
So it looks like the HTML was corrected anyhow, at least according to browser dev tools. (But it's not being done by I put this page into a view, showing the body trimmed to 100 characters, and here's the HTML as rendered:
The HTML is properly corrected—the trimmed-off So it looks like this issue—fixing trimmed summaries—is being properly addressed. |
Ah, with the correction filter turned off, Safari Developer Tools still balances the tags. But putting a breakpoint in Anyhow, to confirm, after the patch:
Ergo, WFM. Code reviewed, LGTM. |
By @BWPanda, @jenlampton, @indigoxela, @klonos, @jlfranklin, and @bugfolder.
By @BWPanda, @jenlampton, @indigoxela, @klonos, @jlfranklin, and @bugfolder.
Thanks @bugfolder for pushing this one forward! I read over this issue and the PR makes sense to me. A simple fix for an edge-case situation. Thanks @BWPanda for the long-ago PR. I'm glad we got one more ancient issue closed! This was opened before Backdrop 1.0 was even released! Merged backdrop/backdrop#3072 into 1.x and 1.26.x. backdrop/backdrop@b699b85 by @BWPanda, @jenlampton, @indigoxela, @klonos, @jlfranklin, and @bugfolder. |
The current state of the system often generates invalid HTML, such as summaries that end in the middle of a closing HTML tag.
The solution should ensure that auto-generated summaries contain valid HTML, and alters the tests to match. Link on d.o
The text was updated successfully, but these errors were encountered: