Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

text_summary() should output valid HTML and Unicode text #308

Closed
jenlampton opened this issue Aug 22, 2014 · 14 comments · Fixed by backdrop/backdrop#3072
Closed

text_summary() should output valid HTML and Unicode text #308

jenlampton opened this issue Aug 22, 2014 · 14 comments · Fixed by backdrop/backdrop#3072

Comments

@jenlampton
Copy link
Member

The current state of the system often generates invalid HTML, such as summaries that end in the middle of a closing HTML tag.

The solution should ensure that auto-generated summaries contain valid HTML, and alters the tests to match. Link on d.o

@sentaidigital
Copy link

sentaidigital commented May 14, 2016

This is one of those things in Drupal that always left me scratching my head, and I agree that something should be done about it. I'm glad to see that others in the Drupal community thought the same.

There are two major patches in that d.o issue. The first (#76) is a simple solution that does a far better job than the current state, but doesn't handle a lot of corner cases or international sentence end punctuation or other stuff. But it's simple and works better.

The second one (#169) has serious performance issues as it builds a DOM to parse out the body text and tries to cover move international punctuation and stuff. The tests also fail because the newlines (\n) that should sometimes be removed from test cases are not.

I have branches where I've massaged both patches into Backdrop. I'm leaning towards committing #76 before diving down the rabbit hole that is #169.

Thoughts?

@sentaidigital
Copy link

And by "committing" I mean "pushing up to my fork and generating a pull request."

@ghost
Copy link

ghost commented Feb 13, 2020

I can reproduce this:

$text = "<p>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis finibus posuere purus, vitae condimentum elit cursus non. Phasellus diam sapien, maximus sit amet ipsum eget, interdum bibendum elit. Sed euismod, neque in elementum cursus, ante velit iaculis leo, a consequat quam dui non augue. Donec dapibus eleifend elementum. Duis scelerisque, nibh et blandit pharetra, libero massa viverra est, at bibendum mauris ante ornare odio. Nulla facilisi. Curabitur at est vestibulum, auctor nibh sit amet, fringilla justo. Donec aliquet aliquam lorem, in facilisis quam posuere vitae. Phasellus vestibulum turpis at mi imperdiet varius.</p>";
dpm(text_summary($text));

This produces:

<p>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis finibus posuere purus, vitae condimentum elit cursus non. Phasellus diam sapien, maximus sit amet ipsum eget, interdum bibendum elit. Sed euismod, neque in elementum cursus, ante velit iaculis leo, a consequat quam dui non augue. Donec dapibus eleifend elementum. Duis scelerisque, nibh et blandit pharetra, libero massa viverra est, at bibendum mauris ante ornare odio. Nulla facilisi. Curabitur at est vestibulum, auctor nibh sit amet, fringilla justo. Donec aliquet aliquam lorem, in facilisis quam posuere vitae.

(i.e. no closing <p> tag)

But dpm(text_summary($text, 'full_html')); works (adds the closing <p> tag).
So the issue seems to only happen when a text format is not provided to text_summary() (the default is NULL). The code confirms this:

// If the htmlcorrector filter is present, apply it to the generated summary.
if (isset($format->filters['filter_htmlcorrector'])) {
  $summary = _filter_htmlcorrector($summary);
}

So I think the easiest solution is to just run _filter_htmlcorrector() regardless (if that function's available).

Here's a PR: backdrop/backdrop#3072

@herbdool
Copy link

herbdool commented Jun 7, 2020

@BWPanda do you know if making a view of teasers is a good way to test? Not sure if Views is using the same function or has its own.

@ghost
Copy link

ghost commented Jul 29, 2020

@herbdool I haven't tested or looked through the code, but this comment seems to suggest that Views has its own trimming function, so it might not work for testing this...

@jenlampton
Copy link
Member Author

jenlampton commented Jul 29, 2020

A view should work as long as you choose the content: teaser option, and the teaser display is set to display summary or trimmed on the body field (which it is by default).

If you choose fields instead of content then you'll have the ability to use the views trimmer.

@ghost
Copy link

ghost commented Aug 17, 2020

I'm now wondering about this... If you have a text format where HTML corrector is turned off, it still produces proper HTML since $format->filters['filter_htmlcorrector'] is set (it just has 'status' => 0). So even though that filter's disabled, it's still run.

The only situation I can think of where this wouldn't be set at all, would be if you used the 'Plain text' processing option for the Body field. But in that case the HTML doesn't need to be valid because it's just escaped and output as actual text...

So are there any other scenarios where this would be an issue? I.e. how to reproduce the original issue (on an actual Backdrop site setup, not just calling the individual functions like I originally did)? @jenlampton?

@klonos
Copy link
Member

klonos commented Oct 4, 2020

So even though that filter's disabled, it's still run.

That doesn't sound right. If somewhere along the way we have intentionally made that change, then we should remove that filter from the text format options, so it doesn't show up in the UI. Either that, or then treat this as a bug and fix things, so that if the filter is disabled, it doesn't run.

@ghost
Copy link

ghost commented Oct 4, 2020

@klonos It's not that the text filter itself is run, it's just that it's used to clean up the code if it's present:

// If the htmlcorrector filter is present, apply it to the generated summary.
if (isset($format->filters['filter_htmlcorrector'])) {
  $summary = _filter_htmlcorrector($summary);
}

The isset() means that even if the filter has a value (being an array where 'status' = 0), it'll still be run in this case. I think that's on purpose; the equivalent of checking if a function exists before running it.

@indigoxela
Copy link
Member

So even though that filter's disabled, it's still run.

That doesn't sound right.

I agree. If something can get disabled via UI, then we should adhere that setting. Even if that means that markup is chopped off.

If it should be impossible to disable it - we want the htmlcorrector to always run - then it shouldn't be available for deactivation via UI.

@jenlampton
Copy link
Member Author

jenlampton commented Oct 5, 2020 via email

@bugfolder
Copy link

Tested this with a page using the Raw HTML format (with "Correct faulty and chopped off HTML" turned off).

Here was my page body text, ending in an unclosed <em>:

Suspendisse placerat vulputate lorem sodales <strong> fermentum maecenas pretium, purus id tempor fermentum, neque ligula blandit dui, sed</strong> tempus nisi lacus ac <em>mauris.

Here is the HTML as rendered:

<div class="field-item even">
<p>
Suspendisse placerat vulputate lorem sodales <strong> fermentum maecenas pretium, purus id tempor fermentum, neque ligula blandit dui, sed</strong> tempus nisi lacus ac <em>mauris.</em>
</p>
<em></em>
</div>

So it looks like the HTML was corrected anyhow, at least according to browser dev tools. (But it's not being done by _filter_htmlcorrector(); putting a breakpoint in that function and rendering the page doesn't break.)

I put this page into a view, showing the body trimmed to 100 characters, and here's the HTML as rendered:

<div class="field-content">
<p>
Suspendisse placerat vulputate lorem sodales <strong> fermentum maecenas pretium, purus id tempor</strong>
</p>
</div>

The HTML is properly corrected—the trimmed-off </strong> is added back.

So it looks like this issue—fixing trimmed summaries—is being properly addressed.

@bugfolder
Copy link

Ah, with the correction filter turned off, Safari Developer Tools still balances the tags. But putting a breakpoint in check_markup() to examine the text after running filters confirms that the page body is being generated without correction when the correction filter is turned off.

Anyhow, to confirm, after the patch:

  • Putting broken HTML into the page body using a format with "correct HTML" turned off results in broken HTML being rendered (as intended).
  • Creating a View that displays a summary of that page with the same text format results in corrected HTML.

Ergo, WFM. Code reviewed, LGTM.

@quicksketch
Copy link
Member

Thanks @bugfolder for pushing this one forward! I read over this issue and the PR makes sense to me. A simple fix for an edge-case situation. Thanks @BWPanda for the long-ago PR. I'm glad we got one more ancient issue closed! This was opened before Backdrop 1.0 was even released!

Merged backdrop/backdrop#3072 into 1.x and 1.26.x.

backdrop/backdrop@b699b85 by @BWPanda, @jenlampton, @indigoxela, @klonos, @jlfranklin, and @bugfolder.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants