Prevent TextHelper::truncate() from breaking HTML #387

Merged
merged 2 commits into from Dec 30, 2011

Projects

None yet

3 participants

@shama

Fixes #2397

@markstory markstory and 1 other commented on an outdated diff Dec 25, 2011
lib/Cake/Test/Case/View/Helper/TextHelperTest.php
+ 'html' => true
+ ));
+ $expected = '<p><span style="font-size: medium;">El biógrafo de Steve Jobs, Walter
+Isaacson, explica porqué Jobs le pidió que le hiciera su biografía en
+este artículo de El País.</span></p>
+<p><span style="font-size: medium;"><span style="font-size:
+large;">Por qué Steve era distinto.</span></span></p>
+<p><span style="font-size: medium;"><a href="http://www.elpais.com/
+articulo/primer/plano/Steve/era/distinto/elpepueconeg/
+20111009elpneglse_4/Tes">http://www.elpais.com/articulo/primer/plano/
+Steve/era/distinto/elpepueconeg/20111009elpneglse_4/Tes</a></span></p>
+<p><span style="font-size: medium;">Ya se ha publicado la biografía de
+Steve Jobs escrita por Walter Isaacson "<strong>Steve Jobs by Walter
+Isaacson</strong>", aquí os dejamos la dirección de amazon donde
+podeís adquirirla.</span></p>
+<p><span style="font-size: medium;"><a>... </p></span></a>';
@markstory
markstory Dec 25, 2011

Aren't the close tags in the wrong order here? Seems like it should be </a></span></p>

@shama
shama Dec 25, 2011

Ack sorry I overlooked that. Nice catch!

@vindia

We've encountered the same bug today and wondered why use regular expressions to "parse" the HTML.

This patch fixes the bug for this particular instance, but it will still break on the following case:

<a href="#" title="Foo>">Bar</a>

Why not use a DOM parser to find text nodes, see if the contents of that node are within the threshold and if so, find the next text node (and repeat). Once the threshold is reached, you can remove all remaining elements from that point on.

@markstory
CakePHP member

I'm not a fan of text munging either, but while a DOM parser would be sufficient for HTML in an XML dialect, I'm not aware of any Xml parser than can handle HTML4. For example:

<ul>
<li>first
<li>second
</ul>
<p>paragraph
<p>second paragraph

is valid HTML, but will cause any XML parser to fail. I'm not aware of any actual HtmlDom implementations in PHP either. At least not at the language/c-extension level.

@markstory markstory merged commit a4e3790 into cakephp:2.0 Dec 30, 2011
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment