Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

innerHTML: Empty HTML tags are incorrectly rewrite #174

Open
jnlin opened this issue Oct 21, 2015 · 7 comments
Open

innerHTML: Empty HTML tags are incorrectly rewrite #174

jnlin opened this issue Oct 21, 2015 · 7 comments

Comments

@jnlin
Copy link

jnlin commented Oct 21, 2015

Hi,

We use querypath version 3.0.4 and found a weird behavior: empty HTML tags are incorrectly rewrite. For example:

$html = '<div><ins></ins></div>';
echo html5qp($html)->innerHTML();

The result of the above code is:

<div>
  <ins/>
</div>

But the correct result should be:

<div>
   <ins>
   </ins>
</div>

Another example:

$html = '<div></div>';
echo htmlqp($html)->innerHTML();

The incorrect result:

<body>
    <div/>
</body>
@technosophos
Copy link
Owner

If you do just ->html(), does it look right? I'm trying to figure out if the output formatter is using the HTML5 library, for falling back to the built-in HTML4/libxml library.

@jnlin
Copy link
Author

jnlin commented Oct 27, 2015

The result of ->html() is correct:

$html = '<div><ins></ins></div>';
echo htmlqp($html)->html();

Result:

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html><body><div><ins></ins></div></body></html>

The result of HTML5 library is good, too:

<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml"><div><ins></ins></div></html> 

@technosophos
Copy link
Owner

Okay. Sounds like innerHTML() is not using the right encoder. I'll look into it.

@technosophos
Copy link
Owner

So looking at the code...

  • html() will always send it through the HTML4 serializer
  • html5() will always send it through the HTML5 serializer
  • innerHTML will always send it through... the XML serializer. This was because of an old (still existing) bug in libxml.

I think that the right behavior at this point is to add an innerHTML5() function.

@technosophos technosophos changed the title Empty HTML tags are incorrectly rewrite innerHTML: Empty HTML tags are incorrectly rewrite Oct 29, 2015
@JoelESvensson
Copy link

Is there any temporary workaround for this until innerHTML5 is added? My use case is that I want to edit a fragment of HTML and change all the anchor links. Right now some tags gets truncated which messes up the presentation. (An empty <em /> tag is not handled well in at least Chrome)

@technosophos
Copy link
Owner

If you can drop in a bit of code, maybe we can figure something out.

@JoelESvensson
Copy link

The html looks like something like this:

<p>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Nullam ac dui at tellus semper sollicitudin.</p>
<img src="/relative-link-to-image.jpg">
<p>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Nullam ac dui at tellus semper sollicitudin.</p>

Lets say the above content is stored in a variable we'll call $content. The variable is then passed to a function
function cdnifyImages($content) which will replace certain attributes in certain elements all the elements and then return something like this:

<p>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Nullam ac dui at tellus semper sollicitudin.</p>
<img src="http://identifier.cloudfront.com/relative-link-to-image.jpg">
<p>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Nullam ac dui at tellus semper sollicitudin.</p>

My actual implementation is currently like this. It works unless empty elements exist

function cdnifyImages($html)
{
    $qp = html5qp($html);
    foreach ($qp->find('img') as $img) {
        $img->attr('src',  cdn($img->attr('src')));
    }

    return $qp->innerHTML();
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants