Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non-inline elements being moved outside of inline containers automatically #68

Closed
wimleers opened this issue Dec 4, 2014 · 13 comments · Fixed by #70
Closed

Non-inline elements being moved outside of inline containers automatically #68

wimleers opened this issue Dec 4, 2014 · 13 comments · Fixed by #70

Comments

@wimleers
Copy link

wimleers commented Dec 4, 2014

In the current Drupal 8 test coverage, which still uses PHP's DomDocument (and hence makes assertions based on a XHTML POV), we have the following two assertions:

    $f = Html::normalize('<p>line1<br/><hr/>line2</p>');
    $this->assertEqual($f, '<p>line1<br></p><hr>line2', 'HTML corrector -- Move non-inline elements outside of inline containers.');

    $f = Html::normalize('<p>line1<div>line2</div></p>');
    $this->assertEqual($f, '<p>line1</p><div>line2</div>', 'HTML corrector -- Move non-inline elements outside of inline containers.');

The second still works with HTML5. The first doesn't.

Instead of moving the <hr> outside of the <p>, it keeps it inside:

<p>line1<br><hr>line2</p>

Looking at \MasterMinds\HTML5\Elements, I see:

        "hr" => 73, // NORMAL | VOID_TAG | BLOCK_TAG

So it's definitely marked as a block-level element. Which makes me suspect that HTML5 simply doesn't do this kind of clean-up, and that it's merely by accident (as a side-effect of some other parsing aspect) that the second test case is handled correctly.

Which makes me wonder if this is behavior only required for XHTML parsers and not HTML5 parsers?

@technosophos
Copy link
Member

I started to answer this, but once I got into the HTML5 spec, I became more confused about what the right handling for HTML5's hr is. It looks like hr now stands for "thematic break in content" and not "horizontal rule".

@goetas
Copy link
Member

goetas commented Dec 6, 2014

Looking at firefox and google chrome behaviour:

<p>line1<br/><hr/>line2</p>

becomes:

<p>line1<br></p><hr>line2<p></p>

looking at this, seems that we have to fix it!

@wimleers the problem is that your assertion says:

<p>line1<br></p><hr>line2

this is a third case... so.. which one? I will prefer the drupal one...

@goetas
Copy link
Member

goetas commented Dec 6, 2014

At the moment i'm changing home, so i can get into this just on January....

@technosophos
Copy link
Member

I agree. We should follow Chrome and Firefox. Looking at the parser, there are some basics in there for bad tag structures. I did them for nested paragraphs and stuff. I'll see if I can hook in there and fix this.

@goetas
Copy link
Member

goetas commented Dec 7, 2014

@technosophos thanks!

@twistor
Copy link

twistor commented Dec 8, 2014

I'm not quite sure this is the same problem, but it's close.
Another problem we're seeing in Drupal is:

<span><div></span><span></span></div>

Expected: This is the Firefox/Chrome output.

<span><div><span></span></div></span>

Actual:

<span><div></div></span><span></span>

@joejoseph00
Copy link

according to the latest tests by @babruix of the latest checkout of the html5-php library he's observing this behavior:
<?php Html::normalize('<p>line1<br/><hr/>line2</p>') ?>
becomes
<p>line1<br><hr>line2</p>
and he expects it to be
<p>line1<br/></p><hr/>line2

however after reading this issue queue, I'd expect it to be:
<p>line1<br></p><hr>line2
or
<p>line1<br></p><hr>line2<p></p>

Right now it looks like the non-inline elements aren't being moved outside the inline container

So, it seems to me that @babruix might not be aware of the Chrome/Firefox behaviour that the html5-php library is intending to mimick
AND
that the html5-php library is not correctly dealing with the non-inline elements and inline container in this case

Perhaps if someone can confirm the expected behavior @babruix can update his test case assertion and if the html5-php library is working as expected or not in this case.

See related comment in drupal.org https://www.drupal.org/node/1333730#comment-9491329

also take a look at the test results : https://qa.drupal.org/pifr/test/941363

@technosophos
Copy link
Member

Before we dive into special casing logic to satisfy Drupal unit tests... this parser is supposed to adhere to the HTML5 standard. I can't find any relevant portion of the spec that suggests how the above should be addressed. I cannot find a prohibition against using block-level tags inside of p. (In fact, there are examples of block-level tags inside of p in the spec.)

Here's the line I would like to draw: If the parser is compliant to the spec, it is technically correct. If Chrome or Firefox or IE or whatever chooses a different solution to mis-marked-up content, that doesn't make it the right choice.

I would just ask that when people issue requests to correct the parser, they back it up with a reference to the spec: http://www.w3.org/TR/2014/REC-html5-20141028/

@joejoseph00
Copy link

Hi Technosophos, thanks for the update, so you're saying the drupal unit test in question needs to be verified with the w3.org spec spec http://www.w3.org/TR/2014/REC-html5-20141028/ with regards to the failed unit test case we're observing. Thank you for your assistance

@technosophos
Copy link
Member

Yes. Or at least giving more of an explanation for why one reconstruction of a broken DOM is more "correct" than another.

@mattfarina
Copy link
Member

@joejoseph00 I noticed the expectation of <p>line1<br/></p><hr/>line2 you shared from @babruix is in an xml/xhtml form. html5 is not xml based. For example, instead of <br/> you have <br>. The former is a self closing format in xml. The latter is the html5 format.

Could it be you're dealing with markup as if it's xhtml/xml rather than html?

@DrColossos
Copy link

Could it be you're dealing with markup as if it's xhtml/xml rather than html?

We just discovered the same issue. We want to use just
or without a bracket and are not able to do this because of "errors" reminding us that this is not valid. We provide a proper HTML5 document though

@goetas @technosophos Should I open a new issue with this or is this related to this issue?

@technosophos
Copy link
Member

I think we should track it as a separate issue. Some sample code would be good, too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants