Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error with parsing when HTML tags uppercased #26

Closed
miso-belica opened this issue Feb 5, 2014 · 4 comments
Closed

Error with parsing when HTML tags uppercased #26

miso-belica opened this issue Feb 5, 2014 · 4 comments

Comments

@miso-belica
Copy link
Contributor

Hi,
I discovered some weird behavior at this page http://rayer.g6.cz/. I also pasted source HTML here http://pastebin.com/FQjSEGCK .

Everything from the text in html > head > title is escaped (even </TITLE> tag). I find out that if I use function strtolower like this \HTML5::loadHTML(strtolower($html)) HTML is parsed correctly. Can you look at this please?

Thank you for your work - I can parse HTML also in PHP finally :)

@mattfarina
Copy link
Member

@miso-belica thanks. We'll take a look into this.

@mattfarina
Copy link
Member

I think I've found the problem. I'm not yet sure of the fix though. In the tokenizer when processing raw text you have:

protected function rawText() {
    if (is_null($this->untilTag)) {
      return $this->text();
    }
    $sequence = '</' . $this->untilTag . '>';
    $txt =  $this->readUntilSequence($sequence);
    $this->events->text($txt);
    $this->setTextMode(0);
    return $this->endTag();
  }

At this point untilTag has been converted to lowercase but the future closing tag is uppercase. This creates a mismatch.

mattfarina added a commit that referenced this issue Feb 8, 2014
…nd normalizing tag names to lowercase (per 8.2.4.9) except for SVG foreign tags that are case sensitive.
@mattfarina
Copy link
Member

@miso-belica I think this fixes it. Please let me know if uppercase tags are still a problem.

@miso-belica
Copy link
Contributor Author

I think it's OK now. Thanks 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants