-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
script elements aren't always get terminated by </script> sequence #114
Comments
What exactly is the supposed behaviour here? Both Nokogiri and Oga seem to parse this exactly the same:
I don't see how Chrome and Firefox seems to disagree, but I'm not going to change Oga's lexer rules just to match this behaviour as it's much easier to just say "Everything except |
Actually, Nokogiri parses it as said in the spec: Nokogiri::HTML.fragment(File.read('file.html'))
=> #<Nokogiri::HTML::DocumentFragment:0x822 name="#document-fragment" children=[#<Nokogiri::XML::Element:0x820 name="script" children=[#<Nokogiri::XML::Text:0x81e "\n var example = 'Consider this string: <!-- <script>';\n console.log(example);\n</script>\n<!-- despite appearances, this is actually part of the script still! -->\n<script>\n ... // this is the same script block still...\n">]>]> |
Looking at the spec I'm not really convinced it makes sense altering Oga's behaviour. In fact, I'd argue that unless somebody knows the HTML spec by heart they'd actually expect Oga's behaviour, not what the spec/Nokogiri state/do. I'm also not really a fan of altering Oga to match badly explain legacy behaviour, e.g. as per this paragraph:
|
Having thought about this I'm going to leave things as is for the time being. If this is deemed important enough in the future I'll look into it again. |
http://www.w3.org/TR/html51/semantics.html item 4.12.1.2 contains interesting (aka weird) rules for closing script elements. In particular, take a look at the following code example from that section:
The states used in W3C document to handle this situation are, however, quiet complex. The spec lists 18 states related to script tag contents parsing:
8.2.4.6 Script data state
8.2.4.17 Script data less-than sign state
8.2.4.18 Script data end tag open state
8.2.4.19 Script data end tag name state
8.2.4.20 Script data escape start state
8.2.4.21 Script data escape start dash state
8.2.4.22 Script data escaped state
8.2.4.23 Script data escaped dash state
8.2.4.24 Script data escaped dash dash state
8.2.4.25 Script data escaped less-than sign state
8.2.4.26 Script data escaped end tag open state
8.2.4.27 Script data escaped end tag name state
8.2.4.28 Script data double escape start state
8.2.4.29 Script data double escaped state
8.2.4.30 Script data double escaped dash state
8.2.4.31 Script data double escaped dash dash state
8.2.4.32 Script data double escaped less-than sign state
8.2.4.33 Script data double escape end state
Not sure if Oga should support it or just say that it doesn't support such cases.
The text was updated successfully, but these errors were encountered: