New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTML5 Syntax Checking doesn't work on some html5 elements #944

Closed
Naatan opened this Issue Dec 10, 2015 · 16 comments

Comments

Projects
None yet
4 participants
@Naatan
Member

Naatan commented Dec 10, 2015

Reference: http://forum.komodoide.com/t/komodo-xhtml5-doesnt-recognize-aside-figure-etc/2353

Problem is likely that our version of HTML Tidy is outdated. Or we need to default to using html5lib for syntax checking.

@Naatan Naatan added the Type: Bug label Dec 10, 2015

@Naatan Naatan added this to the 9.3.3 milestone Dec 10, 2015

@babobski

This comment has been minimized.

Contributor

babobski commented Dec 15, 2015

As a work around you can add the missing html5 elements true a config file, created a gist with the config file. If you safe this as tidy.conf include the file _preferences > Syntax Checking_ select html (or html5) and browse for the file and safe your preferences
snap 2015-12-15 at 22 39 02
If you now check the syntax the lint errors are gone.
(more information about config file can be found here)

@Naatan

This comment has been minimized.

Member

Naatan commented Dec 16, 2015

That's awesome, thanks for sharing @babobski !

@garretwilson

This comment has been minimized.

garretwilson commented Dec 16, 2015

The lint errors are gone, yes---thank you @babobski . But I had already got the lint areas to go away temporarily switching to html5lib. I had wanted to go back to HTML Tidy to get the warnings back (see @babobski 's comment elsewhere, but unfortunately this config file doesn't update the warnings and gives me all sorts of inappropriate messages like <aside> not yet being approved. (I cannot fathom how this Tidy was allows to get so out of date. Does no one use HTML5 with KomodoEdit?)

In other news, @babobski pointed out that I can use the input-xml: true option to check the well-formedness of the file, so for the meantime I will be using @babobski 's config file with the addition of:

input-xml: true
lower-literals: true

Things should go much smoother once we get the new version of HTML Tidy. Good luck, @Naatan !

So really, @Naatan , the only real option is for you to update Tidy. The config file, although nice, doesn't allow me to turn on warnings. And switching to html5lib doesn't allow me to turn on warnings.

@babobski

This comment has been minimized.

Contributor

babobski commented Dec 16, 2015

Does no one use HTML5 with KomodoEdit?

Yes, i do 😄 and linting is working just fine, only when you drop a xml decleration at the top the linting get's off(without the config file). You can test this by creating a new file from template choose html(default is html5) you also got autocompletion for html5 element if you're file language is html5

@garretwilson

This comment has been minimized.

garretwilson commented Dec 16, 2015

Yes, i do...

My point is that no one discovered until now that the HTML Tidy version is so out of date that it doesn't recognize <figure> and thinks <aside> hasn't been finalized. So my question should have been, "Does no one use the new HTML5 elements with KomodoEdit?" I'm guessing you don't use <figure> or <aside>---or you were just content to live with erroneous error message. 😉

@babobski

This comment has been minimized.

Contributor

babobski commented Dec 16, 2015

You can test this

Please do.

The linting on html5 is not going of on <figure> or <aside> see screenshot:
snap 2015-12-16 at 19 38 55

While using the same code in xhtml(1.0) file it will show the lint errors on the html5 elements as aspected.
snap 2015-12-16 at 19 42 36

And the same for html 4.0
snap 2015-12-16 at 19 44 12

Only when you drop a xml deceleration at top of the already correct html5 file, it's not recognizing html5 as it looks like it's falling back to html 4
snap 2015-12-16 at 19 52 20

The problem looks like it more has to do with the xhtml5 especially with the xml declaration, than it does with html5, if i take the xhtml make it xhtml5 set the file language to html5, drop the xml declaration, I'm getting the correct linting result for html5 and xhtml
snap 2015-12-16 at 20 03 45

@mitchell-as mitchell-as modified the milestones: 10.0, 9.3.3 Jan 25, 2016

@mitchell-as

This comment has been minimized.

Member

mitchell-as commented Jan 25, 2016

Bumping to 10.0 since updating a core component (Tidy) is not appropriate for a bugfix series. The update may adversely affect our build process since it's a 3rd party tool.

@babobski

This comment has been minimized.

Contributor

babobski commented Jan 28, 2016

@mitchell-as Personally I think this isn't a bug but just a case of using the wrong syntax.

I had the discussion with @garretwilson about what is the correct syntax, I asked how to properly syntax you're xhtml5 files. Garret didn't came up with a solid reference how to syntax you're xhtml5 files, he gave me this explanation:

XHTML5 is HTML5 that doesn't cut corners. Lint it like you would HTML5, throw in the even-more-strict rules of XML (i.e. don't allow <br> but require <br/>, both of which would be valid HTML5), and don't freak out when you see an XML declaration like <?xml version="1.0" encoding="utf-8"?>. That's pretty much it.

Because I wanted to be sure how to correctly syntax xhtml5 files, i went on a research.
Garret did referenced to the html5 specs and the section 1.6 HTML vs XHTML, in this section you will find a link to the xhtml5 syntax.
Only this section doesn't describe how to properly write you're xhtml5 syntax,

This specification does not define any syntax-level requirements beyond those defined for XML proper.

When you look at the reference section of html5, you will see the a reference called polyglot.
The name polyglot did popup once more before I did my research, I referenced to a article about xhtml5 "Thinking XML: The XML flavor of HTML5", in the resources section you will find:

"Polyglot Markup: HTML-Compatible XHTML Documents"(W3C HTML Working Group, June 2010): Read this recently published Working Draft with a more rigorous basis for XHTML5.

So what is polyglot:

It is sometimes valuable to be able to serve HTML5 documents that are also well formed XML documents. An author may, for example, use XML tools to generate a document, and they and others may process the document using XML tools. The language used to create documents that can be parsed by both HTML and XML parsers is called polyglot markup. Polyglot markup is the overlap language of documents that are both HTML5 documents and XML documents. It is recommended that these documents be served as either text/html (if the content is transmitted to an HTML-aware user agent) or application/xhtml+xml (if the content is transmitted to an XHTML-aware user agent). Other permissible MIME types are text/xml, application/xml, and any MIME type whose subtype ends with the four characters "+xml". [XML-MT]

This document describes how to write xhtml5 markup what results in valid html5 and valid xml.
In the polyglot markup a XML declaration is forbidden:

4.1 Processing instructions and the XML declaration
Processing instructions and the XML declaration are both forbidden in polyglot markup.

As I described above, Komodo is linting correct for html5 and xml if you don't drop a XML declaration in the document.
So my first conclusion is Komodo is linting correct for the polyglot markup(xhtml5)

The only thing about the polyglot markup is that it currently is obsolete
snap 2016-01-09 at 18 42 19

My research didn't stopped there, I went true the w3c specifications to find any reference to xhtml5.
snap 2016-01-28 at 18 51 48
Only the now obsolete polyglot describes how to syntax xhtml5 files, I only found a xhtml5 example in the "Internationalization Tag Set (ITS) Version 2.0", here you can find xhtml5 examples( Example 61, Example 76) using the polyglot markup (not using a XML declaration).

So from my research i can conclude that there are no current specs(completed works) describing the xhtml5 syntax. The references to xhtml5 are shown without the XML declaration and in the polyglot markup this is forbidden.
So my final conclusion is Komodo is linting correct for xhtml5, dropping a XML declaration just isn't correct xhtml5, when you have a XML declaration in you're xhtml5 file it will fall back to html4, witch i think is correct, a xhtml5 document shouldn't start with a XML declaration. This is only the case with xhtml 1.0 (only supports html 4) and xhtml 1.1 (also doesn't support the full html5 features)

@garretwilson

This comment has been minimized.

garretwilson commented Jan 28, 2016

Tell me again @babobski where you found some that said XHTML5 and "polyglot" are the same thing? Because they are not. In fact the reason it is called "polyglot" is because the document would be both acceptable XHTML5 or non-XML HTML5 (which is where the "poly" part comes in).

Since you are starting from the flawed premise that they are the same thing, then that throws off your conclusions:

So my first conclusion is Komodo is linting correct for the polyglot markup(xhtml5).

Perhaps you are trying to say that Komodo is linting correct[ly] for polyglot markup. Perhaps it is; I haven't checked But that is only a subset of all XHTML5 documents.

So your first conclusion is wrong from the start, as you are conflating the two concepts.

So my final conclusion is Komodo is linting correct for xhtml5, dropping a XML declaration just isn't correct xhtml5, when you have a XML declaration in you're xhtml5 file it will fall back to html4, witch i think is correct, a xhtml5 document shouldn't start with a XML declaration.

I frankly don't even know what that sentence says semantically; it seems to be several sentences together, and I'm not clever enough to parse them all out.

The long and short of it: XHTML5 allows the XML declaration (but do not require it), even though polyglot documents prohibit it. If Komodo doesn't correctly lint a file containing an XML declaration, there are at least some XHTML5 documents Komodo does not correctly lint. (Note that I have made no assertions about whether Komodo correctly lints polyglot documents; I haven't tested this.)

@babobski

This comment has been minimized.

Contributor

babobski commented Jan 28, 2016

Tell me again @babobski where you found some that said XHTML5 and "polyglot" are the same thing?

I don't say they are the same,I'm saying there are currently no specs describing that the xml tag is allowed in xhtml5 and the only specs i found describing how to write valid xhtml5 is the polyglot markup.
I didn't find any other html or xhtml specs that describes the xhtml5 syntax, than the now obsolete polyglot syntax.
I'm only discussing the use of a XML declaration in xhtml5.

The long and short of it: XHTML5 allows the XML declaration (but do not require it)

I didn't find a spec that says that, i only found the polyglot syntax that doesn't allow the XML declaration (if you can proof me wrong you're welcome).
In the other resources on the web i never see a XML declaration in the
In all the examples.

If you look at the answer in the discussion "Is XHTML5 dead or is it just an synonym of HTML5?", it is confirming what I'm saying:

In 2012 at the moment of writing, it was clear that W3C decided to abandon XHTML for HTML 5. This decision was motivated by several reasons:

  • Only few people were really interested in XHTML.
  • Most of the websites were written in plain HTML.
  • Even fewer really understood what XHTML is about and how to use it.
  • Too many websites which pretended to serve XHTML used wrong headers, instead of Content-Type: application/xhtml+xml.
  • Even when you fully understand what XHTML is and what must be the headers, the thing is really tricky with some crappy browsers not accepting/supporting application/xhtml+xml content type. This meant that you had to change the header according to the browser.
  • The XML part of XHTML also caused some weird situations the developers had to solve. One is INVALID_STATE_ERR: DOM Exception 11 message appearing when you assign the text containing HTML characters (like é) to an element within the XHTML page. When you encounter this error with its very helpful message in a large web application after doing an AJAX request, you have really no idea if it's the fault of JQuery, AJAX, or something else.
  • Writing HTML 5 code doesn't mean mixing up tags all around. If you're passionate about XML and XHTML, you can still write HTML 5 code which will look very close to XML.
  • In the early days of mobile phones, XHTML was interesting for the mobile devices which were not very powerful. Parsing XML is much easier than HTML. Now, with dual-core mobile devices, it really doesn't matter if they have to parse clean valid XML or dirty HTML full of hacks and mixed tags.

The spec of October 2014 mentions XHTML syntax. For the moment, it is unclear whether there is such a thing as the new XHTML language (not syntax), and if there is, what will be the position of XHTML, nor the adoption of the new XHTML standard by the mainstream browsers.

There are currently no specs for a xhtml5 language, only the obsolete polyglot language.

If you use the (x)html5 validator and use this template:

<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html>
<html xmlns="https://www.w3.org/1999/xhtml">
  <head>
    <meta charset="utf-8" />
    <title>Example XHTML5 document</title>
  </head>
  <body>

  </body>
</html>

You will get the following errors and warnings:

snap 2016-01-28 at 22 34 46
While if you omit the XML declaration it will lint correct.

Garret didn't came up with a solid reference

And you're still not doing that.

I'm sharing this because i research about xhtml5 when I was in a previous discussion with you. And i wanted to share my findings on the xhtml5 because i still think the XML declaration doesn't belong there if you're validation for (x)html5 and komodo is linting correct for xhtml5 if you omit the XML declaration.

I frankly don't even know what that sentence says semantically; it seems to be several sentences together, and I'm not clever enough to parse them all out.

Yep I'm dutch, and my English is not always that great, also my dutch is terrible because I'm dyslexic (that mean's I'm more creative than pernickety). But I don't think you're that stupid that you don't understand what I'm trying to say.
Also I find you're tone, have negative impact I can't appreciate.

@Naatan

This comment has been minimized.

Member

Naatan commented Jan 29, 2016

At the end of the day, what you CAN and CANNOT do with a given spec is up to the linter. All we seek to facilitate is that the linter Komodo uses and supports allows you to configure it in the way that is relevant to the spec that you are using.

Put another way; we don't really care what the differences between 2 specs are, as long as our linter supports them. Now if a given spec really is deprecated and barely gets any use anymore then that may change things..

Either way, the todo for this ticket remains the same: Update HTML tidy to get better support for HTML specs.

@garretwilson I know you mean well, but the fact that your tone is too direct for a Dutchman is quite telling ;) I'd appreciate if you could try to formulate your messages with a friendlier tone.

@mitchell-as

This comment has been minimized.

Member

mitchell-as commented Mar 21, 2016

Re-filing as an Enhancement instead of a bug. Newer versions of Tidy use CMake to build, and our build bots do not have CMake installed. At our current schedule this may not make it into 10.0.

@garretwilson

This comment has been minimized.

garretwilson commented Mar 21, 2016

Yep I'm dutch, and my English is not always that great, also my dutch is terrible because I'm dyslexic (that mean's I'm more creative than pernickety).

Hey, @babobski . I confess I didn't even see your sentence above until just now. I just wanted to say that I in no way intended to disrespect your being Dutch or your being dyslexic---I wasn't aware of either when I wrote that stuff.

You say that you are more creative than pernickety. (Pernickety is a good word---I have never used it before!) I on the other hand am way more on the "pernickety" side---that's just how my brain is wired. So that means I excel it picking through and comprehending specifications, and I've implemented XML, HTML, HTTP, and other parsers from scratch. That's simply the sort of thing I'm good at.

But being "pernickety" if I'm not careful also makes me grumpy when I see so many tools that apparently ignore the specifications and just throw things together "good enough" so they work sometimes, even if they don't match the specifications. I'll try to work on that "negative tone", but I just wanted to let you know that nothing was personal and it just came from my being pernickety.

P.S. Thanks again for the great English word. I think I'll use it a lot now. 😉

@Naatan

This comment has been minimized.

Member

Naatan commented Mar 21, 2016

@mitchell-as would using the last possible version of Tidy that wasn't on CMake be feasible?

@mitchell-as

This comment has been minimized.

Member

mitchell-as commented Mar 21, 2016

Nope, because the last version was circa 2009 and I cannot even find it. Tidy has received new stewardship in the last few years.

@Naatan Naatan modified the milestones: 10.0, 10.1 Apr 28, 2016

@mitchell-as

This comment has been minimized.

Member

mitchell-as commented Jun 23, 2016

I finally discovered this was a bug in our HTML linter. It turns out we were forcing XHTML files to be linted as HTML, even if they are XHTML5 or set to HTML5 in Komodo. There's no need to update HTML Tidy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment