DOCTYPE lost when the template is a complete document #10

papandreou · 2014-07-09T09:57:50Z

$ node -p -e 'new (require("htmlizer"))("<!DOCTYPE html>\n<html><head></head><body></body></html>").toString()'

<html><head></head><body></body></html>

Expected output:

<!DOCTYPE html>
<html><head></head><body></body></html>

The text was updated successfully, but these errors were encountered:

Munawwar · 2014-07-09T11:39:12Z

Ah, jQuery.parseHTML("<!DOCTYPE html>\n<html><head></head><body></body></html>", document, true) is stripping the doctype. Hmm...

papandreou · 2014-07-09T11:58:41Z

Well, then at least it's consistent in the browser vs. in node :)

papandreou · 2014-11-24T12:41:30Z

Would be really nice to get this fixed for when a complete document is rendered on the server. I added a failing test here: https://github.com/papandreou/htmlizer/tree/doctype

Munawwar · 2014-11-24T20:25:03Z

jQuery strips the doctype, html, head and body tags.
I think to fix this I might need a HTML parser written from scratch (I have started some work on it with Pure-JavaScript-HTML5-Parser. The HTML parser is far from perfect. Unfortunately I don't get time to work on it these days.). Or somehow specifically detect these cases.

papandreou · 2014-11-24T21:43:05Z

Why would you need one written from scratch? There's plenty of good, existing ones such as https://github.com/fb55/htmlparser2/ and https://github.com/inikulin/parse5.

Munawwar · 2014-11-24T22:23:47Z

Those run only on node. The same parser should support browsers as well. node-htmlparser is an alternative, but it has the same issue.

papandreou · 2014-11-24T22:28:35Z

Oh... Maybe they work with browserify?

Munawwar · 2014-12-11T17:49:03Z

Hmm...I took a step back. Detecting doctype is mostly as server-side use case. So I'll try to solve this only for nodejs using jsdom instead of jquery.parseHTML.

The solution I had in mind is to use document.write(markup). Unfortunately htmlparser2 (which jsdom uses internally) also has the same issue of just ignoring the doctype. I created an issue with htmlparser2.
Meanwhile I'll workaround it using a regex and document.implementation.createDocumentType.

papandreou · 2014-12-11T18:15:37Z

I'm pretty sure jsdom saves the doctype as document.doctype, so the parser must support it. I also think I recall that newer (1.0.0+?) versions of jsdom include it when reserializing a document, so maybe a jsdom upgrade could shut me up.

Munawwar · 2014-12-11T18:47:44Z

Ah, but jsdom always creates html, head and body tag, even for HTML fragments (it makes sense because the return type of their APIs are always a Document). So we have jquery removing these elements and jsom forcefully adding them. Damnit.
So the easiest way out is to use htmlparser2 and create the DocumentFragment manually.

Munawwar · 2015-04-16T12:44:56Z

Summarizing this:
Upgrading to jsdom 3.1.2 makes jQuery.parseHTML removes doctype,html,body and head tags.
On jsdom 0.10.3 jQuery.parseHTML removes doctype.
jsdom.jsdom() function only returns a Document type, hence it will always have html,head,body tag even for html fragments.
DOMParser is available on modern browsers but not available with jsdom.

So I am going to experiment with neutron-html5parser on a branch.

Munawwar · 2015-10-14T11:07:56Z

This one will be fixed with Htmlizer v2.

papandreou · 2015-10-14T11:10:23Z

Fantastic, thanks for keeping it in mind :)

Munawwar · 2015-10-20T07:14:18Z

Fixed with v2.

papandreou added a commit to papandreou/htmlizer that referenced this issue Nov 24, 2014

Added failing test for Munawwar#10.

f44dbab

Munawwar self-assigned this Dec 11, 2014

Munawwar added a commit that referenced this issue Apr 16, 2015

Switced HTML parser to neutron-html5parser. Removed jQuery. #10.

39f179b

Munawwar added a commit that referenced this issue Apr 16, 2015

toString DOCTYPE. #10.

ba8d6fa

Munawwar mentioned this issue Jul 21, 2015

data-i18n in development #15

Closed

Munawwar closed this as completed Oct 20, 2015

Munawwar mentioned this issue Jun 18, 2016

Update jsdom in htmlizer 0.x #21

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DOCTYPE lost when the template is a complete document #10

DOCTYPE lost when the template is a complete document #10

papandreou commented Jul 9, 2014

Munawwar commented Jul 9, 2014

papandreou commented Jul 9, 2014

papandreou commented Nov 24, 2014

Munawwar commented Nov 24, 2014

papandreou commented Nov 24, 2014

Munawwar commented Nov 24, 2014

papandreou commented Nov 24, 2014

Munawwar commented Dec 11, 2014

papandreou commented Dec 11, 2014

Munawwar commented Dec 11, 2014

Munawwar commented Apr 16, 2015

Munawwar commented Oct 14, 2015

papandreou commented Oct 14, 2015

Munawwar commented Oct 20, 2015

DOCTYPE lost when the template is a complete document #10

DOCTYPE lost when the template is a complete document #10

Comments

papandreou commented Jul 9, 2014

Munawwar commented Jul 9, 2014

papandreou commented Jul 9, 2014

papandreou commented Nov 24, 2014

Munawwar commented Nov 24, 2014

papandreou commented Nov 24, 2014

Munawwar commented Nov 24, 2014

papandreou commented Nov 24, 2014

Munawwar commented Dec 11, 2014

papandreou commented Dec 11, 2014

Munawwar commented Dec 11, 2014

Munawwar commented Apr 16, 2015

Munawwar commented Oct 14, 2015

papandreou commented Oct 14, 2015

Munawwar commented Oct 20, 2015