Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOCTYPE lost when the template is a complete document #10

Closed
papandreou opened this issue Jul 9, 2014 · 14 comments
Closed

DOCTYPE lost when the template is a complete document #10

papandreou opened this issue Jul 9, 2014 · 14 comments
Assignees

Comments

@papandreou
Copy link
Contributor

$ node -p -e 'new (require("htmlizer"))("<!DOCTYPE html>\n<html><head></head><body></body></html>").toString()'

<html><head></head><body></body></html>

Expected output:

<!DOCTYPE html>
<html><head></head><body></body></html>
@Munawwar
Copy link
Owner

Munawwar commented Jul 9, 2014

Ah, jQuery.parseHTML("<!DOCTYPE html>\n<html><head></head><body></body></html>", document, true) is stripping the doctype. Hmm...

@papandreou
Copy link
Contributor Author

Well, then at least it's consistent in the browser vs. in node :)

papandreou added a commit to papandreou/htmlizer that referenced this issue Nov 24, 2014
@papandreou
Copy link
Contributor Author

Would be really nice to get this fixed for when a complete document is rendered on the server. I added a failing test here: https://github.com/papandreou/htmlizer/tree/doctype

@Munawwar
Copy link
Owner

jQuery strips the doctype, html, head and body tags.
I think to fix this I might need a HTML parser written from scratch (I have started some work on it with Pure-JavaScript-HTML5-Parser. The HTML parser is far from perfect. Unfortunately I don't get time to work on it these days.). Or somehow specifically detect these cases.

@papandreou
Copy link
Contributor Author

Why would you need one written from scratch? There's plenty of good, existing ones such as https://github.com/fb55/htmlparser2/ and https://github.com/inikulin/parse5.

@Munawwar
Copy link
Owner

Those run only on node. The same parser should support browsers as well. node-htmlparser is an alternative, but it has the same issue.

@papandreou
Copy link
Contributor Author

Oh... Maybe they work with browserify?

@Munawwar
Copy link
Owner

Hmm...I took a step back. Detecting doctype is mostly as server-side use case. So I'll try to solve this only for nodejs using jsdom instead of jquery.parseHTML.

The solution I had in mind is to use document.write(markup). Unfortunately htmlparser2 (which jsdom uses internally) also has the same issue of just ignoring the doctype. I created an issue with htmlparser2.
Meanwhile I'll workaround it using a regex and document.implementation.createDocumentType.

@Munawwar Munawwar self-assigned this Dec 11, 2014
@papandreou
Copy link
Contributor Author

I'm pretty sure jsdom saves the doctype as document.doctype, so the parser must support it. I also think I recall that newer (1.0.0+?) versions of jsdom include it when reserializing a document, so maybe a jsdom upgrade could shut me up.

@Munawwar
Copy link
Owner

Ah, but jsdom always creates html, head and body tag, even for HTML fragments (it makes sense because the return type of their APIs are always a Document). So we have jquery removing these elements and jsom forcefully adding them. Damnit.
So the easiest way out is to use htmlparser2 and create the DocumentFragment manually.

@Munawwar
Copy link
Owner

Summarizing this:
Upgrading to jsdom 3.1.2 makes jQuery.parseHTML removes doctype,html,body and head tags.
On jsdom 0.10.3 jQuery.parseHTML removes doctype.
jsdom.jsdom() function only returns a Document type, hence it will always have html,head,body tag even for html fragments.
DOMParser is available on modern browsers but not available with jsdom.

So I am going to experiment with neutron-html5parser on a branch.

@Munawwar
Copy link
Owner

This one will be fixed with Htmlizer v2.

@papandreou
Copy link
Contributor Author

Fantastic, thanks for keeping it in mind :)

@Munawwar
Copy link
Owner

Fixed with v2.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants