-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Heading node should "eat" the following whitespace and newline #55
Comments
I see what you're saying. Good suggestion; I'll get on it at some point soon. |
The real problem here is that, while |
I think this brings up the problem of comments as whitespace as well - from what I can tell MediaWiki parses |
Yes, that seems correct. MediaWiki works by first substituting templates and removing HTML comments before it converts headings into real Since mwparser works in a fundamentally different way and we can never determine what the parse tree should truly look like, I think it's better to be safe than sorry with regards to determining whether something is a real heading or not. To that effect, I think |
That sounds like a good strategy (though |
This also causes problems when encountering things like:
in the middle of a paragraph. This is interpreted as a heading 1 when it should be continuation of text. That's from https://en.wikipedia.org/w/index.php?title=Wikipedia:Teahouse/Questions/Archive_296&action=edit. The proposed solution in #55 (comment) should cover this |
Also note that the amount of blank lines after a heading does not matter, e.g.
is parsed by MediaWiki as
|
When headings are parsed, as far as I know, the wiki software requires that nothing except whitespace be on the line following the heading. So
Will be parsed as a heading by the wiki, but
is interpreted as raw text.
mwparserfromhell will emit a separate text node containing the following newline and any preceding spaces. But it's possible to remove this node, which then results in a parse tree that can't actually exist in the wiki: a Heading node without a following Text node beginning with a newline. The following newline and the preceding whitespace should really be implicit in the heading, so it should be "eaten" by the Heading node, rather than be converted into a separate text node. Maybe any whitespace should be preserved, but if so, it should be possible to strip it from the Heading node.
The node following the heading should be the first node on the next line, not the newline. If any non-whitespace intervenes between the heading and the newline, mwparserfromhell should not emit a heading at all but should parse it as inline text (possibly containing templates and such), just like the wiki software does.
The text was updated successfully, but these errors were encountered: