New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sanitize-html incorrectly recognizes (less than)(equals) as a starting tag. #161
Comments
Interesting. The HTML5 spec suggests you're right that <= should be something a valid parser can tolerate as text (although it's bad syntax in the strictest sense): https://www.w3.org/TR/html5/syntax.html#tag-open-state However this is at the level of the htmlparser2 module that sanitize-html is built upon, so I would recommend reporting it there and linking that ticket here. |
Maybe it's time to switch to parser5. htmlparser2 appears to be dead. Very little activity in the last year and a lot of open tickets. |
Not a bad idea, how about opening a ticket to evaluate parser5 as a more
actively maintained replacement?
…On Tue, Sep 12, 2017 at 3:47 PM, ELadner ***@***.***> wrote:
Maybe it's time to switch to parser5. htmlparser2 appears to be dead. Very
little activity in the last year and a lot of open tickets.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#161 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAB9fUq10t3kYonW6PweACxdcigAHdgXks5sht_KgaJpZM4PSDq_>
.
--
*THOMAS BOUTELL, *SUPPORT LEAD
P'UNK AVENUE | (215) 755-1330 | punkave.com
|
Also experiencing this and not even closing the end tag. Happens on version Example: input: "<meh some text" // used as:
sanitizeHtml(value, {
allowedTags: ['a'],
allowedAttributes: {
'a': ['href', 'target', 'onclick']
}
}); |
We'll have to either get it fixed upstream in htmlparser2, or build out
parser5 support, including legacy support for htmlparser2 options if we can
for bc, or else bump to 2.x. This is a significant undertaking.
…On Thu, Mar 29, 2018 at 10:13 AM, Sabin Tudor ***@***.***> wrote:
Also experiencing this and not even closing the end tag. Happens on
version 1.16.1 and up, occurs on 1.18.2 also.
Example:
input: "<meh some text"
output: "" (empty string)
// used as:sanitizeHtml(value, {
allowedTags: ['a'],
allowedAttributes: {
'a': ['href', 'target', 'onclick']
}
});
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#161 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAB9fRLQAq0VG44pl5RMJd5T-1Ka2sblks5tjOvvgaJpZM4PSDq_>
.
--
*THOMAS BOUTELL, CHIEF SOFTWARE ARCHITECT*
P'UNK AVENUE | (215) 755-1330 | punkave.com
|
The upstream ticket got no traction. The last comment from the developer was "Please refer to inikulin/parse5 if you need a spec-compliant parser." |
Another option would be to write a parallel plugin (e.g. sanitize-html5) that's parse5 based but has the same basic structure has sanitize-html. This would remove backward compatible requirements from the existing plugin and give users a migration path (i.e. uninstall the old plugin, install the new plugin). Future versions of the existing plugin could nudge users to upgrade to the new one, then, finally, the old one can be dropped. |
I think we could accomplish the same thing by following semantic versioning
- 2.x is parse5, 1.x is htmlparser2.
…On Mon, Apr 2, 2018 at 2:40 PM, ELadner ***@***.***> wrote:
Another option would be to write a parallel plugin (e.g. sanitize-html5)
that's parse5 based but has the same basic structure has sanitize-html.
This would remove backward compatible requirements from the existing plugin
and give users a migration path (i.e. uninstall the old plugin, install the
new plugin). Future versions of the existing plugin could nudge users to
upgrade to the new one, then, finally, the old once can be dropped.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#161 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAB9ffOnKTm-9HUrrNshwXlWqw-8L-9rks5tknCtgaJpZM4PSDq_>
.
--
*THOMAS BOUTELL, CHIEF SOFTWARE ARCHITECT*
P'UNK AVENUE | (215) 755-1330 | punkave.com
|
I wondered if any progress had been made on this, or if there was a known workaround? Thanks |
Unfortunately not so far. |
I could pursue it as Apostrophe enterprise support, otherwise it's a bit
far out on my roadmap right now unfortunately. Of course it doesn't have to
be done here necessarily - would love to see a PR for a 2.x version on
parse5 that maintains bc except, of course, for passing on options directly
to the parser.
…On Wed, Nov 14, 2018 at 5:17 AM TJPHopkins ***@***.***> wrote:
I wondered if any progress had been made on this, or if there was a known
workaround? Thanks
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#161 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAB9ffLA7GiGMzAMeP4W_pbo4wM1u7v8ks5uu-2igaJpZM4PSDq_>
.
--
*Thomas Boutell, Chief Software Architect*
P'unk Avenue | (215) 755-1330 | punkave.com
|
This is something that has to be supported by the underlying parser module, which we did not write. I would recommend opening an issue on htmlparser2 regarding offering options to tolerate this kind of input, after first checking to see if the option already exists; it is possible to pass options to htmlparser2 with this module. It just can't be done here. |
(Now that htmlparser2 is receiving updates again it is unlikely we'll switch to an entirely different parser.) |
I'm facing the same problem guys. Parser recognizes 'less than' symbol as start of a tag. Is there any possible way to fix this? |
Explore the options of |
Closing this because it is really an |
Late to the conversation but can we use this approach as a workaround? I got idea from npm package page. Will it work replacing <= to %lte; explicitely and then would it sanitize the data as expected?
|
Probably not because you're "downstream" from htmlparser2 having already
made this decision.
…On Mon, Nov 9, 2020 at 7:12 AM Rushabh Shah ***@***.***> wrote:
Closing this because it is really an htmlparser2 question; if the parser
we're relying upon supported it, then we would support it without
modification. (I'm not casting any shade here or suggesting that
htmlparser2 should or should not support it.)
Late to the conversation but can we use this approach as a workaround? I
got idea from npm package page. Will it work replacing *<=* to *%lte;*
explicitely and then would it sanitize the data as expected?
@boutell <https://github.com/boutell>
textFilter: function(text, tagName) {
return text.replace('<=', '<e;');
}
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#161 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAAH27MZ22DA6N6576KRS7DSO7MDZANCNFSM4D2IHK7Q>
.
--
THOMAS BOUTELL | CHIEF TECHNOLOGY OFFICER
APOSTROPHECMS | apostrophecms.com | he/him/his
|
In vanilla NodeBB, any combination of <, >, <=, >= can be entered in a post and the results are rendered correctly.
With sanitize-html installed, < and > are handled correctly, but using <= in a post will treat it as an HTML start tag and not include anything beyond the symbol combination.
This, for example: "this <= is a >= test" renders as "this = test"
The text was updated successfully, but these errors were encountered: