Sanitize-html incorrectly recognizes (less than)(equals) as a starting tag. #161

ELadner · 2017-09-09T17:16:50Z

In vanilla NodeBB, any combination of <, >, <=, >= can be entered in a post and the results are rendered correctly.

With sanitize-html installed, < and > are handled correctly, but using <= in a post will treat it as an HTML start tag and not include anything beyond the symbol combination.

This, for example: "this <= is a >= test" renders as "this = test"

boutell · 2017-09-11T13:50:06Z

Interesting. The HTML5 spec suggests you're right that <= should be something a valid parser can tolerate as text (although it's bad syntax in the strictest sense):

https://www.w3.org/TR/html5/syntax.html#tag-open-state

However this is at the level of the htmlparser2 module that sanitize-html is built upon, so I would recommend reporting it there and linking that ticket here.

ELadner · 2017-09-12T19:47:22Z

Maybe it's time to switch to parser5. htmlparser2 appears to be dead. Very little activity in the last year and a lot of open tickets.

boutell · 2017-09-12T21:15:39Z

Not a bad idea, how about opening a ticket to evaluate parser5 as a more actively maintained replacement?

…

On Tue, Sep 12, 2017 at 3:47 PM, ELadner ***@***.***> wrote: Maybe it's time to switch to parser5. htmlparser2 appears to be dead. Very little activity in the last year and a lot of open tickets. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#161 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAB9fUq10t3kYonW6PweACxdcigAHdgXks5sht_KgaJpZM4PSDq_> .

-- *THOMAS BOUTELL, *SUPPORT LEAD P'UNK AVENUE | (215) 755-1330 | punkave.com

dimorphic · 2018-03-29T14:13:03Z

Also experiencing this and not even closing the end tag. Happens on version 1.16.1 and up, occurs on 1.18.2 also.

Example:

input: "<meh some text"
output: "" (empty string)

// used as:
sanitizeHtml(value, {
  allowedTags: ['a'],
  allowedAttributes: {
    'a': ['href', 'target', 'onclick']
  }
});

boutell · 2018-04-02T14:45:29Z

We'll have to either get it fixed upstream in htmlparser2, or build out parser5 support, including legacy support for htmlparser2 options if we can for bc, or else bump to 2.x. This is a significant undertaking.

…

On Thu, Mar 29, 2018 at 10:13 AM, Sabin Tudor ***@***.***> wrote: Also experiencing this and not even closing the end tag. Happens on version 1.16.1 and up, occurs on 1.18.2 also. Example: input: "<meh some text" output: "" (empty string) // used as:sanitizeHtml(value, { allowedTags: ['a'], allowedAttributes: { 'a': ['href', 'target', 'onclick'] } }); — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#161 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAB9fRLQAq0VG44pl5RMJd5T-1Ka2sblks5tjOvvgaJpZM4PSDq_> .

-- *THOMAS BOUTELL, CHIEF SOFTWARE ARCHITECT* P'UNK AVENUE | (215) 755-1330 | punkave.com

ELadner · 2018-04-02T18:36:28Z

The upstream ticket got no traction. The last comment from the developer was "Please refer to inikulin/parse5 if you need a spec-compliant parser."

ELadner · 2018-04-02T18:40:45Z

Another option would be to write a parallel plugin (e.g. sanitize-html5) that's parse5 based but has the same basic structure has sanitize-html. This would remove backward compatible requirements from the existing plugin and give users a migration path (i.e. uninstall the old plugin, install the new plugin). Future versions of the existing plugin could nudge users to upgrade to the new one, then, finally, the old one can be dropped.

boutell · 2018-04-02T19:46:19Z

I think we could accomplish the same thing by following semantic versioning - 2.x is parse5, 1.x is htmlparser2.

…

On Mon, Apr 2, 2018 at 2:40 PM, ELadner ***@***.***> wrote: Another option would be to write a parallel plugin (e.g. sanitize-html5) that's parse5 based but has the same basic structure has sanitize-html. This would remove backward compatible requirements from the existing plugin and give users a migration path (i.e. uninstall the old plugin, install the new plugin). Future versions of the existing plugin could nudge users to upgrade to the new one, then, finally, the old once can be dropped. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#161 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAB9ffOnKTm-9HUrrNshwXlWqw-8L-9rks5tknCtgaJpZM4PSDq_> .

-- *THOMAS BOUTELL, CHIEF SOFTWARE ARCHITECT* P'UNK AVENUE | (215) 755-1330 | punkave.com

tjphopkins · 2018-11-14T10:17:06Z

I wondered if any progress had been made on this, or if there was a known workaround? Thanks

boutell · 2018-11-15T13:10:52Z

Unfortunately not so far.

boutell · 2018-11-15T13:12:43Z

I could pursue it as Apostrophe enterprise support, otherwise it's a bit far out on my roadmap right now unfortunately. Of course it doesn't have to be done here necessarily - would love to see a PR for a 2.x version on parse5 that maintains bc except, of course, for passing on options directly to the parser.

…

On Wed, Nov 14, 2018 at 5:17 AM TJPHopkins ***@***.***> wrote: I wondered if any progress had been made on this, or if there was a known workaround? Thanks — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#161 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAB9ffLA7GiGMzAMeP4W_pbo4wM1u7v8ks5uu-2igaJpZM4PSDq_> .

-- *Thomas Boutell, Chief Software Architect* P'unk Avenue | (215) 755-1330 | punkave.com

boutell · 2020-03-19T17:00:07Z

This is something that has to be supported by the underlying parser module, which we did not write. I would recommend opening an issue on htmlparser2 regarding offering options to tolerate this kind of input, after first checking to see if the option already exists; it is possible to pass options to htmlparser2 with this module. It just can't be done here.

boutell · 2020-03-19T17:00:44Z

(Now that htmlparser2 is receiving updates again it is unlikely we'll switch to an entirely different parser.)

Karthickbg · 2020-04-17T12:40:15Z

I'm facing the same problem guys. Parser recognizes 'less than' symbol as start of a tag. Is there any possible way to fix this?

boutell · 2020-04-20T13:12:16Z

Explore the options of htmlparser2; there is a way to pass on options to it when configuring sanitize-html. See if it offers something appropriate. If not, consider contributing a PR to that module.

boutell · 2020-04-20T13:12:57Z

Closing this because it is really an htmlparser2 question; if the parser we're relying upon supported it, then we would support it without modification. (I'm not casting any shade here or suggesting that htmlparser2 should or should not support it.)

shah20 · 2020-11-09T12:12:30Z

Closing this because it is really an htmlparser2 question; if the parser we're relying upon supported it, then we would support it without modification. (I'm not casting any shade here or suggesting that htmlparser2 should or should not support it.)

Late to the conversation but can we use this approach as a workaround? I got idea from npm package page. Will it work replacing <= to %lte; explicitely and then would it sanitize the data as expected?
@boutell

textFilter: function(text, tagName) {
      return text.replace('<=', '&lte;');
    }

boutell · 2020-11-09T13:41:33Z

Probably not because you're "downstream" from htmlparser2 having already made this decision.

…

On Mon, Nov 9, 2020 at 7:12 AM Rushabh Shah ***@***.***> wrote: Closing this because it is really an htmlparser2 question; if the parser we're relying upon supported it, then we would support it without modification. (I'm not casting any shade here or suggesting that htmlparser2 should or should not support it.) Late to the conversation but can we use this approach as a workaround? I got idea from npm package page. Will it work replacing *<=* to *%lte;* explicitely and then would it sanitize the data as expected? @boutell <https://github.com/boutell> textFilter: function(text, tagName) { return text.replace('<=', '&lte;'); } — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#161 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAAH27MZ22DA6N6576KRS7DSO7MDZANCNFSM4D2IHK7Q> .

-- THOMAS BOUTELL | CHIEF TECHNOLOGY OFFICER APOSTROPHECMS | apostrophecms.com | he/him/his

ELadner changed the title ~~Sanitize-html incorrectly recognizes (greater than)(equals) as a starting tag.~~ Sanitize-html incorrectly recognizes (less than)(equals) as a starting tag. Sep 10, 2017

ELadner mentioned this issue Sep 11, 2017

Parser thinks that <= is a valid open tag. fb55/htmlparser2#220

Closed

ittybittykitty mentioned this issue Mar 19, 2020

Difficulty parsing < characters with no space between words #339

Closed

boutell closed this as completed Apr 20, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sanitize-html incorrectly recognizes (less than)(equals) as a starting tag. #161

Sanitize-html incorrectly recognizes (less than)(equals) as a starting tag. #161

ELadner commented Sep 9, 2017

boutell commented Sep 11, 2017

ELadner commented Sep 12, 2017

boutell commented Sep 12, 2017 via email

dimorphic commented Mar 29, 2018

boutell commented Apr 2, 2018 via email

ELadner commented Apr 2, 2018

ELadner commented Apr 2, 2018 •

edited

boutell commented Apr 2, 2018 via email

tjphopkins commented Nov 14, 2018

boutell commented Nov 15, 2018

boutell commented Nov 15, 2018 via email

boutell commented Mar 19, 2020

boutell commented Mar 19, 2020

Karthickbg commented Apr 17, 2020 •

edited

boutell commented Apr 20, 2020

boutell commented Apr 20, 2020

shah20 commented Nov 9, 2020

boutell commented Nov 9, 2020 via email

Sanitize-html incorrectly recognizes (less than)(equals) as a starting tag. #161

Sanitize-html incorrectly recognizes (less than)(equals) as a starting tag. #161

Comments

ELadner commented Sep 9, 2017

boutell commented Sep 11, 2017

ELadner commented Sep 12, 2017

boutell commented Sep 12, 2017 via email

dimorphic commented Mar 29, 2018

boutell commented Apr 2, 2018 via email

ELadner commented Apr 2, 2018

ELadner commented Apr 2, 2018 • edited

boutell commented Apr 2, 2018 via email

tjphopkins commented Nov 14, 2018

boutell commented Nov 15, 2018

boutell commented Nov 15, 2018 via email

boutell commented Mar 19, 2020

boutell commented Mar 19, 2020

Karthickbg commented Apr 17, 2020 • edited

boutell commented Apr 20, 2020

boutell commented Apr 20, 2020

shah20 commented Nov 9, 2020

boutell commented Nov 9, 2020 via email

ELadner commented Apr 2, 2018 •

edited

Karthickbg commented Apr 17, 2020 •

edited