Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sanitize-html incorrectly recognizes (less than)(equals) as a starting tag. #161

Closed
ELadner opened this issue Sep 9, 2017 · 18 comments
Closed

Comments

@ELadner
Copy link

ELadner commented Sep 9, 2017

In vanilla NodeBB, any combination of <, >, <=, >= can be entered in a post and the results are rendered correctly.

With sanitize-html installed, < and > are handled correctly, but using <= in a post will treat it as an HTML start tag and not include anything beyond the symbol combination.

This, for example: "this <= is a >= test" renders as "this = test"

@ELadner ELadner changed the title Sanitize-html incorrectly recognizes (greater than)(equals) as a starting tag. Sanitize-html incorrectly recognizes (less than)(equals) as a starting tag. Sep 10, 2017
@boutell
Copy link
Member

boutell commented Sep 11, 2017

Interesting. The HTML5 spec suggests you're right that <= should be something a valid parser can tolerate as text (although it's bad syntax in the strictest sense):

https://www.w3.org/TR/html5/syntax.html#tag-open-state

However this is at the level of the htmlparser2 module that sanitize-html is built upon, so I would recommend reporting it there and linking that ticket here.

@ELadner
Copy link
Author

ELadner commented Sep 12, 2017

Maybe it's time to switch to parser5. htmlparser2 appears to be dead. Very little activity in the last year and a lot of open tickets.

@boutell
Copy link
Member

boutell commented Sep 12, 2017 via email

@dimorphic
Copy link

Also experiencing this and not even closing the end tag. Happens on version 1.16.1 and up, occurs on 1.18.2 also.

Example:

input: "<meh some text"
output: "" (empty string)

// used as:
sanitizeHtml(value, {
  allowedTags: ['a'],
  allowedAttributes: {
    'a': ['href', 'target', 'onclick']
  }
});

@boutell
Copy link
Member

boutell commented Apr 2, 2018 via email

@ELadner
Copy link
Author

ELadner commented Apr 2, 2018

The upstream ticket got no traction. The last comment from the developer was "Please refer to inikulin/parse5 if you need a spec-compliant parser."

@ELadner
Copy link
Author

ELadner commented Apr 2, 2018

Another option would be to write a parallel plugin (e.g. sanitize-html5) that's parse5 based but has the same basic structure has sanitize-html. This would remove backward compatible requirements from the existing plugin and give users a migration path (i.e. uninstall the old plugin, install the new plugin). Future versions of the existing plugin could nudge users to upgrade to the new one, then, finally, the old one can be dropped.

@boutell
Copy link
Member

boutell commented Apr 2, 2018 via email

@tjphopkins
Copy link

I wondered if any progress had been made on this, or if there was a known workaround? Thanks

@boutell
Copy link
Member

boutell commented Nov 15, 2018

Unfortunately not so far.

@boutell
Copy link
Member

boutell commented Nov 15, 2018 via email

@boutell
Copy link
Member

boutell commented Mar 19, 2020

This is something that has to be supported by the underlying parser module, which we did not write. I would recommend opening an issue on htmlparser2 regarding offering options to tolerate this kind of input, after first checking to see if the option already exists; it is possible to pass options to htmlparser2 with this module. It just can't be done here.

@boutell
Copy link
Member

boutell commented Mar 19, 2020

(Now that htmlparser2 is receiving updates again it is unlikely we'll switch to an entirely different parser.)

@Karthickbg
Copy link

Karthickbg commented Apr 17, 2020

I'm facing the same problem guys. Parser recognizes 'less than' symbol as start of a tag. Is there any possible way to fix this?

@boutell
Copy link
Member

boutell commented Apr 20, 2020

Explore the options of htmlparser2; there is a way to pass on options to it when configuring sanitize-html. See if it offers something appropriate. If not, consider contributing a PR to that module.

@boutell
Copy link
Member

boutell commented Apr 20, 2020

Closing this because it is really an htmlparser2 question; if the parser we're relying upon supported it, then we would support it without modification. (I'm not casting any shade here or suggesting that htmlparser2 should or should not support it.)

@boutell boutell closed this as completed Apr 20, 2020
@shah20
Copy link

shah20 commented Nov 9, 2020

Closing this because it is really an htmlparser2 question; if the parser we're relying upon supported it, then we would support it without modification. (I'm not casting any shade here or suggesting that htmlparser2 should or should not support it.)

Late to the conversation but can we use this approach as a workaround? I got idea from npm package page. Will it work replacing <= to %lte; explicitely and then would it sanitize the data as expected?
@boutell

textFilter: function(text, tagName) {
      return text.replace('<=', '&lte;');
    }

@boutell
Copy link
Member

boutell commented Nov 9, 2020 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants