Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Escaping input with unclosed, invalid tags returns it with unneeded escaped closing tags appended to end #464

Closed
lostfictions opened this issue Feb 13, 2021 · 11 comments · Fixed by #607

Comments

@lostfictions
Copy link

To Reproduce

From the Node REPL:

> const s = require('sanitize-html')
undefined
> s("here's a string with a <wacky> tag.", {disallowedTagsMode: "escape"})
"here's a string with a &lt;wacky&gt; tag.&lt;/wacky&gt;"

Expected behavior

The offending tag is escaped, but no addition markup is created. For example, given the input here's a string with a <wacky> tag, I would expect the output to be: here's a string with a &lt;wacky&gt; tag

Describe the bug

When sanitizing a string with disallowedTagsMode: "escape", extra, unneeded (escaped) closing tags are appended to the end of the string.

Details

Version of Node.js: 15.5.0
Server Operating System: Ubuntu 20.04

@lostfictions lostfictions added the bug Something isn't working label Feb 13, 2021
@boutell boutell closed this as completed Feb 13, 2021
@boutell
Copy link
Member

boutell commented Feb 13, 2021 via email

@lostfictions
Copy link
Author

lostfictions commented Feb 13, 2021

Thanks a lot for the reply! In the specific case where we're performing escaping rather than outright removal of tags that aren't allowed, I still believe this behaviour is incorrect -- closing these tags isn't necessary, and in effect the result is "garbage" in the form of escaped closing tags being appended to the output and visible to the user. Here's another motivating example: imagine I have a fragment like this I want to sanitize by escaping:

Here are some <b>operators</b>: =equals<lessthan>greaterthan-minus+plus

Obviously this is badly or erroneously formatted input on the part of a user, but you might imagine situations like this might arise. Assuming I'm preserving tags like b, I'd hope for output like this:

Here are some <b>operators</b>: =equals&lt;lessthan&gt;greaterthan-minus+plus

That is, recognized tags are preserved, and unrecognized tags are simply escaped with no regard for whether they are balanced (since once they're escaped they're not really in the stack and have no semantic significance anymore anyway). Does that make sense?

@boutell
Copy link
Member

boutell commented Feb 14, 2021 via email

@boutell boutell reopened this Feb 15, 2021
@dev-boku
Copy link

dev-boku commented Apr 6, 2021

@boutell Hello, I am curious about the current progress on this issue.

@abea
Copy link
Contributor

abea commented Apr 6, 2021

I don't think there's anything to report right now, @potter7050. Our team continues to be focused on the next major version of ApostropheCMS.

I did look at this for a bit, though. It is complicated by the fact that htmlparser2, which sanitize-html depends on, automatically tries to close any opened tags (as long as they aren't self-closing). So our onCloseTag handler is given the non-existent tag (e.g., <wacky>) even though it's a non-standard tag with no closing tag.

A solution would need to either tell htmlparser2 to skip the closing tag (I don't think there's an option for this) or we track that the tag should be skipped and simply return at the beginning of that onCloseTag handler.

@dev-boku
Copy link

dev-boku commented Apr 7, 2021

@abea I suggest this pr. please check this.
#469

@abea
Copy link
Contributor

abea commented Apr 13, 2021

Thanks, @potter7050. We'll review it soon.

@sunnymsf
Copy link

Hi @abea

Any plans to work on this soon?

@abea abea assigned abea and unassigned abea Nov 29, 2021
@abea abea added seeking contributions bug and removed bug Something isn't working labels Nov 29, 2021
@abea
Copy link
Contributor

abea commented Nov 29, 2021

@sunnymsf No active plans. I provided feedback on the PR #469 that could help someone move this forward. Based on my previous review, htmlparser2 might make this hard to resolve fully. We'd welcome a community contribution for this.

@sunnymsf
Copy link

@abea thanks for the update.

@Iwark
Copy link

Iwark commented Jun 22, 2022

I met the same problem, and here is my workaround:

sanitizeHtml(text, {disallowedTagsMode: 'escape'}).replace(/&lt;\/.*?&gt;/g, '')

It's ugly but works for my case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants