-
-
Notifications
You must be signed in to change notification settings - Fork 723
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Leave text content? #15
Comments
Currently, the entire element is being removed. Do you have an example at hands where your request would make sense? I'd have a look then. |
@codylindley Do you have a specific example on where and how you would need this feature? If not I would be tempted to close this ticket - cannot see a use-case right now. |
I'm actually not aware of a XXS tool that does not permit the configuration of leaving the text node while removing the element node. For example, I might not want to allow text wrapped with a span. I would think since DOMPurify can remove a span I should have the option to leave the text node inside of it, while removing the span. Make sense? |
@codylindley Kinda does ;) Now I am wondering: how would a config API look like? How would you expect to configure it from a developer's perceptive? if you give me a specific example, i can start thinking about an implementation - so far it's still a bit too vague. |
Have you seen js-xss? Here is a look at how they filter out element nodes, but leave text nodes. And it's optional. I would think this would be as simple as something like...
|
Ah, that's a good example, Thx! And I didn't know that library yet. Your API suggestion is neat, sufficiently clean although I am not a big fan of the I'll have a look. |
@codylindley I had a look and I am tempted to reject this feature request. So far I see only one way to do it safely, and that would involve iterating over the nodes twice instead of once. We'd first have to flag elements for deletion and then go over the tree again and remove them while having their harmless children and text nodes survive. And even then I believe we'd have to go from the deepest node upwards to the "thicker branches" which might end up in a performance killer. I am however open for alternative ways to solve it. If you have thoughts on that: appreciated! However your request helped spotting a bug; DOMPurify was too intolerant about content starting with |
@cure53 I've been presented with this problem in a SO question where the user wanted to remove anchor elements, but keeping their text content. You can see my answer there with a couple ways to implement that, though I'm not sure if those are applicable to how DOM Purify works. About performance: this option could be disabled by default, so there should be nearly no performance penalty, right? I was actually going to recommend DOM Purify for the task in the question, then I got quite stumped that DOM Purify does not have an option to keep text nodes from removed elements yet. Also, HTML Purifier keeps text nodes when removing not allowed tag names. Github and Stack Overflow's markdown editors also keep text nodes when removing unsafe tags. |
To get the described effect, the user can allow This feature would be easy to implement for a flat DOM - e.g. a link with no child elements. For a working implementation one would have to iterate recursively over all child elements and check their type. The given example on SO does not do that sufficiently. It also juggles with I believe in a scenario where a feature like that would not be under attack and confronted with a simple and flat DOM it's not a problem. I our case however I think it will enable several attacks and is not feasible. Thus I need to reject it. Just for the record, knowing it's not a security tool; the function you linked is vulnerable against XSS too:
|
I created #18 for discussion. Maybe an API would help here rather than adding more features to the core library. |
@cure53 I'm aware that throwing arbitrary markup into As I said, that code is far from optimal for this library, it was just an example use case which would be much better covered by a tested library than some hack'ish algorithm/regex for those that need to do it securely. |
Can you clarify? Should the KEEP_CONTENT: true ... keep the text node? Does not work for me if that is what it's suppose to do: http://jsfiddle.net/codylindley/9SxCb/ |
Haha yes, I can ;) Right now we classify text nodes as actual elements. Which might not be the most intuitive. This would work just fine (note the extra var foo = DOMPurify.sanitize('123<h2>test</h2><p>test</p>',
{
ALLOWED_TAGS: ['#text','strong','a','table','td','tr','tbody','thead','tfoot','th','li','ul','ol','br','p'],
ALLOWED_ATTR: ['href,title,style,rows,rowspan,width,height,rel,align,cite,cols,colspan,target'],
KEEP_CONTENT: true
});
console.log(foo); I'll deploy a fix later today, it's fine when people don't pass in their own tags but fairly counter-intuitive in your and probably many other scenarios. Nice spot! |
@codylindley Done! |
Thanks. Yeah. After I sent the fiddle notice that in the source and sure enough #text was the key. Nice work all around here! |
Thx :) |
Can I remove an element node and not remove the text node inside of it?
The text was updated successfully, but these errors were encountered: