Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sanitize html #3708

Merged
merged 6 commits into from Jul 26, 2023
Merged

Sanitize html #3708

merged 6 commits into from Jul 26, 2023

Conversation

Nutomic
Copy link
Member

@Nutomic Nutomic commented Jul 24, 2023

Use ammonia crate to sanitize malicious HTML from all string inputs, both via api and federation. This will make cross-site scripting attacks completely impossible.

@Nutomic Nutomic marked this pull request as ready for review July 24, 2023 12:54
@solid-snail
Copy link

I'm not very familiar with ammonia, but it seems like it doesn't completely remove html.
The context of the rendered html matters a lot.
Additionally, It might be possible to break down a payload to several inputs, only turning malicious in the context of the fully rendered html.

Just trying say that "make cross-site scripting attacks completely impossible" is a big statement, which might give a false sense of security.

@phiresky
Copy link
Collaborator

I agree with solid-snail. This PR doesn't perfectly solve the problem. There's places where the user should be able to pass (sane) HTML/ markdown like CreateCommentReport.reason, and there's places where the user should only be able to pass plain text, like lang.password_reset_body(username).

In the first case, ammonia could be used, in the second case, all HTML should be escaped, like @solid-snail added in #3720. Otherwise, you can still probably inject non-script HTML like phishing links into the email body.

@dessalines
Copy link
Member

This is a good start: if we find cases where HTML shouldn't be sanitized for some reason, or places where ammonia doesn't properly sanitize where it should, then we can do more PRs in the future.

Seems to be failing cargo fmt.

@Nutomic
Copy link
Member Author

Nutomic commented Jul 26, 2023

CI is passing now.

Maybe I was too optimistic, but even if its not perfect this is still a major improvement. We can improve security further in future PRs.

Edit: added commit to forbit a, img tags which could be used to bypass markdown (#3720).

@dessalines dessalines enabled auto-merge (squash) July 26, 2023 17:35
@dessalines dessalines merged commit 3471f35 into main Jul 26, 2023
1 of 2 checks passed
Nutomic added a commit that referenced this pull request Jul 27, 2023
* HTML sanitization in apub code

* Sanitize API inputs

* fmt

* Dont allow html a, img tags

---------

Co-authored-by: Dessalines <dessalines@users.noreply.github.com>
@Phoenix591
Copy link

Phoenix591 commented Jul 28, 2023

looks like this needs some refining: its eating > quote tags. https://lemmy.ml/post/2527005 . they're getting stored as &gt; in the database which makes me think its this

@solid-snail
Copy link

Yes, I ran across it while reading the test cases in Ammonia's source code. When there are unmatched < or >, they're being escaped.
I presume it's to avoid what I described in my first comment. If a tag is broken down like so: <, script>, each piece on its own won't be picked up as html, but once the complete web page is created it might be.

@solid-snail
Copy link

markdown comments didn't cross my mind at the time, sorry about that.

Nutomic added a commit that referenced this pull request Jul 28, 2023
@Nutomic
Copy link
Member Author

Nutomic commented Jul 28, 2023

Thanks for reporting this, I opened #3749 to fix it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants