Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow HTML tags/attributes to be whitelisted during Markdown processing #843

Closed
Sherlouk opened this issue Dec 6, 2020 · 7 comments · Fixed by #1090
Closed

Allow HTML tags/attributes to be whitelisted during Markdown processing #843

Sherlouk opened this issue Dec 6, 2020 · 7 comments · Fixed by #1090
Assignees
Labels
enhancement New feature or request

Comments

@Sherlouk
Copy link
Sponsor Collaborator

Sherlouk commented Dec 6, 2020

Example: https://staging.swiftpackageindex.com/guykogus/CodableJSON (See 'Installation' section)

A relatively common feature on READMEs is the use of the 'details' HTML tag in order to make collapsible sections of documentation.

These are currently completely ignored by our rendering of READMEs.

Title Contents with a predetermined title
No Title defaults to 'Details'

Follow-on from #410

@Sherlouk Sherlouk added the bug Something isn't working label Dec 6, 2020
@daveverwer
Copy link
Member

Great catch. I hadn't seen this in my testing (and haven't seen it on GitHub either, which is strange as I look at a lot of package READMEs!)

Markdown processors should leave raw HTML unprocessed and just take it wholesale into the output, but it looks like the one we're using is replacing it and stripping raw HTML.

Screenshot 2020-12-06 at 17 49 32@2x

We should look if that's an option we can set, or a patch we can make to that parser as I'm not sure there's much we can do to fix this if it's not an option.

@Sherlouk
Copy link
Sponsor Collaborator Author

Sherlouk commented Dec 6, 2020

@Sherlouk
Copy link
Sponsor Collaborator Author

Sherlouk commented Dec 6, 2020

Aye according to cmark docs:

/** Render raw HTML and unsafe links (`javascript:`, `vbscript:`,
 * `file:`, and `data:`, except for `image/png`, `image/gif`,
 * `image/jpeg`, or `image/webp` mime types).  By default,
 * raw HTML is replaced by a placeholder HTML comment. Unsafe
 * links are replaced by empty strings.
 */

this is what we currently have enabled, as it's the default.

https://github.com/KristopherGBaker/libcmark_gfm/blob/2abfe8c7d47557590aef8545d720c1297b359972/Sources/libcmark_gfm/include/cmark-gfm.h

@Sherlouk
Copy link
Sponsor Collaborator Author

Sherlouk commented Dec 6, 2020

So I suppose it's a question of whether or not we're happy to drop it into unsafe mode - and if we do that whether or not we want to strip <script> tags etc ourselves?

@daveverwer
Copy link
Member

Apologues for the delay responding to this, and thanks for investigating @Sherlouk!

Is there a tag/attribute whitelist option for unsafe mode in the GFM library? If there is, that's ideal and we'll just whitelist our standard tags and any raw HTML exceptions we find.

If not, we're in a relatively tricky situation as stripping unsafe HTML tags/attributes is almost impossible to get right. There's so much to do! 😬 You've got the obvious tags like script, iframe, embed, object but then you've also got to make sure you remove javascript:, data:, file:, references from everywhere they can be used img and a tags are the obvious ones, but I bet that's not a comprehensive list. Finally, you've got to remove all of the JavaScript-based attributes like onclick, onload, onfocus/onblur. It's effectively impossible to be comprehensive.

I have my fingers crossed for a whitelist 🤞

@Sherlouk
Copy link
Sponsor Collaborator Author

From what I understand, it's not a whitelist.

You can either support no HTML, or all HTML. (or we need to fork the GFM library and add support for all the tags we want...)

@daveverwer
Copy link
Member

Sven and I chatted about this today and we're not going to let this problem stop the README rollout, in fact, it's done (#855) 🚀

I'll retitle this bug as relating to the HTML whitelist, but the amount of READMEs that this breaks is not worth the trouble of all the potential security issues that unsafe mode brings.

@daveverwer daveverwer changed the title Collapsible Details HTML not supported in README Allow HTML tags/attributes to be whitelisted during Markdown processing Dec 16, 2020
@daveverwer daveverwer added enhancement New feature or request and removed bug Something isn't working labels Dec 16, 2020
@daveverwer daveverwer self-assigned this May 13, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants