-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: strip HTML entities like > #27
Conversation
65206e7
to
2cc0a4b
Compare
This can (and should) be solved on your side, by first decoding html entities: I personally maintain This library focusses on slug generation, not the Markdown or HTML parsing. So I’d say it’s out of scope. |
Doesn't make sense...
This implies that one should be able to pass the exact text of a markdown heading to this lib |
That said, I was unaware that there could be entities that don't end with |
I walk my comment back I guess. But I'm a bit confused what would count as "parsed" input to pass to this library. For instance if there is a header with a link and backticks like # Usage with [`react`](https://reactjs.org) What should the "parsed" input passed to this library look like? |
Parsed input in that case is If you want to parse markdown, see for example remark, or other libraries. |
I would expect the output of that parser to be a syntax tree, even within the header node I would expect the link to be a subnode? Not just simple string with all the control characters stripped away? Hence my confusion |
Wait, I don’t get it, which parser should be a syntax tree but isn’t? Where are links stripped? |
No I mean the parser does output a syntax tree, as expected.
https://github.com/syntax-tree/mdast#heading It's really not obvious how I would get a simple string like |
Look, sorry, but I have no clue what you want to do or how to help you, but a) this project takes text and turns it into a slug, b) remark is an advanced system of hundreds of projects to work with markdown. remark may be to advanced for your use case; I don’t know what your use case is. |
No worries...My use case is just that I'm autogenerating markdown and was looking into using this lib to generate slugs instead of my own custom code. I had also used I guess basically the text we pass to this lib should be the same text we would get if we select the header and copy in compiled HTML? Out of curiosity do you know if there's anything within remark that actually spits out that text? |
@wooorm okay I discovered that Facebook's I added this header to one of their test fixtures: # <div> test Here in the failing test you can see that the entities don't get removed:
If even a major Facebook project has the same misconceptions I did about how to use |
First, find a node, then serialize it with
The whole unified/remark/mdast/unist is really good at that
To clarify: no, it does not do that; as the docs do not mention it, I would assume it does not exist.
That’s not an accident, that’s the exact behavior of GitHub’s slugging mechanism and the goal of this project.
It’s closer to
Could you clarify what you want to do?
Please raise an issue over there. It may be fixed in docusaurus@2 already. Btw, docusaurus maintainers and unified maintainers do talk about stuff like this. |
Convert the raw markdown header into what string will guarantee I'm just trying to figure out how we can document this clearly so that other people won't have the same confusion I did. It's clear to you where And also when I realized that HTML entities cause problems in downstream libraries (docusaurus) and also unrelated libraries (markdown-toc) I got the impression that there isn't enough awareness of the fact that tools should deal with HTML entities. For me the most practical solution was to just write some quick and dirty regex processing my build script, but I figured it would be helpful to raise awareness of this across the ecosystem so that other people don't waste time trying to figure out how to generate the correct slugs. So I'm thinking the README should say something like
But I imagine specific instructions might be necessary to fully process GitHub markdown? |
I discovered in my repo https://github.com/vscodeshift/material-ui-snippets that HTML entities like
>
get stripped out of slugs. Right now in this and other libs,<div>
gets converted toltdivgt
, but the actual GitHub slug is justdiv
.