Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Plugins: auto-anchor: use ASCII-only slug #349

Merged
merged 1 commit into from
Feb 20, 2020

Conversation

harding
Copy link
Contributor

@harding harding commented Feb 19, 2020

Closes #348

For some reason, our Markdown interpreter (Kramdown) won't create anchor tags containing non-ASCII text. I don't know if that its fault or there's something in the HTML5 spec. This affects both our Japanese and Spanish translations (the later only occasionally when one of the words in the anchor text has an accent character).

The fix implemented in this PR is to (1) remove all non-ASCII accents (converting those characters to their nearest ASCII representation, e.g. õ to o) and (2) further remove all non-ASCII characters (e.g. all Japanese characters) before generating the anchor id ("slug"). This works, but it has two implications for non-Latin character sets:

  1. If all of the text in the bolded, italicized, or linkified part of a bullet is non-Latin, no anchor will be generated (because there's no text left after the non-ASCII is removed). This doesn't seem to happen often in the Japanese translations because there's usually technical terms like "LN" or "bech32" that are used unchanged in the translation.

  2. Once all of the non-Latin text has been stripped, there's sometimes more than one bullet in an article that has the same anchor as another bullet. E.g. after stripping non-ASCII, these produce the same anchor:

    • LNオニオン形式でのプライバシー漏洩の可能性:
    • LNの前払い:

    Our tests catch cases where two anchors have the same id, so this is easy to catch before publication. I found that it was easy to fix by just inserting a single-character comment in the bolded text to distinguish the duplicates (see PR for a few cases where I had to do this).

Overall, I think this is an adequate solution, although it's still unfortunate that we can't have localized anchors.

Testing: I tested by building the site with and without this PR's commit and then diffing the HTML. It looks to me like this change is purely additive---in only adds anchors (and HTML comments) in places where there were no anchors before. No existing anchors are changed.

@bitschmidty
Copy link
Contributor

bitschmidty commented Feb 20, 2020

tACK 1b66c6a

Built, ran, tested duplicates provided, smoke tested some other pages in order to ensure nothing was aversely affected. Also diffed and spot checked the outputs

@bitschmidty bitschmidty merged commit 4e450d1 into bitcoinops:master Feb 20, 2020
@jnewbery
Copy link
Contributor

Thanks @harding . Looks like this fixes the immediate issue, although I agree that it's a shame that we can't have localized anchors. Did you take a look at the transliterated_header_ids option in
https://kramdown.gettalong.org/converter/html.html?

@jnewbery
Copy link
Contributor

I think we might also want to add a note to https://github.com/bitcoinops/bitcoinops.github.io/blob/master/CONTRIBUTING.md#translations explaining how to modify the anchor tags in case of duplicate anchors for non-ascii languages. @bitschmidty - can you do that?

@harding
Copy link
Contributor Author

harding commented Feb 23, 2020

@jnewbery

Did you take a look at the transliterated_header_ids option in
https://kramdown.gettalong.org/converter/html.html?

No (though I have now). Three notes:

  1. This doesn't help the auto-anchor plugin, as that only adds ID tags to bullets that don't get auto-id'd by Kramdown

  2. It looks like header ids are working just fine for Spanish (e.g.) and for Japanese (e.g.) they're using English headings, so they work too.

  3. Before proposing this solution, I did try URL encoding the the text using a Liquid filter, but I couldn't get it to work as well as the current solution.

@jnewbery
Copy link
Contributor

ok, thanks for checking, Dave!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
website General website changes and maintenance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Automatic Anchors not populating for ja
3 participants