Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTML ids are generated without diacritics and links do not work #158

Closed
adambartyzal opened this issue Jul 13, 2023 · 9 comments
Closed

Comments

@adambartyzal
Copy link

adambartyzal commented Jul 13, 2023

For example markdown link:

[2.4. Úprava plastové krabičky pro elektroniku](#24-úprava-plastové-krabičky-pro-elektroniku)

leading to a heading within the same webpage:

## 2.4. Úprava plastové krabičky pro elektroniku

correctly generates link:

<li><a href="[#24-%C3%BAprava-plastov%C3%A9-krabi%C4%8Dky-pro-elektroniku](localhost:8667/Test-page#24-%C3%BAprava-plastov%C3%A9-krabi%C4%8Dky-pro-elektroniku)">2.4. Úprava plastové krabičky pro elektroniku</a></li>

but in the heading id diacritics are missing:

<h2><a id='24-prava-plastov-krabiky-pro-elektroniku'></a>2.4. Úprava plastové krabičky pro elektroniku</h2>

therefore the link from above leads to nowhere.

Github generates id looking like this:

<h2><a id="24-úprava-plastové-krabičky-pro-elektroniku"</a>2.4. Úprava plastové krabičky pro elektroniku</h2>

which seems to work fine.

@DannyBen
Copy link
Owner

Thanks for reporting. I will look into it.

@DannyBen
Copy link
Owner

I think it should work now in the edge version.
Can you verify it works for your case?

@adambartyzal
Copy link
Author

I've build it from origin/fix/diacritics-in-ids branch (hopefully that's what i should have done) and now none of my links work. But I do see some change in the generated webpage.

<h2><a id='24-prava-plastov-krabiky-pro-elektroniku'></a>2.4. Úprava plastové krabičky pro elektroniku</h2>

has changed to:

<h2 id="2-4-prava-plastov-krabi-ky-pro-elektroniku">2.4. Úprava plastové krabičky pro elektroniku</h2>

It's Czech language if that helps.

@DannyBen
Copy link
Owner

Well. Let's take a step back.

  1. As it turns out, the current (released) version is working as expected.
  2. IDs are added to headers automatically, with diacritics removed - this is done by RedCarpet, and not under my control.
  3. The <!-- TOC --> magic comment, adds the same type of slugs, with diacritics removed completely.
  4. So - in essence - the released version is working as expected.

I was using this markdown document to test:

<!-- TOC -->

## 2.4. Úprava plastové krabičky pro elektroniku

which resulted in this HTML:

<ul>
  <li><a href="#2-4-prava-plastov-krabi-ky-pro-elektroniku">2.4. Úprava plastové krabičky pro elektroniku</a></li>
</ul>
      
<h2 id="2-4-prava-plastov-krabi-ky-pro-elektroniku">2.4. Úprava plastové krabičky pro elektroniku</h2>

with identical IDs, as expected.


Now - I totally agree that the IDs should contain diacritics - especially if this is how GitHub does it.:

2.4. Úprava => 2-4-úprava   # good
2.4. Úprava => 2-4-prava    # bad

but, unless this is changed in RedCarpet, I don't think there is much I can do about it.
I have opened an issue: vmg/redcarpet#739

So to recap:

  1. If you were referring to the links generated by the <!-- TOC --> marker - they should work in the released version. If they are not, I need you to provide me with a minimal markdown text and/or filename to reproduce.
  2. If you are linking to header IDs manually - you will have to use the same IDs like the ones that are generated (i.e. with diacritic letters erased altogether).

As a side note - I see that filenames with diacritics are not currently supported.

@adambartyzal
Copy link
Author

Thank you for looking into this and for escalating it.

@DannyBen
Copy link
Owner

DannyBen commented Apr 18, 2024

Version 1.2.0 is now released, with support for pandoc as an alternative markdown renderer. When using renderer: pandoc in .madness.yml settings, headers with diacritics should work properly in Table of Contents.

Note that this requires having pandoc installed, but it should be as simple as brew install pandoc or any other OS package manager (apt / apk etc).

If anyone can confirm this works, or report that it doesn't, it will be appreciated.

@xorguy
Copy link

xorguy commented Apr 18, 2024

Is the docker image updated with support for pandoc renderer?

@DannyBen
Copy link
Owner

Yup.

RUN apk add --no-cache pandoc

@DannyBen DannyBen reopened this Apr 18, 2024
@DannyBen
Copy link
Owner

I also had to remove leading numbers and dots from Table of Contents links, since pandoc removes them from header IDs.

# pandoc removes leading numbers and dots from header slugs, we do the same
slug = slug.remove(/^[\d\-]+/) if config.renderer == 'pandoc'

When using pandoc, headers that look like ## 2.4 Hello World will get the id hello-world by pandoc, so the ToC does the same.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants