Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TOC extension: Add option to generate nested anchor IDs #1364

Conversation

amp-matthew-wallace
Copy link

This PR adds a new boolean to the TOC extension configuration.

The name I've chosen for this config setting is nested_anchor_ids, although I'm open to other suggestions.
Perhaps one of the following?

  • hierarchical_anchor_ids
  • concatenate_anchor_ids
  • hierarchical_anchors

This feature can be useful when linking to specific subsections of the resultant document, as the anchor ID will more be more specific to the header and the headers above it. Unlike the default anchor ID scheme, these more specific links are less likely to break as additions are made to the markdown document.

For example, consider the following markdown:

 # Header A
 ## Header A
 ## Header B
 ### Header A
 # Header B
 ## Header A

Without the nested_anchor_ids setting, the resultant HTML would be:

<h1 id="header-a">Header A</h1>
<h2 id="header-a_1">Header A</h2>
<h2 id="header-b">Header B</h2>
<h3 id="header-a_2">Header A</h3>
<h1 id="header-b_1">Header B</h1>
<h2 id="header-a_3">Header A</h2>

With the nested_anchor_ids setting, the resultant HTML would be:

<h1 id="header-a">Header A</h1>
<h2 id="header-a-header-a">Header A</h2>
<h2 id="header-a-header-b">Header B</h2>
<h3 id="header-a-header-b-header-a">Header A</h3>
<h1 id="header-b">Header B</h1>
<h2 id="header-b-header-a">Header A</h2>

This feature could certainly be its own separate extension (in fact that's how I currently have it implemented in my project). But I figure that it is also a reasonably logical addition for the TOC extension.

I've tried to follow the contribution guide, and used #1339 as an example, but I'm happy to make any further necessary changes.

This feature can be useful when linking to specific subsections of the
resultant document, as the anchor ID will more be more specific to the
header and the headers above it. Unlike the default anchor ID scheme, these
more specific links are less likely to break as additions are made to the
markdown document.

For example, consider the following markdown:
```md
 # Header A
 ## Header A
 ## Header B
 ### Header A
 # Header B
 ## Header A
```

Without the `nested_anchor_ids` setting, the resultant HTML would be:
```html
<h1 id="header-a">Header A</h1>
<h2 id="header-a_1">Header A</h2>
<h2 id="header-b">Header B</h2>
<h3 id="header-a_2">Header A</h3>
<h1 id="header-b_1">Header B</h1>
<h2 id="header-a_3">Header A</h2>
```

With the `nested_anchor_ids` setting, the resultant HTML would be:
```html
<h1 id="header-a">Header A</h1>
<h2 id="header-a-header-a">Header A</h2>
<h2 id="header-a-header-b">Header B</h2>
<h3 id="header-a-header-b-header-a">Header A</h3>
<h1 id="header-b">Header B</h1>
<h2 id="header-b-header-a">Header A</h2>
```
@facelessuser
Copy link
Collaborator

My personal feeling is that this feature makes the header IDs cumbersome. I realize though that there are likely others who'd disagree. This is just my opinion, and I don't have the final say on this either. My personal vote is that this would do better as a third party extension. I'm certain there are other like minded people who'd be interested and happy to see such an extension.

@amp-matthew-wallace
Copy link
Author

My personal feeling is that this feature makes the header IDs cumbersome. I realize though that there are likely others who'd disagree. This is just my opinion, and I don't have the final say on this either. My personal vote is that this would do better as a third party extension. I'm certain there are other like minded people who'd be interested and happy to see such an extension.

I can definitely see where you're coming from, these IDs can get pretty long depending on how the headers are named. How do you personally work with the ID fields that would suffer from these very verbose values?

I figure that when interacting with the table of contents, the actual values don't matter to the user, as the linking works automatically. The only other time I figure you'd want to interact with the anchor IDs is when hand-writing links to other sections of the document. I think that it's when writing links that this feature is most useful because the ID string for each header would be named very specifically and wouldn't change if another subsection with the same name is added somewhere.

From some light googling, it seems that id field can hold up to 1M characters, so there's little risk that super long IDs would cause issues in the browser.

@facelessuser
Copy link
Collaborator

From some light googling, it seems that id field can hold up to 1M characters, so there's little risk that super long IDs would cause issues in the browser.

For me, it isn't a matter of limitations. Just because you can do something, doesn't mean you should.

I figure that when interacting with the table of contents, the actual values don't matter to the user, as the linking works automatically.

My counter argument would be, then why do we need nested IDs? It just feels a bit much to me. Again, as a member of the Python Markdown team, I'm just voicing my personal opinions. I'm personally not interested in supporting the increase complexity for such a feature. If others feel strongly in favor, then so be it 🙂.

@amp-matthew-wallace
Copy link
Author

amp-matthew-wallace commented Jul 26, 2023

I figure that when interacting with the table of contents, the actual values don't matter to the user, as the linking works automatically.

My counter argument would be, then why do we need nested IDs? It just feels a bit much to me.

That's a terrific point. The core value of this feature is really only applicable when users want to use both a table of contents and lots of in-doc links. So maybe the TOC extension isn't the best place for the feature.

Because of the way that the TOC extension preserves pre-existing anchor IDs, a separate extension with this functionality could be used alongside the TOC extension without issue. One of the reasons why I chose to incorporate the feature into the TOC extension was that I would have needed to import or re-implement stashedHTML2text, get_name, unescape, unique, and slugify to make my feature work as an independent extension.

However, a separate extension with just this functionality would feel somewhat insubstantial. Would such an extension be welcome within this repo? Or would it be better off somewhere like pymdown-extensions?

Either way, I'd be interested in hearing how other maintainers feel about this change as-is.

@facelessuser
Copy link
Collaborator

However, a separate extension with just this functionality would feel somewhat insubstantial. Would such an extension be welcome within this repo? Or would it be better off somewhere like pymdown-extensions?

I'm actually the maintainer of that extension set. It is definitely an opinionated set of extensions, so unless I'm fully behind the idea of a new extension, I don't usually accept them. My intention is not to support all extensions for Python Markdown, but mainly the ones I had an explicit desire for.

I'm not completely against the idea, or at least some aspects of it. Recently I was approached with the idea of giving tab IDs (from my supported tab container extension) context from the parent header. In document pages, you can sometimes have many tab containers with the same tab names, like for Windows examples vs macOS examples, etc. So I caved and allowed scoping those IDs with the parent header as an optional feature, but I rejected applying context from all levels of nested headers as it was originally pitched, which is similar to what you are requesting.

Since you are not the first to pose some idea in this vein, I am certain there are more like minded people out there, but I personally find it as too much, and not that helpful. I would probably not accept such an extension into Pymdown Extensions. It's just not something I can get behind enough to agree to the maintenance burden. It would be better to have someone invested in such extension to maintain and support it.

@waylan
Copy link
Member

waylan commented Jul 26, 2023

I tend to agree with @facelessuser.

As first read, I mistakenly assumed the purpose was to avoid weird ids when multiple headers all contained the same text and was about to object to that document structure. However, after taking a closer look, I now realize that that interpretation was based on your examples only. However, your proposed documentation makes it more clear. Just as a URL shows the nesting structure of multiple documents (grandparent/parent/child), so does your proposed id for multiple sections of a single document (#grandparent-parent-child). While I can see the utility in this, I think in most cases, it would actually create more confusion that help. Especially if headers are full sentences. In those cases where headers are single words, then perhaps it would make sense as a separate third-party add-on.

Whether that third-party extension worked in conjunction with the TOC extension or replaced it entirely would be up to you as its creator. And I would suggest that you just create it on its own, not as part of a collection. In fact, I might be more inclined to use some of the pymdown extensions if @facelessuser would actually make them available as separate packages. I'm not interested in installing a whole collection for just one or two. IMO, it is better for each to be in its own package alone.

In fact, if I had to start over, the Markdown library would not contain any extensions, just the API for them. Each extension would be its own separate standalone installable package. IMO there is no such thing as an extension which is too insubstantial to stand alone.

@waylan waylan added needs-decision A decision needs to be made regarding request. needs-review Needs to be reviewed and/or approved. labels Jul 26, 2023
@amp-matthew-wallace
Copy link
Author

I would probably not accept such an extension into Pymdown Extensions. It's just not something I can get behind enough to agree to the maintenance burden.

I noticed that you were the maintainer shortly after sending that message. I totally understand not wanting to take on a new extension that you aren't invested in.

I'm not interested in installing a whole collection for just one or two. IMO, it is better for each to be in its own package alone.

I agree that from a user's perspective having a separate package for each extension is a lot nicer. But I also see the benefit to the developer of only having to manage one repo and publish one package.


I would understand if this is determined to be too niche for inclusion into the TOC extension, although it'd definitely be my preference to get this merged rather than create a separate extension and new package.

Example

Here's a concrete example of the scenario that motivated me to make this change:

Before nested_anchor_ids

Initial state:

# Dashboard Page
## Feature A
This feature can be managed the [management page](#feature-a_1)
# Management Page
## Feature A
Blah blah

New page added:

# Dashboard Page
## Feature A
This feature can be managed the [management page](#feature-a_1)
# Settings Page
## Feature A
# Management Page
## Feature A
Blah blah

The #feature-a_1 link is now incorrect, and points to the settings page.

After nested_anchor_ids

Initial state:

# Dashboard Page
## Feature A
This feature can be managed the [management page](#management-page-feature-a)
# Management Page
## Feature A
Blah blah

New page added:

# Dashboard Page
## Feature A
This feature can be managed the [management page](#management-page-feature-a)
# Settings Page
## Feature A
# Management Page
## Feature A
Blah blah

The #management-page-feature-a link continues to work as intended, and is in my opinion more descriptive than the old #feature-a_1 link.

@waylan
Copy link
Member

waylan commented Jul 27, 2023

Thanks for the real-world example. That helps to illustrate the utility. However, I would argue that that type of scenario is rare and in the few occasions where is happens, I would just define my own custom IDs with the Attribute List Extension.

As you likely know, headers come in 6 levels. If you where to utilize all 6 levels, your proposed feature could generate an ID that is quite long and awkward. A custom defined ID would be preferable.

In fact, I would expect my project to include various headers that I would not want to have your proposed behavior applied to. Therefore, I would need granular control to only have it applied to some, but not all headers. In the end, the easiest way to do that is to use attribute lists, which is already possible without adding any new features.

Of course, we only need to define custom IDs where the auto-generated ones don't make sense. So, with your example, we might do this:

# Dashboard Page
## Feature A { #dashboard-feature-a }
This feature can be managed the [management page](#management-feature-a)
# Settings Page 
## Feature A { #settings-feature-a }
# Management Page
## Feature A { #management-feature-a }
Blah blah

Notice that I also shortened the custom IDs by not including the word page. Of course, that is a style choice which you may or may not want to follow. Personally, I think the shorter IDs are preferable, but then maybe the word page has significance in your context which would require it to remain. I am simply illustrating how defining custom IDs can be better as you get total control.

@amp-matthew-wallace
Copy link
Author

Ah I agree that using the attribute list extension would solve this problem, perhaps better than my solution would. The IDs set with attribute lists do not automatically change as header names do, so they're safer to link to.

I had found a reference to the this custom ID syntax on markdownsyntax.org, but was unable to find an extension supporting the feature when searching the terms from that page.

I'll close this PR then, given that there's a better solution to the problem that's already supported.

Thank you both for reviewing this work so quickly!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs-decision A decision needs to be made regarding request. needs-review Needs to be reviewed and/or approved.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants