Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Footnote ref number in TOC #660

Closed
mustafa0x opened this issue May 10, 2018 · 1 comment · Fixed by #1441
Closed

Footnote ref number in TOC #660

mustafa0x opened this issue May 10, 2018 · 1 comment · Fixed by #1441
Labels
bug Bug report. extension Related to one or more of the included extensions. someday-maybe Approved low priority request.

Comments

@mustafa0x
Copy link

mustafa0x commented May 10, 2018

In [13]: t = '''
    ...: # Header with footnote[^1]
    ...: 
    ...: Lorem Ipsum
    ...: 
    ...: [^1]: footnote text
    ...: '''

In [14]: print(markdown.markdown('[TOC]\n\n' + t, extensions=['markdown.extensions.toc', 'markdown.extensions.footnotes']))
<div class="toc">
<ul>
<li><a href="#header-with-footnote1">Header with footnote1</a></li> <!-- Note the '1' in the header id and more importantly the link text -->
</ul>
</div>
<h1 id="header-with-footnote1">Header with footnote<sup id="fnref:1"><a class="footnote-ref" href="#fn:1" rel="footnote">1</a></sup></h1>
<p>Lorem Ipsum</p>
<div class="footnote">
<hr />
<ol>
<li id="fn:1">
<p>footnote text&#160;<a class="footnote-backref" href="#fnref:1" rev="footnote" title="Jump back to footnote 1 in the text">&#8617;</a></p>
</li>
</ol>
</div>
@waylan waylan added bug Bug report. extension Related to one or more of the included extensions. labels May 10, 2018
@waylan
Copy link
Member

waylan commented May 10, 2018

So the code which sanitizes the text for use in the TOC is pretty simple. It simply pulls the text from the HTML elements. It could be significantly more complex to exclude footnote refs. And I find it odd that we would need to only do this for a non-standard add-on syntax. Additionally, the fact that this is only being reported now suggests that this is an unusual edge case that not many users will encounter.

That said, it is clearly not what one would expect and should probably be fixed. Of course, pull requests are welcome.

@waylan waylan added the someday-maybe Approved low priority request. label Oct 23, 2018
waylan added a commit to waylan/markdown that referenced this issue Feb 9, 2024
- All postprocessors are run on heading content (not just
  `RawHtmlPostprocessor`).
- Footnote references are stripped from heading content. Fixes Python-Markdown#660.
- A more robust `striptags` is provided to convert headings to plain text.
  Unlike, markupsafe's implementation, HTML entities are not unescaped.
- Both the plain text `name` and rich `html` are saved to `toc_tokens`,
  which means users can now access the full rich text content of the
  headings directly from the `toc_tokens`.
- `data-toc-label` is sanitized separate from heading content.
- A `html.unescape` call added to `slugify` and `slugify_unicode`, which
  ensures `slugify` operates on Unicode characters, rather than HTML
  entities. By including in the functions, users can override with their
  own slugify functions if they desire.

Note that this first commit includes minimal changes to the tests to show
very little change in behavior (mostly the new `html` attribute of the
`toc_tokens` was added). A refactoring of the tests will be in a separate
commit.
waylan added a commit that referenced this issue Mar 8, 2024
* All postprocessors are run on heading content.
* Footnote references are stripped from heading content. Fixes #660.
* A more robust `striptags` is provided to convert headings to plain text.
  Unlike, the `markupsafe` implementation, HTML entities are not unescaped.
* The plain text `name`, rich `html` and unescaped raw `data-toc-label` are
  saved to `toc_tokens`, allowing users to access the full rich text content of
  the headings directly from `toc_tokens`.
* `data-toc-label` is sanitized separate from heading content.
* A `html.unescape` call is made just prior to calling `slugify` so that
  `slugify` only operates on Unicode characters. Note that `html.unescape` is
  not run on the `name` or `html`.
* The `get_name` and `stashedHTML2text` functions defined in the `toc` extension
  are both **deprecated**. Instead, use some combination of `run_postprocessors`,
  `render_inner_html` and `striptags`.

Co-authored-by: Oleh Prypin <oleh@pryp.in>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Bug report. extension Related to one or more of the included extensions. someday-maybe Approved low priority request.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants