Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Unicode superscripts for HTML note markers #9437

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

silby
Copy link

@silby silby commented Feb 8, 2024

Since HTML doesn't have semantic "footnote" elements, Pandoc has
historically used the <sup> tag to mark the numeric reference to
footnotes. In some fonts, depending on line-spacing, the common default
<sup> style of "font-size: smaller; vertical-align: super;" doesn't look
very good, spilling beyond the font's cap height and making browsers
add extra space at the top of the text line.

Many fonts include characters from the Unicode superscripts and
subscripts block (https://unicode.org/charts/nameslist/n_2070.html)
which are designed to function as footnote markers. Using these
characters to render note marks, instead of a <sup> tag, yields better
typographical results in these cases without additional CSS. The <sup>
tag is purely typographical so losing it from the output doesn't cost
anything semantically.

This diff adds a --note-style option to pandoc, taking the values
"sup-tag" (the default and hitherto only method) and
"unicode-superscript" (print marks using superscript chars, no
surrounding tag).

Due to the nature of Note output in the HTML writer, a Lua filter cannot
really customize how footnote marks are printed, justifying a writer
option here. An alternative to adding this feature to Pandoc would be
for authors to use CSS like 'a.footnote-ref sup { font-size: inherit;
vertical-align: inherit; font-feature-settings: "sups"; }' which would
work for fonts where the "sups" OpenType feature replaces digits with
their superscript forms. That solution only works for fonts encoding
that feature though; Times New Roman on my system has the superscript
characters but do not support the "sups" OpenType feature.

Future work could extend support for this writer option to plain output
and possibly other formats where note marks are emitted by Pandoc rather
than the renderer of the output document. (The present author has not
studied whether there are such writer formats.)

Since HTML doesn't have semantic "footnote" elements, Pandoc has
historically used the <sup> tag to mark the numeric reference to
footnotes. In some fonts, depending on line-spacing, the common default
<sup> style of "font-size: smaller; vertical-align: super;" doesn't look
very good, spilling beyond the font's cap height and making browsers
add extra space at the top of the text line.

Many fonts include characters from the Unicode superscripts and
subscripts block (https://unicode.org/charts/nameslist/n_2070.html)
which are designed to function as footnote markers. Using these
characters to render note marks, instead of a <sup> tag, yields better
typographical results in these cases without additional CSS. The <sup>
tag is purely typographical so losing it from the output doesn't cost
anything semantically.

This diff adds a --note-style option to pandoc, taking the values
"sup-tag" (the default and hitherto only method) and
"unicode-superscript" (print marks using superscript chars, no
surrounding tag).

Due to the nature of Note output in the HTML writer, a Lua filter cannot
really customize how footnote marks are printed, justifying a writer
option here. An alternative to adding this feature to Pandoc would be
for authors to use CSS like 'a.footnote-ref sup { font-size: inherit;
vertical-align: inherit; font-feature-settings: "sups"; }' which would
work for fonts where the "sups" OpenType feature replaces digits with
their superscript forms. That solution only works for fonts encoding
that feature though; Times New Roman on my system has the superscript
characters but do not support the "sups" OpenType feature.

Future work could extend support for this writer option to plain output
and possibly other formats where note marks are emitted by Pandoc rather
than the renderer of the output document. (The present author has not
studied whether there are such writer formats.)
@silby
Copy link
Author

silby commented Feb 8, 2024

PR is missing changes to MANUAL.txt but I can contribute that once any preliminary acceptance/bikeshedding is complete.

@jgm
Copy link
Owner

jgm commented Feb 9, 2024

Interesting idea! The only part I have reservations about is the new option; I really like to avoid adding options if at all possible. Hence I'm wondering how widely supported the superscripted characters are in fonts. E.g. are they found in the standard "web fonts"? If they are very widely supported, perhaps we could get away with just making this the standard behavior?

@silby
Copy link
Author

silby commented Feb 9, 2024

If they are very widely supported, perhaps we could get away with just making this the standard behavior?

My comment in the commit message that "Times New Roman on my system has the superscript characters" was misleading…I didn't check all the numbers. Here's a test page styling these characters with the classic "web-safe fonts", plus serif and sans-serif just in case. On my Mac (screenshot below) it seems like only Georgia supports superscripts 0 through 9. Verdana supports 1-5, 7, and 8, suggesting these are actually glyphs for fraction numerators that are also present at the superscript code point. The remainder only have superscript 1-3, which were present in Latin-1. The lowest common denominator weighs against using Unicode superscripts as a default; the author will always have to provide a font that has them.

Screenshot 2024-02-08 at 10 58 52 PM

I agree that adding a command-line option for this subtle thing that only affects one output format is undesirable. One can imagine a filter-like callback in Lua like NoteMark(n) that you can define to return the desired mark for note number n; someone could use this to switch out numbers for *, **, †, ‡, §, ‖, ¶, or the like. But the hypothetical NoteMark(n) function doesn't really have a place where it belongs in Pandoc's scripting capabilities: it's not a filter, because the note mark is never in the AST, and custom writers have to cover every node of the AST, and I really don't want to do that. I can fake this with filters: I'd have to write a Note filter that does basically all the work that the Note handler in the HTML writer does, including getting the notes emitted at the end of the document. I didn't really feel like doing that. I don't think this idea of a writer callback exposed in Lua is actionable under this PR other than maybe making a fresh issue out of it if you like it enough.

One other non-actionable idea after thinking about all this is a future breaking release could axe the dedicated command-line flags for a bunch of Pandoc's less common knobs in favor of something like ssh -o, which just lets you set any option you can put in ssh_config. This could reduce (or just disguise?) the pain of periodically adding more knobs, and might make the man page easier to maintain.

The least intrusive thing I can do for myself is write something tiny to pipe Pandoc's HTML output into and just replace the <sup>s with the Unicode version, which I don't really mind doing.

Hope this is all food for thought; I completely understand and can't really disagree if you just want to close this as too niche to support with an option and not immediately tractable in any other way.

@jgm
Copy link
Owner

jgm commented Feb 9, 2024

Ah, too bad.

One option could be a custom writer that just calls the normal writer and then does a pattern substitution on the formatted footnote references. This would avoid the need for piping into an external script, so it might be just slightly nicer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants