Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kerning is ignored between tags #648

Open
ghost opened this issue Sep 19, 2022 · 8 comments
Open

Kerning is ignored between tags #648

ghost opened this issue Sep 19, 2022 · 8 comments
Labels

Comments

@ghost
Copy link

ghost commented Sep 19, 2022

With Kerning: yes under [Script Info], the kerning normally renders as expected, but in karaoke mode between the syllables it seems it's ignored or at least extra space is added (tested with VLC and ffmpeg). I think, that's a bug or is there some extra setting like KerningKaraoke: yes which can activate this?

The same applies to ligatures, but there it's probably maybe expected because the color border inside the ligature would not be clear. But with kerning it is clear, I think.

Edit: it seems to be ignored between tags in general, probably the parser should understand that a word can still be a whole word, when there is a tag inside it.

Missing kerning between syllables

kerning_ignored_with_karaoke_2

Only the second word has the expected kerning, because it has not tags in it.

Source code:

{\k100}{\k100}ein{\k100}jährig\Neinjährig

Missing kerning between tags (here italics)

kerning2

Here, both words have ignored the kerning.

Source code:

{\k1}{\k100}ein{\k100}jährig\Nein{\i1}jährig{\i0}

I also tried the underscore tag, but VLC crashed immediately there.

@ghost ghost changed the title Kerning is ignored in karaoke mode Kerning is ignored between tags Sep 19, 2022
@astiob
Copy link
Member

astiob commented Sep 19, 2022

I also tried the underscore tag, but VLC crashed immediately there.

Use VLC 3.0.18-rc or 3.0.16 (or a non-Windows VLC 3.0.17, or another player such as mpv).


Generally, everything in ASS is terminated at override tags. What is your use case: are you making a subtitle file for public distribution, or is this an internal on-the-fly representation in a karaoke program of some sort?

If it’s the former (and the fact you’re testing in VLC suggests it is), it’s impossible to achieve portable kerning across karaoke (or other) tags anyway, so your only option is to explicitly position the word parts, as horrible as that sounds. And you also need to add an invisible “complex-layout” character such as U+200B ZERO WIDTH SPACE to each (!) chunk to trigger kerning in non-libass renderers within those chunks.

But if it’s the latter, we could in theory kern across tags when ASS_FEATURE_WHOLE_TEXT_LAYOUT is enabled via the libass API or Encoding is set to -1. I believe this would require us to shape text across tags, because HarfBuzz intentionally avoids providing a dedicated kerning API as kerning is a small part of shaping. I think this entails (and may be as simple as) making all three of these lines conditional on whole_text_layout:

libass/libass/ass_shaper.c

Lines 901 to 903 in 5b0dba4

info->starts_new_run ||
(!shaper->whole_text_layout && info->hspacing) ||
last->flags != info->flags))

except DECO_ROTATE in flags (until we implement proper vertical layout via HarfBuzz).

However, this would also break ligatures: consider

{\k100}af{\k100}fine

In a font with an “ffi” ligature, which are common, shaping across the karaoke tag would cause the whole “ffi” sequence to be a single glyph. We couldn’t paint the first “f” in one colour and the remaining “fi” in a different colour. So we’d have to display this as if it were this:

{\k100}affi{\k100}ne

which is considerably different. This is, in fact, what happens in Web browsers: see for yourself. But I’m not sure this is a very good example to follow.

@TheOneric
Copy link
Member

This [, i.e. colouring the whole ligature like the first letter,] is, in fact, what happens in Web browsers: see for yourself. But I’m not sure this is a very good example to follow.

Minor correction: that only appeas to apply to Blink-based browsers like Chromium (and for full transparency, I tested this with Carlito as an alias for Calibri, but it too has a ffi ligature).
Gecko (Firefox) uses the ffi ligature and splits the colour where it thinks the first letter should end. If you look closely you'll notice some parts of the first f already use the second colour.
Webkit doesn't do ligatures across colouring bounds, so the first f is not part of a ligature and uses the default colour, while the second f and i get combined into a fi ligature coloured silver.
webbrowsers

@astiob
Copy link
Member

astiob commented Sep 20, 2022

Thanks!

Worth nothing that WebKit (based on a Safari 13 test on Catalina… the newest I could easily get my hands on via an online testing service) also doesn’t apply kerning in that case, just like libass.

@ghost
Copy link
Author

ghost commented Sep 20, 2022

Thank you for the detailed answer!

My use case is similar to the latter, I render the subtitles using ffmpeg
but saw that VLC often renders similiar, so I test it with VLC before, to see a first impression on the fly.

I think, the best would be to support kerning between tags if this is set by the user using a property.
Because maybe kerning between tags is not wanted for all languages or all fonts.
Ideally, it would be possible to specify a list of tags where kerning should
be done or the opposite, a list where it should not be done.
For example a word which has an italic tag in it would not always play together with the kerning
given to a font which does not a have a italic variant.
But I see that this becomes complex, so it would be nice to have a property like
KerningBetweenTagsWithinWords: yes to first support this in general,
this would solve 99% of all cases, I assume.
Maybe later another property could be introduced to kern only tags in a given list,
if someone wants to implement this.

Regarding ligatures, for some languages or fonts they are very important, and
ligatures between tags are wanted. For now, I worked around this for the karaoke-mode by splitting
syllables at incorrect positions to keep the ligatures. For example in German,
many words have their syllables split between s and t, and some fonts have a st-ligature.
To keep this, I split before st.

st_lig

I understand that supporting ligatures between tags is even more difficult but
again a property like LigaturesBetweenTagsWithinWords: yes could also make this
user-dependent.
An easy algorithm that should match most of the time by +-10% would be to look at the number of
glyphs that mapped to the ligature (f f i -> ffi (len of input glyphs=3) and then split the ligature
in 3 parts of same size. We approximate that the first part is the first glyph and so on.
Another option would be to just take the widths of each single glyph from the input sequence,
norm them to 1 and then use the same relative percentage width for the ligature instead of a fixed one.
For example if the width of f is more than the width of i in a given font, we would expect a distribution of like [0.4, 0.4, 0.2] for ffi-ligature
to match the letters f, f and i, compared to the fixed one which would be [0.33, 0.33, 0.33].
It could be also possible to supply user-defined lists, where each list sums up to 1.0, like:
LigatureSplittings: [["ffi", [0.35, 0.35, 0.3]], ["st", [0.55, 0.45]]]
The best would probably to support all three and give them numbers, then the user could set the more appropriate method like LigatureSplittingMethod: 1/2/3

But this again looks like a complex task and the easiest method would probably be enough.
My guess is, that the first, most trivial method, gives the best result, because none of them
can match with 100% accuracy. But In my opinion that's still better than replacing the ligatures.
The method shown here also looks good enough. Still better than ignoring the ligatures, for my fonts.

I think supporting ligatures between tags and then coloring only the whole ligature at once
in karaoke-mode is not ideal. So, here such an approximation algorithm would be the best automatic way, I think.

@astiob
Copy link
Member

astiob commented Jan 21, 2023

By the way… You showed an example with italics, but how do you even expect it to work? The italic font and the upright font are two different font faces. There literally is no kerning defined between an upright n and an italic j.

@ghost
Copy link
Author

ghost commented Jan 21, 2023

No, they are the same font. It's faked italics by angle (and thus there is kerning defined). But this was obviously just an example to show that tags break kerning in general. Maybe it does not make sense for some tags, but for the \k tags and so on it's crucial.

@astiob
Copy link
Member

astiob commented Feb 9, 2023

that only appeas to apply to Blink-based browsers like Chromium

Hmm, I’ve just noticed another curious thing. I have no idea what exactly is going on here, but Chromium (Edge) is letting me select “f” and “i” separately when selecting text that displays the “fi” ligature, even if it’s <span>f</span>i, until I apply a different style (I tried color) to the “f”. Then it displays the whole ligature with this new style, as we saw before, and stops letting me selecting the letters separately. Separate selection stays unavailable even after I remove the style and even after I remove the <span> in the inspector. At this point, the inspector shows the “f” and “i…” as distinct text nodes, but clearly they’re rendered rather as “fi” and “…”. When I remove the “f” node and readd “f” to the “i…” text node, I can select the letters individually again, even though there are no visual changes in the text at any point.

@astiob
Copy link
Member

astiob commented Mar 21, 2024

Still no plans to implement this soon, but for posterity: it turns out some fonts have a Ligature Caret List table that HarfBuzz exposes via hb_ot_layout_get_ligature_carets. This may be how Chromium let me select the individual letters, and this may be how Gecko decided where to change the colour.

When the table is absent (or presumably when we can’t make sense of it because the number of caret points differs from the number of clusters we got), back in 2012 Behdad recommended splitting the width into equal parts (and opposed the idea of trying to look up the individual glyphs and making the split proportional to their widths).

(I’m confused by the comment about kerning in HarfBuzz’s doc, though. It was first mentioned here but was never explained in detail. I don’t understand how kerning is relevant.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants