Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Diacritics with Cyrillic letters #72

Closed
codepointless opened this issue Apr 2, 2020 · 10 comments
Closed

Diacritics with Cyrillic letters #72

codepointless opened this issue Apr 2, 2020 · 10 comments
Labels
addressed in source files completed in dev versions, but not yet released as fonts

Comments

@codepointless
Copy link

In Russian and several other languages that use the Cyrillic script, accented syllables can be shown with a stress mark above the vowel. In Source Serif Pro, some vowels with diacritics are rendered incorrectly:
Буквы
(I have no idea what’s wrong with the capital letters, I tried both TTF and OTF with the same result.)

For all Cyrillic letters, diacritics should be positioned just above the letter. I suppose ѣ is the least obvious, so here are some examples of ѣ with acute and diaeresis from an old dictionary (when this letter was in use):
Медвѣдъ

Вѣсить

I also have a question: is it possible to make all Cyrillic letters look decent with combining marks above and below? I understand that you don’t want to do it manually, but maybe there’s a way to make something passable automatically. It’s true that we don’t use diacritics in usual texts, but sometimes it may come in handy.

@frankrolf
Copy link
Member

Thank you.
To make sure we are on the same page:

  • which are the diacritics that may be added on top of Cyrillic base glyphs? I know of acute, but also see diaeresis in your example. Grave?
  • is it also possible to place diacritics below Cyrillic letters?
  • it would be possible to make all Cyrillic letters receive combining accents, this is just a matter of preparing the data. However, do you have a list of letter which may receive accents at all?

Notwithstanding all of the above, in the screenshot you provide it looks like the combining acute accent is not used. Note that the combining acute (U+0301) is a non-spacing glyph. Here is an example using the combining acute with the currently shipping fonts, as seen on https://adobe-fonts.github.io/source-serif-pro/ .
image
I am not saying this is perfect, but slightly better than the example you’re showing.

@codepointless
Copy link
Author

codepointless commented Apr 3, 2020

Wow, it’s interesting. All small letters in your picture look fine, although I would use the ‘slanted‘ acute for ‘high’ letters like ѣ and ё. Of course I used the combining acute (U+0301), and the spaces you see are real spaces (U+0020). And that’s what I see in four different browsers:
Small letters
Can it be some Windows-related issue? Can you explain what internal mechanism is used to raise the acute and place above the letter б in your picture? Maybe it could help me understand what exactly goes wrong on my computer. The capital letters are the most mysterious cos I get this:
Capital italics

Now, about combining marks in Cyrillic script.

The letter ѣ with two dots above is rather a typographical experiment than a conventional letter, and it’s not the most strange thing I’ve seen in Russian books from the beginning of the last century. These experiments were soon stopped by communists who removed Ѣ from Russian alphabet.

As far as I know, diacritics are not actively used in all modern languages with Cyrillic alphabets (except if they are part of letters like й and ќ, of course). Some Slavic languages use cobining marks to show word accent. In Russian, we use acute for primary stress and grave for secondary stress, they can be put above vowels only. I’m quite sure the same is true for Ukrainian, Belorussian and Rusyn, and here’s the list of our vowels:
Аа Ее Ёё Єє Ии Іі Її Оо Уу Ыы (Ѣѣ) Ээ Юю Яя (Ѵѵ)
(Ѣ and Ѵ are not currently in use).

In Bulgarian, accented vowels are denoted by grave (or sometimes by acute), and the letter
Ъъ
is also a vowel, so it also can be stressed:
жъртва дъно
Perhaps Macedonian uses grave in the same way, but I’m not quite sure.

Serbs have much more fun with their pitch accent. Acute, grave, double grave, inverted breve and macron above vowels and syllabic consonants are used to show tones in Serbian (I’ve seen accented р, but probably there are other consonants):
братан брз
(Well, probably these marks can only be found in dictionaries.) There’s also Montenegrin language... Let’s hope they didn’t invent something else.

Besides Slavic languages, there are many non-Slavic languages that use Cyrillic script. They have plenty of interesting letters like Є̈, И̃ and Ю̆ that aren’t presented in Unicode and have to be composed from letters with diacritic marks. Some list of those letters can be found on Wikipedia, it also includes some letters with diacritics below (such as Г with cedilla), but I can’t guarantee this information is correct.

This was about usual texts. But when it comes to linguistics, I would expect everything. Russian letters are used in Russian phonetic notation. This is what can happen to vowels:
Vowels
And this is what can happen to consonants:
Consonants
This is transcription of Czech words (in italics):
чешский
And this is transcribed passage in Rusyn dialect I’ve just found on Wikipedia:

Мốї ма́мi. Нно, та в’ĭн мно́го чếляде так шчо.. обы̊́ хво́рi бы̊́ли.

By the way, this ы̊́ also looks wrong in my browser (in Arial), so maybe something is wrong with my computer...

@frankrolf
Copy link
Member

Thank you so much for all this! To answer your questions:

Can you explain what internal mechanism is used to raise the acute and place above the letter б in your picture?

  • The internal mechanism used is the mark feature, which defines a group of base glyphs, and which combining accents those may be joined with. Inside the font source, this communication happens through assignment of anchors.

Can it be some Windows-related issue?

  • I don’t know. Possibly? But AFAIK Windows should support the mark feature – depending on the application, of course.

The capital letters are the most mysterious cos I get this:

All that said, I am not sure why all lowercase letters show semi-accurate positioning of the combining mark. It may well be there’s some macOS magic going on (I am using Safari 13.1). When I make a similar test in InDesign, the outcome is more in line with what I expect, based on anchors in the UFO files:
image

One more remark:
I don’t expect any use of combining accents above already-accented glyphs such as ёїйќ, but will prepare anchors for non-accented base glyphs.

@codepointless
Copy link
Author

Yes, I’ve installed the latest version of the font. Moreover, the test page on guthub uses its own web-font and doesn’t give me a choice, right? Did you try accented italic capital letters in InDesign? It looks like some bug in the font, because accented Latin letters (even not precomposed) look fine.

I don’t expect any use of combining accents above already-accented glyphs such as ёїйќ

These are separate letters, so they can also be used with diacritics above (especially ё and ї, since they are vowels):
четырёхвёсельный
In the last picture in my previous comment, you can see й́ in phonetic notation.

@throwaway571
Copy link

Related to this, the Montenegrin spelling standard, sometimes considered a separate language for political reasons, may use the letters С́ с́ and З́ з́ for palatalized sibilants spelled сј and зј in other varieties of Serbo-Croatian.

frankrolf added a commit that referenced this issue Aug 21, 2020
- add anchors to З Р Ъ Ѣ Ѵ з р ъ ѣ ѵ and related glyphs
- add Cyrillic pre-composed glyphs to ccmp decomposition lookup

This completes anchor support for Cyrillic vowels as discussed in #72 (Аа Ее Ёё Єє Ии І Оо Уу Ыы Ээ Юю Яя already had anchors), and adds support for using combining accents in Bulgarian and Montenegrin.
@frankrolf frankrolf added the addressed in source files completed in dev versions, but not yet released as fonts label Aug 21, 2020
@frankrolf
Copy link
Member

The latest release of Source Serif has better support for adding combining marks to Cyrillic base characters:
https://github.com/adobe-fonts/source-serif/releases/tag/4.004R

@codepointless
Copy link
Author

Now diacritics above work correctly with Cyrillic vowels, but there’s still at least one bug.

If ё is combined with diacritics, it looks like ӥ. For example, the sequence ё́ (U+0451 U+301) gives this:
ё́

@frankrolf
Copy link
Member

Wow, that’s something! Thanks for the report!
I am opening a new bug for this.

@codepointless
Copy link
Author

Thanks. By the way, are there absolutely no plans to add anchors to consonants and anchors for combining marks below?

@frankrolf
Copy link
Member

When I worked on this addition, I considered adding anchors to the consonants too – but I did not know where I’d stop with that, so I kept it at vowels.
Source Serif does not even support linguistics (IPA) for Latin yet, so a focus on linguistic use in Cyrillic could be part of a future update.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
addressed in source files completed in dev versions, but not yet released as fonts
Projects
None yet
Development

No branches or pull requests

3 participants