Word level mapping of phonemized output #96

mmmaat · 2021-12-03T14:54:29Z

Suugested by @CorentinJ, see his implementation here.

At some point we got interested in being able to map from characters in the input text of our TTS system to its audio output. That required being able to map from an orthographic input to its phonemized output. Since your library does not provide such mappings, and since espeak doesn't seem to either, I wrote an algorithm to figure them out. It operates at the word level, and we verified that it is correct even for complex edge cases.

It must still be decided how to implement that:

As a custom word separator: juːɾuːbɚz [Youtubers] noʊ [no] lɑːŋɡɚ [longer] bᵻlɔŋ [belong] ɔnðɪ [on the] ɪntɚnɛt [internet].

As an extension of the prepend-text option with a tree-like structure as output:

[('Youtubers no longer belong on the internet': [
    ('Youtubers', ['juː', 'ɾuːbɚz']),
    ('no', ['noʊ']),
    ...
    ('internet', ['ɪntɚnɛt'])
]]

A completely new option

This feature seems to be incompatible with phone/syllable separators. What about punctuation preservation?

The text was updated successfully, but these errors were encountered:

trenslow · 2023-05-09T08:20:31Z

First off, thanks for the awesome tool. It has made my life so much easier in a lot of respects.

Has there been any progress made on this topic? It's something that could be really useful.

One simple solution I tried was to use the same word separator that I sent to .phonemize to split the original text. However, sometimes eSpeak-NG merges words, so the number of 'words' I get back from .phonemize doesn't align with the number I get back from the original text split (e.g. the "That's it, words are merged." example from the documentation).

I don't think the merging is configurable on the eSpeak side, as the merging comes from the underlying pronunciation dictionaries. So what remains is sending the split words one-by-one. This also isn't perfect, as you lose information about e.g. sandhi effects.

Maybe there's some information flowing from eSpeak about which words are merged which could be provided to phonemizer users? That would allow people to at least have the choice of what to do about that information.

mmmaat · 2023-05-09T10:18:31Z

No progress by now... Did you try the code by @CorentinJ here?

trenslow · 2023-05-24T13:20:07Z

i didn't, as my use case is for languages other than english

CorentinJ · 2023-05-24T14:32:59Z

Word-level mappings should work for all languages with the algo I provided.

trenslow · 2023-05-25T12:26:04Z

Ah ok! I will explore it ASAP.

mmmaat · 2023-05-25T12:42:58Z

If someone want to do a PR with that, it will be great, I have no time for this project in the next few months...

mmmaat added the feature request label Dec 3, 2021

mmmaat mentioned this issue Oct 11, 2023

Do this phonemizer support mixed language? #156

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Word level mapping of phonemized output #96

Word level mapping of phonemized output #96

mmmaat commented Dec 3, 2021 •

edited

Loading

trenslow commented May 9, 2023

mmmaat commented May 9, 2023

trenslow commented May 24, 2023

CorentinJ commented May 24, 2023

trenslow commented May 25, 2023

mmmaat commented May 25, 2023

Word level mapping of phonemized output #96

Word level mapping of phonemized output #96

Comments

mmmaat commented Dec 3, 2021 • edited Loading

trenslow commented May 9, 2023

mmmaat commented May 9, 2023

trenslow commented May 24, 2023

CorentinJ commented May 24, 2023

trenslow commented May 25, 2023

mmmaat commented May 25, 2023

mmmaat commented Dec 3, 2021 •

edited

Loading