[Bug] Letter-by-letter pronunciation not possible #2619

eginhard · 2023-05-16T15:27:10Z

Describe the bug

I'd like to force the TTS model to pronounce a word letter by letter, e.g. "ARD" should be pronounced "A R D" (/ˌeɪˌɑːɹdˈiː/). In systems with SSML support (#752) you could use <speak><say-as interpret-as="verbatim">ard</say-as></speak>, but another way would be fine as well.

Espeak supports this even for words not in its dictionary by adding periods between the characters: espeak-ng --ipa -v en-us "A.R.D." is read /ˌeɪˌɑːɹdˈiː/.

This doesn't work in Coqui because the input for Espeak is split at punctuation characters and each chunk ["A", "R", "D"] is phonemized separately:

TTS/TTS/tts/utils/text/phonemizers/base.py

Line 129 in bc0a532

text, punctuations = self._phonemize_preprocess(text)

This results in the word, not the letter pronunciation of "a" being chosen (ɐ instead of eɪ). I could change _phonemize_preprocess() to pass the input to Espeak with punctuation included, but I'm not sure about the side effects. Is there a specific reason to do it this way?

To Reproduce

from TTS.api import TTS
p = TTS(model_name="tts_models/en/ljspeech/vits", gpu=False).synthesizer.tts_model.tokenizer.phonemizer
p.phonemize("A.R.D.")

Output: 'ˈɐ.ˈɑːɹ.d|ˈiː.'

Expected behavior

Expected output: ˌeɪˌɑːɹdˈiː

Logs

No response

Environment

{
    "CUDA": {
        "GPU": [],
        "available": false,
        "version": "11.7"
    },
    "Packages": {
        "PyTorch_debug": false,
        "PyTorch_version": "2.0.0+cu117",
        "TTS": "0.10.2",
        "numpy": "1.22.4"
    },
    "System": {
        "OS": "Linux",
        "architecture": [
            "64bit",
            "ELF"
        ],
        "processor": "x86_64",
        "python": "3.10.8",
        "version": "#42~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Tue Apr 18 17:40:00 UTC 2"
    }
}

Additional context

No response

The text was updated successfully, but these errors were encountered:

erogol · 2023-05-16T16:22:39Z

It is possible but needs a model that is capable of doing that. I close this since it is not a dev issue.

eginhard · 2023-05-16T20:37:17Z

The output is fine with the tts_models/en/ljspeech/vits model. E.g. "ABC CNN ESPN" is correctly read letter by letter because all words are marked as abbreviations in the Espeak dictionary.

Espeak doesn't know about other letter sequences, like "ARD" or "ABCDEFGHIJKLMNOPQRSTUVWXYZ", and tries to read them as a word. I can force it to phonemize them as letter sequences by adding periods between each letter, but Coqui strips all punctuation before calling Espeak. Changing

TTS/TTS/tts/utils/text/phonemizers/base.py

Line 104 in bc0a532

return self._punctuator.strip_to_restore(text)

to return [text], [] fixes this and results in correct letter-by-letter output for "A.R.D" and "A.B.C.D.E.F.G.H.I.J.K.L.M.N.O.P.Q.R.S.T.U.V.W.X.Y.Z". But I'm not sure if there is a specific reason that Coqui strips the punctuation there and changing it wouldn't cause other issues?

Previously, the text was wrapped in an additional set of quotes that was passed to Espeak. This could result in different phonemization in certain edges and caused the insertion of an initial separator "_" that had to be removed. Compare: $ espeak-ng -q -b 1 -v en-us --ipa=1 '"A"' _ˈɐ $ espeak-ng -q -b 1 -v en-us --ipa=1 'A' ˈeɪ Fixes coqui-ai#2619

Previously, the text was wrapped in an additional set of quotes that was passed to Espeak. This could result in different phonemization in certain edges and caused the insertion of an initial separator "_" that had to be removed. Compare: $ espeak-ng -q -b 1 -v en-us --ipa=1 '"A"' _ˈɐ $ espeak-ng -q -b 1 -v en-us --ipa=1 'A' ˈeɪ Fixes #2619

eginhard added the bug Something isn't working label May 16, 2023

erogol closed this as completed May 16, 2023

eginhard mentioned this issue Nov 22, 2023

Don't pass quotes to espeak #3286

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] Letter-by-letter pronunciation not possible #2619

[Bug] Letter-by-letter pronunciation not possible #2619

eginhard commented May 16, 2023

erogol commented May 16, 2023

eginhard commented May 16, 2023

[Bug] Letter-by-letter pronunciation not possible #2619

[Bug] Letter-by-letter pronunciation not possible #2619

Comments

eginhard commented May 16, 2023

Describe the bug

To Reproduce

Expected behavior

Logs

Environment

Additional context

erogol commented May 16, 2023

eginhard commented May 16, 2023