How to fix pronunciation issues #5

fquirin · 2021-12-19T23:27:38Z

I've been experimenting with txt2pho and MBROLA and noticed something odd:

The sentence Heute ist der 19.12.2021 will abort after 19. (pronounced "neunzehnte") :-/.

I'm using this command:

echo "Heute ist der 19.12.2021" | iconv -cs -f UTF-8 -t ISO-8859-1 | ./txt2pho -m | mbrola /usr/share/mbrola/de2/de2" - test.wav

I was wondering where one can edit the rule that is responsible for this behavior.

Besides that there is a light problem with numbers at the end of a sentence, because they will always be spoken as ordinals:
"Er wurde Heute 40.". Though without context it is indeed unclear if "Er wurde Heute vierzig" or "Er wurde Heute Vierzigster" is the right version ^^.

Btw I'm avoiding 'preproc' because it has it's own set of issues 😅 🙈

The text was updated successfully, but these errors were encountered:

GHPS · 2021-12-20T22:09:20Z

I was wondering where one can edit the rule that is responsible for this behavior.

Well - this is not a bug in txt2pho...

The programs in this repo serve very different purposes. txt2pho is responsible
for converting text to phonems - just plain text, no complex numbers, fractions, times, dates
or currencies since that is a completely different kind of problem. This latter problem
is tackled by the preprocessor preproc.

At first glance both problems seem to be quite easy to solve. As always with natural
languages both become much harder the closer one looks at them.

Take your example

echo "Heute ist der 19.12.2021" | ./preproc data/PPRules/rules.lst data/hadifix.abk

gets translated to

Heute ist der 19 12 zwei tausend einundzwanzig

which means that the ordinals are not spoken correctly. OK, let's fix rules.lst so we get

Heute ist der 19n 12n zwei tausend einundzwanzig

which is better but still incorrect since in German ordinals are declined in the sentence.
So we need an algorithm which understands the different parts of language in a sentence
and translates "19." into "19n", "19te" or "19ter" respectively.

Besides that there is a light problem with numbers at the end of a sentence,
because they will always be spoken as ordinals:

Yes, that is basically a similar problem. The preprocessor must be able to understand
the meaning of "40." at the end of the sentence.

At the moment no-one has volunteered to write such an elaborated version of preproc...

PS: If you have a text output coming from an another programs it's rather easy to
ensure the correctness of the spoken output:

Heute ist der 19te 12te 2021.

fquirin · 2021-12-20T22:34:20Z

I'm aware of the complexity but I'm confused why 19.12.2021 becomes neunzehnte and that's it. The rest is removed completely! And I'm not even using preproc (see example above).
I'm catching some edge cases already in the assistant TTS preprocessor (e.g. 10:30 Uhr -> 10 Uhr 30), but as you know dates are extremely messy in German 😬 ... so I was hoping to get "neunzehn punkt zwölf punkt zweitausendeinundzwanzig" for now from txt2pho as in espeak for example.

GHPS · 2021-12-20T23:13:20Z

I'm aware of the complexity but I'm confused why 19.12.2021 becomes neunzehnte and that's it.

OK, let's focus on txt2pho...

echo "Heute ist der 19.12.2021"|./txt2pho | mbrola -e /usr/share/mbrola/de2/de2 - test.wav

becomes

Heute ist der 19 Punkt 12 Punkt 2021

or

_ 10   0  86 
h 81  23  88  48  89  73  91  98  92 
OY 121  15  94  31  96  48  98  64 100  81 101  98 102 
t 83  14 104  39 104 
@ 58  24 104  59 104  93 104 
_ 41  39 103  88 103 
I 46  33 103  76 102 
s 69  13 102  42 101  71 101 100 100 
t 70  29  98 
d 48 
e: 57   4  96  39  95  74  95 
6 70   7  94  36  96  64  98  93 100 
n 53  28 101  66 102 
OY 109   2 103  20 103  39 103  57 103  75 103  94 103 
n 56  23 102  59 102  95 102 
t 66   5 101  35 100 
s 63  17 100  49  99  81  99 
e: 56  14  98  50  97  86  97 
n 57  21  96  56  95  91  94 
p 92  15  92  37  91 
U 66   5  91  35  93  65  95  95  97 
N 60  28  98  62 100  95 101 
k 63   3 102  35 103 
t 52  17 102 
s 57  16 101  51 101  86 100 
v 37  32 100  86  99 
9 64  23  98  55  98  86  97 
l 61  18  97  51  99  84 100 
f 57  18 101  53 102  88 102 
p 98   7 102  28 102  48 101 
U 66  23 101  53 101  83 100 
N 61  15 100  48  99  80 100 
k 64  20 102 
t 54  33 102 
s 58  29 100  64 100  98  99 
v 33  58  99 
aI 119   5  98  22  97  39  97  55  96  72  95  89  95 
t 68  15  98  44  99 
aU 84  23  98  46  98  70  98  94  98 
z 34  44  97 
E 57   2  97  37  96  72  96 
n 49   8  95  49  95  90  94 
d 49   2  92 
aI 100   6  92  26  91  46  90  66  90  86  89 
n 20  30  88 
U 51  12  88  51  87  90  87 
n 52  29  86  67  86 
t 46  37  85 
s 46  13  85  57  85 100  85 
v 15 
a 75   7  85  33  85  60  84  87  84 
n 48  21  83  62  83 
t 39   5  81 
s 49   6  81  47  80  88  79 
I 43  33  79  79  78 
C 63  17  77  49  77  81  76 
s 60  13  76  47  76  80  76 
_ 483   2  85   6  85  10  85  14  85  18  85  22  85  27  85  31  85  35  85  39  85

fquirin · 2021-12-20T23:32:37Z

Ah sorry, there was a dot missing. Try this:
echo "Heute ist der 19.12.2021."|./txt2pho | mbrola -e /usr/share/mbrola/de2/de2 - test.wav

GHPS · 2021-12-22T12:55:10Z

Yes - can see the problem now.

I'm wondering what the cause for this strange behaviour is.

fquirin · 2021-12-23T10:30:34Z

I thought it was a rule defined somewhere since it actually transforms 19. to ordinal. Maybe it fails to handle 2021. but then I'd expect to hear 12. at least.
Any files I could check for ordinal transformation?

fquirin · 2022-04-19T07:58:24Z

Yes - can see the problem now.
I'm wondering what the cause for this strange behaviour is.

Did you find out anything new about this? I've seen there were some recent commits related to dates 🙂

fquirin · 2022-05-12T15:40:20Z

@GHPS I've integrated txt2pho in the latest SEPIA-Home release 🙂 . Here are instructions to install it.

I really like the voices but from time to time I find some strange artifacts (that don't appear in espeak or default MBROLA). For example if you ask SEPIA for the date you will get the answer "Heute ist der 12.05.2022" but what you hear is really weird: "Heute ist der zwölft punkt null fünf null zwei null zwei zwei punkt zweitausendzweiundzwanzig" 😅 🙈 .

My "speak" script looks like this (arguments: gender, voice, text):
echo "$3" | iconv -cs -f UTF-8 -t ISO-8859-1 | ./txt2pho "-$1" | mbrola /usr/share/mbrola/"$2"

GHPS · 2022-05-12T20:26:32Z

@GHPS I've integrated txt2pho in the latest SEPIA-Home release slightly_smiling_face . Here are instructions to install it.

Great - SEPIA is a very promissing project. I'll link to the instructions in the readme of this project.

Concerning the pronunciation issue: The log-files should give some insight what is going on/wrong.

I'll take a deeper look into the code in the next week...

fquirin · 2022-05-13T07:59:35Z

Great, thanks! 🙂
I'll try to fix the problem with dates in SEPIA's own TTS pre-processor in the meantime. It seems German dates have been a pain for TTS since the dawn of time =)

fquirin · 2022-10-05T07:31:20Z

Hi @GHPS

I found another issue with the pronunciation, again related to "." after numbers 😢.

echo "Licht steht auf 70." | iconv -cs -f UTF-8 -t ISO-8859-1 | ./txt2pho -m | mbrola /usr/share/mbrola/de3/de3 - test.wav

The "70" will not be spoken at all. It works when I remove the "." at the end.

GHPS · 2022-10-07T11:55:36Z

Thanks for the information.

The "70" will not be spoken at all. It works when I remove the "." at the end.

That is in principle the same problem as discussed above: txt2pho converts a
stream of text to phonems - but has no concepts for parts of speech or even
complete sentences. In this context the character string "70." has no meaning
since it is no word or a correct German number. It is therefore ignored.

That is why the preprocessor is necessary. It uses a number of heuristics to
decide whether "70." means "siebzigster" oder "siebzig" at the end of the sentence.
It even understands constructs like "70.000".

In short: Use preproc to convert numbers or whole sentences before sending
the stream to txt2pho.

fquirin · 2022-10-07T14:00:57Z

In short: Use preproc to convert numbers or whole sentences before sending
the stream to txt2pho

The preprocessor has unfortunately some weird behavior as well :-/ for example:
echo "Der 70. Geburtstag ist am 01.01.2023" | iconv -cs -f UTF-8 -t ISO-8859-1 | ./preproc -r data/preproc.rls -a data/preproc.abk -> Der siebzigste Geburtstag ist am 01n 01n zwei tausend dreiundzwanzig

I initially removed the preprocessor because I'm doing my own processing first, but it may still be the better option compared to loosing numbers completely 😅

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to fix pronunciation issues #5

How to fix pronunciation issues #5

fquirin commented Dec 19, 2021

GHPS commented Dec 20, 2021 •

edited

Loading

fquirin commented Dec 20, 2021 •

edited

Loading

GHPS commented Dec 20, 2021

fquirin commented Dec 20, 2021

GHPS commented Dec 22, 2021

fquirin commented Dec 23, 2021

fquirin commented Apr 19, 2022

fquirin commented May 12, 2022

GHPS commented May 12, 2022

fquirin commented May 13, 2022

fquirin commented Oct 5, 2022

GHPS commented Oct 7, 2022

fquirin commented Oct 7, 2022

How to fix pronunciation issues #5

How to fix pronunciation issues #5

Comments

fquirin commented Dec 19, 2021

GHPS commented Dec 20, 2021 • edited Loading

fquirin commented Dec 20, 2021 • edited Loading

GHPS commented Dec 20, 2021

fquirin commented Dec 20, 2021

GHPS commented Dec 22, 2021

fquirin commented Dec 23, 2021

fquirin commented Apr 19, 2022

fquirin commented May 12, 2022

GHPS commented May 12, 2022

fquirin commented May 13, 2022

fquirin commented Oct 5, 2022

GHPS commented Oct 7, 2022

fquirin commented Oct 7, 2022

GHPS commented Dec 20, 2021 •

edited

Loading

fquirin commented Dec 20, 2021 •

edited

Loading