Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do this phonemizer support mixed language? #156

Closed
JohnHerry opened this issue Oct 11, 2023 · 4 comments
Closed

Do this phonemizer support mixed language? #156

JohnHerry opened this issue Oct 11, 2023 · 4 comments

Comments

@JohnHerry
Copy link

Is your feature request related to a problem? Please describe.
Is this phonemizer support language-mixed input? eg. "我想买一部iphone。"

Describe the solution you'd like
the desired output of IPA phonemes of this sentence, and make promission that thers is no syllable conflict.

Describe alternatives you've considered

Additional context
We also would like that there is a map between each of input characters and its IPAs.
eg: {"我": [IPA list of 我], "iphone": [IPA list of iphone]}

@mmmaat
Copy link
Collaborator

mmmaat commented Oct 11, 2023

Hi, phonemizer (with the espeak backend) can detect language switches mostly to English. But this is quite limited as you cannot specify which are languages, or which part of the text is in which language. See https://bootphon.github.io/phonemizer/api_reference.html, language_switch option.

$ echo '我想买一部iphone。' | phonemize -l cmn -b espeak -w '; '
[WARNING] 1 utterances containing language switches on lines 1
[WARNING] extra phones may appear in the "cmn" phoneset
[WARNING] language switch flags have been kept (applying "keep-flags" policy)
[WARNING] words count mismatch on 100.0% of the lines (1/1)
wo2; ɕiɑ2ŋ; mai2; ji5; pu5; (en)aɪfəʊn(zh);

For the mapping word -> IPA, this is not implemented but already a feature request, see #96.

@mmmaat mmmaat closed this as completed Oct 11, 2023
@JohnHerry
Copy link
Author

Hi, phonemizer (with the espeak backend) can detect language switches mostly to English. But this is quite limited as you cannot specify which are languages, or which part of the text is in which language. See https://bootphon.github.io/phonemizer/api_reference.html, language_switch option.

$ echo '我想买一部iphone。' | phonemize -l cmn -b espeak -w '; '
[WARNING] 1 utterances containing language switches on lines 1
[WARNING] extra phones may appear in the "cmn" phoneset
[WARNING] language switch flags have been kept (applying "keep-flags" policy)
[WARNING] words count mismatch on 100.0% of the lines (1/1)
wo2; ɕiɑ2ŋ; mai2; ji5; pu5; (en)aɪfəʊn(zh);

For the mapping word -> IPA, this is not implemented but already a feature request, see #96.

Thanks for the help. by the way, In the output IPAs of the example, I guess it may contains the Tone symbols. but it looks strange. the output of the two character 一部( ji5; pu5;) have the same tone "5;", but as a Mandarin native, I think they should be not. Is there any bug in the relative module?

And I have another question, Is there any way to got the full alphabeta of IPAs? we would like an IPA alphabeta desigin that support multi-lingual expression.

The third quesion, How did the phonemizer process the polyphone problem? There are a lot of multi-PinYin characters in Mandarin characters. the truly PinYin is desided by the text context where the character is in. eg: character "着" in the context "走", its PinYin is "zhe", but when in "火", its PinYin is "zhao", I thinks they should also be different with IPA transcription, How did the phonemeizer process this problem? with a LM based prediction?

@mmmaat
Copy link
Collaborator

mmmaat commented Oct 12, 2023

Your questions are all related to the espeak-ng backend, not phonemizer itself, which is a "simple" wrapper. Please go there to look for answers. For example https://github.com/espeak-ng/espeak-ng/issues?q=mandarin and https://github.com/espeak-ng/espeak-ng/blob/master/dictsource/cmn_list.
Best.

@JohnHerry
Copy link
Author

Thank you very much

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants