Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

espeak-ng #3

Closed
cainesap opened this issue Oct 24, 2017 · 7 comments
Closed

espeak-ng #3

cainesap opened this issue Oct 24, 2017 · 7 comments

Comments

@cainesap
Copy link

Hi,
I heard about espeak-ng and am considering installing it: https://github.com/espeak-ng/espeak-ng
Do you know if phonemizer will work with espeak-ng?
regards, Andrew

@mmmaat
Copy link
Collaborator

mmmaat commented Oct 24, 2017

According to this I hope so...
If not, it may be interesting (and simple) to add a espeak-ng backend to the phonemizer.

But I'm a bit busy those days...

@cainesap
Copy link
Author

With my attempt at espeak-ng installation it doesn't seem to work with phonemizer as it is, but I know you're busy -- so don't worry about it, espeak works fine as it is -- and thanks for your time in answering my questions!

(phonemize) MML5030:espeak-ng apc38$ echo "hello world" | phonemize -l en-us
Traceback (most recent call last):
  File ".../phonemize/bin/phonemize", line 11, in <module>
    load_entry_point('phonemizer==0.3', 'console_scripts', 'phonemize')()
  File "build/bdist.macosx-10.6-intel/egg/phonemizer/main.py", line 131, in main
  File "build/bdist.macosx-10.6-intel/egg/phonemizer/main.py", line 73, in parse_args
  File "build/bdist.macosx-10.6-intel/egg/phonemizer/main.py", line 72, in <genexpr>
UnicodeEncodeError: 'ascii' codec can't encode character u'\u0101' in position 1: ordinal not in range(128)

@mmmaat
Copy link
Collaborator

mmmaat commented Oct 25, 2017

Ok thank you for the idea! If you or someone in your team have knowledge in Python, we welcome a pull request with espeak-ng integration.

@cainesap
Copy link
Author

hi Mathieu,
I can try, but I'm not sure what the problem is, given that the error seems to relate to espeak.supported_languages() which calls espeak --voices, for which the output looks ok (nothing non-ascii like), unless I've misunderstood the error:

Pty Language       Age/Gender VoiceName          File                 Other Languages
 5  af              --/M      Afrikaans          gmw/af               
 5  am              --/M      Amharic            sem/am               
 5  an              --/M      Aragonese          roa/an               
 5  ar              --/M      Arabic             sem/ar               
 5  as              --/M      Assamese           inc/as               
 5  az              --/M      Azerbaijani        trk/az               
 5  bg              --/M      Bulgarian          zls/bg               
 5  bn              --/M      Bengali            inc/bn               
 5  bpy             --/M      Bishnupriya_Manipuri inc/bpy              
 5  bs              --/M      Bosnian            zls/bs               
 5  ca              --/M      Catalan            roa/ca               
 5  cmn             --/M      Chinese_(Mandarin) sit/cmn              (zh-cmn 5)(zh 5)
 5  cs              --/M      Czech              zlw/cs               
 5  cy              --/M      Welsh              cel/cy               
 5  da              --/M      Danish             gmq/da               
 5  de              --/M      German             gmw/de               
 5  el              --/M      Greek              grk/el               
 5  en-029          --/M      English_(Caribbean) gmw/en-029           (en 10)
 2  en-gb           --/M      English_(Great_Britain) gmw/en               (en 2)
 5  en-gb-scotland  --/M      English_(Scotland) gmw/en-GB-scotland   (en 4)
 5  en-gb-x-gbclan  --/M      English_(Lancaster) gmw/en-GB-x-gbclan   (en-gb 3)(en 5)
 5  en-gb-x-gbcwmd  --/M      English_(West_Midlands) gmw/en-GB-x-gbcwmd   (en-gb 9)(en 9)
 5  en-gb-x-rp      --/M      English_(Received_Pronunciation) gmw/en-GB-x-rp       (en-gb 4)(en 5)
 2  en-us           --/M      English_(America)  gmw/en-US            (en 3)
 5  eo              --/M      Esperanto          art/eo               
 5  es              --/M      Spanish_(Spain)    roa/es               
 5  es-419          --/M      Spanish_(Latin_America) roa/es-419           (es-mx 6)(es 6)
 5  et              --/M      Estonian           urj/et               
 5  eu              --/M      Basque             eu                   
 5  fa              --/M      Persian            ira/fa               
 5  fa-Latn         --/M      Persian_(Pinglish) ira/fa-Latn          
 5  fi              --/M      Finnish            urj/fi               
 5  fr-be           --/M      French_(Belgium)   roa/fr-BE            (fr 8)
 5  fr-ch           --/M      French_(Switzerland) roa/fr-CH            (fr 8)
 5  fr-fr           --/M      French_(France)    roa/fr               (fr 5)
 5  ga              --/M      Gaelic_(Irish)     cel/ga               
 5  gd              --/M      Gaelic_(Scottish)  cel/gd               
 5  gn              --/M      Guarani            sai/gn               
 5  grc             --/M      Greek_(Ancient)    grk/grc              
 5  gu              --/M      Gujarati           inc/gu               
 5  hi              --/M      Hindi              inc/hi               
 5  hr              --/M      Croatian           zls/hr               (hbs 5)
 5  hu              --/M      Hungarian          urj/hu               
 5  hy              --/M      Armenian_(East_Armenia) ine/hy               (hy-arevela 5)
 5  hy-arevmda      --/M      Armenian_(West_Armenia) ine/hy-arevmda       (hy 8)
 5  ia              --/M      Interlingua        art/ia               
 5  id              --/M      Indonesian         poz/id               
 5  is              --/M      Icelandic          gmq/is               
 5  it              --/M      Italian            roa/it               
 5  ja              --/M      Japanese           jpx/ja               
 5  jbo             --/M      Lojban             art/jbo              
 5  ka              --/M      Georgian           ccs/ka               
 5  kl              --/M      Greenlandic        esx/kl               
 5  kn              --/M      Kannada            dra/kn               
 5  ko              --/M      Korean             ko                   
 5  kok             --/M      Konkani            inc/kok              
 5  ku              --/M      Kurdish            ira/ku               
 5  ky              --/M      Kyrgyz             trk/ky               
 5  la              --/M      Latin              itc/la               
 5  lfn             --/M      Lingua_Franca_Nova art/lfn              
 5  lt              --/M      Lithuanian         bat/lt               
 5  lv              --/M      Latvian            bat/lv               
 5  mi              --/M      Māori             poz/mi               
 5  mk              --/M      Macedonian         zls/mk               
 5  ml              --/M      Malayalam          dra/ml               
 5  mr              --/M      Marathi            inc/mr               
 5  ms              --/M      Malay              poz/ms               
 5  mt              --/M      Maltese            sem/mt               
 5  my              --/M      Burmese            sit/my               
 5  nb              --/M      Norwegian_Bokmål  gmq/nb               (no 5)
 5  nci             --/M      Nahuatl_(Classical) azc/nci              
 5  ne              --/M      Nepali             inc/ne               
 5  nl              --/M      Dutch              gmw/nl               
 5  om              --/M      Oromo              cus/om               
 5  or              --/M      Oriya              inc/or               
 5  pa              --/M      Punjabi            inc/pa               
 5  pap             --/M      Papiamento         roa/pap              
 5  pl              --/M      Polish             zlw/pl               
 5  pt              --/M      Portuguese_(Portugal) roa/pt               (pt-pt 5)
 5  pt-br           --/M      Portuguese_(Brazil) roa/pt-BR            (pt 6)
 5  ro              --/M      Romanian           roa/ro               
 5  ru              --/M      Russian            zle/ru               
 5  sd              --/M      Sindhi             inc/sd               
 5  si              --/M      Sinhala            inc/si               
 5  sk              --/M      Slovak             zlw/sk               
 5  sl              --/M      Slovenian          zls/sl               
 5  sq              --/M      Albanian           ine/sq               
 5  sr              --/M      Serbian            zls/sr               
 5  sv              --/M      Swedish            gmq/sv               
 5  sw              --/M      Swahili            bnt/sw               
 5  ta              --/M      Tamil              dra/ta               
 5  te              --/M      Telugu             dra/te               
 5  tn              --/M      Setswana           bnt/tn               
 5  tr              --/M      Turkish            trk/tr               
 5  tt              --/M      Tatar              trk/tt               
 5  ur              --/M      Urdu               inc/ur               
 5  vi              --/M      Vietnamese_(Northern) aav/vi               
 5  vi-vn-x-central --/M      Vietnamese_(Central) aav/vi-VN-x-central  
 5  vi-vn-x-south   --/M      Vietnamese_(Southern) aav/vi-VN-x-south    
 5  yue             --/M      Chinese_(Cantonese) sit/yue              (zh-yue 5)(zh 8)

@mmmaat
Copy link
Collaborator

mmmaat commented Oct 26, 2017

Hello Andrew,

I tried espeak-ng and its works out of the box for me with python3.

mathieu@deaftone:~/dev/espeak-ng$ python --version
Python 3.6.2 :: Continuum Analytics, Inc.
mathieu@deaftone:~/dev/espeak-ng$ phonemize --version
phonemizer: 0.3.1
festival: Festival Speech Synthesis System: 2.4:release December 2014
eSpeak NG text-to-speech: 1.49.3-dev  Data at: /usr/share/espeak-ng-data
mathieu@deaftone:~/dev/espeak-ng$ echo 'hello world' | phonemize -l en-us
həloʊ wɜːld 

But when using python2 I reproduce your bug:

mathieu@deaftone:~/dev/phonemizer$ phonemize --version
Traceback (most recent call last):
  File "/home/mathieu/.miniconda3/envs/wordseg-py2/bin/phonemize", line 11, in <module>
    load_entry_point('phonemizer==0.3.1', 'console_scripts', 'phonemize')()
  File "build/bdist.linux-x86_64/egg/phonemizer/main.py", line 145, in main
  File "build/bdist.linux-x86_64/egg/phonemizer/main.py", line 73, in parse_args
  File "build/bdist.linux-x86_64/egg/phonemizer/main.py", line 72, in <genexpr>
UnicodeEncodeError: 'ascii' codec can't encode character u'\u0101' in position 1: ordinal not in range(128)

Python2 has a wired unicode support, whereas it is native in python3... I'll try to find a bugfix for that.

@mmmaat
Copy link
Collaborator

mmmaat commented Oct 26, 2017

Ok this is done, I just pushed a fix! The issue was mi -> Māori.

@mmmaat mmmaat closed this as completed Oct 26, 2017
@cainesap
Copy link
Author

Great, thanks Mathieu!
I'll switch to Python 3, in fact I prefer it and I can't remember why we're using Py2 for this project .. must be historic, but there's no need afaik :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants