Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Double punctiation break phonemization #54

Closed
cfrancesco opened this issue Sep 23, 2020 · 5 comments
Closed

Double punctiation break phonemization #54

cfrancesco opened this issue Sep 23, 2020 · 5 comments

Comments

@cfrancesco
Copy link

I do not have an extensive list, but many double punctuation patterns break the phonemization. One example being !'
Phonemizer from pip version 2.2

~/anaconda3/envs/ttsTF/lib/python3.6/site-packages/phonemizer/phonemize.py in phonemize(text, language, backend, separator, strip, preserve_punctuation, punctuation_marks, with_stress, language_switch, njobs, logger)
    172     # phonemize the input text
    173     return phonemizer.phonemize(
--> 174         text, separator=separator, strip=strip, njobs=njobs)

~/anaconda3/envs/ttsTF/lib/python3.6/site-packages/phonemizer/backend/espeak.py in phonemize(self, text, separator, strip, njobs)
    233         # finally restore the punctuation
    234         return self._phonemize_postprocess(
--> 235             text, text_type, punctuation_marks)
    236 
    237     def _command(self, fname):

~/anaconda3/envs/ttsTF/lib/python3.6/site-packages/phonemizer/backend/base.py in _phonemize_postprocess(self, text, text_type, punctuation_marks)
    138         # restore the punctuation is asked for
    139         if self.preserve_punctuation:
--> 140             text = self._punctuator.restore(text, punctuation_marks)
    141 
    142         # output the result formatted as a string or a list of strings

~/anaconda3/envs/ttsTF/lib/python3.6/site-packages/phonemizer/punctuation.py in restore(cls, text, marks)
    147 
    148         """
--> 149         return cls._restore_aux(str2list(text), marks, 0)
    150 
    151     @classmethod

~/anaconda3/envs/ttsTF/lib/python3.6/site-packages/phonemizer/punctuation.py in _restore_aux(cls, text, marks, num)
    162             if current.position == 'E':
    163                 return [text[0] + current.mark] + cls._restore_aux(
--> 164                     text[1:], marks[1:], num + 1)
    165             if current.position == 'A':
    166                 return [current.mark] + cls._restore_aux(

~/anaconda3/envs/ttsTF/lib/python3.6/site-packages/phonemizer/punctuation.py in _restore_aux(cls, text, marks, num)
    175                 restored = cls._restore_aux(
    176                     [text[0] + current.mark + text[1]] + text[2:],
--> 177                     marks[1:], num)
    178             return restored
    179         else:

~/anaconda3/envs/ttsTF/lib/python3.6/site-packages/phonemizer/punctuation.py in _restore_aux(cls, text, marks, num)
    178             return restored
    179         else:
--> 180             return [text[0]] + cls._restore_aux(text[1:], marks, num + 1)

~/anaconda3/envs/ttsTF/lib/python3.6/site-packages/phonemizer/punctuation.py in _restore_aux(cls, text, marks, num)
    162             if current.position == 'E':
    163                 return [text[0] + current.mark] + cls._restore_aux(
--> 164                     text[1:], marks[1:], num + 1)
    165             if current.position == 'A':
    166                 return [current.mark] + cls._restore_aux(

~/anaconda3/envs/ttsTF/lib/python3.6/site-packages/phonemizer/punctuation.py in _restore_aux(cls, text, marks, num)
    162             if current.position == 'E':
    163                 return [text[0] + current.mark] + cls._restore_aux(
--> 164                     text[1:], marks[1:], num + 1)
    165             if current.position == 'A':
    166                 return [current.mark] + cls._restore_aux(

~/anaconda3/envs/ttsTF/lib/python3.6/site-packages/phonemizer/punctuation.py in _restore_aux(cls, text, marks, num)
    161                     [current.mark + text[0]] + text[1:], marks[1:], num)
    162             if current.position == 'E':
--> 163                 return [text[0] + current.mark] + cls._restore_aux(
    164                     text[1:], marks[1:], num + 1)
    165             if current.position == 'A':

IndexError: list index out of range
@mmmaat
Copy link
Collaborator

mmmaat commented Sep 23, 2020

Hi, can I have a complete example of a failing command please, with input text and options?

@mmmaat
Copy link
Collaborator

mmmaat commented Sep 23, 2020

Ok I understood the bug, it occurs when trying to restore punctuation on an empty text. I'll publish a fix soon. Thanks for reporting.

@mmmaat
Copy link
Collaborator

mmmaat commented Sep 23, 2020

Fixed in ee591ed.

@michael-conrad
Copy link

Don't know if this is related or not, but:

000004280: Hélas! . ni l'un ni l'autre ne ressemblait au sien.
Traceback (most recent call last):
  File "/home/muksihs/git/Cherokee-TTS/data/comvoi_ipa/generateTrainingData.py", line 59, in <module>
    use_sampa=False)
  File "/home/muksihs/miniconda3/envs/Cherokee-TTS/lib/python3.7/site-packages/phonemizer/phonemize.py", line 172, in phonemize
    text, separator=separator, strip=strip, njobs=njobs)
  File "/home/muksihs/miniconda3/envs/Cherokee-TTS/lib/python3.7/site-packages/phonemizer/backend/base.py", line 126, in phonemize
    text = self._punctuator.restore(text, punctuation_marks)
  File "/home/muksihs/miniconda3/envs/Cherokee-TTS/lib/python3.7/site-packages/phonemizer/punctuation.py", line 146, in restore
    return cls._restore_aux(str2list(text), marks, 0)
  File "/home/muksihs/miniconda3/envs/Cherokee-TTS/lib/python3.7/site-packages/phonemizer/punctuation.py", line 166, in _restore_aux
    [text[0] + m.mark + text[1]] + text[2:], marks[1:], n)
  File "/home/muksihs/miniconda3/envs/Cherokee-TTS/lib/python3.7/site-packages/phonemizer/punctuation.py", line 166, in _restore_aux
    [text[0] + m.mark + text[1]] + text[2:], marks[1:], n)
IndexError: list index out of range
pip show phonemizer
Name: phonemizer
Version: 2.1
Summary: Simple text to phones converter for multiple languages
Home-page: https://github.com/bootphon/phonemizer
Author: Mathieu Bernard
Author-email: mathieu.a.bernard@inria.fr
License: GPL3
Location: /home/muksihs/miniconda3/envs/Cherokee-TTS/lib/python3.7/site-packages
Requires: segments, attrs, joblib
Required-by: 

@mmmaat
Copy link
Collaborator

mmmaat commented Mar 29, 2021

Hi, indeed you should upgrade your phonemizer version:

>>> from phonemizer import phonemize
>>> utt = "Hélas! . ni l'un ni l'autre ne ressemblait au sien." 
>>> phonemize(utt, backend='espeak', language='fr-fr', preserve_punctuation=True)
'elas ! . ni lœ̃ ni lotʁ nə ʁəsɑ̃blɛt o sjɛ̃ .'

I got the version

$ phonemize --version                                                                             
phonemizer-2.2.2
available backends: espeak-ng-1.50, espeak-mbrola, festival-2.5.0, segments-2.1.3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants