Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TextGrids cannot be read if they contain special/IPA characters #52

Open
mfaytak opened this issue Sep 19, 2023 · 1 comment
Open

TextGrids cannot be read if they contain special/IPA characters #52

mfaytak opened this issue Sep 19, 2023 · 1 comment

Comments

@mfaytak
Copy link

mfaytak commented Sep 19, 2023

Expected behaviour
Read in a textgrid (long format) using: tg = pympi.Praat.TextGrid(path_to_textgrid)

Actual behaviour
Throws an AttributeError (included below) and halts if the contents of any interval tier contain non-ASCII characters such as ɪ or ŋ or ɛ. All other TextGrids are imported without issues as expected.

System information

  • python version: 3.x (Jupyter Notebook kernel)
  • os: Mac OS 13.4.1 (Ventura)
  • are you up to date with the latest master?: Yes

Offending notebook cell (which imports any TGs not containing ɛ or ɪ just fine):

for subj in os.listdir(corpus):
    for file in os.listdir(os.path.join(corpus,subj)):
        if not file.endswith(".TextGrid"):
            continue
        print(file)
        tg = pympi.Praat.TextGrid(os.path.join(corpus,subj,file))

Full traceback of the issue I am encountering is included below.

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[40], line 11
      9     continue
     10 print(file)
---> 11 tg = pympi.Praat.TextGrid(os.path.join(corpus,subj,file))
     12 for tier in tg.get_tiers():
     13     print(tier.name)

File ~/miniconda3/envs/cameroon/lib/python3.11/site-packages/pympi/Praat.py:44, in TextGrid.__init__(self, file_path, xmin, xmax, codec)
     42 else:
     43     with open(file_path, 'rb') as f:
---> 44         self.from_file(f, codec)

File ~/miniconda3/envs/cameroon/lib/python3.11/site-packages/pympi/Praat.py:101, in TextGrid.from_file(self, ifile, codec)
     99 # Skip the Headers and empty line
    100 next(ifile), next(ifile), next(ifile)
--> 101 self.xmin = float(nn(ifile, regfloat))
    102 self.xmax = float(nn(ifile, regfloat))
    103 # Skip <exists>

File ~/miniconda3/envs/cameroon/lib/python3.11/site-packages/pympi/Praat.py:94, in TextGrid.from_file.<locals>.nn(ifile, pat)
     92 def nn(ifile, pat):
     93     line = next(ifile).decode(codec)
---> 94     return pat.search(line).group(1)

AttributeError: 'NoneType' object has no attribute 'group'
@mfaytak
Copy link
Author

mfaytak commented Sep 20, 2023

As a small update, this occurs regardless of whether the file's encoding is correctly specified in the codec parameter of pympi.Praat.TextGrid(). The files with IPA characters turn out to be in UTF-16 for some reason, whereas all others are in ASCII. But specifying the correct codec doesn't actually solve the issue, whatever it is.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant