Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dialect-specific allophones #9

Open
agonzalezd opened this issue Jan 4, 2023 · 4 comments
Open

dialect-specific allophones #9

agonzalezd opened this issue Jan 4, 2023 · 4 comments
Labels
enhancement New feature or request

Comments

@agonzalezd
Copy link

Hello again.

I found no allophones are being loaded, although marked as default in __init__ of PHOIBLE (allophone_column='Allophones').

I checked it in all phonemes:

>>> from phones import PhoneCollection
>>> pc = PhoneCollection()
>>> {tuple(sorted(p.allophones)) for p in pc.langs(pc.lang_list).values}
{()}
>>> 

I might be missing something, though...

I am using version 0.0.4

I am also having this warning, btw:

python3.8/site-packages/phones/__init__.py:219: FutureWarning: The default value of numeric_only 
in DataFrameGroupBy.mean is deprecated. In a future version, numeric_only will default to False. 
Either specify numeric_only or select only columns which should be valid for the function.
  self.data.groupby(

Thanks in advance!

@MiniXC
Copy link
Owner

MiniXC commented Jan 4, 2023

Hi, thanks for pointing this out! As far as I can remember, a quirk with PHOIBLE is that one language can have the same phone several times with different allophones, which messes with other functionality in this library.

For example, for "eng", the output with allophones is this

[a (eng), a (eng), a (eng), aɪ (eng), aɪ (eng), aɪ (eng), aɪ (eng), aɪ (eng), aʊ (eng), aʊ (eng), aʊ (eng), aʊ (eng), aʊ (eng), aː (eng), b (eng), b (eng), b (eng), b (eng), b (eng), b (eng), b (eng), d (eng), d (eng), d (eng), d (eng), d (eng), d (eng), d (eng), d̠ʒ (eng), d̠ʒ (eng), d̠ʒ (eng), d̠ʒ (eng), d̠ʒ (eng), d̠ʒ (eng), d̠ʒ (eng), e (eng), e (eng), ei (eng), eə (eng), eə (eng), eɪ (eng), eɪ (eng), eɪ (eng), eɪ̯ (eng), eː (eng), eː (eng), eː (eng), f (eng), f (eng), f (eng), f (eng), f (eng), f (eng), f (eng), h (eng), h (eng), h (eng), h (eng), h (eng), h (eng), h (eng), i (eng), i (eng), iə (eng), iɛ (eng), iɪ (eng), iː (eng), iː (eng), iː (eng), iː (eng), iː (eng), j (eng), j (eng), j (eng), j (eng), j (eng), j (eng), j (eng), k (eng), k (eng), kx (eng), kʰ (eng), kʰ (eng), kʰ (eng), kʰ (eng), l (eng), l (eng), l (eng), l (eng), l (eng), l (eng), l (eng), m (eng), m (eng), m (eng), m (eng), m (eng), m (eng), m (eng), n (eng), n (eng), n (eng), n (eng), n (eng), n (eng), n (eng), oe (eng), oʊ (eng), oʊ (eng), oː (eng), oː (eng), p (eng), pʰ (eng), pʰ (eng), pʰ (eng), pʰ (eng), pʰ (eng), pʰ (eng), r (eng), s (eng), s (eng), s (eng), s (eng), s (eng), s (eng), s (eng), t (eng), ts (eng), tʰ (eng), tʰ (eng), tʰ (eng), tʰ (eng), tʰ (eng), t̠ʃ (eng), t̠ʃ (eng), t̠ʃ (eng), t̠ʃ (eng), t̠ʃ (eng), t̠ʃ (eng), t̠ʃ (eng), u (eng), uː (eng), uː (eng), uː (eng), v (eng), v (eng), v (eng), v (eng), v (eng), v (eng), v (eng), w (eng), w (eng), w (eng), w (eng), w (eng), w (eng), w (eng), z (eng), z (eng), z (eng), z (eng), z (eng), z (eng), z (eng), æ (eng), æ (eng), æ (eng), æe (eng), æo (eng), ð (eng), ð (eng), ð (eng), ð (eng), ð (eng), ð (eng), ð (eng), øː (eng), ŋ (eng), ŋ (eng), ŋ (eng), ŋ (eng), ŋ (eng), ŋ (eng), ŋ (eng), ɐ (eng), ɐʉ (eng), ɐː (eng), ɑ (eng), ɑ (eng), ɑe (eng), ɑː (eng), ɑː (eng), ɑː (eng), ɒ (eng), ɒ (eng), ɒ (eng), ɒ (eng), ɒ (eng), ɒɯ (eng), ɒː (eng), ɔ (eng), ɔɪ (eng), ɔɪ (eng), ɔɪ (eng), ɔɪ (eng), ɔɪ (eng), ɔː (eng), ɔː (eng), ɔː (eng), ɘ (eng), ə (eng), ə (eng), ə (eng), ə (eng), ə (eng), əʊ (eng), ɚ (eng), ɚː (eng), ɛ (eng), ɛ (eng), ɛ (eng), ɛ (eng), ɛ (eng), ɛ (eng), ɛʉ (eng), ɛʉ (eng), ɛː (eng), ɜː (eng), ɡ (eng), ɡ (eng), ɡ (eng), ɡ (eng), ɡ (eng), ɡ (eng), ɡ (eng), ɪ (eng), ɪ (eng), ɪ (eng), ɪ (eng), ɪ (eng), ɪ (eng), ɪə (eng), ɵː (eng), ɹ (eng), ɹ (eng), ɹ (eng), ɹ (eng), ɹ (eng), ɹ (eng), ʃ (eng), ʃ (eng), ʃ (eng), ʃ (eng), ʃ (eng), ʃ (eng), ʃ (eng), ʉə (eng), ʉː (eng), ʉː (eng), ʉː (eng), ʊ (eng), ʊ (eng), ʊ (eng), ʊ (eng), ʊ (eng), ʊ (eng), ʊ (eng), ʊə (eng), ʌ (eng), ʌ (eng), ʌ (eng), ʒ (eng), ʒ (eng), ʒ (eng), ʒ (eng), ʒ (eng), ʒ (eng), ʒ (eng), θ (eng), θ (eng), θ (eng), θ (eng), θ (eng), θ (eng), θ (eng)]

While it might be a bit messy, do you think .values_with_allophones would be a good solution for this?

Otherwise I might turn .values into a function with a flag allophones which defaults to true (but can be turned off for internal library functionality.

I'm leaning towards the first, since most users probably don't expect the same phone to appear several times for a given language.

Merging all the allophone lists would be another possibility, but this obfuscates the original data and could lead to unexpected results - in truth the different lists are just compiled by different linguists (I think).

@MiniXC
Copy link
Owner

MiniXC commented Jan 4, 2023

Also thanks for pointing out the warning, this will be fixed for 0.0.5 (see #11)

@agonzalezd
Copy link
Author

agonzalezd commented Jan 4, 2023

Hm, I see. It is probably due to merging multiple dialects into a language, since each dialect could have different allophones for the same given phoneme... Another solution could be merging all the allophones into a single phone. These could be separated if the load_dialects flag is True. Do you see it more feasible?

MiniXC added a commit that referenced this issue Jan 4, 2023
@MiniXC
Copy link
Owner

MiniXC commented Jan 4, 2023

For now I pushed a hotfix with values_with_allophones - I will not have enough time to look more into this in the next few weeks. Once I have more time though I'm happy to look into it more. If you want to tackle it, you're welcome to open a pull request.

@MiniXC MiniXC added the enhancement New feature or request label Jan 4, 2023
@MiniXC MiniXC changed the title Allophones are not loaded dialect-specific allophones Jan 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants