-
Notifications
You must be signed in to change notification settings - Fork 71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Latin dialects #146
Latin dialects #146
Conversation
Note that this is not working as expected: the Classical and Ecclesiastical files are byte-for-byte identical and the former contains clear Ecclesiastical pronunciations (e.g., with affricates). Closes #143, or it will when/if it works.
This not working as expected, once again. The Classical and Ecclesiastical files are bit-for-bit identical and the former contains affricates (which are Ecclesiastical only). But when it does work it'll close #143.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm probably missing something --
- "This is not working" (in the pull request description) -- what is not working?
- How does the whitelist work?
Do you mean that the you scraped Latin after adding these dialects to |
Sorry, this is a terribly non-descript PR! Yes, when I run with a (truncated for speed) |
Also a bunch of unrelated changes have ended up in this PR...I have too many PRs at once. But the relevant one is to The whitelist file is for a separate project we're doing of...well developing whitelists for languages/dialects where there are a lot of non-native pronunciations. We're really just focusing on English for now but I did a Latin one as an example. Once we have a few of them we'll add them to the post-processing procedure and generate "filtered" files as part of the big scrape. |
I think when we added the Latin extraction function (and our other extraction functions that build their own pron selectors and don't rely on |
Okay, so if there's a language customization in |
As far as I can tell, languages that do not interact with I'm not sure what the optimal solution to this might be - but whatever it is will also help me refine the Vietnamese extraction function I put together, which handled dialects for Vietnamese by basically rebuilding the pron and dialect selectors in the extraction function. |
At the very least we should document (or log loudly?) this.
And while it's not super important not sure how useful the Latin data is
without dialect specifications.
I feel like I'm out of my depth with figuring out how that works given how
elaborate the Latin extraction function is.
…On Sat, Apr 18, 2020 at 12:15 PM Lucas Ashby ***@***.***> wrote:
As far as I can tell, languages that do not interact with
extract_word_pron_default (and in particular _yield_phn and its use of
config.pron_xpath_selector ) in extract/default.py do not have any
dialect support - meaning all languages for which we have extraction
functions just ignore dialects.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<https://github.com/kylebgorman/wikipron/pull/146#issuecomment-615897078>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AABG4OMPBNELF7GT66HLEMLRNHG27ANCNFSM4MLK34AA>
.
|
I created an issue to track this. |
Lucas is correct that languages which require a non-default extraction treatment ignore the dialect labels unless they are specifically used in the respective extraction function. We should figure out how the extraction module for Latin can make use of |
This is quite out of date but ongoing work is happening on #143. |
This is not working: the dialect files are identical; but am just sharing in case I have missed something.