-
Notifications
You must be signed in to change notification settings - Fork 71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added Thai pron extraction #90
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yay for Thai!
If we close #71 after this PR is merged, should we open another ticket for adding the --no-tone
flag in the future? (I agree with Kyle that we don't need to work on this feature now.)
I suppose I could import IPA_XPATH from default.py into tha.py if that would make more sense?
+1 for using IPA_XPATH
from default.py
instead. (We can think about how to refactor things further later.)
Sorry I forgot this -- please add an entry to CHANGELOG.md for handling Thai. |
+1 for having a general The changelog entry is basically the same as Khmer's, I couldn't think of anything more creative. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Please use the "squash and merge" option for merging.
Just open #91 for the "no tone" flag.
Added
tha.py
to extract directory. It is almost identical tokhm.py
, the only difference is in the_IPA_XPATH
variable, which oddly enough, can be set to the same value asIPA_XPATH
indefault.py
. The difference betweentha.py
anddefault.py
is that we don't want to use_yield_phn
fortha.py
as we want to skip thefor pron_xpath in request.html.xpath(config.pron_xpath_selector):
step in_yield_phn
and just go straight to calling the coreyield_pron
function.I suppose I could import
IPA_XPATH
fromdefault.py
intotha.py
if that would make more sense?I'm checking in the Thai data as well, it's our first data collected using the new 'segments' package parsing.
I also changed logging in scrape.py to just log messages from within scrape.py (as well as a few changes I noticed should be made running the big scrape).