Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added Thai pron extraction #90

Merged
merged 5 commits into from
Nov 8, 2019
Merged

Added Thai pron extraction #90

merged 5 commits into from
Nov 8, 2019

Conversation

lfashby
Copy link
Collaborator

@lfashby lfashby commented Nov 8, 2019

Added tha.py to extract directory. It is almost identical to khm.py, the only difference is in the _IPA_XPATH variable, which oddly enough, can be set to the same value as IPA_XPATH in default.py. The difference between tha.py and default.py is that we don't want to use _yield_phn for tha.py as we want to skip the for pron_xpath in request.html.xpath(config.pron_xpath_selector): step in _yield_phn and just go straight to calling the core yield_pron function.
I suppose I could import IPA_XPATH from default.py into tha.py if that would make more sense?

I'm checking in the Thai data as well, it's our first data collected using the new 'segments' package parsing.

I also changed logging in scrape.py to just log messages from within scrape.py (as well as a few changes I noticed should be made running the big scrape).

Copy link
Collaborator

@jacksonllee jacksonllee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yay for Thai!

If we close #71 after this PR is merged, should we open another ticket for adding the --no-tone flag in the future? (I agree with Kyle that we don't need to work on this feature now.)

I suppose I could import IPA_XPATH from default.py into tha.py if that would make more sense?

+1 for using IPA_XPATH from default.py instead. (We can think about how to refactor things further later.)

@jacksonllee
Copy link
Collaborator

Sorry I forgot this -- please add an entry to CHANGELOG.md for handling Thai.

@lfashby
Copy link
Collaborator Author

lfashby commented Nov 8, 2019

+1 for having a general --no-tone ticket.

The changelog entry is basically the same as Khmer's, I couldn't think of anything more creative.

@jacksonllee jacksonllee mentioned this pull request Nov 8, 2019
Copy link
Collaborator

@jacksonllee jacksonllee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Please use the "squash and merge" option for merging.

Just open #91 for the "no tone" flag.

@lfashby lfashby merged commit 853e9fa into CUNY-CL:master Nov 8, 2019
@lfashby lfashby deleted the thai branch November 8, 2019 17:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants