Added Thai pron extraction #90

lfashby · 2019-11-08T16:58:47Z

Added tha.py to extract directory. It is almost identical to khm.py, the only difference is in the _IPA_XPATH variable, which oddly enough, can be set to the same value as IPA_XPATH in default.py. The difference between tha.py and default.py is that we don't want to use _yield_phn for tha.py as we want to skip the for pron_xpath in request.html.xpath(config.pron_xpath_selector): step in _yield_phn and just go straight to calling the core yield_pron function.
I suppose I could import IPA_XPATH from default.py into tha.py if that would make more sense?

I'm checking in the Thai data as well, it's our first data collected using the new 'segments' package parsing.

I also changed logging in scrape.py to just log messages from within scrape.py (as well as a few changes I noticed should be made running the big scrape).

…e_summary.py

jacksonllee

Yay for Thai!

If we close #71 after this PR is merged, should we open another ticket for adding the --no-tone flag in the future? (I agree with Kyle that we don't need to work on this feature now.)

I suppose I could import IPA_XPATH from default.py into tha.py if that would make more sense?

+1 for using IPA_XPATH from default.py instead. (We can think about how to refactor things further later.)

jacksonllee · 2019-11-08T17:16:13Z

Sorry I forgot this -- please add an entry to CHANGELOG.md for handling Thai.

lfashby · 2019-11-08T17:23:39Z

+1 for having a general --no-tone ticket.

The changelog entry is basically the same as Khmer's, I couldn't think of anything more creative.

jacksonllee

LGTM! Please use the "squash and merge" option for merging.

Just open #91 for the "no tone" flag.

lfashby added 3 commits November 7, 2019 23:49

Added Thai pron extraction, ran Thai, removed duplicates, ran generat…

a94d2ef

…e_summary.py

Reworked logging in scrape.py

0d0b4bf

Merge branch 'master' into thai

d783f28

lfashby requested a review from jacksonllee November 8, 2019 16:59

Added thai to _SMOKE_TEST_LANGUAGES

abd47bc

jacksonllee reviewed Nov 8, 2019

View reviewed changes

Updated changelog, import IPA_PATH in tha.py from default.py

6d1b29d

jacksonllee mentioned this pull request Nov 8, 2019

Add a --no-tone flag #91

Closed

jacksonllee approved these changes Nov 8, 2019

View reviewed changes

lfashby merged commit 853e9fa into CUNY-CL:master Nov 8, 2019

lfashby deleted the thai branch November 8, 2019 17:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added Thai pron extraction #90

Added Thai pron extraction #90

lfashby commented Nov 8, 2019 •

edited

Loading

jacksonllee left a comment

jacksonllee commented Nov 8, 2019

lfashby commented Nov 8, 2019

jacksonllee left a comment

Added Thai pron extraction #90

Added Thai pron extraction #90

Conversation

lfashby commented Nov 8, 2019 • edited Loading

jacksonllee left a comment

Choose a reason for hiding this comment

jacksonllee commented Nov 8, 2019

lfashby commented Nov 8, 2019

jacksonllee left a comment

Choose a reason for hiding this comment

lfashby commented Nov 8, 2019 •

edited

Loading