Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduced eng_us_phonemic.phones to eng branch #336

Merged
merged 110 commits into from
Jan 26, 2021
Merged

Reduced eng_us_phonemic.phones to eng branch #336

merged 110 commits into from
Jan 26, 2021

Conversation

cgibson6279
Copy link
Collaborator

Reduced phonemic inventory in eng_us_phonemic.phones

lfashby and others added 30 commits June 17, 2020 16:05
* minor change to latin extraction function, rescraped Latin

* potential fix to lat scraping issue

* raw scrape of latin

* postprocessing of new latin data

* updated changelog, fixed line length error

* rescrape of latin

* postprocessing of updated latin data
* [pox] Scraped Polabian.

Note: The ISO 639-3 code is `pox`, the older ISO 639-2 code is `sla`.

* Updated CHANGELOG.
* [mnc] Scraped Manchu.

* Updated CHANGELOG.

Co-authored-by: Kyle Gorman <kylebgorman@gmail.com>
#184)

* Merged Whitelist functionality with src/scrape.py. Now checks for presence of whitelist and writes separate tsv as {original file name}_filtered.tsv. Update generate_summary to reflect if file is filtered through a whitelist. CHANGELOG and README update accordingly.

* Style tweaks and cleanup.

* Updated generalized_split and postprocess to reflect automatic whitelist processing in scrape. Fixed dialect issue in generate_summary.

* Previous edits didn't cary.

* Cleanup typo mistakes. Added error handling to scrape.py.

* Style clean-up.

* Fixed style issues.

Co-authored-by: Kyle Gorman <kylebgorman@gmail.com>
* [arc] Listing the correct scripts for Imperial Aramaic:

1.  The original Aramaic script (`armi`).
2.  The square script as in Biblical Aramaic (`hebr`).
3.  Classical Syriac/Assyrian Neo-Aramaic (`syrc`) descended from (1).

This correctly assigns the entries to their respective lexicons. Most
of pronunciations are available for (2), with very minor number of
entries for (1) and (3).

* [arc] Listing the correct scripts for Imperial Aramaic (continuing the
previous commit which was partial):

1.  The original Aramaic script (`armi`).
2.  The square script as in Biblical Aramaic (`hebr`).
3.  Classical Syriac/Assyrian Neo-Aramaic (`syrc`) descended from (1).

This correctly assigns the entries to their respective lexicons. Most
of pronunciations are available for (2), with very minor number of
entries for (1) and (3).

* Updated CHANGELOG with #186.
* tentative solution for tone removal

* updates changelog, ran white on test_config.py

* remove print statement from test_config.py

* partial replace of codepoints with chars, adds nfd/nfc conversion

* reworks import statements

* updates _TONES_REGEX

* ran white on config.py

* updates to conversions and adds comments

* fixes to scrape.py comment length

* converted test_config.py no_tone tests to nfd strings

* modifies no_tone process not to skip removing superscript parentheses around non-tone superscript chars
* [geo] Rescrape post-bot.

Closes #138.

* Add changelog

* Rename.

* Update CHANGELOG

* Revert "[geo] Rescrape post-bot."

This reverts commit 4a151b1.
* Flattens directory structure for data.

The non-wiki data is moved to the new `wikipron-extras` (https://github.com/kylebgorman/wikipron-extras) repository.

Closes #193.

* Add PR number to changelog.

* "Imperial"
* [geo] Rescrape post-bot.

Closes #138.

* Add changelog

* Update changelog

* [geo] Add whitelist and re-scrape.

* Renames for merge.

* Add link to guidelines
* Enforces consistent style in logging using %r.

* Updates CHANELOG

* Fixes a double-quoted logging var.
* [rum] Add whitelist and rescrape.

* [eng] Adds English rescrape.

* [dut] Adds Dutch rescrape.

* [gre] Adds Greek rescrape.

* [gre] Adds Greek rescrape.

* Updates scrape path for phonetic filtering.

Closes #195.

* [rum] Adds Romanian rescrape.

* [arm] Adds Armenian rescrape.

* [gre] Adds Greek rescrape (second try).

* [arm] Adds Armenian dialects + rescrapes.

Closes #197.

* Adds CHANGELOG changes.

* [spa] Adds Spanish rescrape.

* Postprocess and regenerate summaries.
* adds tuvan to languagecodes.py

* updates changelog
* [aar, bdq, jje, lsi] discovers new languages and scrapes them.

* Fall scrape.
Fills out the bibliography entry for the WikiPron paper.
* updated languages.json and json files for translating between wikitionary code and iso code

* updates codes.py and languagecodes.py

* modifies test_languagecodes.py to reduce redundancy with codes.py

* small formatting fixes

* updates changelog

* logging statement formatting
Fixes formatting issue in table. Not sure why this had to be done manually...
* Uses %r everywhere in `data/src`.

* [nep] Adds Nepali data.

Closes #209.

* Update changelog
* Added French phonemic phones list. Added filter French phonemic tsv.

* Added French phonemic phones.

* Updated Changelog.

* Added phones
* [izh] Scrape and add Ingrian.

* Updated CHANGELOG.
* [ban] Splitting Balinese into Latin and Balinese scripts.

* Updated CHANGELOG.

Co-authored-by: Kyle Gorman <kylebgorman@gmail.com>
* [kir] Split Kyrgyz into Cyrillic and Arabic scripts.

* Updated.
* Added French phonemic phones list. Added filter French phonemic tsv.

* Added French phonemic phones.

* Updated Changelog.

* Added phones

* Added filtered phonemic wordlist
* [khb] Adding customized extractor for Lü.

* [khb] Re-scraping and updating the data and summaries.

* Updated CHANGELOG.

* Reordered imports.

* [khb] Adding scrape smoke test.

* Resorted.
kylebgorman and others added 25 commits December 31, 2020 11:17
* Added French phonemic phones list. Added filter French phonemic tsv.

* Added French phonemic phones.

* Updated Changelog.

* Added phones

* Added filtered phonemic wordlist

* Added Serbo-Croatian phonemes and filtered TSV files.

* Updated summaries for Serbo-Croatian phones.

* Updated CHANGELOG.

* Fixed formatting of Serbo-Croat phones file and CHANGELOG.

* Updated fork to match upstream.

* Updated fork to match upstream

* Delete .DS_Store

I don't know where this file came from...

* Delete .DS_Store

* Delete hbs_phonemic_phones.txt

* Delete .DS_Store

* [ita] Adds phoneme list, filtered phonemic TSV file

* Updates CHANGELOG

* Adds updated README and language summary

* Updates CHANGELOG with issue number for Italian phone list

* Adds Adyghe phones, filtered Adyghe data

* Updated CHANGELOG

* Adds Bulgarian phone list, filtered Bulgarian data

* Postprocesses with filtered Bulgarian data

* Updates changelog

* Adds Icelandic phones, filtered TSV file

* Updates changelog

* Adds Slovenian phones, filtered Slovenian data

* Updates changelog

* Add normalization to list_phones.py

* Updates changelog

* Reformats list_phones.py

* Adds Welsh phoneme lists, filtered Welsh TSV data

* Updates changelog

* Updates  with instructions to re-scrape

* Updates changelog

* Updates

* Updates data/phones/README.md

* Adds Vietnamese phones, Vietnamese TSV files

* Updates changelog

* Adds Hindi  file, new/updated TSV files

* Updates changelog

* Fixes Serbo-Croatian phones

* Updates CHANGELOG

* Revert "Adds Hindi  file, new/updated TSV files"

This reverts commit 964c3be.

* Adds Portuguese .phones files, re-scraped TSV data

* Rescrapes Portuguese data

* Updates changelog

Co-authored-by: Kyle Gorman <kylebgorman@gmail.com>
* Added French phonemic phones list. Added filter French phonemic tsv.

* Added French phonemic phones.

* Updated Changelog.

* Added phones

* Added filtered phonemic wordlist

* Added Serbo-Croatian phonemes and filtered TSV files.

* Updated summaries for Serbo-Croatian phones.

* Updated CHANGELOG.

* Fixed formatting of Serbo-Croat phones file and CHANGELOG.

* Updated fork to match upstream.

* Updated fork to match upstream

* Delete .DS_Store

I don't know where this file came from...

* Delete .DS_Store

* Delete hbs_phonemic_phones.txt

* Delete .DS_Store

* [ita] Adds phoneme list, filtered phonemic TSV file

* Updates CHANGELOG

* Adds updated README and language summary

* Updates CHANGELOG with issue number for Italian phone list

* Adds Adyghe phones, filtered Adyghe data

* Updated CHANGELOG

* Adds Bulgarian phone list, filtered Bulgarian data

* Postprocesses with filtered Bulgarian data

* Updates changelog

* Adds Icelandic phones, filtered TSV file

* Updates changelog

* Adds Slovenian phones, filtered Slovenian data

* Updates changelog

* Add normalization to list_phones.py

* Updates changelog

* Reformats list_phones.py

* Adds Welsh phoneme lists, filtered Welsh TSV data

* Updates changelog

* Updates  with instructions to re-scrape

* Updates changelog

* Updates

* Updates data/phones/README.md

* Adds Vietnamese phones, Vietnamese TSV files

* Updates changelog

* Adds Hindi  file, new/updated TSV files

* Updates changelog

* Fixes Serbo-Croatian phones

* Updates CHANGELOG

* Revert "Adds Hindi  file, new/updated TSV files"

This reverts commit 964c3be.

* Adds Portuguese .phones files, re-scraped TSV data

* Rescrapes Portuguese data

* Updates changelog

* Adds Burmese phones, updated Burmese data

* Updates changelog

Co-authored-by: Kyle Gorman <kylebgorman@gmail.com>
…295)

* Support some Moksha pronunciations that reside under "p", rather than
"li".

* Scrape.

* Attempt to fix the test.

* Updated.

* Split the PR into two items.
* Added French phonemic phones list. Added filter French phonemic tsv.

* Added French phonemic phones.

* Updated Changelog.

* Added phones

* Added filtered phonemic wordlist

* Added Serbo-Croatian phonemes and filtered TSV files.

* Updated summaries for Serbo-Croatian phones.

* Updated CHANGELOG.

* Fixed formatting of Serbo-Croat phones file and CHANGELOG.

* Updated fork to match upstream.

* Updated fork to match upstream

* Delete .DS_Store

I don't know where this file came from...

* Delete .DS_Store

* Delete hbs_phonemic_phones.txt

* Delete .DS_Store

* [ita] Adds phoneme list, filtered phonemic TSV file

* Updates CHANGELOG

* Adds updated README and language summary

* Updates CHANGELOG with issue number for Italian phone list

* Adds Adyghe phones, filtered Adyghe data

* Updated CHANGELOG

* Adds Bulgarian phone list, filtered Bulgarian data

* Postprocesses with filtered Bulgarian data

* Updates changelog

* Adds Icelandic phones, filtered TSV file

* Updates changelog

* Adds Slovenian phones, filtered Slovenian data

* Updates changelog

* Add normalization to list_phones.py

* Updates changelog

* Reformats list_phones.py

* Adds Welsh phoneme lists, filtered Welsh TSV data

* Updates changelog

* Updates  with instructions to re-scrape

* Updates changelog

* Updates

* Updates data/phones/README.md

* Adds Vietnamese phones, Vietnamese TSV files

* Updates changelog

* Adds Hindi  file, new/updated TSV files

* Updates changelog

* Fixes Serbo-Croatian phones

* Updates CHANGELOG

* Revert "Adds Hindi  file, new/updated TSV files"

This reverts commit 964c3be.

* Adds Portuguese .phones files, re-scraped TSV data

* Rescrapes Portuguese data

* Updates changelog

* Adds Burmese phones, updated Burmese data

* Updates changelog

* Adds Japanese phone list. Rescrapes Japanese data

* Updates changelog

* Removes data/tsv/jpn_hira_phonemic.tsv

Co-authored-by: Kyle Gorman <kylebgorman@gmail.com>
* updates segments version and adds test for vietnamese tones

* updates changelog
* Create German Phonelist

* Updated CHANGELOG.md

* incorporate updates in README.md, and added missing ger_phone* files
* Added French phonemic phones list. Added filter French phonemic tsv.

* Added French phonemic phones.

* Updated Changelog.

* Added phones

* Added filtered phonemic wordlist

* Added Serbo-Croatian phonemes and filtered TSV files.

* Updated summaries for Serbo-Croatian phones.

* Updated CHANGELOG.

* Fixed formatting of Serbo-Croat phones file and CHANGELOG.

* Updated fork to match upstream.

* Updated fork to match upstream

* Delete .DS_Store

I don't know where this file came from...

* Delete .DS_Store

* Delete hbs_phonemic_phones.txt

* Delete .DS_Store

* [ita] Adds phoneme list, filtered phonemic TSV file

* Updates CHANGELOG

* Adds updated README and language summary

* Updates CHANGELOG with issue number for Italian phone list

* Adds Adyghe phones, filtered Adyghe data

* Updated CHANGELOG

* Adds Bulgarian phone list, filtered Bulgarian data

* Postprocesses with filtered Bulgarian data

* Updates changelog

* Adds Icelandic phones, filtered TSV file

* Updates changelog

* Adds Slovenian phones, filtered Slovenian data

* Updates changelog

* Add normalization to list_phones.py

* Updates changelog

* Reformats list_phones.py

* Adds Welsh phoneme lists, filtered Welsh TSV data

* Updates changelog

* Updates  with instructions to re-scrape

* Updates changelog

* Updates

* Updates data/phones/README.md

* Adds Vietnamese phones, Vietnamese TSV files

* Updates changelog

* Adds Hindi  file, new/updated TSV files

* Updates changelog

* Fixes Serbo-Croatian phones

* Updates CHANGELOG

* Revert "Adds Hindi  file, new/updated TSV files"

This reverts commit 964c3be.

* Adds Portuguese .phones files, re-scraped TSV data

* Rescrapes Portuguese data

* Updates changelog

* Adds Burmese phones, updated Burmese data

* Updates changelog

* Adds Japanese phone list. Rescrapes Japanese data

* Updates changelog

* Removes data/tsv/jpn_hira_phonemic.tsv

* Adds Azerbaijani phones, updated TSV data

* Updates changelog

Co-authored-by: Kyle Gorman <kylebgorman@gmail.com>
* Added French phonemic phones list. Added filter French phonemic tsv.

* Added French phonemic phones.

* Updated Changelog.

* Added phones

* Added filtered phonemic wordlist

* Added Serbo-Croatian phonemes and filtered TSV files.

* Updated summaries for Serbo-Croatian phones.

* Updated CHANGELOG.

* Fixed formatting of Serbo-Croat phones file and CHANGELOG.

* Updated fork to match upstream.

* Updated fork to match upstream

* Delete .DS_Store

I don't know where this file came from...

* Delete .DS_Store

* Delete hbs_phonemic_phones.txt

* Delete .DS_Store

* [ita] Adds phoneme list, filtered phonemic TSV file

* Updates CHANGELOG

* Adds updated README and language summary

* Updates CHANGELOG with issue number for Italian phone list

* Adds Adyghe phones, filtered Adyghe data

* Updated CHANGELOG

* Adds Bulgarian phone list, filtered Bulgarian data

* Postprocesses with filtered Bulgarian data

* Updates changelog

* Adds Icelandic phones, filtered TSV file

* Updates changelog

* Adds Slovenian phones, filtered Slovenian data

* Updates changelog

* Add normalization to list_phones.py

* Updates changelog

* Reformats list_phones.py

* Adds Welsh phoneme lists, filtered Welsh TSV data

* Updates changelog

* Updates  with instructions to re-scrape

* Updates changelog

* Updates

* Updates data/phones/README.md

* Adds Vietnamese phones, Vietnamese TSV files

* Updates changelog

* Adds Hindi  file, new/updated TSV files

* Updates changelog

* Fixes Serbo-Croatian phones

* Updates CHANGELOG

* Revert "Adds Hindi  file, new/updated TSV files"

This reverts commit 964c3be.

* Adds Portuguese .phones files, re-scraped TSV data

* Rescrapes Portuguese data

* Updates changelog

* Adds Burmese phones, updated Burmese data

* Updates changelog

* Adds Japanese phone list. Rescrapes Japanese data

* Updates changelog

* Removes data/tsv/jpn_hira_phonemic.tsv

* Adds Azerbaijani phones, updated TSV data

* Updates changelog

* Adds Turkish phones, rescraped Turkish data

* Updates changelog

Co-authored-by: Kyle Gorman <kylebgorman@gmail.com>
* adds afr phone list and rescrapes

* Updated CHANGELOG.md
* Added French phonemic phones list. Added filter French phonemic tsv.

* Added French phonemic phones.

* Updated Changelog.

* Added phones

* Added filtered phonemic wordlist

* Added Serbo-Croatian phonemes and filtered TSV files.

* Updated summaries for Serbo-Croatian phones.

* Updated CHANGELOG.

* Fixed formatting of Serbo-Croat phones file and CHANGELOG.

* Updated fork to match upstream.

* Updated fork to match upstream

* Delete .DS_Store

I don't know where this file came from...

* Delete .DS_Store

* Delete hbs_phonemic_phones.txt

* Delete .DS_Store

* [ita] Adds phoneme list, filtered phonemic TSV file

* Updates CHANGELOG

* Adds updated README and language summary

* Updates CHANGELOG with issue number for Italian phone list

* Adds Adyghe phones, filtered Adyghe data

* Updated CHANGELOG

* Adds Bulgarian phone list, filtered Bulgarian data

* Postprocesses with filtered Bulgarian data

* Updates changelog

* Adds Icelandic phones, filtered TSV file

* Updates changelog

* Adds Slovenian phones, filtered Slovenian data

* Updates changelog

* Add normalization to list_phones.py

* Updates changelog

* Reformats list_phones.py

* Adds Welsh phoneme lists, filtered Welsh TSV data

* Updates changelog

* Updates  with instructions to re-scrape

* Updates changelog

* Updates

* Updates data/phones/README.md

* Adds Vietnamese phones, Vietnamese TSV files

* Updates changelog

* Adds Hindi  file, new/updated TSV files

* Updates changelog

* Fixes Serbo-Croatian phones

* Updates CHANGELOG

* Revert "Adds Hindi  file, new/updated TSV files"

This reverts commit 964c3be.

* Adds Portuguese .phones files, re-scraped TSV data

* Rescrapes Portuguese data

* Updates changelog

* Adds Burmese phones, updated Burmese data

* Updates changelog

* Adds Japanese phone list. Rescrapes Japanese data

* Updates changelog

* Removes data/tsv/jpn_hira_phonemic.tsv

* Adds Azerbaijani phones, updated TSV data

* Updates changelog

* Adds Turkish phones, rescraped Turkish data

* Updates changelog

* Adds Maltese phones, updated data

* Updates changelog

Co-authored-by: Kyle Gorman <kylebgorman@gmail.com>
* Frequency code tire-kick:

1. Increases typing.
2. No longer overwrites the .tsv files: adds `_freq.tsv` suffix sintead.
3. Adds Khmer to JSON config. file.
4. Adds `shared_tasks` subdirectory for targeted config files.
5. Updates README.
* Added French phonemic phones list. Added filter French phonemic tsv.

* Added French phonemic phones.

* Updated Changelog.

* Added phones

* Added filtered phonemic wordlist

* Added Serbo-Croatian phonemes and filtered TSV files.

* Updated summaries for Serbo-Croatian phones.

* Updated CHANGELOG.

* Fixed formatting of Serbo-Croat phones file and CHANGELOG.

* Updated fork to match upstream.

* Updated fork to match upstream

* Delete .DS_Store

I don't know where this file came from...

* Delete .DS_Store

* Delete hbs_phonemic_phones.txt

* Delete .DS_Store

* [ita] Adds phoneme list, filtered phonemic TSV file

* Updates CHANGELOG

* Adds updated README and language summary

* Updates CHANGELOG with issue number for Italian phone list

* Adds Adyghe phones, filtered Adyghe data

* Updated CHANGELOG

* Adds Bulgarian phone list, filtered Bulgarian data

* Postprocesses with filtered Bulgarian data

* Updates changelog

* Adds Icelandic phones, filtered TSV file

* Updates changelog

* Adds Slovenian phones, filtered Slovenian data

* Updates changelog

* Add normalization to list_phones.py

* Updates changelog

* Reformats list_phones.py

* Adds Welsh phoneme lists, filtered Welsh TSV data

* Updates changelog

* Updates  with instructions to re-scrape

* Updates changelog

* Updates

* Updates data/phones/README.md

* Adds Vietnamese phones, Vietnamese TSV files

* Updates changelog

* Adds Hindi  file, new/updated TSV files

* Updates changelog

* Fixes Serbo-Croatian phones

* Updates CHANGELOG

* Revert "Adds Hindi  file, new/updated TSV files"

This reverts commit 964c3be.

* Adds Portuguese .phones files, re-scraped TSV data

* Rescrapes Portuguese data

* Updates changelog

* Adds Burmese phones, updated Burmese data

* Updates changelog

* Adds Japanese phone list. Rescrapes Japanese data

* Updates changelog

* Removes data/tsv/jpn_hira_phonemic.tsv

* Adds Azerbaijani phones, updated TSV data

* Updates changelog

* Adds Turkish phones, rescraped Turkish data

* Updates changelog

* Adds Maltese phones, updated data

* Updates changelog

* Adds Latvian phones, updated Latvian data

Co-authored-by: Kyle Gorman <kylebgorman@gmail.com>
* Added French phonemic phones list. Added filter French phonemic tsv.

* Added French phonemic phones.

* Updated Changelog.

* Added phones

* Added filtered phonemic wordlist

* Added Serbo-Croatian phonemes and filtered TSV files.

* Updated summaries for Serbo-Croatian phones.

* Updated CHANGELOG.

* Fixed formatting of Serbo-Croat phones file and CHANGELOG.

* Updated fork to match upstream.

* Updated fork to match upstream

* Delete .DS_Store

I don't know where this file came from...

* Delete .DS_Store

* Delete hbs_phonemic_phones.txt

* Delete .DS_Store

* [ita] Adds phoneme list, filtered phonemic TSV file

* Updates CHANGELOG

* Adds updated README and language summary

* Updates CHANGELOG with issue number for Italian phone list

* Adds Adyghe phones, filtered Adyghe data

* Updated CHANGELOG

* Adds Bulgarian phone list, filtered Bulgarian data

* Postprocesses with filtered Bulgarian data

* Updates changelog

* Adds Icelandic phones, filtered TSV file

* Updates changelog

* Adds Slovenian phones, filtered Slovenian data

* Updates changelog

* Add normalization to list_phones.py

* Updates changelog

* Reformats list_phones.py

* Adds Welsh phoneme lists, filtered Welsh TSV data

* Updates changelog

* Updates  with instructions to re-scrape

* Updates changelog

* Updates

* Updates data/phones/README.md

* Adds Vietnamese phones, Vietnamese TSV files

* Updates changelog

* Adds Hindi  file, new/updated TSV files

* Updates changelog

* Fixes Serbo-Croatian phones

* Updates CHANGELOG

* Revert "Adds Hindi  file, new/updated TSV files"

This reverts commit 964c3be.

* Adds Portuguese .phones files, re-scraped TSV data

* Rescrapes Portuguese data

* Updates changelog

* Adds Burmese phones, updated Burmese data

* Updates changelog

* Adds Japanese phone list. Rescrapes Japanese data

* Updates changelog

* Removes data/tsv/jpn_hira_phonemic.tsv

* Adds Azerbaijani phones, updated TSV data

* Updates changelog

* Adds Turkish phones, rescraped Turkish data

* Updates changelog

* Adds Maltese phones, updated data

* Updates changelog

* Adds Latvian phones, updated Latvian data

* Updates changelog

* Adds Khmer phones and updated TSV data

* Updates changelog

Co-authored-by: Kyle Gorman <kylebgorman@gmail.com>
* Added French phonemic phones list. Added filter French phonemic tsv.

* Added French phonemic phones.

* Updated Changelog.

* Added phones

* Added filtered phonemic wordlist

* Added Serbo-Croatian phonemes and filtered TSV files.

* Updated summaries for Serbo-Croatian phones.

* Updated CHANGELOG.

* Fixed formatting of Serbo-Croat phones file and CHANGELOG.

* Updated fork to match upstream.

* Updated fork to match upstream

* Delete .DS_Store

I don't know where this file came from...

* Delete .DS_Store

* Delete hbs_phonemic_phones.txt

* Delete .DS_Store

* [ita] Adds phoneme list, filtered phonemic TSV file

* Updates CHANGELOG

* Adds updated README and language summary

* Updates CHANGELOG with issue number for Italian phone list

* Adds Adyghe phones, filtered Adyghe data

* Updated CHANGELOG

* Adds Bulgarian phone list, filtered Bulgarian data

* Postprocesses with filtered Bulgarian data

* Updates changelog

* Adds Icelandic phones, filtered TSV file

* Updates changelog

* Adds Slovenian phones, filtered Slovenian data

* Updates changelog

* Add normalization to list_phones.py

* Updates changelog

* Reformats list_phones.py

* Adds Welsh phoneme lists, filtered Welsh TSV data

* Updates changelog

* Updates  with instructions to re-scrape

* Updates changelog

* Updates

* Updates data/phones/README.md

* Adds Vietnamese phones, Vietnamese TSV files

* Updates changelog

* Adds Hindi  file, new/updated TSV files

* Updates changelog

* Fixes Serbo-Croatian phones

* Updates CHANGELOG

* Revert "Adds Hindi  file, new/updated TSV files"

This reverts commit 964c3be.

* Adds Portuguese .phones files, re-scraped TSV data

* Rescrapes Portuguese data

* Updates changelog

* Adds Burmese phones, updated Burmese data

* Updates changelog

* Adds Japanese phone list. Rescrapes Japanese data

* Updates changelog

* Removes data/tsv/jpn_hira_phonemic.tsv

* Adds Azerbaijani phones, updated TSV data

* Updates changelog

* Adds Turkish phones, rescraped Turkish data

* Updates changelog

* Adds Maltese phones, updated data

* Updates changelog

* Adds Latvian phones, updated Latvian data

* Updates changelog

* Adds Khmer phones and updated TSV data

* Updates changelog

* Adds Østnorsk (Bokmål) phones and updated TSV data

* Updates changelog

* Fixes typo

Co-authored-by: Kyle Gorman <kylebgorman@gmail.com>
* scrape up to cantonese

* raw partial scrape - excludes yue, rus, cmn

* post-processing on partial scrape, src README fix

* re-ran generate_summary.py after resolving conflicts

* revert comment in scrape.py

* updates changelog, resolves formatting error
* Added French phonemic phones list. Added filter French phonemic tsv.

* Added French phonemic phones.

* Updated Changelog.

* Added phones

* Added filtered phonemic wordlist

* Added Serbo-Croatian phonemes and filtered TSV files.

* Updated summaries for Serbo-Croatian phones.

* Updated CHANGELOG.

* Fixed formatting of Serbo-Croat phones file and CHANGELOG.

* Updated fork to match upstream.

* Updated fork to match upstream

* Delete .DS_Store

I don't know where this file came from...

* Delete .DS_Store

* Delete hbs_phonemic_phones.txt

* Delete .DS_Store

* [ita] Adds phoneme list, filtered phonemic TSV file

* Updates CHANGELOG

* Adds updated README and language summary

* Updates CHANGELOG with issue number for Italian phone list

* Adds Adyghe phones, filtered Adyghe data

* Updated CHANGELOG

* Adds Bulgarian phone list, filtered Bulgarian data

* Postprocesses with filtered Bulgarian data

* Updates changelog

* Adds Icelandic phones, filtered TSV file

* Updates changelog

* Adds Slovenian phones, filtered Slovenian data

* Updates changelog

* Add normalization to list_phones.py

* Updates changelog

* Reformats list_phones.py

* Adds Welsh phoneme lists, filtered Welsh TSV data

* Updates changelog

* Updates  with instructions to re-scrape

* Updates changelog

* Updates

* Updates data/phones/README.md

* Adds Vietnamese phones, Vietnamese TSV files

* Updates changelog

* Adds Hindi  file, new/updated TSV files

* Updates changelog

* Fixes Serbo-Croatian phones

* Updates CHANGELOG

* Revert "Adds Hindi  file, new/updated TSV files"

This reverts commit 964c3be.

* Adds Portuguese .phones files, re-scraped TSV data

* Rescrapes Portuguese data

* Updates changelog

* Adds Burmese phones, updated Burmese data

* Updates changelog

* Adds Japanese phone list. Rescrapes Japanese data

* Updates changelog

* Removes data/tsv/jpn_hira_phonemic.tsv

* Adds Azerbaijani phones, updated TSV data

* Updates changelog

* Adds Turkish phones, rescraped Turkish data

* Updates changelog

* Adds Maltese phones, updated data

* Updates changelog

* Adds Latvian phones, updated Latvian data

* Updates changelog

* Adds Khmer phones and updated TSV data

* Updates changelog

* Adds Østnorsk (Bokmål) phones and updated TSV data

* Updates changelog

* Fixes typo

* Update data/phones/README.md

* Update changelog

Co-authored-by: Kyle Gorman <kylebgorman@gmail.com>
* cleaned up armenian phones

* cleaned up armenian phones (with more tidying up)

* cleaning up armenian (fixed changelog)

I had written the update on the wrong spot on the changelog + I added the issue number

* uncommented accidental gaps

* uncommented accidental gaps

* added voiceless allophones

* added missing geminate affricates
# Conflicts:
#	data/phones/eng_us_phonemic.phones
@kylebgorman kylebgorman self-requested a review January 26, 2021 21:52
Copy link
Collaborator

@kylebgorman kylebgorman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@kylebgorman
Copy link
Collaborator

This rather scary looking PR just brings the branch up to date and adds the new more restrictive eng_us_phonemic.phones. Approving.

@kylebgorman kylebgorman merged commit f5c05d0 into CUNY-CL:eng Jan 26, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.