Skip to content

Commit

Permalink
Reduced eng_us_phonemic.phones to eng branch (#336)
Browse files Browse the repository at this point in the history
* Update to Latin pron selector (#183)

* minor change to latin extraction function, rescraped Latin

* potential fix to lat scraping issue

* raw scrape of latin

* postprocessing of new latin data

* updated changelog, fixed line length error

* rescrape of latin

* postprocessing of updated latin data

* [pox] Scraped Polabian. (#186)

* [pox] Scraped Polabian.

Note: The ISO 639-3 code is `pox`, the older ISO 639-2 code is `sla`.

* Updated CHANGELOG.

* [mnc] Scraped Manchu. (#185)

* [mnc] Scraped Manchu.

* Updated CHANGELOG.

Co-authored-by: Kyle Gorman <kylebgorman@gmail.com>

* Merged Whitelist functionality with src/scrape.py. Now checks for pre… (#184)

* Merged Whitelist functionality with src/scrape.py. Now checks for presence of whitelist and writes separate tsv as {original file name}_filtered.tsv. Update generate_summary to reflect if file is filtered through a whitelist. CHANGELOG and README update accordingly.

* Style tweaks and cleanup.

* Updated generalized_split and postprocess to reflect automatic whitelist processing in scrape. Fixed dialect issue in generate_summary.

* Previous edits didn't cary.

* Cleanup typo mistakes. Added error handling to scrape.py.

* Style clean-up.

* Fixed style issues.

Co-authored-by: Kyle Gorman <kylebgorman@gmail.com>

* Imperial aramaic (#187)

* [arc] Listing the correct scripts for Imperial Aramaic:

1.  The original Aramaic script (`armi`).
2.  The square script as in Biblical Aramaic (`hebr`).
3.  Classical Syriac/Assyrian Neo-Aramaic (`syrc`) descended from (1).

This correctly assigns the entries to their respective lexicons. Most
of pronunciations are available for (2), with very minor number of
entries for (1) and (3).

* [arc] Listing the correct scripts for Imperial Aramaic (continuing the
previous commit which was partial):

1.  The original Aramaic script (`armi`).
2.  The square script as in Biblical Aramaic (`hebr`).
3.  Classical Syriac/Assyrian Neo-Aramaic (`syrc`) descended from (1).

This correctly assigns the entries to their respective lexicons. Most
of pronunciations are available for (2), with very minor number of
entries for (1) and (3).

* Updated CHANGELOG with #186.

* Add --no-tone flag (#188)

* tentative solution for tone removal

* updates changelog, ran white on test_config.py

* remove print statement from test_config.py

* partial replace of codepoints with chars, adds nfd/nfc conversion

* reworks import statements

* updates _TONES_REGEX

* ran white on config.py

* updates to conversions and adds comments

* fixes to scrape.py comment length

* converted test_config.py no_tone tests to nfd strings

* modifies no_tone process not to skip removing superscript parentheses around non-tone superscript chars

* Rename (#192)

* [geo] Rescrape post-bot.

Closes #138.

* Add changelog

* Rename.

* Update CHANGELOG

* Revert "[geo] Rescrape post-bot."

This reverts commit 4a151b13e0e03e7a4aecb7dad29c1de9c2230f10.

* Flattens directory structure for data. (#194)

* Flattens directory structure for data.

The non-wiki data is moved to the new `wikipron-extras` (https://github.com/kylebgorman/wikipron-extras) repository.

Closes #193.

* Add PR number to changelog.

* "Imperial"

* [geo] Rescrape post-bot. (#191)

* [geo] Rescrape post-bot.

Closes #138.

* Add changelog

* Update changelog

* [geo] Add whitelist and re-scrape.

* Renames for merge.

* Add link to guidelines

* [hun] Adds whitelist.

* Simplify postprocess

* Enforces consistent style in logging using %r. (#196)

* Enforces consistent style in logging using %r.

* Updates CHANELOG

* Fixes a double-quoted logging var.

* Filtering (#199)

* [rum] Add whitelist and rescrape.

* [eng] Adds English rescrape.

* [dut] Adds Dutch rescrape.

* [gre] Adds Greek rescrape.

* [gre] Adds Greek rescrape.

* Updates scrape path for phonetic filtering.

Closes #195.

* [rum] Adds Romanian rescrape.

* [arm] Adds Armenian rescrape.

* [gre] Adds Greek rescrape (second try).

* [arm] Adds Armenian dialects + rescrapes.

Closes #197.

* Adds CHANGELOG changes.

* [spa] Adds Spanish rescrape.

* Postprocess and regenerate summaries.

* [aar, bdq, jje, lsi] discovers new languages and scrapes them. (#202)

* Added tyv to languagecodes.py (#203)

* adds tuvan to languagecodes.py

* updates changelog

* Fall scrape (#204)

* [aar, bdq, jje, lsi] discovers new languages and scrapes them.

* Fall scrape.

* Fuller bib information

Fills out the bibliography entry for the WikiPron paper.

* Updates to codes.py (#205)

* updated languages.json and json files for translating between wikitionary code and iso code

* updates codes.py and languagecodes.py

* modifies test_languagecodes.py to reduce redundancy with codes.py

* small formatting fixes

* updates changelog

* logging statement formatting

* Update README.md

Fixes formatting issue in table. Not sure why this had to be done manually...

* ENH rename '.whitelist' as '.phones' (#207)

* Uses %r everywhere in `data/src`. (#210)

* Nepali support (#211)

* Uses %r everywhere in `data/src`.

* [nep] Adds Nepali data.

Closes #209.

* Update changelog

* [fre] Adds phoneme list (#213)

* Added French phonemic phones list. Added filter French phonemic tsv.

* Added French phonemic phones.

* Updated Changelog.

* Added phones

* [izh] Scrape and add Ingrian. (#215)

* [izh] Scrape and add Ingrian.

* Updated CHANGELOG.

* [ban] Splitting Balinese into Latin and Balinese scripts. (#214)

* [ban] Splitting Balinese into Latin and Balinese scripts.

* Updated CHANGELOG.

Co-authored-by: Kyle Gorman <kylebgorman@gmail.com>

* [kir] Split Kyrgyz into Cyrillic and Arabic scripts. (#216)

* [kir] Split Kyrgyz into Cyrillic and Arabic scripts.

* Updated.

* Added fre_phonemic_filtered.tsv (#217)

* Added French phonemic phones list. Added filter French phonemic tsv.

* Added French phonemic phones.

* Updated Changelog.

* Added phones

* Added filtered phonemic wordlist

* Refresh the database size counter. (#220)

* [khb] Customized extractor and re-scraping of Lü. (#219)

* [khb] Adding customized extractor for Lü.

* [khb] Re-scraping and updating the data and summaries.

* Updated CHANGELOG.

* Reordered imports.

* [khb] Adding scrape smoke test.

* Resorted.

* FIX specify UTF-8 in handling text files (#221)

It looks like Windows users have encountered encodings --
they hit `UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in
position 2882: character maps to <undefined>` when pip installing
wikipron, the error triggered at setup.py.
While we're at it, we specify UTF-8 encoding for all open() calls
for text processing as well.

Co-authored-by: jacksonllee <jacksonlunlee@gmail.com>

* [mga] New scrape: Middle Irish. (#224)

* [mga] New scrape: Middle Irish.

* Updated CHANGELOG.

* [cos] New scrape: Corsican. (#222)

* [cos] Add Corsican to the language code registry.

* [cos] Scraped Corsican and updated the language descriptions.

* Updated CHANGELOG.

Co-authored-by: Kyle Gorman <kylebgorman@gmail.com>

* [okm] New scrape: Middle Korean (#223)

* [okm] Adding ISO 639-3-only Middle Korean: Korean, Middle (10th–16th centuries).

* [okm] New scrape of Middle Korean and update of indices and descriptions.

* Updated CHANGELOG.

* Fixed typo.

Co-authored-by: Kyle Gorman <kylebgorman@gmail.com>

* [opt] New scrape: Old Portuguese (aka Galician-Portuguese). (#225)

* Adding Old Portuegese (aka Galician-Portuguese) codes.

* [opt] New scrape.

* [opt] Updated summaries.

* Updated CHANGELOG.

Co-authored-by: Kyle Gorman <kylebgorman@gmail.com>

* Added Serbo-Croatian phonemes and filtered TSV files. (#227)

* Added French phonemic phones list. Added filter French phonemic tsv.

* Added French phonemic phones.

* Updated Changelog.

* Added phones

* Added filtered phonemic wordlist

* Added Serbo-Croatian phonemes and filtered TSV files.

* Updated summaries for Serbo-Croatian phones.

* Updated CHANGELOG.

* Fixed formatting of Serbo-Croat phones file and CHANGELOG.

Co-authored-by: Kyle Gorman <kylebgorman@gmail.com>

* [shn] Custom extractor and new scrape for Shan (#229)

* [shn] Adding customized extractor for Shan.

* [shn] Adding smoke test.

* [shn] New scrape for Shan.

* [shn] Updated descriptions.

* Updated CHANGELOG.

* [tyv] New scrape: Tuvan (#228)

* [tyv] Tuvan scrape.

* [tyv] Updated descriptions.

This also fixes a buggy previous merge of `okm`.

* [tyv] Filtering Tuvan to use Cyrillic script only.

* Updated CHANGELOG.

Co-authored-by: Kyle Gorman <kylebgorman@gmail.com>

* Reorganizes tests and adds a few initial tests for the data side (#226)

* moves wikipron module tests into subdirectory

* reformating of test_version.py

* adds outline of test for data naming conventions, removes nonsense from src/scrape.py

* basic framework for testing file creation involved in big scrape

* renamed file naming test and added comments

* reorganizes tests directory, adds test for generate_summary.py

* fix formating in test_version.py

* revises and renames file for testing scrape

* fixes pathing issue in init

* adds some typing to new tests

* changes open statements to use proper encoding

* potential solution to circleci module error

* approaching a circleci import solution?

* updates changelog

* [hbs] Fix file naming.

* Update README.md

* Update README.md

* [gre] Takes advantage of upstream consistency fix.

Closes #198.

* [lat] Split Latin into its dialects (#233)

* ENH handle Latin dialects

* RM remove unwanted Latin TSVs

* ENH add Latin dialect TSVs

* ENH postprocess Latin dialect TSVs

* ENH update data summary readme

* MAINT update changelog

* MAINT update changelog

* Add HTTP User-Agent header to API calls (#234)

* add http headers for get requests

* add http headers for get requests in tests/

* change wikipron/scrape.py code to avoid circular imports

* updated requirements.txt to have the latest dependencies (#238) (#239)

* Update requirements.txt

* Update CHANGELOG.md

* Update CHANGELOG.md

* Added support for Python-3.9 (#236) (#240)

* Update requirements.txt

* Update CHANGELOG.md

* Update CHANGELOG.md

* Update config.yml

* Update setup.py

* Update CHANGELOG.md

* Add black formatting (#242)

* add black formatting (fix #237)

* update changelog

* [kmr] New scrape: Northern Kurdish (#243)

* [kmr] Adding an entry for Northern Kurdish.

* [kmr] Adding an ISO mapping for Northern Kurdish.

* [kmr] Fresh scrape.

* [kmr] Updated description and summaries.

* [kmr] Updated CHANGELOG.

* [kmr] Lower-cased version.

* [kmr] Silly. Source should be lower-case.

* Update CHANGELOG.md

Minor style fixes to CHANGELOG

* MAINT reorganize changelog (#244)

* Add logging for dialect support for languages requiring custom extraction logic (#245)

* ENH alert the use of custom logic when dialect is specific

* MAINT update changelog

* Add a script to facilitate the creation of .phones files (#246)

* ENH add script to tally phones/phonemes in a TSV

* DOC update readme for the .phones files

* MAINT update changelog

* DOC comments in list_phones.py

* MAINT update changelog

* DOC update docstrings and readme

* Use mypy for type checking (#247)

* ISSUE-241: Ignoring 'env' and '.idea' directories

* ISSUE-241: Added 'mypy' to 'requirements.txt'

* ISSUE-241: Added 'Type checking' step to CircleCI

* ISSUE-241: Fixed mypy issues

* ISSUE-241: Updated documentation

* ISSUE-241: Added mypy to the correct 'requirements.txt'

* ISSUE-241: Ran Black formatter

Also updated the contribution guidelines to include this as a step

* ISSUE-241: Markups

ISSUE-241: Markup - Alphabetised 'requirements.txt'
ISSUE-241: Markup - Log invalid page title
ISSUE-241: Markup - Alphabetised 'test_scrape.py' imports
ISSUE-241: Markup - Added explanatory comment
ISSUE-241: Markup - Improved 'config_dict' typing
ISSUE-241: Markup - Improved 'scrape.py' typing

* ISSUE-241: Markup - Using logger interpolation

* ISSUE-241: Markups

* ISSUE-241: Markup - Added working dir to Circle CI config

* split tildes; resort (#250)

* split tildes; resort

* update CHANGELOG.md

* Improve CircleCI workflow with orbs (#249)

* Convert to matrix CircleCI workflow

* Fix typo in parameter

* Add missing job name

* Add CircleCI test storage

* Add Python orb and caching

* Fix orb command

* Set Python deps install to global scope

* Bump up Python orb version

* Fix command nesting

* Add package manager to orb command

* Fix pyenv cache failure

* Fix pyenv cache

* Add workspace cache for pip packages

* Fix username typo

* Fix permission error

* Test pre-built CircleCI Docker image

* Test missing site-packages

* Test missing Python dir

* Add verbose pip list

* Add pre test jobs

* Fix parameter substitution in description

* Fix extraneus run

* Add parametrized flake8 and black jobs

* Fix parameter passing

* Fix unreferenced parameter

* Fix pre-test Docker image tag

* Show xml coverage

* Add pre-test Python cache

* Create tsv directory

* Chown /home to circleci

* Fix store_results path

* Rename pre-test jobs

* Improve CircleCI configuration

Add Python orb, matrix jobs and rework workflow structure

* Improve CircleCI configuration

Add Python orb, matrix jobs and rework workflow structure

* Bump up pre-build Python version to 3.9

* Add mypy to pre-build jobs

* Add mypy to build required jobs

* Change pip3 to pip

* Add PR to CHANGELOG.md

* Disable circleci user chowning /home

* Revert "Disable circleci user chowning /home"

This reverts commit eed32d6f3ab9c2094a642cc23967c536ad5bddb5.

* Disable pyenv creation

* Revert "Disable pyenv creation"

This reverts commit 68297c21c1c2f4dc67e2bc9bd7972adbeea3878b.

* Disable pyenv creation

* Test pip cache renewal

* Revert "Test pip cache renewal"

This reverts commit b4772307ded407da0fedfc4320b3594f66d366fa.

Cache works as intended, references
https://github.com/kylebgorman/wikipron/pull/249#discussion_r511582495.

Co-authored-by: Jackson L. Lee <jacksonlunlee@gmail.com>

* Small path changes on the data side, rework of test_scrape.py (#251)

* rework some paths on data side, simplify test_scrape.py

* revert changes to test_summary.py

* updates changelog

* Adding a sanity check for valid IPA (#248)

* Check that the phones/phonemes are valid IPA.

* Only print the bad characters.

* Updated CHANGELOG.

* Reformatted the file using black.

* Reran black with line length limit.

* Phonemes, rather than phones.

* Sorted the packages alphabetically.

* Re-arranged imports.

* Moved ipapy into data-specific requirements file.

* Adding dependency on absl-py (for logging) and factoring out the phoneme
checking functionality into its own function.

* Added a link to IPA chart.

* Removed absl-py.

* Use internal logger.

* Check the logging level.

* Moving to global logger.

Thanks Kyle!

* reformatted.

* Cosmetic: fixed warning message.

Co-authored-by: Kyle Gorman <kylebgorman@gmail.com>

* Style fixes for list_phones (#254)

* Style fixes for list_phones

* Ran black formatter.

* Remove `<5`.

* Negative flags are renamed to positive statements (#141) (#255)

* Negative flags in cli.py are renamed to positive statements. In order to accomodate this change, Wikipron/config.py and tests/test_wikipron/test_config.py are also edited accordingly.

* positive flags are added and negative flags are renamed to positive ones.

* changelog is updated.

* style edit

* fix fix redundancy

Co-authored-by: unknown <Yeonju@NYCMAXASIKKAW10.ad.insidemedia.net>

* Clean up flag help and eliminate remaining double negatives (#257)

* Work on flags:

1. Flag help should be short, because people don't read it very
carefully and it's not formatted for multi-sentence input. This shortens
all the flags to a single, consistent name. Because dialect and
segmentation require more information, these details have been moved
into a prominent position in the README instead.
2. The tone and space flags are given negative versions, cf. what Yeonju
did earlier.

* Eliminates double-negative in skip-spaces.

* Updates changelog.

* Updates tests, config, core.

* Fixed missing test_scrape change.

* Adds test for TSV splitting (#256)

* fixes to split.py and postprocess before adding tests

* cleanup of test_split

* updated a few comments in test_split

* revert needless changes to postprocess and split

* minor comment update in test_split

* updates changelog

* Updates data side to use new flags (#258)

* quick fix to small oversight in test_extract.py

* data side uses new flags

* updates changelog, removes config_factory from text_extract.py

* [ita] Adds phoneme list, filtered phonemic TSV file (#261)

* Added French phonemic phones list. Added filter French phonemic tsv.

* Added French phonemic phones.

* Updated Changelog.

* Added phones

* Added filtered phonemic wordlist

* Added Serbo-Croatian phonemes and filtered TSV files.

* Updated summaries for Serbo-Croatian phones.

* Updated CHANGELOG.

* Fixed formatting of Serbo-Croat phones file and CHANGELOG.

* Updated fork to match upstream.

* Updated fork to match upstream

* Delete .DS_Store

I don't know where this file came from...

* Delete .DS_Store

* Delete hbs_phonemic_phones.txt

* Delete .DS_Store

* [ita] Adds phoneme list, filtered phonemic TSV file

* Updates CHANGELOG

* Adds updated README and language summary

* Updates CHANGELOG with issue number for Italian phone list

Co-authored-by: Kyle Gorman <kylebgorman@gmail.com>

* [ady] Adds phone list, filtered Adyghe data. (#263)

* Added French phonemic phones list. Added filter French phonemic tsv.

* Added French phonemic phones.

* Updated Changelog.

* Added phones

* Added filtered phonemic wordlist

* Added Serbo-Croatian phonemes and filtered TSV files.

* Updated summaries for Serbo-Croatian phones.

* Updated CHANGELOG.

* Fixed formatting of Serbo-Croat phones file and CHANGELOG.

* Updated fork to match upstream.

* Updated fork to match upstream

* Delete .DS_Store

I don't know where this file came from...

* Delete .DS_Store

* Delete hbs_phonemic_phones.txt

* Delete .DS_Store

* [ita] Adds phoneme list, filtered phonemic TSV file

* Updates CHANGELOG

* Adds updated README and language summary

* Updates CHANGELOG with issue number for Italian phone list

* Adds Adyghe phones, filtered Adyghe data

* Updated CHANGELOG

Co-authored-by: Kyle Gorman <kylebgorman@gmail.com>

* Moves `list_phones.py`. (#266)

* Moves `list_phones.py`.

Closes #265.

* Add changelog

* [bul] Adds phone list, filtered Bulgarian data (#267)

* Added French phonemic phones list. Added filter French phonemic tsv.

* Added French phonemic phones.

* Updated Changelog.

* Added phones

* Added filtered phonemic wordlist

* Added Serbo-Croatian phonemes and filtered TSV files.

* Updated summaries for Serbo-Croatian phones.

* Updated CHANGELOG.

* Fixed formatting of Serbo-Croat phones file and CHANGELOG.

* Updated fork to match upstream.

* Updated fork to match upstream

* Delete .DS_Store

I don't know where this file came from...

* Delete .DS_Store

* Delete hbs_phonemic_phones.txt

* Delete .DS_Store

* [ita] Adds phoneme list, filtered phonemic TSV file

* Updates CHANGELOG

* Adds updated README and language summary

* Updates CHANGELOG with issue number for Italian phone list

* Adds Adyghe phones, filtered Adyghe data

* Updated CHANGELOG

* Adds Bulgarian phone list, filtered Bulgarian data

* Postprocesses with filtered Bulgarian data

* Updates changelog

Co-authored-by: Kyle Gorman <kylebgorman@gmail.com>

* Adds Icelandic phone list, filtered Icelandic data. (#270)

* Added French phonemic phones list. Added filter French phonemic tsv.

* Added French phonemic phones.

* Updated Changelog.

* Added phones

* Added filtered phonemic wordlist

* Added Serbo-Croatian phonemes and filtered TSV files.

* Updated summaries for Serbo-Croatian phones.

* Updated CHANGELOG.

* Fixed formatting of Serbo-Croat phones file and CHANGELOG.

* Updated fork to match upstream.

* Updated fork to match upstream

* Delete .DS_Store

I don't know where this file came from...

* Delete .DS_Store

* Delete hbs_phonemic_phones.txt

* Delete .DS_Store

* [ita] Adds phoneme list, filtered phonemic TSV file

* Updates CHANGELOG

* Adds updated README and language summary

* Updates CHANGELOG with issue number for Italian phone list

* Adds Adyghe phones, filtered Adyghe data

* Updated CHANGELOG

* Adds Bulgarian phone list, filtered Bulgarian data

* Postprocesses with filtered Bulgarian data

* Updates changelog

* Adds Icelandic phones, filtered TSV file

* Updates changelog

Co-authored-by: Kyle Gorman <kylebgorman@gmail.com>

* [slv] Adds Slovenian phoneme list, filtered TSV data. (#273)

* Added French phonemic phones list. Added filter French phonemic tsv.

* Added French phonemic phones.

* Updated Changelog.

* Added phones

* Added filtered phonemic wordlist

* Added Serbo-Croatian phonemes and filtered TSV files.

* Updated summaries for Serbo-Croatian phones.

* Updated CHANGELOG.

* Fixed formatting of Serbo-Croat phones file and CHANGELOG.

* Updated fork to match upstream.

* Updated fork to match upstream

* Delete .DS_Store

I don't know where this file came from...

* Delete .DS_Store

* Delete hbs_phonemic_phones.txt

* Delete .DS_Store

* [ita] Adds phoneme list, filtered phonemic TSV file

* Updates CHANGELOG

* Adds updated README and language summary

* Updates CHANGELOG with issue number for Italian phone list

* Adds Adyghe phones, filtered Adyghe data

* Updated CHANGELOG

* Adds Bulgarian phone list, filtered Bulgarian data

* Postprocesses with filtered Bulgarian data

* Updates changelog

* Adds Icelandic phones, filtered TSV file

* Updates changelog

* Adds Slovenian phones, filtered Slovenian data

* Updates changelog

Co-authored-by: Kyle Gorman <kylebgorman@gmail.com>

* Adds normalization to `list_phones.py`, corrects bugs relating to `ipapy` (#275)

* Added French phonemic phones list. Added filter French phonemic tsv.

* Added French phonemic phones.

* Updated Changelog.

* Added phones

* Added filtered phonemic wordlist

* Added Serbo-Croatian phonemes and filtered TSV files.

* Updated summaries for Serbo-Croatian phones.

* Updated CHANGELOG.

* Fixed formatting of Serbo-Croat phones file and CHANGELOG.

* Updated fork to match upstream.

* Updated fork to match upstream

* Delete .DS_Store

I don't know where this file came from...

* Delete .DS_Store

* Delete hbs_phonemic_phones.txt

* Delete .DS_Store

* [ita] Adds phoneme list, filtered phonemic TSV file

* Updates CHANGELOG

* Adds updated README and language summary

* Updates CHANGELOG with issue number for Italian phone list

* Adds Adyghe phones, filtered Adyghe data

* Updated CHANGELOG

* Adds Bulgarian phone list, filtered Bulgarian data

* Postprocesses with filtered Bulgarian data

* Updates changelog

* Adds Icelandic phones, filtered TSV file

* Updates changelog

* Adds Slovenian phones, filtered Slovenian data

* Updates changelog

* Add normalization to list_phones.py

* Updates changelog

* Reformats list_phones.py

Co-authored-by: Kyle Gorman <kylebgorman@gmail.com>

* Adds Welsh .phones lists, filtered TSV data (#276)

* Added French phonemic phones list. Added filter French phonemic tsv.

* Added French phonemic phones.

* Updated Changelog.

* Added phones

* Added filtered phonemic wordlist

* Added Serbo-Croatian phonemes and filtered TSV files.

* Updated summaries for Serbo-Croatian phones.

* Updated CHANGELOG.

* Fixed formatting of Serbo-Croat phones file and CHANGELOG.

* Updated fork to match upstream.

* Updated fork to match upstream

* Delete .DS_Store

I don't know where this file came from...

* Delete .DS_Store

* Delete hbs_phonemic_phones.txt

* Delete .DS_Store

* [ita] Adds phoneme list, filtered phonemic TSV file

* Updates CHANGELOG

* Adds updated README and language summary

* Updates CHANGELOG with issue number for Italian phone list

* Adds Adyghe phones, filtered Adyghe data

* Updated CHANGELOG

* Adds Bulgarian phone list, filtered Bulgarian data

* Postprocesses with filtered Bulgarian data

* Updates changelog

* Adds Icelandic phones, filtered TSV file

* Updates changelog

* Adds Slovenian phones, filtered Slovenian data

* Updates changelog

* Add normalization to list_phones.py

* Updates changelog

* Reformats list_phones.py

* Adds Welsh phoneme lists, filtered Welsh TSV data

* Updates changelog

Co-authored-by: Kyle Gorman <kylebgorman@gmail.com>

* [yue] Handle Cantonese for scraping (#277)

* ENH handle Cantonese for scraping

* MAINT update changelog

* DOC explain Cantonese pron XPath template

* Updates `data/phones/README.md` with instructions to re-scrape (#281)

* Added French phonemic phones list. Added filter French phonemic tsv.

* Added French phonemic phones.

* Updated Changelog.

* Added phones

* Added filtered phonemic wordlist

* Added Serbo-Croatian phonemes and filtered TSV files.

* Updated summaries for Serbo-Croatian phones.

* Updated CHANGELOG.

* Fixed formatting of Serbo-Croat phones file and CHANGELOG.

* Updated fork to match upstream.

* Updated fork to match upstream

* Delete .DS_Store

I don't know where this file came from...

* Delete .DS_Store

* Delete hbs_phonemic_phones.txt

* Delete .DS_Store

* [ita] Adds phoneme list, filtered phonemic TSV file

* Updates CHANGELOG

* Adds updated README and language summary

* Updates CHANGELOG with issue number for Italian phone list

* Adds Adyghe phones, filtered Adyghe data

* Updated CHANGELOG

* Adds Bulgarian phone list, filtered Bulgarian data

* Postprocesses with filtered Bulgarian data

* Updates changelog

* Adds Icelandic phones, filtered TSV file

* Updates changelog

* Adds Slovenian phones, filtered Slovenian data

* Updates changelog

* Add normalization to list_phones.py

* Updates changelog

* Reformats list_phones.py

* Adds Welsh phoneme lists, filtered Welsh TSV data

* Updates changelog

* Updates  with instructions to re-scrape

* Updates changelog

* Updates

* Updates data/phones/README.md

Co-authored-by: Kyle Gorman <kylebgorman@gmail.com>

* [vie] Adds Vietnamese `.phones` files, `.tsv` files  (#283)

* Added French phonemic phones list. Added filter French phonemic tsv.

* Added French phonemic phones.

* Updated Changelog.

* Added phones

* Added filtered phonemic wordlist

* Added Serbo-Croatian phonemes and filtered TSV files.

* Updated summaries for Serbo-Croatian phones.

* Updated CHANGELOG.

* Fixed formatting of Serbo-Croat phones file and CHANGELOG.

* Updated fork to match upstream.

* Updated fork to match upstream

* Delete .DS_Store

I don't know where this file came from...

* Delete .DS_Store

* Delete hbs_phonemic_phones.txt

* Delete .DS_Store

* [ita] Adds phoneme list, filtered phonemic TSV file

* Updates CHANGELOG

* Adds updated README and language summary

* Updates CHANGELOG with issue number for Italian phone list

* Adds Adyghe phones, filtered Adyghe data

* Updated CHANGELOG

* Adds Bulgarian phone list, filtered Bulgarian data

* Postprocesses with filtered Bulgarian data

* Updates changelog

* Adds Icelandic phones, filtered TSV file

* Updates changelog

* Adds Slovenian phones, filtered Slovenian data

* Updates changelog

* Add normalization to list_phones.py

* Updates changelog

* Reformats list_phones.py

* Adds Welsh phoneme lists, filtered Welsh TSV data

* Updates changelog

* Updates  with instructions to re-scrape

* Updates changelog

* Updates

* Updates data/phones/README.md

* Adds Vietnamese phones, Vietnamese TSV files

* Updates changelog

Co-authored-by: Kyle Gorman <kylebgorman@gmail.com>

* [hin] Adds `phones` file, updated/new TSV files. (#284)

* Added French phonemic phones list. Added filter French phonemic tsv.

* Added French phonemic phones.

* Updated Changelog.

* Added phones

* Added filtered phonemic wordlist

* Added Serbo-Croatian phonemes and filtered TSV files.

* Updated summaries for Serbo-Croatian phones.

* Updated CHANGELOG.

* Fixed formatting of Serbo-Croat phones file and CHANGELOG.

* Updated fork to match upstream.

* Updated fork to match upstream

* Delete .DS_Store

I don't know where this file came from...

* Delete .DS_Store

* Delete hbs_phonemic_phones.txt

* Delete .DS_Store

* [ita] Adds phoneme list, filtered phonemic TSV file

* Updates CHANGELOG

* Adds updated README and language summary

* Updates CHANGELOG with issue number for Italian phone list

* Adds Adyghe phones, filtered Adyghe data

* Updated CHANGELOG

* Adds Bulgarian phone list, filtered Bulgarian data

* Postprocesses with filtered Bulgarian data

* Updates changelog

* Adds Icelandic phones, filtered TSV file

* Updates changelog

* Adds Slovenian phones, filtered Slovenian data

* Updates changelog

* Add normalization to list_phones.py

* Updates changelog

* Reformats list_phones.py

* Adds Welsh phoneme lists, filtered Welsh TSV data

* Updates changelog

* Updates  with instructions to re-scrape

* Updates changelog

* Updates

* Updates data/phones/README.md

* Adds Vietnamese phones, Vietnamese TSV files

* Updates changelog

* Adds Hindi  file, new/updated TSV files

* Updates changelog

Co-authored-by: Kyle Gorman <kylebgorman@gmail.com>

* [hbs] Fixes Serbo-Croatian phoneme lists. Re-scrapes data. (#288)

* Added French phonemic phones list. Added filter French phonemic tsv.

* Added French phonemic phones.

* Updated Changelog.

* Added phones

* Added filtered phonemic wordlist

* Added Serbo-Croatian phonemes and filtered TSV files.

* Updated summaries for Serbo-Croatian phones.

* Updated CHANGELOG.

* Fixed formatting of Serbo-Croat phones file and CHANGELOG.

* Updated fork to match upstream.

* Updated fork to match upstream

* Delete .DS_Store

I don't know where this file came from...

* Delete .DS_Store

* Delete hbs_phonemic_phones.txt

* Delete .DS_Store

* [ita] Adds phoneme list, filtered phonemic TSV file

* Updates CHANGELOG

* Adds updated README and language summary

* Updates CHANGELOG with issue number for Italian phone list

* Adds Adyghe phones, filtered Adyghe data

* Updated CHANGELOG

* Adds Bulgarian phone list, filtered Bulgarian data

* Postprocesses with filtered Bulgarian data

* Updates changelog

* Adds Icelandic phones, filtered TSV file

* Updates changelog

* Adds Slovenian phones, filtered Slovenian data

* Updates changelog

* Add normalization to list_phones.py

* Updates changelog

* Reformats list_phones.py

* Adds Welsh phoneme lists, filtered Welsh TSV data

* Updates changelog

* Updates  with instructions to re-scrape

* Updates changelog

* Updates

* Updates data/phones/README.md

* Adds Vietnamese phones, Vietnamese TSV files

* Updates changelog

* Adds Hindi  file, new/updated TSV files

* Updates changelog

* Fixes Serbo-Croatian phones

* Updates CHANGELOG

* Revert "Adds Hindi  file, new/updated TSV files"

This reverts commit 964c3becdbf8a4ec35285ffbfbe9a419fad5123e.

Co-authored-by: Kyle Gorman <kylebgorman@gmail.com>

* [ofs] Scraped Old Frisian. (#294)

* Add Old Frisian to the configuration.

* Mark "ofs" as ISO639-3 language code.

* Fixed language name.

* Added phonemic pronunciations.

* Updated.

* [aar] Rescraped Afar. (#291)

* [aar] Rescraped.

* Updated.

Co-authored-by: Kyle Gorman <kylebgorman@gmail.com>

* [dng] Scraped Dungan. (#293)

* [dng] Scraped Dungan.

* Updated.

Co-authored-by: Kyle Gorman <kylebgorman@gmail.com>

* [bre] Rescraped Breton. (#292)

* [bre] Rescraped.

* Updated.

Co-authored-by: Kyle Gorman <kylebgorman@gmail.com>

* Covering grammar script (#297)

* Adds covering grammar generator, for future QA work.

Also moves `list_phones.py` to the src directory, which makes sense to
me.

* Changelog update.

* Updates CHANGELOG with issue number.

* Update README.md

Fix syntax highlighting hints.

* [ltg] Scraped Latgalian. (#296)

* [ltg] Scraped Latgalian.

* Forgot to include the actual data.

* Updated.

* Removes reconstructions (#302)

* Adds covering grammar generator, for future QA work.

Also moves `list_phones.py` to the src directory, which makes sense to
me.

* Changelog update.

* Updates CHANGELOG with issue number.

* Skips reconstructions during scraping.

Then, rescrapes Latin to take advantage of this.

* Adds number to changelog.

* Updates CHANGELOG for >>> junk.

* Rescrapes Armenian. (#303)

Closes #301.

* [por] Adds phones files, rescraped TSV files. (#304)

* Added French phonemic phones list. Added filter French phonemic tsv.

* Added French phonemic phones.

* Updated Changelog.

* Added phones

* Added filtered phonemic wordlist

* Added Serbo-Croatian phonemes and filtered TSV files.

* Updated summaries for Serbo-Croatian phones.

* Updated CHANGELOG.

* Fixed formatting of Serbo-Croat phones file and CHANGELOG.

* Updated fork to match upstream.

* Updated fork to match upstream

* Delete .DS_Store

I don't know where this file came from...

* Delete .DS_Store

* Delete hbs_phonemic_phones.txt

* Delete .DS_Store

* [ita] Adds phoneme list, filtered phonemic TSV file

* Updates CHANGELOG

* Adds updated README and language summary

* Updates CHANGELOG with issue number for Italian phone list

* Adds Adyghe phones, filtered Adyghe data

* Updated CHANGELOG

* Adds Bulgarian phone list, filtered Bulgarian data

* Postprocesses with filtered Bulgarian data

* Updates changelog

* Adds Icelandic phones, filtered TSV file

* Updates changelog

* Adds Slovenian phones, filtered Slovenian data

* Updates changelog

* Add normalization to list_phones.py

* Updates changelog

* Reformats list_phones.py

* Adds Welsh phoneme lists, filtered Welsh TSV data

* Updates changelog

* Updates  with instructions to re-scrape

* Updates changelog

* Updates

* Updates data/phones/README.md

* Adds Vietnamese phones, Vietnamese TSV files

* Updates changelog

* Adds Hindi  file, new/updated TSV files

* Updates changelog

* Fixes Serbo-Croatian phones

* Updates CHANGELOG

* Revert "Adds Hindi  file, new/updated TSV files"

This reverts commit 964c3becdbf8a4ec35285ffbfbe9a419fad5123e.

* Adds Portuguese .phones files, re-scraped TSV data

* Rescrapes Portuguese data

* Updates changelog

Co-authored-by: Kyle Gorman <kylebgorman@gmail.com>

* [bur] Adds Burmese phone list, re-scraped Burmese data. (#305)

* Added French phonemic phones list. Added filter French phonemic tsv.

* Added French phonemic phones.

* Updated Changelog.

* Added phones

* Added filtered phonemic wordlist

* Added Serbo-Croatian phonemes and filtered TSV files.

* Updated summaries for Serbo-Croatian phones.

* Updated CHANGELOG.

* Fixed formatting of Serbo-Croat phones file and CHANGELOG.

* Updated fork to match upstream.

* Updated fork to match upstream

* Delete .DS_Store

I don't know where this file came from...

* Delete .DS_Store

* Delete hbs_phonemic_phones.txt

* Delete .DS_Store

* [ita] Adds phoneme list, filtered phonemic TSV file

* Updates CHANGELOG

* Adds updated README and language summary

* Updates CHANGELOG with issue number for Italian phone list

* Adds Adyghe phones, filtered Adyghe data

* Updated CHANGELOG

* Adds Bulgarian phone list, filtered Bulgarian data

* Postprocesses with filtered Bulgarian data

* Updates changelog

* Adds Icelandic phones, filtered TSV file

* Updates changelog

* Adds Slovenian phones, filtered Slovenian data

* Updates changelog

* Add normalization to list_phones.py

* Updates changelog

* Reformats list_phones.py

* Adds Welsh phoneme lists, filtered Welsh TSV data

* Updates changelog

* Updates  with instructions to re-scrape

* Updates changelog

* Updates

* Updates data/phones/README.md

* Adds Vietnamese phones, Vietnamese TSV files

* Updates changelog

* Adds Hindi  file, new/updated TSV files

* Updates changelog

* Fixes Serbo-Croatian phones

* Updates CHANGELOG

* Revert "Adds Hindi  file, new/updated TSV files"

This reverts commit 964c3becdbf8a4ec35285ffbfbe9a419fad5123e.

* Adds Portuguese .phones files, re-scraped TSV data

* Rescrapes Portuguese data

* Updates changelog

* Adds Burmese phones, updated Burmese data

* Updates changelog

Co-authored-by: Kyle Gorman <kylebgorman@gmail.com>

* [mdf] Scrape Moksha + slightly more flexible default pron selector. (#295)

* Support some Moksha pronunciations that reside under "p", rather than
"li".

* Scrape.

* Attempt to fix the test.

* Updated.

* Split the PR into two items.

* [jpn] Adds Japanese .phones file and updated TSV files (#307)

* Added French phonemic phones list. Added filter French phonemic tsv.

* Added French phonemic phones.

* Updated Changelog.

* Added phones

* Added filtered phonemic wordlist

* Added Serbo-Croatian phonemes and filtered TSV files.

* Updated summaries for Serbo-Croatian phones.

* Updated CHANGELOG.

* Fixed formatting of Serbo-Croat phones file and CHANGELOG.

* Updated fork to match upstream.

* Updated fork to match upstream

* Delete .DS_Store

I don't know where this file came from...

* Delete .DS_Store

* Delete hbs_phonemic_phones.txt

* Delete .DS_Store

* [ita] Adds phoneme list, filtered phonemic TSV file

* Updates CHANGELOG

* Adds updated README and language summary

* Updates CHANGELOG with issue number for Italian phone list

* Adds Adyghe phones, filtered Adyghe data

* Updated CHANGELOG

* Adds Bulgarian phone list, filtered Bulgarian data

* Postprocesses with filtered Bulgarian data

* Updates changelog

* Adds Icelandic phones, filtered TSV file

* Updates changelog

* Adds Slovenian phones, filtered Slovenian data

* Updates changelog

* Add normalization to list_phones.py

* Updates changelog

* Reformats list_phones.py

* Adds Welsh phoneme lists, filtered Welsh TSV data

* Updates changelog

* Updates  with instructions to re-scrape

* Updates changelog

* Updates

* Updates data/phones/README.md

* Adds Vietnamese phones, Vietnamese TSV files

* Updates changelog

* Adds Hindi  file, new/updated TSV files

* Updates changelog

* Fixes Serbo-Croatian phones

* Updates CHANGELOG

* Revert "Adds Hindi  file, new/updated TSV files"

This reverts commit 964c3becdbf8a4ec35285ffbfbe9a419fad5123e.

* Adds Portuguese .phones files, re-scraped TSV data

* Rescrapes Portuguese data

* Updates changelog

* Adds Burmese phones, updated Burmese data

* Updates changelog

* Adds Japanese phone list. Rescrapes Japanese data

* Updates changelog

* Removes data/tsv/jpn_hira_phonemic.tsv

Co-authored-by: Kyle Gorman <kylebgorman@gmail.com>

* Updates segments version (#308)

* updates segments version and adds test for vietnamese tones

* updates changelog

* [ger] Adds German Phone list, filtered TSV file (#309)

* Create German Phonelist

* Updated CHANGELOG.md

* incorporate updates in README.md, and added missing ger_phone* files

* Adds some whitespace to German phone list comments. (#310)

* [aze] Adds Azerbaijani phone lists and updated TSV data (#312)

* Added French phonemic phones list. Added filter French phonemic tsv.

* Added French phonemic phones.

* Updated Changelog.

* Added phones

* Added filtered phonemic wordlist

* Added Serbo-Croatian phonemes and filtered TSV files.

* Updated summaries for Serbo-Croatian phones.

* Updated CHANGELOG.

* Fixed formatting of Serbo-Croat phones file and CHANGELOG.

* Updated fork to match upstream.

* Updated fork to match upstream

* Delete .DS_Store

I don't know where this file came from...

* Delete .DS_Store

* Delete hbs_phonemic_phones.txt

* Delete .DS_Store

* [ita] Adds phoneme list, filtered phonemic TSV file

* Updates CHANGELOG

* Adds updated README and language summary

* Updates CHANGELOG with issue number for Italian phone list

* Adds Adyghe phones, filtered Adyghe data

* Updated CHANGELOG

* Adds Bulgarian phone list, filtered Bulgarian data

* Postprocesses with filtered Bulgarian data

* Updates changelog

* Adds Icelandic phones, filtered TSV file

* Updates changelog

* Adds Slovenian phones, filtered Slovenian data

* Updates changelog

* Add normalization to list_phones.py

* Updates changelog

* Reformats list_phones.py

* Adds Welsh phoneme lists, filtered Welsh TSV data

* Updates changelog

* Updates  with instructions to re-scrape

* Updates changelog

* Updates

* Updates data/phones/README.md

* Adds Vietnamese phones, Vietnamese TSV files

* Updates changelog

* Adds Hindi  file, new/updated TSV files

* Updates changelog

* Fixes Serbo-Croatian phones

* Updates CHANGELOG

* Revert "Adds Hindi  file, new/updated TSV files"

This reverts commit 964c3becdbf8a4ec35285ffbfbe9a419fad5123e.

* Adds Portuguese .phones files, re-scraped TSV data

* Rescrapes Portuguese data

* Updates changelog

* Adds Burmese phones, updated Burmese data

* Updates changelog

* Adds Japanese phone list. Rescrapes Japanese data

* Updates changelog

* Removes data/tsv/jpn_hira_phonemic.tsv

* Adds Azerbaijani phones, updated TSV data

* Updates changelog

Co-authored-by: Kyle Gorman <kylebgorman@gmail.com>

* [tur] Adds Turkish phone list and updated TSV data (#314)

* Added French phonemic phones list. Added filter French phonemic tsv.

* Added French phonemic phones.

* Updated Changelog.

* Added phones

* Added filtered phonemic wordlist

* Added Serbo-Croatian phonemes and filtered TSV files.

* Updated summaries for Serbo-Croatian phones.

* Updated CHANGELOG.

* Fixed formatting of Serbo-Croat phones file and CHANGELOG.

* Updated fork to match upstream.

* Updated fork to match upstream

* Delete .DS_Store

I don't know where this file came from...

* Delete .DS_Store

* Delete hbs_phonemic_phones.txt

* Delete .DS_Store

* [ita] Adds phoneme list, filtered phonemic TSV file

* Updates CHANGELOG

* Adds updated README and language summary

* Updates CHANGELOG with issue number for Italian phone list

* Adds Adyghe phones, filtered Adyghe data

* Updated CHANGELOG

* Adds Bulgarian phone list, filtered Bulgarian data

* Postprocesses with filtered Bulgarian data

* Updates changelog

* Adds Icelandic phones, filtered TSV file

* Updates changelog

* Adds Slovenian phones, filtered Slovenian data

* Updates changelog

* Add normalization to list_phones.py

* Updates changelog

* Reformats list_phones.py

* Adds Welsh phoneme lists, filtered Welsh TSV data

* Updates changelog

* Updates  with instructions to re-scrape

* Updates changelog

* Updates

* Updates data/phones/README.md

* Adds Vietnamese phones, Vietnamese TSV files

* Updates changelog

* Adds Hindi  file, new/updated TSV files

* Updates changelog

* Fixes Serbo-Croatian phones

* Updates CHANGELOG

* Revert "Adds Hindi  file, new/updated TSV files"

This reverts commit 964c3becdbf8a4ec35285ffbfbe9a419fad5123e.

* Adds Portuguese .phones files, re-scraped TSV data

* Rescrapes Portuguese data

* Updates changelog

* Adds Burmese phones, updated Burmese data

* Updates changelog

* Adds Japanese phone list. Rescrapes Japanese data

* Updates changelog

* Removes data/tsv/jpn_hira_phonemic.tsv

* Adds Azerbaijani phones, updated TSV data

* Updates changelog

* Adds Turkish phones, rescraped Turkish data

* Updates changelog

Co-authored-by: Kyle Gorman <kylebgorman@gmail.com>

* [afr] adds phone list for Afrikaans and updated TSV files (#316)

* adds afr phone list and rescrapes

* Updated CHANGELOG.md

* [mlt] Adds Maltese phones file and updates TSV data (#318)

* Added French phonemic phones list. Added filter French phonemic tsv.

* Added French phonemic phones.

* Updated Changelog.

* Added phones

* Added filtered phonemic wordlist

* Added Serbo-Croatian phonemes and filtered TSV files.

* Updated summaries for Serbo-Croatian phones.

* Updated CHANGELOG.

* Fixed formatting of Serbo-Croat phones file and CHANGELOG.

* Updated fork to match upstream.

* Updated fork to match upstream

* Delete .DS_Store

I don't know where this file came from...

* Delete .DS_Store

* Delete hbs_phonemic_phones.txt

* Delete .DS_Store

* [ita] Adds phoneme list, filtered phonemic TSV file

* Updates CHANGELOG

* Adds updated README and language summary

* Updates CHANGELOG with issue number for Italian phone list

* Adds Adyghe phones, filtered Adyghe data

* Updated CHANGELOG

* Adds Bulgarian phone list, filtered Bulgarian data

* Postprocesses with filtered Bulgarian data

* Updates changelog

* Adds Icelandic phones, filtered TSV file

* Updates changelog

* Adds Slovenian phones, filtered Slovenian data

* Updates changelog

* Add normalization to list_phones.py

* Updates changelog

* Reformats list_phones.py

* Adds Welsh phoneme lists, filtered Welsh TSV data

* Updates changelog

* Updates  with instructions to re-scrape

* Updates changelog

* Updates

* Updates data/phones/README.md

* Adds Vietnamese phones, Vietnamese TSV files

* Updates changelog

* Adds Hindi  file, new/updated TSV files

* Updates changelog

* Fixes Serbo-Croatian phones

* Updates CHANGELOG

* Revert "Adds Hindi  file, new/updated TSV files"

This reverts commit 964c3becdbf8a4ec35285ffbfbe9a419fad5123e.

* Adds Portuguese .phones files, re-scraped TSV data

* Rescrapes Portuguese data

* Updates changelog

* Adds Burmese phones, updated Burmese data

* Updates changelog

* Adds Japanese phone list. Rescrapes Japanese data

* Updates changelog

* Removes data/tsv/jpn_hira_phonemic.tsv

* Adds Azerbaijani phones, updated TSV data

* Updates changelog

* Adds Turkish phones, rescraped Turkish data

* Updates changelog

* Adds Maltese phones, updated data

* Updates changelog

Co-authored-by: Kyle Gorman <kylebgorman@gmail.com>

* Frequency code tire-kick (#320)

* Frequency code tire-kick:

1. Increases typing.
2. No longer overwrites the .tsv files: adds `_freq.tsv` suffix sintead.
3. Adds Khmer to JSON config. file.
4. Adds `shared_tasks` subdirectory for targeted config files.
5. Updates README.

* [lav] Adds Latvian phone list and updated TSV data (#322)

* Added French phonemic phones list. Added filter French phonemic tsv.

* Added French phonemic phones.

* Updated Changelog.

* Added phones

* Added filtered phonemic wordlist

* Added Serbo-Croatian phonemes and filtered TSV files.

* Updated summaries for Serbo-Croatian phones.

* Updated CHANGELOG.

* Fixed formatting of Serbo-Croat phones file and CHANGELOG.

* Updated fork to match upstream.

* Updated fork to match upstream

* Delete .DS_Store

I don't know where this file came from...

* Delete .DS_Store

* Delete hbs_phonemic_phones.txt

* Delete .DS_Store

* [ita] Adds phoneme list, filtered phonemic TSV file

* Updates CHANGELOG

* Adds updated README and language summary

* Updates CHANGELOG with issue number for Italian phone list

* Adds Adyghe phones, filtered Adyghe data

* Updated CHANGELOG

* Adds Bulgarian phone list, filtered Bulgarian data

* Postprocesses with filtered Bulgarian data

* Updates changelog

* Adds Icelandic phones, filtered TSV file

* Updates changelog

* Adds Slovenian phones, filtered Slovenian data

* Updates changelog

* Add normalization to list_phones.py

* Updates changelog

* Reformats list_phones.py

* Adds Welsh phoneme lists, filtered Welsh TSV data

* Updates changelog

* Updates  with instructions to re-scrape

* Updates changelog

* Updates

* Updates data/phones/README.md

* Adds Vietnamese phones, Vietnamese TSV files

* Updates changelog

* Adds Hindi  file, new/updated TSV files

* Updates changelog

* Fixes Serbo-Croatian phones

* Updates CHANGELOG

* Revert "Adds Hindi  file, new/updated TSV files"

This reverts commit 964c3becdbf8a4ec35285ffbfbe9a419fad5123e.

* Adds Portuguese .phones files, re-scraped TSV data

* Rescrapes Portuguese data

* Updates changelog

* Adds Burmese phones, updated Burmese data

* Updates changelog

* Adds Japanese phone list. Rescrapes Japanese data

* Updates changelog

* Removes data/tsv/jpn_hira_phonemic.tsv

* Adds Azerbaijani phones, updated TSV data

* Updates changelog

* Adds Turkish phones, rescraped Turkish data

* Updates changelog

* Adds Maltese phones, updated data

* Updates changelog

* Adds Latvian phones, updated Latvian data

Co-authored-by: Kyle Gorman <kylebgorman@gmail.com>

* [khm] Adds Khmer phones and updated TSV data (#327)

* Added French phonemic phones list. Added filter French phonemic tsv.

* Added French phonemic phones.

* Updated Changelog.

* Added phones

* Added filtered phonemic wordlist

* Added Serbo-Croatian phonemes and filtered TSV files.

* Updated summaries for Serbo-Croatian phones.

* Updated CHANGELOG.

* Fixed formatting of Serbo-Croat phones file and CHANGELOG.

* Updated fork to match upstream.

* Updated fork to match upstream

* Delete .DS_Store

I don't know where this file came from...

* Delete .DS_Store

* Delete hbs_phonemic_phones.txt

* Delete .DS_Store

* [ita] Adds phoneme list, filtered phonemic TSV file

* Updates CHANGELOG

* Adds updated README and language summary

* Updates CHANGELOG with issue number for Italian phone list

* Adds Adyghe phones, filtered Adyghe data

* Updated CHANGELOG

* Adds Bulgarian phone list, filtered Bulgarian data

* Postprocesses with filtered Bulgarian data

* Updates changelog

* Adds Icelandic phones, filtered TSV file

* Updates changelog

* Adds Slovenian phones, filtered Slovenian data

* Updates changelog

* Add normalization to list_phones.py

* Updates changelog

* Reformats list_phones.py

* Adds Welsh phoneme lists, filtered Welsh TSV data

* Updates changelog

* Updates  with instructions to re-scrape

* Updates changelog

* Updates

* Updates data/phones/README.md

* Adds Vietnamese phones, Vietnamese TSV files

* Updates changelog

* Adds Hindi  file, new/updated TSV files

* Updates changelog

* Fixes Serbo-Croatian phones

* Updates CHANGELOG

* Revert "Adds Hindi  file, new/updated TSV files"

This reverts commit 964c3becdbf8a4ec35285ffbfbe9a419fad5123e.

* Adds Portuguese .phones files, re-scraped TSV data

* Rescrapes Portuguese data

* Updates changelog

* Adds Burmese phones, updated Burmese data

* Updates changelog

* Adds Japanese phone list. Rescrapes Japanese data

* Updates changelog

* Removes data/tsv/jpn_hira_phonemic.tsv

* Adds Azerbaijani phones, updated TSV data

* Updates changelog

* Adds Turkish phones, rescraped Turkish data

* Updates changelog

* Adds Maltese phones, updated data

* Updates changelog

* Adds Latvian phones, updated Latvian data

* Updates changelog

* Adds Khmer phones and updated TSV data

* Updates changelog

Co-authored-by: Kyle Gorman <kylebgorman@gmail.com>

* Moves Latin phonelist to Classical Latin. (#326)

Also undertook a light reorg.

* [nob] Adds Østnorsk (Bokmål) phones and updated TSV data (#330)

* Added French phonemic phones list. Added filter French phonemic tsv.

* Added French phonemic phones.

* Updated Changelog.

* Added phones

* Added filtered phonemic wordlist

* Added Serbo-Croatian phonemes and filtered TSV files.

* Updated summaries for Serbo-Croatian phones.

* Updated CHANGELOG.

* Fixed formatting of Serbo-Croat phones file and CHANGELOG.

* Updated fork to match upstream.

* Updated fork to match upstream

* Delete .DS_Store

I don't know where this file came from...

* Delete .DS_Store

* Delete hbs_phonemic_phones.txt

* Delete .DS_Store

* [ita] Adds phoneme list, filtered phonemic TSV file

* Updates CHANGELOG

* Adds updated README and language summary

* Updates CHANGELOG with issue number for Italian phone list

* Adds Adyghe phones, filtered Adyghe data

* Updated CHANGELOG

* Adds Bulgarian phone list, filtered Bulgarian data

* Postprocesses with filtered Bulgarian data

* Updates changelog

* Adds Icelandic phones, filtered TSV file

* Updates changelog

* Adds Slovenian phones, filtered Slovenian data

* Updates changelog

* Add normalization to list_phones.py

* Updates changelog

* Reformats list_phones.py

* Adds Welsh phoneme lists, filtered Welsh TSV data

* Updates changelog

* Updates  with instructions to re-scrape

* Updates changelog

* Updates

* Updates data/phones/README.md

* Adds Vietnamese phones, Vietnamese TSV files

* Updates changelog

* Adds Hindi  file, new/updated TSV files

* Updates changelog

* Fixes Serbo-Croatian phones

* Updates CHANGELOG

* Revert "Adds Hindi  file, new/updated TSV files"

This reverts commit 964c3becdbf8a4ec35285ffbfbe9a419fad5123e.

* Adds Portuguese .phones files, re-scraped TSV data

* Rescrapes Portuguese data

* Updates changelog

* Adds Burmese phones, updated Burmese data

* Updates changelog

* Adds Japanese phone list. Rescrapes Japanese data

* Updates changelog

* Removes data/tsv/jpn_hira_phonemic.tsv

* Adds Azerbaijani phones, updated TSV data

* Updates changelog

* Adds Turkish phones, rescraped Turkish data

* Updates changelog

* Adds Maltese phones, updated data

* Updates changelog

* Adds Latvian phones, updated Latvian data

* Updates changelog

* Adds Khmer phones and updated TSV data

* Updates changelog

* Adds Østnorsk (Bokmål) phones and updated TSV data

* Updates changelog

* Fixes typo

Co-authored-by: Kyle Gorman <kylebgorman@gmail.com>

* Add English link to language list for frequencies. (#332)

* Partial scrape (#334)

* scrape up to cantonese

* raw partial scrape - excludes yue, rus, cmn

* post-processing on partial scrape, src README fix

* re-ran generate_summary.py after resolving conflicts

* revert comment in scrape.py

* updates changelog, resolves formatting error

* Updates `data/phones/README.md` (#333)

* Added French phonemic phones list. Added filter French phonemic tsv.

* Added French phonemic phones.

* Updated Changelog.

* Added phones

* Added filtered phonemic wordlist

* Added Serbo-Croatian phonemes and filtered TSV files.

* Updated summaries for Serbo-Croatian phones.

* Updated CHANGELOG.

* Fixed formatting of Serbo-Croat phones file and CHANGELOG.

* Updated fork to match upstream.

* Updated fork to match upstream

* Delete .DS_Store

I don't know where this file came from...

* Delete .DS_Store

* Delete hbs_phonemic_phones.txt

* Delete .DS_Store

* [ita] Adds phoneme list, filtered phonemic TSV file

* Updates CHANGELOG

* Adds updated README and language summary

* Updates CHANGELOG with issue number for Italian phone list

* Adds Adyghe phones, filtered Adyghe data

* Updated CHANGELOG

* Adds Bulgarian phone list, filtered Bulgarian data

* Postprocesses with filtered Bulgarian data

* Updates changelog

* Adds Icelandic phones, filtered TSV file

* Updates changelog

* Adds Slovenian phones, filtered Slovenian data

* Updates changelog

* Add normalization to list_phones.py

* Updates changelog

* Reformats list_phones.py

* Adds Welsh phoneme lists, filtered Welsh TSV data

* Updates changelog

* Updates  with instructions to re-scrape

* Updates changelog

* Updates

* Updates data/phones/README.md

* Adds Vietnamese phones, Vietnamese TSV files

* Updates changelog

* Adds Hindi  file, new/updated TSV files

* Updates changelog

* Fixes Serbo-Croatian phones

* Updates CHANGELOG

* Revert "Adds Hindi  file, new/updated TSV files"

This reverts commit 964c3becdbf8a4ec35285ffbfbe9a419fad5123e.

* Adds Portuguese .phones files, re-scraped TSV data

* Rescrapes Portuguese data

* Updates changelog

* Adds Burmese phones, updated Burmese data

* Updates changelog

* Adds Japanese phone list. Rescrapes Japanese data

* Updates changelog

* Removes data/tsv/jpn_hira_phonemic.tsv

* Adds Azerbaijani phones, updated TSV data

* Updates changelog

* Adds Turkish phones, rescraped Turkish data

* Updates changelog

* Adds Maltese phones, updated data

* Updates changelog

* Adds Latvian phones, updated Latvian data

* Updates changelog

* Adds Khmer phones and updated TSV data

* Updates changelog

* Adds Østnorsk (Bokmål) phones and updated TSV data

* Updates changelog

* Fixes typo

* Update data/phones/README.md

* Update changelog

Co-authored-by: Kyle Gorman <kylebgorman@gmail.com>

* [arm] cleaned up armenian phones (#331)

* cleaned up armenian phones

* cleaned up armenian phones (with more tidying up)

* cleaning up armenian (fixed changelog)

I had written the update on the wrong spot on the changelog + I added the issue number

* uncommented accidental gaps

* uncommented accidental gaps

* added voiceless allophones

* added missing geminate affricates

* reduced branch

* reduced branch

* final changes for commit to original branch

Co-authored-by: Lucas Ashby <lfeashby@gmail.com>
Co-authored-by: Alexander Gutkin <35786058+agutkin@users.noreply.github.com>
Co-authored-by: Kyle Gorman <kylebgorman@gmail.com>
Co-authored-by: Travis Bartley <Travismbartley@gmail.com>
Co-authored-by: Jackson L. Lee <jacksonlunlee@gmail.com>
Co-authored-by: ajmalanoski <71616036+ajmalanoski@users.noreply.github.com>
Co-authored-by: Alireza <Alirezasampoor@gmail.com>
Co-authored-by: Biswaroop Bhattacharjee <biswaroop08@gmail.com>
Co-authored-by: Muhammad Fakhri Putra Supriyadi <fakhriputra123s@gmail.com>
Co-authored-by: Ben Fernandes <dev.benfernandes@gmail.com>
Co-authored-by: Jim Regan <jaoregan@tcd.ie>
Co-authored-by: platipo <enrico.paganin@mail.com>
Co-authored-by: yeonju123 <yeonju123@gmail.com>
Co-authored-by: unknown <Yeonju@NYCMAXASIKKAW10.ad.insidemedia.net>
Co-authored-by: Hossep Dolatian <hovdeov@gmail.com>
  • Loading branch information
16 people committed Jan 26, 2021
1 parent 88829cf commit f5c05d0
Show file tree
Hide file tree
Showing 562 changed files with 2,476,761 additions and 1,716,276 deletions.
113 changes: 76 additions & 37 deletions .circleci/config.yml
Original file line number Diff line number Diff line change
@@ -1,52 +1,91 @@
version: 2
version: 2.1

orbs:
python: circleci/python@1.2.0

workflows:
version: 2
test:
jobs:
- build-python-3.6
- build-python-3.7
- build-python-3.8

jobs:
build-python-3.6: &template
pre-build:
description: Install and run a Python standalone package
parameters:
command-name:
type: string
command-run:
type: string
docker:
- image: python:3.6
- image: cimg/python:3.9
#auth:
# username: $DOCKERHUB_USERNAME
# password: $DOCKERHUB_PASSWORD
steps:
- checkout
- run:
name: Build source distribution and install package from it
working_directory: ~/project/
# Ensure we can build a source distribution that can correctly install.
# "python setup.py sdist" creates dist/wikipron-x.y.z.tar.gz
command: |
pip install --progress-bar off --upgrade pip setuptools
python setup.py sdist
pip install dist/`ls dist/ | grep .tar.gz`
name: Create custom requirements
command: grep << parameters.command-name >> requirements.txt > << parameters.command-name >>_requirements.txt
- python/install-packages:
pkg-manager: pip
pip-dependency-file: << parameters.command-name >>_requirements.txt
cache-version: << parameters.command-name >>-v1
- run:
name: Install the full development requirements
working_directory: ~/project/
command: pip install --progress-bar off -r requirements.txt
working_directory: ~/
command: << parameters.command-run >>

build-python:
parameters:
python-version:
type: string
docker:
- image: cimg/python:<< parameters.python-version >>
#auth:
# username: $DOCKERHUB_USERNAME
# password: $DOCKERHUB_PASSWORD
steps:
- checkout
- python/install-packages:
pkg-manager: pip
pre-install-steps:
- run:
name: Build source distribution and install package from it
command: |
pip install --progress-bar off --upgrade pip setuptools
python setup.py sdist
pip install dist/*.tar.gz
- run:
name: Show installed Python packages
command: pip list
- run:
name: Lint
working_directory: ~/
# Avoid being able to import wikipron by relative import.
# Test code by importing the *installed* wikipron in site-packages.
command: flake8 project/setup.py project/wikipron project/tests
command: pip list -v
- run:
name: Run python tests
working_directory: ~/
# Avoid being able to import wikipron by relative import.
# Test code by importing the *installed* wikipron in site-packages.
command: pytest -vv project/tests
build-python-3.7:
<<: *template
docker:
- image: python:3.7
build-python-3.8:
<<: *template
docker:
- image: python:3.8
command: |
sudo chown circleci:circleci /home
pytest -vv project/tests --junitxml /tmp/testxml/report.xml
- store_test_results:
path: /tmp/testxml/

workflows:
version: 2
build-and-test:
jobs:
- pre-build:
name: flake8
command-name: flake8
command-run: flake8 project/setup.py project/wikipron project/tests
- pre-build:
name: black
command-name: black
command-run: black --line-length=79 --check project/setup.py project/wikipron project/tests project/data
- pre-build:
name: mypy
command-name: mypy
command-run: mypy --ignore-missing-imports project/wikipron project/tests project/data
- build-python:
requires:
- flake8
- black
- mypy
matrix:
parameters:
python-version: ["3.6", "3.7", "3.8", "3.9"]

5 changes: 4 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -5,4 +5,7 @@ __pycache__/
*.egg-info/
*.log
**/tars
**/freq_tsvs
**/freq_tsvs
env/

.idea/
172 changes: 147 additions & 25 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,56 +10,178 @@ Versioning](http://semver.org/spec/v2.0.0.html).
Unreleased
----------

### Added
- Adds two Vietnamese dialects to `languages.json`. (\#139)
- Adds whitelisting capabilities to `postprocess`. (\#152)
- Adds whitelists for Dutch, English, Greek, Latin, Korean, and Spanish.
### Under `data/`

#### Added

- Added Afrikaans whitelists, filtered TSV file, rescraped phonemic and phonetic TSV files. (\#311)
- Added German whitelists and filtered TSV file. (\#285)
- Added whitelisting capabilities to `postprocess`. (\#152)
- Added whitelists for Dutch, English, Greek, Latin, Korean, and Spanish.
(\#158, etc.)
- Improves printing in the README table. (\#145)
- Renames data directory `data`. (\#147)
- Logged dialect configuration if specified. (\#133)
- Handled additional language codes. (\#132, \#148)
- Added `--no-skip-spaces-word` and `--no-skip-spaces-pron` flag. (\#135)
- Added typing to big scrape code. (\#140)
- Added argparse to allow limiting 'big scrape' to individual languages
with `--restriction` flag. (\#154)
- Added Manchu (`mnc`). (\#185)
- Added Polabian (`pox`). (\#186)
- Added `aar`, `bdq`, `jje`, and `lsi`. (\#202)
- Added `tyv` to `languagecodes.py` (\#203, \#205)
- Added `bcl`, `egl`, `izh`, `ltg`, `azg`, `kir` and `mga` to `languagecodes.py`. (\#205)
- Added `nep` to `languagecodes.py`. (\#206)
- Added Ingrian (`izh`). (\#215)
- Added French phoneme list and filtered TSV file. (\#213, \#217)
- Added Corsican (`cos`). (\#222)
- Added Middle Korean (`okm`). (\#223)
- Added Middle Irish (`mga`). (\#224)
- Added Old Portuguese (`opt`). (\#225)
- Added Serbo-Croatian phoneme list and filtered TSV files. (\#227)
- Added Tuvan (`tyv`). (\#228)
- Added Shan (`shn`) with custom extraction. (\#229)
- Added Northern Kurdish (`kmr`). (\#243)
- Added a script to facilitate the creation of a `.phones` file. (\#246)
- Added IPA validity checks for phonemes. (\#248)
- Split multiple pronunciations joined by tilde in `eng_us_phonetic`.
- Added Italian phoneme list and filtered TSV file. (\#260, \#261)
- Added Adyghe phone list and filtered TSV file. (\#262, \#263)
- Added Bulgarian phoneme list and filtered TSV file. (\#264, \#267)
- Added Icelandic phoneme list and filtered TSV file. (\#269, \#270)
- Added Slovenian phoneme list and filtered TSV file. (\#271, \#273)
- Added normalization to `list_phones.py`. Corrected errors relating to
`ipapy` (\#275)
- Added Welsh .phones lists and filtered TSV files. (\#274, \#276)
- Added draft of covering grammar script. (\#297)
- Updated `data/phones/README.md` with instructions to re-scrape. (\#279, \#281)
- Added Vietnamese `.phones` files and re-scraped and filtered `.tsv` files.
(\#278, \#283)
- Added Hindi `.phones` files and the re-scraped and filtered `.tsv` files.
(\#282, \#284)
- Added Old Frisian (`ofs`). (\#294)
- Added Dungan (`dng`). (\#293)
- Added Latgalian (`ltg`). (\#296)
- Added draft of covering grammar script. (\#297)
- Added Portuguese `.phones` files and re-scraped data. (\#290, \#304)
- Added Japanese `.phones` files and re-scraped data. (\#230, \#307)
- Added Moksha (`mdf`). (\#295)
- Added Azerbaijani `.phones` files and re-scraped data. (\#306, \#312)
- Added Turkish `.phones` file and re-scraped data. (\#313, \#314)
- Added Maltese `.phones` file and re-scraped data. (\#317, \#318)
- Added Latvian `.phones` file and re-scraped data. (\#321, \#322)
- Added Khmer `.phones` file and re-scraped data. (\#324, \#327)
- Added Østnorsk (Bokmål) `.phones` file and re-scraped data. (\#324, \#327)
- Several languages added to `languagecodes.py`. (\#334)

#### Changed

- Edited the arm_e_phonetic.phones and arm_w_phonetic.phones files. (\#298)
- Improved printing in the README table. (\#145)
- Renamed data directory `data`. (\#147)
- Split `may` into Latin and Arabic files. (\#164)
- Split `pan` into Gurmukhi and Shahmukhī. (\#169)
- Split `uig` into Perso-Arabic and Cyrillic. (\#173)
- Alowing ASCII apostrophes (0x27) in spellings. (\#172).
- Only allowing Latin spellings in Maltese lexicon. (\#166).
- Only allowed Latin spellings in Maltese lexicon. (\#166).
- Split `mon` into Cyrillic and Mongol Bichig (\#179).
- Added Vietnamese extraction function. (\#181)

### Deprecated
### Removed
### Fixed
### Security
- Merged whitelist.py into 'big scrape' script. src scrape.py now checks for
existence of whitelist file during scrape to create second filtered TSV.
New TSV placed under `tsv/\*\_filtered.tsv`. (\#154).
- Updated `generate_summary.py` to reflect presence of 'filtered' tsv. (\#154)
- Imperial Aramaic (`arc`) split into three scripts properly. (\#187)
- Flattened data directory structure. (\#194)
- Updated Georgian (`geo`) to take advantage of upstream bot-based
consistency fixes. (\#138)
- Split `arm` into Eastern and Western dialects. (\#197)
- Rescraped files with new whitelists. (\#199)
- Updated logging statements for consistency. (\#196)
- Renamed `.whitelist` file extension name as `.phones`. (\#207)
- Split `ban` into Latin and Balinese scripts. (\#214)
- Split `kir` into Cyrillic and Arabic. (\#216)
- Split Latin (`lat`) into its dialects. (\#233)
- Added MyPy coverage for `wikipron`, `tests` and `data` directories. (\#247)
- Modified paths in `codes.py`, `scrape.py`, and `split.py`. (\#251, \#256)
- Modified config flags in `languages.json` and `scrape.py`. (\#258)
- Edited Serbo-Croatian `.phones` file to list all vowel/pitch accent
combinations. Re-scraped Serbo-Croatian data. (\#288)
- Moved `list_phones.py` to parent directory. (\#265, \#266)
- Moved `list_phones.py` to `src` directory. (\#297)
- Frequencies code no longer overwrites TSV files. (\#320)
- Updated `data/phones/README.md` to specify that `.phones` files should be
in NFC normalization form. (\#333)
- Kurdish (`kur`) and Opata (`opt`) removed from `languages.json`. (\#334)

#### Fixed

- Fixed path issue with phonetic whitelisted files. (\#195)

### Under `wikipron/` and Elsewhere

#### Added

- Added positive flags for stress, syllable boundaries, tones, segment to `cli.py`. (\#141)
- Added positive flags for space skipping to `cli.py`. (\#257)
- Added two Vietnamese dialects to `languages.json`. (\#139)
- Handled additional language codes. (\#132, \#148)
- Added `--no-skip-spaces-word` and `--no-skip-spaces-pron` flag. (\#135)
- Allowed ASCII apostrophes (0x27) in spellings. (\#172).
- Added Vietnamese extraction function. (\#181).
- Modified pron selector in Latin extraction function. (\#183).
- Added `--no-tone` flag. (\#188)
- Customized extractor and new scraped prons for `khb`. (\#219)
- Added `tests/test_data` directory containing two tests. (\#226, \#251)
- Added HTTP User-Agent header to API calls to Wiktionary. (\#234)
- Added support for python 3.9 (\#240)
- Added black style formatting to `.circleci/config.yml`. (\#242)
- Added logging for scraping a language with `--dialect` specified
that requires its custom extraction logic. (\#245)
- Improved CircleCI workflow with orbs. (\#249)
- Added `test_split.py` to `tests/test_data`. (\#256)
- Handled Cantonese for scraping. (\#277)
- Added exclusion for reconstructions. (\#302)
- Added Vietnamese contour tone grouping test in `tests/test_config.py` (\#308)

#### Changed

- Renamed arguments to positive statements in `wikipron/config.py` and edited `_get_process_pron` function accordingly. (\#141, \#257)
- Changed testing values used in `tests/test_config.py` in order to accomodate the addition of positive flags. (\#141)
- Specified UTF-8 encoding in handling text files. (\#221)
- Moved previous contents of `tests` into `tests/test_wikipron` (\#226)
- Updated the packages version numbers in requirements.txt to their latest according to PyPI (\#239)
- Updated the default pron selector to also look for IPA strings under paragraphs in addition to list items. (\#295)
- Updated segments package version to 2.2.0 (\#308)

#### Deprecated

#### Removed

- Moved Wiktionary querying functions from `test_languagecodes.py` to `codes.py` (\#205)

#### Fixed

#### Security

[1.1.0] - 2020-03-03
--------------------

### Added
#### Added

- Added the extraction function for Mandarin Chinese and its scraped data. (\#124)
- Integrated Wortschatz frequencies. (\#122)

### Changed
#### Changed

- Updated the Japanese extraction function and Japanese data. (\#129)
- Updated all scraped Wiktionary data and frequency data. (\#127, \#128)
- Generalized the splitting script in the big scrape. (\#123)
- Moved small file removal to `generate_summary.py`. (\#119)
- Updated Russian data. (\#115)

### Fixed
#### Fixed

- Avoided and logged error in case of pron processing failure. (\#130)

[1.0.0] - 2019-11-29
----------------------

### Added
#### Added

- Handled Japanese. (\#109, \#114)
- Handled Latin, for which the actual graphemes cannot be the Wiktionary
Expand All @@ -73,21 +195,21 @@ Unreleased
- Resolved Wiktionary language names for languages with at least 100
pronunciation entries. (\#52, \#55)

### Changed
#### Changed

- Removed duplicate <word, pronunciation> pairs in the persisted data. (\#85, \#111, \#116)
- Split Welsh into Northern Wales and Southern dialects in the persisted data. (\#110)
- Factored out casefolding. (\#102)
- Split Serbo-Croatian into Cyrillic and Latin TSVs. (\#96)
- Generalized word and pronunciation extraction. (\#88)

### Removed
#### Removed

- Removed the timeout in smoke tests. (\#107)
- Removed the `output` option. (\#82)
- Removed the `require_dialect_label` option. (\#77)

### Fixed
#### Fixed

- Skipped pronunciations with a dash. (\#106)
- Skipped empty pronunciations in scraping. (\#59)
Expand All @@ -97,15 +219,15 @@ Unreleased
`title="wikipedia:<language> phonology"` to cover previously unhandled
languages (e.g., Estonian and Slovak). (\#49)

### Security
#### Security

- Avoided using `exec` to retrieve the version string. Used `pkg_resources`
instead. (\#63)

[0.1.1] - 2019-08-14
----------------------

### Fixed
#### Fixed

- Fixed import bug. (\#45)

Expand Down
Loading

0 comments on commit f5c05d0

Please sign in to comment.