-
Notifications
You must be signed in to change notification settings - Fork 71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fall scrape #204
Fall scrape #204
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! I believe this is our fourth 'big scrape'. It appears as though they've removed all phonemic entries for Macedonian and Ukrainian. I'll have to look into how this 'Ghuoto' entry got into languages.json
. Looks as though it has been there for a while and definitely shouldn't be. Hopefully my long awaited revamp of codes.py
will fix things like that!
* Update to Latin pron selector (#183) * minor change to latin extraction function, rescraped Latin * potential fix to lat scraping issue * raw scrape of latin * postprocessing of new latin data * updated changelog, fixed line length error * rescrape of latin * postprocessing of updated latin data * [pox] Scraped Polabian. (#186) * [pox] Scraped Polabian. Note: The ISO 639-3 code is `pox`, the older ISO 639-2 code is `sla`. * Updated CHANGELOG. * [mnc] Scraped Manchu. (#185) * [mnc] Scraped Manchu. * Updated CHANGELOG. Co-authored-by: Kyle Gorman <kylebgorman@gmail.com> * Merged Whitelist functionality with src/scrape.py. Now checks for pre… (#184) * Merged Whitelist functionality with src/scrape.py. Now checks for presence of whitelist and writes separate tsv as {original file name}_filtered.tsv. Update generate_summary to reflect if file is filtered through a whitelist. CHANGELOG and README update accordingly. * Style tweaks and cleanup. * Updated generalized_split and postprocess to reflect automatic whitelist processing in scrape. Fixed dialect issue in generate_summary. * Previous edits didn't cary. * Cleanup typo mistakes. Added error handling to scrape.py. * Style clean-up. * Fixed style issues. Co-authored-by: Kyle Gorman <kylebgorman@gmail.com> * Imperial aramaic (#187) * [arc] Listing the correct scripts for Imperial Aramaic: 1. The original Aramaic script (`armi`). 2. The square script as in Biblical Aramaic (`hebr`). 3. Classical Syriac/Assyrian Neo-Aramaic (`syrc`) descended from (1). This correctly assigns the entries to their respective lexicons. Most of pronunciations are available for (2), with very minor number of entries for (1) and (3). * [arc] Listing the correct scripts for Imperial Aramaic (continuing the previous commit which was partial): 1. The original Aramaic script (`armi`). 2. The square script as in Biblical Aramaic (`hebr`). 3. Classical Syriac/Assyrian Neo-Aramaic (`syrc`) descended from (1). This correctly assigns the entries to their respective lexicons. Most of pronunciations are available for (2), with very minor number of entries for (1) and (3). * Updated CHANGELOG with #186. * Add --no-tone flag (#188) * tentative solution for tone removal * updates changelog, ran white on test_config.py * remove print statement from test_config.py * partial replace of codepoints with chars, adds nfd/nfc conversion * reworks import statements * updates _TONES_REGEX * ran white on config.py * updates to conversions and adds comments * fixes to scrape.py comment length * converted test_config.py no_tone tests to nfd strings * modifies no_tone process not to skip removing superscript parentheses around non-tone superscript chars * Rename (#192) * [geo] Rescrape post-bot. Closes #138. * Add changelog * Rename. * Update CHANGELOG * Revert "[geo] Rescrape post-bot." This reverts commit 4a151b13e0e03e7a4aecb7dad29c1de9c2230f10. * Flattens directory structure for data. (#194) * Flattens directory structure for data. The non-wiki data is moved to the new `wikipron-extras` (https://github.com/kylebgorman/wikipron-extras) repository. Closes #193. * Add PR number to changelog. * "Imperial" * [geo] Rescrape post-bot. (#191) * [geo] Rescrape post-bot. Closes #138. * Add changelog * Update changelog * [geo] Add whitelist and re-scrape. * Renames for merge. * Add link to guidelines * [hun] Adds whitelist. * Simplify postprocess * Enforces consistent style in logging using %r. (#196) * Enforces consistent style in logging using %r. * Updates CHANELOG * Fixes a double-quoted logging var. * Filtering (#199) * [rum] Add whitelist and rescrape. * [eng] Adds English rescrape. * [dut] Adds Dutch rescrape. * [gre] Adds Greek rescrape. * [gre] Adds Greek rescrape. * Updates scrape path for phonetic filtering. Closes #195. * [rum] Adds Romanian rescrape. * [arm] Adds Armenian rescrape. * [gre] Adds Greek rescrape (second try). * [arm] Adds Armenian dialects + rescrapes. Closes #197. * Adds CHANGELOG changes. * [spa] Adds Spanish rescrape. * Postprocess and regenerate summaries. * [aar, bdq, jje, lsi] discovers new languages and scrapes them. (#202) * Added tyv to languagecodes.py (#203) * adds tuvan to languagecodes.py * updates changelog * Fall scrape (#204) * [aar, bdq, jje, lsi] discovers new languages and scrapes them. * Fall scrape. * Fuller bib information Fills out the bibliography entry for the WikiPron paper. * Updates to codes.py (#205) * updated languages.json and json files for translating between wikitionary code and iso code * updates codes.py and languagecodes.py * modifies test_languagecodes.py to reduce redundancy with codes.py * small formatting fixes * updates changelog * logging statement formatting * Update README.md Fixes formatting issue in table. Not sure why this had to be done manually... * ENH rename '.whitelist' as '.phones' (#207) * Uses %r everywhere in `data/src`. (#210) * Nepali support (#211) * Uses %r everywhere in `data/src`. * [nep] Adds Nepali data. Closes #209. * Update changelog * [fre] Adds phoneme list (#213) * Added French phonemic phones list. Added filter French phonemic tsv. * Added French phonemic phones. * Updated Changelog. * Added phones * [izh] Scrape and add Ingrian. (#215) * [izh] Scrape and add Ingrian. * Updated CHANGELOG. * [ban] Splitting Balinese into Latin and Balinese scripts. (#214) * [ban] Splitting Balinese into Latin and Balinese scripts. * Updated CHANGELOG. Co-authored-by: Kyle Gorman <kylebgorman@gmail.com> * [kir] Split Kyrgyz into Cyrillic and Arabic scripts. (#216) * [kir] Split Kyrgyz into Cyrillic and Arabic scripts. * Updated. * Added fre_phonemic_filtered.tsv (#217) * Added French phonemic phones list. Added filter French phonemic tsv. * Added French phonemic phones. * Updated Changelog. * Added phones * Added filtered phonemic wordlist * Refresh the database size counter. (#220) * [khb] Customized extractor and re-scraping of Lü. (#219) * [khb] Adding customized extractor for Lü. * [khb] Re-scraping and updating the data and summaries. * Updated CHANGELOG. * Reordered imports. * [khb] Adding scrape smoke test. * Resorted. * FIX specify UTF-8 in handling text files (#221) It looks like Windows users have encountered encodings -- they hit `UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 2882: character maps to <undefined>` when pip installing wikipron, the error triggered at setup.py. While we're at it, we specify UTF-8 encoding for all open() calls for text processing as well. Co-authored-by: jacksonllee <jacksonlunlee@gmail.com> * [mga] New scrape: Middle Irish. (#224) * [mga] New scrape: Middle Irish. * Updated CHANGELOG. * [cos] New scrape: Corsican. (#222) * [cos] Add Corsican to the language code registry. * [cos] Scraped Corsican and updated the language descriptions. * Updated CHANGELOG. Co-authored-by: Kyle Gorman <kylebgorman@gmail.com> * [okm] New scrape: Middle Korean (#223) * [okm] Adding ISO 639-3-only Middle Korean: Korean, Middle (10th–16th centuries). * [okm] New scrape of Middle Korean and update of indices and descriptions. * Updated CHANGELOG. * Fixed typo. Co-authored-by: Kyle Gorman <kylebgorman@gmail.com> * [opt] New scrape: Old Portuguese (aka Galician-Portuguese). (#225) * Adding Old Portuegese (aka Galician-Portuguese) codes. * [opt] New scrape. * [opt] Updated summaries. * Updated CHANGELOG. Co-authored-by: Kyle Gorman <kylebgorman@gmail.com> * Added Serbo-Croatian phonemes and filtered TSV files. (#227) * Added French phonemic phones list. Added filter French phonemic tsv. * Added French phonemic phones. * Updated Changelog. * Added phones * Added filtered phonemic wordlist * Added Serbo-Croatian phonemes and filtered TSV files. * Updated summaries for Serbo-Croatian phones. * Updated CHANGELOG. * Fixed formatting of Serbo-Croat phones file and CHANGELOG. Co-authored-by: Kyle Gorman <kylebgorman@gmail.com> * [shn] Custom extractor and new scrape for Shan (#229) * [shn] Adding customized extractor for Shan. * [shn] Adding smoke test. * [shn] New scrape for Shan. * [shn] Updated descriptions. * Updated CHANGELOG. * [tyv] New scrape: Tuvan (#228) * [tyv] Tuvan scrape. * [tyv] Updated descriptions. This also fixes a buggy previous merge of `okm`. * [tyv] Filtering Tuvan to use Cyrillic script only. * Updated CHANGELOG. Co-authored-by: Kyle Gorman <kylebgorman@gmail.com> * Reorganizes tests and adds a few initial tests for the data side (#226) * moves wikipron module tests into subdirectory * reformating of test_version.py * adds outline of test for data naming conventions, removes nonsense from src/scrape.py * basic framework for testing file creation involved in big scrape * renamed file naming test and added comments * reorganizes tests directory, adds test for generate_summary.py * fix formating in test_version.py * revises and renames file for testing scrape * fixes pathing issue in init * adds some typing to new tests * changes open statements to use proper encoding * potential solution to circleci module error * approaching a circleci import solution? * updates changelog * [hbs] Fix file naming. * Update README.md * Update README.md * [gre] Takes advantage of upstream consistency fix. Closes #198. * [lat] Split Latin into its dialects (#233) * ENH handle Latin dialects * RM remove unwanted Latin TSVs * ENH add Latin dialect TSVs * ENH postprocess Latin dialect TSVs * ENH update data summary readme * MAINT update changelog * MAINT update changelog * Add HTTP User-Agent header to API calls (#234) * add http headers for get requests * add http headers for get requests in tests/ * change wikipron/scrape.py code to avoid circular imports * updated requirements.txt to have the latest dependencies (#238) (#239) * Update requirements.txt * Update CHANGELOG.md * Update CHANGELOG.md * Added support for Python-3.9 (#236) (#240) * Update requirements.txt * Update CHANGELOG.md * Update CHANGELOG.md * Update config.yml * Update setup.py * Update CHANGELOG.md * Add black formatting (#242) * add black formatting (fix #237) * update changelog * [kmr] New scrape: Northern Kurdish (#243) * [kmr] Adding an entry for Northern Kurdish. * [kmr] Adding an ISO mapping for Northern Kurdish. * [kmr] Fresh scrape. * [kmr] Updated description and summaries. * [kmr] Updated CHANGELOG. * [kmr] Lower-cased version. * [kmr] Silly. Source should be lower-case. * Update CHANGELOG.md Minor style fixes to CHANGELOG * MAINT reorganize changelog (#244) * Add logging for dialect support for languages requiring custom extraction logic (#245) * ENH alert the use of custom logic when dialect is specific * MAINT update changelog * Add a script to facilitate the creation of .phones files (#246) * ENH add script to tally phones/phonemes in a TSV * DOC update readme for the .phones files * MAINT update changelog * DOC comments in list_phones.py * MAINT update changelog * DOC update docstrings and readme * Use mypy for type checking (#247) * ISSUE-241: Ignoring 'env' and '.idea' directories * ISSUE-241: Added 'mypy' to 'requirements.txt' * ISSUE-241: Added 'Type checking' step to CircleCI * ISSUE-241: Fixed mypy issues * ISSUE-241: Updated documentation * ISSUE-241: Added mypy to the correct 'requirements.txt' * ISSUE-241: Ran Black formatter Also updated the contribution guidelines to include this as a step * ISSUE-241: Markups ISSUE-241: Markup - Alphabetised 'requirements.txt' ISSUE-241: Markup - Log invalid page title ISSUE-241: Markup - Alphabetised 'test_scrape.py' imports ISSUE-241: Markup - Added explanatory comment ISSUE-241: Markup - Improved 'config_dict' typing ISSUE-241: Markup - Improved 'scrape.py' typing * ISSUE-241: Markup - Using logger interpolation * ISSUE-241: Markups * ISSUE-241: Markup - Added working dir to Circle CI config * split tildes; resort (#250) * split tildes; resort * update CHANGELOG.md * Improve CircleCI workflow with orbs (#249) * Convert to matrix CircleCI workflow * Fix typo in parameter * Add missing job name * Add CircleCI test storage * Add Python orb and caching * Fix orb command * Set Python deps install to global scope * Bump up Python orb version * Fix command nesting * Add package manager to orb command * Fix pyenv cache failure * Fix pyenv cache * Add workspace cache for pip packages * Fix username typo * Fix permission error * Test pre-built CircleCI Docker image * Test missing site-packages * Test missing Python dir * Add verbose pip list * Add pre test jobs * Fix parameter substitution in description * Fix extraneus run * Add parametrized flake8 and black jobs * Fix parameter passing * Fix unreferenced parameter * Fix pre-test Docker image tag * Show xml coverage * Add pre-test Python cache * Create tsv directory * Chown /home to circleci * Fix store_results path * Rename pre-test jobs * Improve CircleCI configuration Add Python orb, matrix jobs and rework workflow structure * Improve CircleCI configuration Add Python orb, matrix jobs and rework workflow structure * Bump up pre-build Python version to 3.9 * Add mypy to pre-build jobs * Add mypy to build required jobs * Change pip3 to pip * Add PR to CHANGELOG.md * Disable circleci user chowning /home * Revert "Disable circleci user chowning /home" This reverts commit eed32d6f3ab9c2094a642cc23967c536ad5bddb5. * Disable pyenv creation * Revert "Disable pyenv creation" This reverts commit 68297c21c1c2f4dc67e2bc9bd7972adbeea3878b. * Disable pyenv creation * Test pip cache renewal * Revert "Test pip cache renewal" This reverts commit b4772307ded407da0fedfc4320b3594f66d366fa. Cache works as intended, references https://github.com/kylebgorman/wikipron/pull/249#discussion_r511582495. Co-authored-by: Jackson L. Lee <jacksonlunlee@gmail.com> * Small path changes on the data side, rework of test_scrape.py (#251) * rework some paths on data side, simplify test_scrape.py * revert changes to test_summary.py * updates changelog * Adding a sanity check for valid IPA (#248) * Check that the phones/phonemes are valid IPA. * Only print the bad characters. * Updated CHANGELOG. * Reformatted the file using black. * Reran black with line length limit. * Phonemes, rather than phones. * Sorted the packages alphabetically. * Re-arranged imports. * Moved ipapy into data-specific requirements file. * Adding dependency on absl-py (for logging) and factoring out the phoneme checking functionality into its own function. * Added a link to IPA chart. * Removed absl-py. * Use internal logger. * Check the logging level. * Moving to global logger. Thanks Kyle! * reformatted. * Cosmetic: fixed warning message. Co-authored-by: Kyle Gorman <kylebgorman@gmail.com> * Style fixes for list_phones (#254) * Style fixes for list_phones * Ran black formatter. * Remove `<5`. * Negative flags are renamed to positive statements (#141) (#255) * Negative flags in cli.py are renamed to positive statements. In order to accomodate this change, Wikipron/config.py and tests/test_wikipron/test_config.py are also edited accordingly. * positive flags are added and negative flags are renamed to positive ones. * changelog is updated. * style edit * fix fix redundancy Co-authored-by: unknown <Yeonju@NYCMAXASIKKAW10.ad.insidemedia.net> * Clean up flag help and eliminate remaining double negatives (#257) * Work on flags: 1. Flag help should be short, because people don't read it very carefully and it's not formatted for multi-sentence input. This shortens all the flags to a single, consistent name. Because dialect and segmentation require more information, these details have been moved into a prominent position in the README instead. 2. The tone and space flags are given negative versions, cf. what Yeonju did earlier. * Eliminates double-negative in skip-spaces. * Updates changelog. * Updates tests, config, core. * Fixed missing test_scrape change. * Adds test for TSV splitting (#256) * fixes to split.py and postprocess before adding tests * cleanup of test_split * updated a few comments in test_split * revert needless changes to postprocess and split * minor comment update in test_split * updates changelog * Updates data side to use new flags (#258) * quick fix to small oversight in test_extract.py * data side uses new flags * updates changelog, removes config_factory from text_extract.py * [ita] Adds phoneme list, filtered phonemic TSV file (#261) * Added French phonemic phones list. Added filter French phonemic tsv. * Added French phonemic phones. * Updated Changelog. * Added phones * Added filtered phonemic wordlist * Added Serbo-Croatian phonemes and filtered TSV files. * Updated summaries for Serbo-Croatian phones. * Updated CHANGELOG. * Fixed formatting of Serbo-Croat phones file and CHANGELOG. * Updated fork to match upstream. * Updated fork to match upstream * Delete .DS_Store I don't know where this file came from... * Delete .DS_Store * Delete hbs_phonemic_phones.txt * Delete .DS_Store * [ita] Adds phoneme list, filtered phonemic TSV file * Updates CHANGELOG * Adds updated README and language summary * Updates CHANGELOG with issue number for Italian phone list Co-authored-by: Kyle Gorman <kylebgorman@gmail.com> * [ady] Adds phone list, filtered Adyghe data. (#263) * Added French phonemic phones list. Added filter French phonemic tsv. * Added French phonemic phones. * Updated Changelog. * Added phones * Added filtered phonemic wordlist * Added Serbo-Croatian phonemes and filtered TSV files. * Updated summaries for Serbo-Croatian phones. * Updated CHANGELOG. * Fixed formatting of Serbo-Croat phones file and CHANGELOG. * Updated fork to match upstream. * Updated fork to match upstream * Delete .DS_Store I don't know where this file came from... * Delete .DS_Store * Delete hbs_phonemic_phones.txt * Delete .DS_Store * [ita] Adds phoneme list, filtered phonemic TSV file * Updates CHANGELOG * Adds updated README and language summary * Updates CHANGELOG with issue number for Italian phone list * Adds Adyghe phones, filtered Adyghe data * Updated CHANGELOG Co-authored-by: Kyle Gorman <kylebgorman@gmail.com> * Moves `list_phones.py`. (#266) * Moves `list_phones.py`. Closes #265. * Add changelog * [bul] Adds phone list, filtered Bulgarian data (#267) * Added French phonemic phones list. Added filter French phonemic tsv. * Added French phonemic phones. * Updated Changelog. * Added phones * Added filtered phonemic wordlist * Added Serbo-Croatian phonemes and filtered TSV files. * Updated summaries for Serbo-Croatian phones. * Updated CHANGELOG. * Fixed formatting of Serbo-Croat phones file and CHANGELOG. * Updated fork to match upstream. * Updated fork to match upstream * Delete .DS_Store I don't know where this file came from... * Delete .DS_Store * Delete hbs_phonemic_phones.txt * Delete .DS_Store * [ita] Adds phoneme list, filtered phonemic TSV file * Updates CHANGELOG * Adds updated README and language summary * Updates CHANGELOG with issue number for Italian phone list * Adds Adyghe phones, filtered Adyghe data * Updated CHANGELOG * Adds Bulgarian phone list, filtered Bulgarian data * Postprocesses with filtered Bulgarian data * Updates changelog Co-authored-by: Kyle Gorman <kylebgorman@gmail.com> * Adds Icelandic phone list, filtered Icelandic data. (#270) * Added French phonemic phones list. Added filter French phonemic tsv. * Added French phonemic phones. * Updated Changelog. * Added phones * Added filtered phonemic wordlist * Added Serbo-Croatian phonemes and filtered TSV files. * Updated summaries for Serbo-Croatian phones. * Updated CHANGELOG. * Fixed formatting of Serbo-Croat phones file and CHANGELOG. * Updated fork to match upstream. * Updated fork to match upstream * Delete .DS_Store I don't know where this file came from... * Delete .DS_Store * Delete hbs_phonemic_phones.txt * Delete .DS_Store * [ita] Adds phoneme list, filtered phonemic TSV file * Updates CHANGELOG * Adds updated README and language summary * Updates CHANGELOG with issue number for Italian phone list * Adds Adyghe phones, filtered Adyghe data * Updated CHANGELOG * Adds Bulgarian phone list, filtered Bulgarian data * Postprocesses with filtered Bulgarian data * Updates changelog * Adds Icelandic phones, filtered TSV file * Updates changelog Co-authored-by: Kyle Gorman <kylebgorman@gmail.com> * [slv] Adds Slovenian phoneme list, filtered TSV data. (#273) * Added French phonemic phones list. Added filter French phonemic tsv. * Added French phonemic phones. * Updated Changelog. * Added phones * Added filtered phonemic wordlist * Added Serbo-Croatian phonemes and filtered TSV files. * Updated summaries for Serbo-Croatian phones. * Updated CHANGELOG. * Fixed formatting of Serbo-Croat phones file and CHANGELOG. * Updated fork to match upstream. * Updated fork to match upstream * Delete .DS_Store I don't know where this file came from... * Delete .DS_Store * Delete hbs_phonemic_phones.txt * Delete .DS_Store * [ita] Adds phoneme list, filtered phonemic TSV file * Updates CHANGELOG * Adds updated README and language summary * Updates CHANGELOG with issue number for Italian phone list * Adds Adyghe phones, filtered Adyghe data * Updated CHANGELOG * Adds Bulgarian phone list, filtered Bulgarian data * Postprocesses with filtered Bulgarian data * Updates changelog * Adds Icelandic phones, filtered TSV file * Updates changelog * Adds Slovenian phones, filtered Slovenian data * Updates changelog Co-authored-by: Kyle Gorman <kylebgorman@gmail.com> * Adds normalization to `list_phones.py`, corrects bugs relating to `ipapy` (#275) * Added French phonemic phones list. Added filter French phonemic tsv. * Added French phonemic phones. * Updated Changelog. * Added phones * Added filtered phonemic wordlist * Added Serbo-Croatian phonemes and filtered TSV files. * Updated summaries for Serbo-Croatian phones. * Updated CHANGELOG. * Fixed formatting of Serbo-Croat phones file and CHANGELOG. * Updated fork to match upstream. * Updated fork to match upstream * Delete .DS_Store I don't know where this file came from... * Delete .DS_Store * Delete hbs_phonemic_phones.txt * Delete .DS_Store * [ita] Adds phoneme list, filtered phonemic TSV file * Updates CHANGELOG * Adds updated README and language summary * Updates CHANGELOG with issue number for Italian phone list * Adds Adyghe phones, filtered Adyghe data * Updated CHANGELOG * Adds Bulgarian phone list, filtered Bulgarian data * Postprocesses with filtered Bulgarian data * Updates changelog * Adds Icelandic phones, filtered TSV file * Updates changelog * Adds Slovenian phones, filtered Slovenian data * Updates changelog * Add normalization to list_phones.py * Updates changelog * Reformats list_phones.py Co-authored-by: Kyle Gorman <kylebgorman@gmail.com> * Adds Welsh .phones lists, filtered TSV data (#276) * Added French phonemic phones list. Added filter French phonemic tsv. * Added French phonemic phones. * Updated Changelog. * Added phones * Added filtered phonemic wordlist * Added Serbo-Croatian phonemes and filtered TSV files. * Updated summaries for Serbo-Croatian phones. * Updated CHANGELOG. * Fixed formatting of Serbo-Croat phones file and CHANGELOG. * Updated fork to match upstream. * Updated fork to match upstream * Delete .DS_Store I don't know where this file came from... * Delete .DS_Store * Delete hbs_phonemic_phones.txt * Delete .DS_Store * [ita] Adds phoneme list, filtered phonemic TSV file * Updates CHANGELOG * Adds updated README and language summary * Updates CHANGELOG with issue number for Italian phone list * Adds Adyghe phones, filtered Adyghe data * Updated CHANGELOG * Adds Bulgarian phone list, filtered Bulgarian data * Postprocesses with filtered Bulgarian data * Updates changelog * Adds Icelandic phones, filtered TSV file * Updates changelog * Adds Slovenian phones, filtered Slovenian data * Updates changelog * Add normalization to list_phones.py * Updates changelog * Reformats list_phones.py * Adds Welsh phoneme lists, filtered Welsh TSV data * Updates changelog Co-authored-by: Kyle Gorman <kylebgorman@gmail.com> * [yue] Handle Cantonese for scraping (#277) * ENH handle Cantonese for scraping * MAINT update changelog * DOC explain Cantonese pron XPath template * Updates `data/phones/README.md` with instructions to re-scrape (#281) * Added French phonemic phones list. Added filter French phonemic tsv. * Added French phonemic phones. * Updated Changelog. * Added phones * Added filtered phonemic wordlist * Added Serbo-Croatian phonemes and filtered TSV files. * Updated summaries for Serbo-Croatian phones. * Updated CHANGELOG. * Fixed formatting of Serbo-Croat phones file and CHANGELOG. * Updated fork to match upstream. * Updated fork to match upstream * Delete .DS_Store I don't know where this file came from... * Delete .DS_Store * Delete hbs_phonemic_phones.txt * Delete .DS_Store * [ita] Adds phoneme list, filtered phonemic TSV file * Updates CHANGELOG * Adds updated README and language summary * Updates CHANGELOG with issue number for Italian phone list * Adds Adyghe phones, filtered Adyghe data * Updated CHANGELOG * Adds Bulgarian phone list, filtered Bulgarian data * Postprocesses with filtered Bulgarian data * Updates changelog * Adds Icelandic phones, filtered TSV file * Updates changelog * Adds Slovenian phones, filtered Slovenian data * Updates changelog * Add normalization to list_phones.py * Updates changelog * Reformats list_phones.py * Adds Welsh phoneme lists, filtered Welsh TSV data * Updates changelog * Updates with instructions to re-scrape * Updates changelog * Updates * Updates data/phones/README.md Co-authored-by: Kyle Gorman <kylebgorman@gmail.com> * [vie] Adds Vietnamese `.phones` files, `.tsv` files (#283) * Added French phonemic phones list. Added filter French phonemic tsv. * Added French phonemic phones. * Updated Changelog. * Added phones * Added filtered phonemic wordlist * Added Serbo-Croatian phonemes and filtered TSV files. * Updated summaries for Serbo-Croatian phones. * Updated CHANGELOG. * Fixed formatting of Serbo-Croat phones file and CHANGELOG. * Updated fork to match upstream. * Updated fork to match upstream * Delete .DS_Store I don't know where this file came from... * Delete .DS_Store * Delete hbs_phonemic_phones.txt * Delete .DS_Store * [ita] Adds phoneme list, filtered phonemic TSV file * Updates CHANGELOG * Adds updated README and language summary * Updates CHANGELOG with issue number for Italian phone list * Adds Adyghe phones, filtered Adyghe data * Updated CHANGELOG * Adds Bulgarian phone list, filtered Bulgarian data * Postprocesses with filtered Bulgarian data * Updates changelog * Adds Icelandic phones, filtered TSV file * Updates changelog * Adds Slovenian phones, filtered Slovenian data * Updates changelog * Add normalization to list_phones.py * Updates changelog * Reformats list_phones.py * Adds Welsh phoneme lists, filtered Welsh TSV data * Updates changelog * Updates with instructions to re-scrape * Updates changelog * Updates * Updates data/phones/README.md * Adds Vietnamese phones, Vietnamese TSV files * Updates changelog Co-authored-by: Kyle Gorman <kylebgorman@gmail.com> * [hin] Adds `phones` file, updated/new TSV files. (#284) * Added French phonemic phones list. Added filter French phonemic tsv. * Added French phonemic phones. * Updated Changelog. * Added phones * Added filtered phonemic wordlist * Added Serbo-Croatian phonemes and filtered TSV files. * Updated summaries for Serbo-Croatian phones. * Updated CHANGELOG. * Fixed formatting of Serbo-Croat phones file and CHANGELOG. * Updated fork to match upstream. * Updated fork to match upstream * Delete .DS_Store I don't know where this file came from... * Delete .DS_Store * Delete hbs_phonemic_phones.txt * Delete .DS_Store * [ita] Adds phoneme list, filtered phonemic TSV file * Updates CHANGELOG * Adds updated README and language summary * Updates CHANGELOG with issue number for Italian phone list * Adds Adyghe phones, filtered Adyghe data * Updated CHANGELOG * Adds Bulgarian phone list, filtered Bulgarian data * Postprocesses with filtered Bulgarian data * Updates changelog * Adds Icelandic phones, filtered TSV file * Updates changelog * Adds Slovenian phones, filtered Slovenian data * Updates changelog * Add normalization to list_phones.py * Updates changelog * Reformats list_phones.py * Adds Welsh phoneme lists, filtered Welsh TSV data * Updates changelog * Updates with instructions to re-scrape * Updates changelog * Updates * Updates data/phones/README.md * Adds Vietnamese phones, Vietnamese TSV files * Updates changelog * Adds Hindi file, new/updated TSV files * Updates changelog Co-authored-by: Kyle Gorman <kylebgorman@gmail.com> * [hbs] Fixes Serbo-Croatian phoneme lists. Re-scrapes data. (#288) * Added French phonemic phones list. Added filter French phonemic tsv. * Added French phonemic phones. * Updated Changelog. * Added phones * Added filtered phonemic wordlist * Added Serbo-Croatian phonemes and filtered TSV files. * Updated summaries for Serbo-Croatian phones. * Updated CHANGELOG. * Fixed formatting of Serbo-Croat phones file and CHANGELOG. * Updated fork to match upstream. * Updated fork to match upstream * Delete .DS_Store I don't know where this file came from... * Delete .DS_Store * Delete hbs_phonemic_phones.txt * Delete .DS_Store * [ita] Adds phoneme list, filtered phonemic TSV file * Updates CHANGELOG * Adds updated README and language summary * Updates CHANGELOG with issue number for Italian phone list * Adds Adyghe phones, filtered Adyghe data * Updated CHANGELOG * Adds Bulgarian phone list, filtered Bulgarian data * Postprocesses with filtered Bulgarian data * Updates changelog * Adds Icelandic phones, filtered TSV file * Updates changelog * Adds Slovenian phones, filtered Slovenian data * Updates changelog * Add normalization to list_phones.py * Updates changelog * Reformats list_phones.py * Adds Welsh phoneme lists, filtered Welsh TSV data * Updates changelog * Updates with instructions to re-scrape * Updates changelog * Updates * Updates data/phones/README.md * Adds Vietnamese phones, Vietnamese TSV files * Updates changelog * Adds Hindi file, new/updated TSV files * Updates changelog * Fixes Serbo-Croatian phones * Updates CHANGELOG * Revert "Adds Hindi file, new/updated TSV files" This reverts commit 964c3becdbf8a4ec35285ffbfbe9a419fad5123e. Co-authored-by: Kyle Gorman <kylebgorman@gmail.com> * [ofs] Scraped Old Frisian. (#294) * Add Old Frisian to the configuration. * Mark "ofs" as ISO639-3 language code. * Fixed language name. * Added phonemic pronunciations. * Updated. * [aar] Rescraped Afar. (#291) * [aar] Rescraped. * Updated. Co-authored-by: Kyle Gorman <kylebgorman@gmail.com> * [dng] Scraped Dungan. (#293) * [dng] Scraped Dungan. * Updated. Co-authored-by: Kyle Gorman <kylebgorman@gmail.com> * [bre] Rescraped Breton. (#292) * [bre] Rescraped. * Updated. Co-authored-by: Kyle Gorman <kylebgorman@gmail.com> * Covering grammar script (#297) * Adds covering grammar generator, for future QA work. Also moves `list_phones.py` to the src directory, which makes sense to me. * Changelog update. * Updates CHANGELOG with issue number. * Update README.md Fix syntax highlighting hints. * [ltg] Scraped Latgalian. (#296) * [ltg] Scraped Latgalian. * Forgot to include the actual data. * Updated. * Removes reconstructions (#302) * Adds covering grammar generator, for future QA work. Also moves `list_phones.py` to the src directory, which makes sense to me. * Changelog update. * Updates CHANGELOG with issue number. * Skips reconstructions during scraping. Then, rescrapes Latin to take advantage of this. * Adds number to changelog. * Updates CHANGELOG for >>> junk. * Rescrapes Armenian. (#303) Closes #301. * [por] Adds phones files, rescraped TSV files. (#304) * Added French phonemic phones list. Added filter French phonemic tsv. * Added French phonemic phones. * Updated Changelog. * Added phones * Added filtered phonemic wordlist * Added Serbo-Croatian phonemes and filtered TSV files. * Updated summaries for Serbo-Croatian phones. * Updated CHANGELOG. * Fixed formatting of Serbo-Croat phones file and CHANGELOG. * Updated fork to match upstream. * Updated fork to match upstream * Delete .DS_Store I don't know where this file came from... * Delete .DS_Store * Delete hbs_phonemic_phones.txt * Delete .DS_Store * [ita] Adds phoneme list, filtered phonemic TSV file * Updates CHANGELOG * Adds updated README and language summary * Updates CHANGELOG with issue number for Italian phone list * Adds Adyghe phones, filtered Adyghe data * Updated CHANGELOG * Adds Bulgarian phone list, filtered Bulgarian data * Postprocesses with filtered Bulgarian data * Updates changelog * Adds Icelandic phones, filtered TSV file * Updates changelog * Adds Slovenian phones, filtered Slovenian data * Updates changelog * Add normalization to list_phones.py * Updates changelog * Reformats list_phones.py * Adds Welsh phoneme lists, filtered Welsh TSV data * Updates changelog * Updates with instructions to re-scrape * Updates changelog * Updates * Updates data/phones/README.md * Adds Vietnamese phones, Vietnamese TSV files * Updates changelog * Adds Hindi file, new/updated TSV files * Updates changelog * Fixes Serbo-Croatian phones * Updates CHANGELOG * Revert "Adds Hindi file, new/updated TSV files" This reverts commit 964c3becdbf8a4ec35285ffbfbe9a419fad5123e. * Adds Portuguese .phones files, re-scraped TSV data * Rescrapes Portuguese data * Updates changelog Co-authored-by: Kyle Gorman <kylebgorman@gmail.com> * [bur] Adds Burmese phone list, re-scraped Burmese data. (#305) * Added French phonemic phones list. Added filter French phonemic tsv. * Added French phonemic phones. * Updated Changelog. * Added phones * Added filtered phonemic wordlist * Added Serbo-Croatian phonemes and filtered TSV files. * Updated summaries for Serbo-Croatian phones. * Updated CHANGELOG. * Fixed formatting of Serbo-Croat phones file and CHANGELOG. * Updated fork to match upstream. * Updated fork to match upstream * Delete .DS_Store I don't know where this file came from... * Delete .DS_Store * Delete hbs_phonemic_phones.txt * Delete .DS_Store * [ita] Adds phoneme list, filtered phonemic TSV file * Updates CHANGELOG * Adds updated README and language summary * Updates CHANGELOG with issue number for Italian phone list * Adds Adyghe phones, filtered Adyghe data * Updated CHANGELOG * Adds Bulgarian phone list, filtered Bulgarian data * Postprocesses with filtered Bulgarian data * Updates changelog * Adds Icelandic phones, filtered TSV file * Updates changelog * Adds Slovenian phones, filtered Slovenian data * Updates changelog * Add normalization to list_phones.py * Updates changelog * Reformats list_phones.py * Adds Welsh phoneme lists, filtered Welsh TSV data * Updates changelog * Updates with instructions to re-scrape * Updates changelog * Updates * Updates data/phones/README.md * Adds Vietnamese phones, Vietnamese TSV files * Updates changelog * Adds Hindi file, new/updated TSV files * Updates changelog * Fixes Serbo-Croatian phones * Updates CHANGELOG * Revert "Adds Hindi file, new/updated TSV files" This reverts commit 964c3becdbf8a4ec35285ffbfbe9a419fad5123e. * Adds Portuguese .phones files, re-scraped TSV data * Rescrapes Portuguese data * Updates changelog * Adds Burmese phones, updated Burmese data * Updates changelog Co-authored-by: Kyle Gorman <kylebgorman@gmail.com> * [mdf] Scrape Moksha + slightly more flexible default pron selector. (#295) * Support some Moksha pronunciations that reside under "p", rather than "li". * Scrape. * Attempt to fix the test. * Updated. * Split the PR into two items. * [jpn] Adds Japanese .phones file and updated TSV files (#307) * Added French phonemic phones list. Added filter French phonemic tsv. * Added French phonemic phones. * Updated Changelog. * Added phones * Added filtered phonemic wordlist * Added Serbo-Croatian phonemes and filtered TSV files. * Updated summaries for Serbo-Croatian phones. * Updated CHANGELOG. * Fixed formatting of Serbo-Croat phones file and CHANGELOG. * Updated fork to match upstream. * Updated fork to match upstream * Delete .DS_Store I don't know where this file came from... * Delete .DS_Store * Delete hbs_phonemic_phones.txt * Delete .DS_Store * [ita] Adds phoneme list, filtered phonemic TSV file * Updates CHANGELOG * Adds updated README and language summary * Updates CHANGELOG with issue number for Italian phone list * Adds Adyghe phones, filtered Adyghe data * Updated CHANGELOG * Adds Bulgarian phone list, filtered Bulgarian data * Postprocesses with filtered Bulgarian data * Updates changelog * Adds Icelandic phones, filtered TSV file * Updates changelog * Adds Slovenian phones, filtered Slovenian data * Updates changelog * Add normalization to list_phones.py * Updates changelog * Reformats list_phones.py * Adds Welsh phoneme lists, filtered Welsh TSV data * Updates changelog * Updates with instructions to re-scrape * Updates changelog * Updates * Updates data/phones/README.md * Adds Vietnamese phones, Vietnamese TSV files * Updates changelog * Adds Hindi file, new/updated TSV files * Updates changelog * Fixes Serbo-Croatian phones * Updates CHANGELOG * Revert "Adds Hindi file, new/updated TSV files" This reverts commit 964c3becdbf8a4ec35285ffbfbe9a419fad5123e. * Adds Portuguese .phones files, re-scraped TSV data * Rescrapes Portuguese data * Updates changelog * Adds Burmese phones, updated Burmese data * Updates changelog * Adds Japanese phone list. Rescrapes Japanese data * Updates changelog * Removes data/tsv/jpn_hira_phonemic.tsv Co-authored-by: Kyle Gorman <kylebgorman@gmail.com> * Updates segments version (#308) * updates segments version and adds test for vietnamese tones * updates changelog * [ger] Adds German Phone list, filtered TSV file (#309) * Create German Phonelist * Updated CHANGELOG.md * incorporate updates in README.md, and added missing ger_phone* files * Adds some whitespace to German phone list comments. (#310) * [aze] Adds Azerbaijani phone lists and updated TSV data (#312) * Added French phonemic phones list. Added filter French phonemic tsv. * Added French phonemic phones. * Updated Changelog. * Added phones * Added filtered phonemic wordlist * Added Serbo-Croatian phonemes and filtered TSV files. * Updated summaries for Serbo-Croatian phones. * Updated CHANGELOG. * Fixed formatting of Serbo-Croat phones file and CHANGELOG. * Updated fork to match upstream. * Updated fork to match upstream * Delete .DS_Store I don't know where this file came from... * Delete .DS_Store * Delete hbs_phonemic_phones.txt * Delete .DS_Store * [ita] Adds phoneme list, filtered phonemic TSV file * Updates CHANGELOG * Adds updated README and language summary * Updates CHANGELOG with issue number for Italian phone list * Adds Adyghe phones, filtered Adyghe data * Updated CHANGELOG * Adds Bulgarian phone list, filtered Bulgarian data * Postprocesses with filtered Bulgarian data * Updates changelog * Adds Icelandic phones, filtered TSV file * Updates changelog * Adds Slovenian phones, filtered Slovenian data * Updates changelog * Add normalization to list_phones.py * Updates changelog * Reformats list_phones.py * Adds Welsh phoneme lists, filtered Welsh TSV data * Updates changelog * Updates with instructions to re-scrape * Updates changelog * Updates * Updates data/phones/README.md * Adds Vietnamese phones, Vietnamese TSV files * Updates changelog * Adds Hindi file, new/updated TSV files * Updates changelog * Fixes Serbo-Croatian phones * Updates CHANGELOG * Revert "Adds Hindi file, new/updated TSV files" This reverts commit 964c3becdbf8a4ec35285ffbfbe9a419fad5123e. * Adds Portuguese .phones files, re-scraped TSV data * Rescrapes Portuguese data * Updates changelog * Adds Burmese phones, updated Burmese data * Updates changelog * Adds Japanese phone list. Rescrapes Japanese data * Updates changelog * Removes data/tsv/jpn_hira_phonemic.tsv * Adds Azerbaijani phones, updated TSV data * Updates changelog Co-authored-by: Kyle Gorman <kylebgorman@gmail.com> * [tur] Adds Turkish phone list and updated TSV data (#314) * Added French phonemic phones list. Added filter French phonemic tsv. * Added French phonemic phones. * Updated Changelog. * Added phones * Added filtered phonemic wordlist * Added Serbo-Croatian phonemes and filtered TSV files. * Updated summaries for Serbo-Croatian phones. * Updated CHANGELOG. * Fixed formatting of Serbo-Croat phones file and CHANGELOG. * Updated fork to match upstream. * Updated fork to match upstream * Delete .DS_Store I don't know where this file came from... * Delete .DS_Store * Delete hbs_phonemic_phones.txt * Delete .DS_Store * [ita] Adds phoneme list, filtered phonemic TSV file * Updates CHANGELOG * Adds updated README and language summary * Updates CHANGELOG with issue number for Italian phone list * Adds Adyghe phones, filtered Adyghe data * Updated CHANGELOG * Adds Bulgarian phone list, filtered Bulgarian data * Postprocesses with filtered Bulgarian data * Updates changelog * Adds Icelandic phones, filtered TSV file * Updates changelog * Adds Slovenian phones, filtered Slovenian data * Updates changelog * Add normalization to list_phones.py * Updates changelog * Reformats list_phones.py * Adds Welsh phoneme lists, filtered Welsh TSV data * Updates changelog * Updates with instructions to re-scrape * Updates changelog * Updates * Updates data/phones/README.md * Adds Vietnamese phones, Vietnamese TSV files * Updates changelog * Adds Hindi file, new/updated TSV files * Updates changelog * Fixes Serbo-Croatian phones * Updates CHANGELOG * Revert "Adds Hindi file, new/updated TSV files" This reverts commit 964c3becdbf8a4ec35285ffbfbe9a419fad5123e. * Adds Portuguese .phones files, re-scraped TSV data * Rescrapes Portuguese data * Updates changelog * Adds Burmese phones, updated Burmese data * Updates changelog * Adds Japanese phone list. Rescrapes Japanese data * Updates changelog * Removes data/tsv/jpn_hira_phonemic.tsv * Adds Azerbaijani phones, updated TSV data * Updates changelog * Adds Turkish phones, rescraped Turkish data * Updates changelog Co-authored-by: Kyle Gorman <kylebgorman@gmail.com> * [afr] adds phone list for Afrikaans and updated TSV files (#316) * adds afr phone list and rescrapes * Updated CHANGELOG.md * [mlt] Adds Maltese phones file and updates TSV data (#318) * Added French phonemic phones list. Added filter French phonemic tsv. * Added French phonemic phones. * Updated Changelog. * Added phones * Added filtered phonemic wordlist * Added Serbo-Croatian phonemes and filtered TSV files. * Updated summaries for Serbo-Croatian phones. * Updated CHANGELOG. * Fixed formatting of Serbo-Croat phones file and CHANGELOG. * Updated fork to match upstream. * Updated fork to match upstream * Delete .DS_Store I don't know where this file came from... * Delete .DS_Store * Delete hbs_phonemic_phones.txt * Delete .DS_Store * [ita] Adds phoneme list, filtered phonemic TSV file * Updates CHANGELOG * Adds updated README and language summary * Updates CHANGELOG with issue number for Italian phone list * Adds Adyghe phones, filtered Adyghe data * Updated CHANGELOG * Adds Bulgarian phone list, filtered Bulgarian data * Postprocesses with filtered Bulgarian data * Updates changelog * Adds Icelandic phones, filtered TSV file * Updates changelog * Adds Slovenian phones, filtered Slovenian data * Updates changelog * Add normalization to list_phones.py * Updates changelog * Reformats list_phones.py * Adds Welsh phoneme lists, filtered Welsh TSV data * Updates changelog * Updates with instructions to re-scrape * Updates changelog * Updates * Updates data/phones/README.md * Adds Vietnamese phones, Vietnamese TSV files * Updates changelog * Adds Hindi file, new/updated TSV files * Updates changelog * Fixes Serbo-Croatian phones * Updates CHANGELOG * Revert "Adds Hindi file, new/updated TSV files" This reverts commit 964c3becdbf8a4ec35285ffbfbe9a419fad5123e. * Adds Portuguese .phones files, re-scraped TSV data * Rescrapes Portuguese data * Updates changelog * Adds Burmese phones, updated Burmese data * Updates changelog * Adds Japanese phone list. Rescrapes Japanese data * Updates changelog * Removes data/tsv/jpn_hira_phonemic.tsv * Adds Azerbaijani phones, updated TSV data * Updates changelog * Adds Turkish phones, rescraped Turkish data * Updates changelog * Adds Maltese phones, updated data * Updates changelog Co-authored-by: Kyle Gorman <kylebgorman@gmail.com> * Frequency code tire-kick (#320) * Frequency code tire-kick: 1. Increases typing. 2. No longer overwrites the .tsv files: adds `_freq.tsv` suffix sintead. 3. Adds Khmer to JSON config. file. 4. Adds `shared_tasks` subdirectory for targeted config files. 5. Updates README. * [lav] Adds Latvian phone list and updated TSV data (#322) * Added French phonemic phones list. Added filter French phonemic tsv. * Added French phonemic phones. * Updated Changelog. * Added phones * Added filtered phonemic wordlist * Added Serbo-Croatian phonemes and filtered TSV files. * Updated summaries for Serbo-Croatian phones. * Updated CHANGELOG. * Fixed formatting of Serbo-Croat phones file and CHANGELOG. * Updated fork to match upstream. * Updated fork to match upstream * Delete .DS_Store I don't know where this file came from... * Delete .DS_Store * Delete hbs_phonemic_phones.txt * Delete .DS_Store * [ita] Adds phoneme list, filtered phonemic TSV file * Updates CHANGELOG * Adds updated README and language summary * Updates CHANGELOG with issue number for Italian phone list * Adds Adyghe phones, filtered Adyghe data * Updated CHANGELOG * Adds Bulgarian phone list, filtered Bulgarian data * Postprocesses with filtered Bulgarian data * Updates changelog * Adds Icelandic phones, filtered TSV file * Updates changelog * Adds Slovenian phones, filtered Slovenian data * Updates changelog * Add normalization to list_phones.py * Updates changelog * Reformats list_phones.py * Adds Welsh phoneme lists, filtered Welsh TSV data * Updates changelog * Updates with instructions to re-scrape * Updates changelog * Updates * Updates data/phones/README.md * Adds Vietnamese phones, Vietnamese TSV files * Updates changelog * Adds Hindi file, new/updated TSV files * Updates changelog * Fixes Serbo-Croatian phones * Updates CHANGELOG * Revert "Adds Hindi file, new/updated TSV files" This reverts commit 964c3becdbf8a4ec35285ffbfbe9a419fad5123e. * Adds Portuguese .phones files, re-scraped TSV data * Rescrapes Portuguese data * Updates changelog * Adds Burmese phones, updated Burmese data * Updates changelog * Adds Japanese phone list. Rescrapes Japanese data * Updates changelog * Removes data/tsv/jpn_hira_phonemic.tsv * Adds Azerbaijani phones, updated TSV data * Updates changelog * Adds Turkish phones, rescraped Turkish data * Updates changelog * Adds Maltese phones, updated data * Updates changelog * Adds Latvian phones, updated Latvian data Co-authored-by: Kyle Gorman <kylebgorman@gmail.com> * [khm] Adds Khmer phones and updated TSV data (#327) * Added French phonemic phones list. Added filter French phonemic tsv. * Added French phonemic phones. * Updated Changelog. * Added phones * Added filtered phonemic wordlist * Added Serbo-Croatian phonemes and filtered TSV files. * Updated summaries for Serbo-Croatian phones. * Updated CHANGELOG. * Fixed formatting of Serbo-Croat phones file and CHANGELOG. * Updated fork to match upstream. * Updated fork to match upstream * Delete .DS_Store I don't know where this file came from... * Delete .DS_Store * Delete hbs_phonemic_phones.txt * Delete .DS_Store * [ita] Adds phoneme list, filtered phonemic TSV file * Updates CHANGELOG * Adds updated README and language summary * Updates CHANGELOG with issue number for Italian phone list * Adds Adyghe phones, filtered Adyghe data * Updated CHANGELOG * Adds Bulgarian phone list, filtered Bulgarian data * Postprocesses with filtered Bulgarian data * Updates changelog * Adds Icelandic phones, filtered TSV file * Updates changelog * Adds Slovenian phones, filtered Slovenian data * Updates changelog * Add normalization to list_phones.py * Updates changelog * Reformats list_phones.py * Adds Welsh phoneme lists, filtered Welsh TSV data * Updates changelog * Updates with instructions to re-scrape * Updates changelog * Updates * Updates data/phones/README.md * Adds Vietnamese phones, Vietnamese TSV files * Updates changelog * Adds Hindi file, new/updated TSV files * Updates changelog * Fixes Serbo-Croatian phones * Updates CHANGELOG * Revert "Adds Hindi file, new/updated TSV files" This reverts commit 964c3becdbf8a4ec35285ffbfbe9a419fad5123e. * Adds Portuguese .phones files, re-scraped TSV data * Rescrapes Portuguese data * Updates changelog * Adds Burmese phones, updated Burmese data * Updates changelog * Adds Japanese phone list. Rescrapes Japanese data * Updates changelog * Removes data/tsv/jpn_hira_phonemic.tsv * Adds Azerbaijani phones, updated TSV data * Updates changelog * Adds Turkish phones, rescraped Turkish data * Updates changelog * Adds Maltese phones, updated data * Updates changelog * Adds Latvian phones, updated Latvian data * Updates changelog * Adds Khmer phones and updated TSV data * Updates changelog Co-authored-by: Kyle Gorman <kylebgorman@gmail.com> * Moves Latin phonelist to Classical Latin. (#326) Also undertook a light reorg. * [nob] Adds Østnorsk (Bokmål) phones and updated TSV data (#330) * Added French phonemic phones list. Added filter French phonemic tsv. * Added French phonemic phones. * Updated Changelog. * Added phones * Added filtered phonemic wordlist * Added Serbo-Croatian phonemes and filtered TSV files. * Updated summaries for Serbo-Croatian phones. * Updated CHANGELOG. * Fixed formatting of Serbo-Croat phones file and CHANGELOG. * Updated fork to match upstream. * Updated fork to match upstream * Delete .DS_Store I don't know where this file came from... * Delete .DS_Store * Delete hbs_phonemic_phones.txt * Delete .DS_Store * [ita] Adds phoneme list, filtered phonemic TSV file * Updates CHANGELOG * Adds updated README and language summary * Updates CHANGELOG with issue number for Italian phone list * Adds Adyghe phones, filtered Adyghe data * Updated CHANGELOG * Adds Bulgarian phone list, filtered Bulgarian data * Postprocesses with filtered Bulgarian data * Updates changelog * Adds Icelandic phones, filtered TSV file * Updates changelog * Adds Slovenian phones, filtered Slovenian data * Updates changelog * Add normalization to list_phones.py * Updates changelog * Reformats list_phones.py * Adds Welsh phoneme lists, filtered Welsh TSV data * Updates changelog * Updates with instructions to re-scrape * Updates changelog * Updates * Updates data/phones/README.md * Adds Vietnamese phones, Vietnamese TSV files * Updates changelog * Adds Hindi file, new/updated TSV files * Updates changelog * Fixes Serbo-Croatian phones * Updates CHANGELOG * Revert "Adds Hindi file, new/updated TSV files" This reverts commit 964c3becdbf8a4ec35285ffbfbe9a419fad5123e. * Adds Portuguese .phones files, re-scraped TSV data * Rescrapes Portuguese data * Updates changelog * Adds Burmese phones, updated Burmese data * Updates changelog * Adds Japanese phone list. Rescrapes Japanese data * Updates changelog * Removes data/tsv/jpn_hira_phonemic.tsv * Adds Azerbaijani phones, updated TSV data * Updates changelog * Adds Turkish phones, rescraped Turkish data * Updates changelog * Adds Maltese phones, updated data * Updates changelog * Adds Latvian phones, updated Latvian data * Updates changelog * Adds Khmer phones and updated TSV data * Updates changelog * Adds Østnorsk (Bokmål) phones and updated TSV data * Updates changelog * Fixes typo Co-authored-by: Kyle Gorman <kylebgorman@gmail.com> * Add English link to language list for frequencies. (#332) * Partial scrape (#334) * scrape up to cantonese * raw partial scrape - excludes yue, rus, cmn * post-processing on partial scrape, src README fix * re-ran generate_summary.py after resolving conflicts * revert comment in scrape.py * updates changelog, resolves formatting error * Updates `data/phones/README.md` (#333) * Added French phonemic phones list. Added filter French phonemic tsv. * Added French phonemic phones. * Updated Changelog. * Added phones * Added filtered phonemic wordlist * Added Serbo-Croatian phonemes and filtered TSV files. * Updated summaries for Serbo-Croatian phones. * Updated CHANGELOG. * Fixed formatting of Serbo-Croat phones file and CHANGELOG. * Updated fork to match upstream. * Updated fork to match upstream * Delete .DS_Store I don't know where this file came from... * Delete .DS_Store * Delete hbs_phonemic_phones.txt * Delete .DS_Store * [ita] Adds phoneme list, filtered phonemic TSV file * Updates CHANGELOG * Adds updated README and language summary * Updates CHANGELOG with issue number for Italian phone list * Adds Adyghe phones, filtered Adyghe data * Updated CHANGELOG * Adds Bulgarian phone list, filtered Bulgarian data * Postprocesses with filtered Bulgarian data * Updates changelog * Adds Icelandic phones, filtered TSV file * Updates changelog * Adds Slovenian phones, filtered Slovenian data * Updates changelog * Add normalization to list_phones.py * Updates changelog * Reformats list_phones.py * Adds Welsh phoneme lists, filtered Welsh TSV data * Updates changelog * Updates with instructions to re-scrape * Updates changelog * Updates * Updates data/phones/README.md * Adds Vietnamese phones, Vietnamese TSV files * Updates changelog * Adds Hindi file, new/updated TSV files * Updates changelog * Fixes Serbo-Croatian phones * Updates CHANGELOG * Revert "Adds Hindi file, new/updated TSV files" This reverts commit 964c3becdbf8a4ec35285ffbfbe9a419fad5123e. * Adds Portuguese .phones files, re-scraped TSV data * Rescrapes Portuguese data * Updates changelog * Adds Burmese phones, updated Burmese data * Updates changelog * Adds Japanese phone list. Rescrapes Japanese data * Updates changelog * Removes data/tsv/jpn_hira_phonemic.tsv * Adds Azerbaijani phones, updated TSV data * Updates changelog * Adds Turkish phones, rescraped Turkish data * Updates changelog * Adds Maltese phones, updated data * Updates changelog * Adds Latvian phones, updated Latvian data * Updates changelog * Adds Khmer phones and updated TSV data * Updates changelog * Adds Østnorsk (Bokmål) phones and updated TSV data * Updates changelog * Fixes typo * Update data/phones/README.md * Update changelog Co-authored-by: Kyle Gorman <kylebgorman@gmail.com> * [arm] cleaned up armenian phones (#331) * cleaned up armenian phones * cleaned up armenian phones (with more tidying up) * cleaning up armenian (fixed changelog) I had written the update on the wrong spot on the changelog + I added the issue number * uncommented accidental gaps * uncommented accidental gaps * added voiceless allophones * added missing geminate affricates * reduced branch * reduced branch * final changes for commit to original branch Co-authored-by: Lucas Ashby <lfeashby@gmail.com> Co-authored-by: Alexander Gutkin <35786058+agutkin@users.noreply.github.com> Co-authored-by: Kyle Gorman <kylebgorman@gmail.com> Co-authored-by: Travis Bartley <Travismbartley@gmail.com> Co-authored-by: Jackson L. Lee <jacksonlunlee@gmail.com> Co-authored-by: ajmalanoski <71616036+ajmalanoski@users.noreply.github.com> Co-authored-by: Alireza <Alirezasampoor@gmail.com> Co-authored-by: Biswaroop Bhattacharjee <biswaroop08@gmail.com> Co-authored-by: Muhammad Fakhri Putra Supriyadi <fakhriputra123s@gmail.com> Co-authored-by: Ben Fernandes <dev.benfernandes@gmail.com> Co-authored-by: Jim Regan <jaoregan@tcd.ie> Co-authored-by: platipo <enrico.paganin@mail.com> Co-authored-by: yeonju123 <yeonju123@gmail.com> Co-authored-by: unknown <Yeonju@NYCMAXASIKKAW10.ad.insidemedia.net> Co-authored-by: Hossep Dolatian <hovdeov@gmail.com>
Unreleased
inCHANGELOG.md
to reflect the changes in code or data.Closes #200 (that appears to have been transitory).