Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Reduced eng_us_phonemic.phones to eng branch (#336)
* Update to Latin pron selector (#183) * minor change to latin extraction function, rescraped Latin * potential fix to lat scraping issue * raw scrape of latin * postprocessing of new latin data * updated changelog, fixed line length error * rescrape of latin * postprocessing of updated latin data * [pox] Scraped Polabian. (#186) * [pox] Scraped Polabian. Note: The ISO 639-3 code is `pox`, the older ISO 639-2 code is `sla`. * Updated CHANGELOG. * [mnc] Scraped Manchu. (#185) * [mnc] Scraped Manchu. * Updated CHANGELOG. Co-authored-by: Kyle Gorman <kylebgorman@gmail.com> * Merged Whitelist functionality with src/scrape.py. Now checks for pre… (#184) * Merged Whitelist functionality with src/scrape.py. Now checks for presence of whitelist and writes separate tsv as {original file name}_filtered.tsv. Update generate_summary to reflect if file is filtered through a whitelist. CHANGELOG and README update accordingly. * Style tweaks and cleanup. * Updated generalized_split and postprocess to reflect automatic whitelist processing in scrape. Fixed dialect issue in generate_summary. * Previous edits didn't cary. * Cleanup typo mistakes. Added error handling to scrape.py. * Style clean-up. * Fixed style issues. Co-authored-by: Kyle Gorman <kylebgorman@gmail.com> * Imperial aramaic (#187) * [arc] Listing the correct scripts for Imperial Aramaic: 1. The original Aramaic script (`armi`). 2. The square script as in Biblical Aramaic (`hebr`). 3. Classical Syriac/Assyrian Neo-Aramaic (`syrc`) descended from (1). This correctly assigns the entries to their respective lexicons. Most of pronunciations are available for (2), with very minor number of entries for (1) and (3). * [arc] Listing the correct scripts for Imperial Aramaic (continuing the previous commit which was partial): 1. The original Aramaic script (`armi`). 2. The square script as in Biblical Aramaic (`hebr`). 3. Classical Syriac/Assyrian Neo-Aramaic (`syrc`) descended from (1). This correctly assigns the entries to their respective lexicons. Most of pronunciations are available for (2), with very minor number of entries for (1) and (3). * Updated CHANGELOG with #186. * Add --no-tone flag (#188) * tentative solution for tone removal * updates changelog, ran white on test_config.py * remove print statement from test_config.py * partial replace of codepoints with chars, adds nfd/nfc conversion * reworks import statements * updates _TONES_REGEX * ran white on config.py * updates to conversions and adds comments * fixes to scrape.py comment length * converted test_config.py no_tone tests to nfd strings * modifies no_tone process not to skip removing superscript parentheses around non-tone superscript chars * Rename (#192) * [geo] Rescrape post-bot. Closes #138. * Add changelog * Rename. * Update CHANGELOG * Revert "[geo] Rescrape post-bot." This reverts commit 4a151b13e0e03e7a4aecb7dad29c1de9c2230f10. * Flattens directory structure for data. (#194) * Flattens directory structure for data. The non-wiki data is moved to the new `wikipron-extras` (https://github.com/kylebgorman/wikipron-extras) repository. Closes #193. * Add PR number to changelog. * "Imperial" * [geo] Rescrape post-bot. (#191) * [geo] Rescrape post-bot. Closes #138. * Add changelog * Update changelog * [geo] Add whitelist and re-scrape. * Renames for merge. * Add link to guidelines * [hun] Adds whitelist. * Simplify postprocess * Enforces consistent style in logging using %r. (#196) * Enforces consistent style in logging using %r. * Updates CHANELOG * Fixes a double-quoted logging var. * Filtering (#199) * [rum] Add whitelist and rescrape. * [eng] Adds English rescrape. * [dut] Adds Dutch rescrape. * [gre] Adds Greek rescrape. * [gre] Adds Greek rescrape. * Updates scrape path for phonetic filtering. Closes #195. * [rum] Adds Romanian rescrape. * [arm] Adds Armenian rescrape. * [gre] Adds Greek rescrape (second try). * [arm] Adds Armenian dialects + rescrapes. Closes #197. * Adds CHANGELOG changes. * [spa] Adds Spanish rescrape. * Postprocess and regenerate summaries. * [aar, bdq, jje, lsi] discovers new languages and scrapes them. (#202) * Added tyv to languagecodes.py (#203) * adds tuvan to languagecodes.py * updates changelog * Fall scrape (#204) * [aar, bdq, jje, lsi] discovers new languages and scrapes them. * Fall scrape. * Fuller bib information Fills out the bibliography entry for the WikiPron paper. * Updates to codes.py (#205) * updated languages.json and json files for translating between wikitionary code and iso code * updates codes.py and languagecodes.py * modifies test_languagecodes.py to reduce redundancy with codes.py * small formatting fixes * updates changelog * logging statement formatting * Update README.md Fixes formatting issue in table. Not sure why this had to be done manually... * ENH rename '.whitelist' as '.phones' (#207) * Uses %r everywhere in `data/src`. (#210) * Nepali support (#211) * Uses %r everywhere in `data/src`. * [nep] Adds Nepali data. Closes #209. * Update changelog * [fre] Adds phoneme list (#213) * Added French phonemic phones list. Added filter French phonemic tsv. * Added French phonemic phones. * Updated Changelog. * Added phones * [izh] Scrape and add Ingrian. (#215) * [izh] Scrape and add Ingrian. * Updated CHANGELOG. * [ban] Splitting Balinese into Latin and Balinese scripts. (#214) * [ban] Splitting Balinese into Latin and Balinese scripts. * Updated CHANGELOG. Co-authored-by: Kyle Gorman <kylebgorman@gmail.com> * [kir] Split Kyrgyz into Cyrillic and Arabic scripts. (#216) * [kir] Split Kyrgyz into Cyrillic and Arabic scripts. * Updated. * Added fre_phonemic_filtered.tsv (#217) * Added French phonemic phones list. Added filter French phonemic tsv. * Added French phonemic phones. * Updated Changelog. * Added phones * Added filtered phonemic wordlist * Refresh the database size counter. (#220) * [khb] Customized extractor and re-scraping of Lü. (#219) * [khb] Adding customized extractor for Lü. * [khb] Re-scraping and updating the data and summaries. * Updated CHANGELOG. * Reordered imports. * [khb] Adding scrape smoke test. * Resorted. * FIX specify UTF-8 in handling text files (#221) It looks like Windows users have encountered encodings -- they hit `UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 2882: character maps to <undefined>` when pip installing wikipron, the error triggered at setup.py. While we're at it, we specify UTF-8 encoding for all open() calls for text processing as well. Co-authored-by: jacksonllee <jacksonlunlee@gmail.com> * [mga] New scrape: Middle Irish. (#224) * [mga] New scrape: Middle Irish. * Updated CHANGELOG. * [cos] New scrape: Corsican. (#222) * [cos] Add Corsican to the language code registry. * [cos] Scraped Corsican and updated the language descriptions. * Updated CHANGELOG. Co-authored-by: Kyle Gorman <kylebgorman@gmail.com> * [okm] New scrape: Middle Korean (#223) * [okm] Adding ISO 639-3-only Middle Korean: Korean, Middle (10th–16th centuries). * [okm] New scrape of Middle Korean and update of indices and descriptions. * Updated CHANGELOG. * Fixed typo. Co-authored-by: Kyle Gorman <kylebgorman@gmail.com> * [opt] New scrape: Old Portuguese (aka Galician-Portuguese). (#225) * Adding Old Portuegese (aka Galician-Portuguese) codes. * [opt] New scrape. * [opt] Updated summaries. * Updated CHANGELOG. Co-authored-by: Kyle Gorman <kylebgorman@gmail.com> * Added Serbo-Croatian phonemes and filtered TSV files. (#227) * Added French phonemic phones list. Added filter French phonemic tsv. * Added French phonemic phones. * Updated Changelog. * Added phones * Added filtered phonemic wordlist * Added Serbo-Croatian phonemes and filtered TSV files. * Updated summaries for Serbo-Croatian phones. * Updated CHANGELOG. * Fixed formatting of Serbo-Croat phones file and CHANGELOG. Co-authored-by: Kyle Gorman <kylebgorman@gmail.com> * [shn] Custom extractor and new scrape for Shan (#229) * [shn] Adding customized extractor for Shan. * [shn] Adding smoke test. * [shn] New scrape for Shan. * [shn] Updated descriptions. * Updated CHANGELOG. * [tyv] New scrape: Tuvan (#228) * [tyv] Tuvan scrape. * [tyv] Updated descriptions. This also fixes a buggy previous merge of `okm`. * [tyv] Filtering Tuvan to use Cyrillic script only. * Updated CHANGELOG. Co-authored-by: Kyle Gorman <kylebgorman@gmail.com> * Reorganizes tests and adds a few initial tests for the data side (#226) * moves wikipron module tests into subdirectory * reformating of test_version.py * adds outline of test for data naming conventions, removes nonsense from src/scrape.py * basic framework for testing file creation involved in big scrape * renamed file naming test and added comments * reorganizes tests directory, adds test for generate_summary.py * fix formating in test_version.py * revises and renames file for testing scrape * fixes pathing issue in init * adds some typing to new tests * changes open statements to use proper encoding * potential solution to circleci module error * approaching a circleci import solution? * updates changelog * [hbs] Fix file naming. * Update README.md * Update README.md * [gre] Takes advantage of upstream consistency fix. Closes #198. * [lat] Split Latin into its dialects (#233) * ENH handle Latin dialects * RM remove unwanted Latin TSVs * ENH add Latin dialect TSVs * ENH postprocess Latin dialect TSVs * ENH update data summary readme * MAINT update changelog * MAINT update changelog * Add HTTP User-Agent header to API calls (#234) * add http headers for get requests * add http headers for get requests in tests/ * change wikipron/scrape.py code to avoid circular imports * updated requirements.txt to have the latest dependencies (#238) (#239) * Update requirements.txt * Update CHANGELOG.md * Update CHANGELOG.md * Added support for Python-3.9 (#236) (#240) * Update requirements.txt * Update CHANGELOG.md * Update CHANGELOG.md * Update config.yml * Update setup.py * Update CHANGELOG.md * Add black formatting (#242) * add black formatting (fix #237) * update changelog * [kmr] New scrape: Northern Kurdish (#243) * [kmr] Adding an entry for Northern Kurdish. * [kmr] Adding an ISO mapping for Northern Kurdish. * [kmr] Fresh scrape. * [kmr] Updated description and summaries. * [kmr] Updated CHANGELOG. * [kmr] Lower-cased version. * [kmr] Silly. Source should be lower-case. * Update CHANGELOG.md Minor style fixes to CHANGELOG * MAINT reorganize changelog (#244) * Add logging for dialect support for languages requiring custom extraction logic (#245) * ENH alert the use of custom logic when dialect is specific * MAINT update changelog * Add a script to facilitate the creation of .phones files (#246) * ENH add script to tally phones/phonemes in a TSV * DOC update readme for the .phones files * MAINT update changelog * DOC comments in list_phones.py * MAINT update changelog * DOC update docstrings and readme * Use mypy for type checking (#247) * ISSUE-241: Ignoring 'env' and '.idea' directories * ISSUE-241: Added 'mypy' to 'requirements.txt' * ISSUE-241: Added 'Type checking' step to CircleCI * ISSUE-241: Fixed mypy issues * ISSUE-241: Updated documentation * ISSUE-241: Added mypy to the correct 'requirements.txt' * ISSUE-241: Ran Black formatter Also updated the contribution guidelines to include this as a step * ISSUE-241: Markups ISSUE-241: Markup - Alphabetised 'requirements.txt' ISSUE-241: Markup - Log invalid page title ISSUE-241: Markup - Alphabetised 'test_scrape.py' imports ISSUE-241: Markup - Added explanatory comment ISSUE-241: Markup - Improved 'config_dict' typing ISSUE-241: Markup - Improved 'scrape.py' typing * ISSUE-241: Markup - Using logger interpolation * ISSUE-241: Markups * ISSUE-241: Markup - Added working dir to Circle CI config * split tildes; resort (#250) * split tildes; resort * update CHANGELOG.md * Improve CircleCI workflow with orbs (#249) * Convert to matrix CircleCI workflow * Fix typo in parameter * Add missing job name * Add CircleCI test storage * Add Python orb and caching * Fix orb command * Set Python deps install to global scope * Bump up Python orb version * Fix command nesting * Add package manager to orb command * Fix pyenv cache failure * Fix pyenv cache * Add workspace cache for pip packages * Fix username typo * Fix permission error * Test pre-built CircleCI Docker image * Test missing site-packages * Test missing Python dir * Add verbose pip list * Add pre test jobs * Fix parameter substitution in description * Fix extraneus run * Add parametrized flake8 and black jobs * Fix parameter passing * Fix unreferenced parameter * Fix pre-test Docker image tag * Show xml coverage * Add pre-test Python cache * Create tsv directory * Chown /home to circleci * Fix store_results path * Rename pre-test jobs * Improve CircleCI configuration Add Python orb, matrix jobs and rework workflow structure * Improve CircleCI configuration Add Python orb, matrix jobs and rework workflow structure * Bump up pre-build Python version to 3.9 * Add mypy to pre-build jobs * Add mypy to build required jobs * Change pip3 to pip * Add PR to CHANGELOG.md * Disable circleci user chowning /home * Revert "Disable circleci user chowning /home" This reverts commit eed32d6f3ab9c2094a642cc23967c536ad5bddb5. * Disable pyenv creation * Revert "Disable pyenv creation" This reverts commit 68297c21c1c2f4dc67e2bc9bd7972adbeea3878b. * Disable pyenv creation * Test pip cache renewal * Revert "Test pip cache renewal" This reverts commit b4772307ded407da0fedfc4320b3594f66d366fa. Cache works as intended, references https://github.com/kylebgorman/wikipron/pull/249#discussion_r511582495. Co-authored-by: Jackson L. Lee <jacksonlunlee@gmail.com> * Small path changes on the data side, rework of test_scrape.py (#251) * rework some paths on data side, simplify test_scrape.py * revert changes to test_summary.py * updates changelog * Adding a sanity check for valid IPA (#248) * Check that the phones/phonemes are valid IPA. * Only print the bad characters. * Updated CHANGELOG. * Reformatted the file using black. * Reran black with line length limit. * Phonemes, rather than phones. * Sorted the packages alphabetically. * Re-arranged imports. * Moved ipapy into data-specific requirements file. * Adding dependency on absl-py (for logging) and factoring out the phoneme checking functionality into its own function. * Added a link to IPA chart. * Removed absl-py. * Use internal logger. * Check the logging level. * Moving to global logger. Thanks Kyle! * reformatted. * Cosmetic: fixed warning message. Co-authored-by: Kyle Gorman <kylebgorman@gmail.com> * Style fixes for list_phones (#254) * Style fixes for list_phones * Ran black formatter. * Remove `<5`. * Negative flags are renamed to positive statements (#141) (#255) * Negative flags in cli.py are renamed to positive statements. In order to accomodate this change, Wikipron/config.py and tests/test_wikipron/test_config.py are also edited accordingly. * positive flags are added and negative flags are renamed to positive ones. * changelog is updated. * style edit * fix fix redundancy Co-authored-by: unknown <Yeonju@NYCMAXASIKKAW10.ad.insidemedia.net> * Clean up flag help and eliminate remaining double negatives (#257) * Work on flags: 1. Flag help should be short, because people don't read it very carefully and it's not formatted for multi-sentence input. This shortens all the flags to a single, consistent name. Because dialect and segmentation require more information, these details have been moved into a prominent position in the README instead. 2. The tone and space flags are given negative versions, cf. what Yeonju did earlier. * Eliminates double-negative in skip-spaces. * Updates changelog. * Updates tests, config, core. * Fixed missing test_scrape change. * Adds test for TSV splitting (#256) * fixes to split.py and postprocess before adding tests * cleanup of test_split * updated a few comments in test_split * revert needless changes to postprocess and split * minor comment update in test_split * updates changelog * Updates data side to use new flags (#258) * quick fix to small oversight in test_extract.py * data side uses new flags * updates changelog, removes config_factory from text_extract.py * [ita] Adds phoneme list, filtered phonemic TSV file (#261) * Added French phonemic phones list. Added filter French phonemic tsv. * Added French phonemic phones. * Updated Changelog. * Added phones * Added filtered phonemic wordlist * Added Serbo-Croatian phonemes and filtered TSV files. * Updated summaries for Serbo-Croatian phones. * Updated CHANGELOG. * Fixed formatting of Serbo-Croat phones file and CHANGELOG. * Updated fork to match upstream. * Updated fork to match upstream * Delete .DS_Store I don't know where this file came from... * Delete .DS_Store * Delete hbs_phonemic_phones.txt * Delete .DS_Store * [ita] Adds phoneme list, filtered phonemic TSV file * Updates CHANGELOG * Adds updated README and language summary * Updates CHANGELOG with issue number for Italian phone list Co-authored-by: Kyle Gorman <kylebgorman@gmail.com> * [ady] Adds phone list, filtered Adyghe data. (#263) * Added French phonemic phones list. Added filter French phonemic tsv. * Added French phonemic phones. * Updated Changelog. * Added phones * Added filtered phonemic wordlist * Added Serbo-Croatian phonemes and filtered TSV files. * Updated summaries for Serbo-Croatian phones. * Updated CHANGELOG. * Fixed formatting of Serbo-Croat phones file and CHANGELOG. * Updated fork to match upstream. * Updated fork to match upstream * Delete .DS_Store I don't know where this file came from... * Delete .DS_Store * Delete hbs_phonemic_phones.txt * Delete .DS_Store * [ita] Adds phoneme list, filtered phonemic TSV file * Updates CHANGELOG * Adds updated README and language summary * Updates CHANGELOG with issue number for Italian phone list * Adds Adyghe phones, filtered Adyghe data * Updated CHANGELOG Co-authored-by: Kyle Gorman <kylebgorman@gmail.com> * Moves `list_phones.py`. (#266) * Moves `list_phones.py`. Closes #265. * Add changelog * [bul] Adds phone list, filtered Bulgarian data (#267) * Added French phonemic phones list. Added filter French phonemic tsv. * Added French phonemic phones. * Updated Changelog. * Added phones * Added filtered phonemic wordlist * Added Serbo-Croatian phonemes and filtered TSV files. * Updated summaries for Serbo-Croatian phones. * Updated CHANGELOG. * Fixed formatting of Serbo-Croat phones file and CHANGELOG. * Updated fork to match upstream. * Updated fork to match upstream * Delete .DS_Store I don't know where this file came from... * Delete .DS_Store * Delete hbs_phonemic_phones.txt * Delete .DS_Store * [ita] Adds phoneme list, filtered phonemic TSV file * Updates CHANGELOG * Adds updated README and language summary * Updates CHANGELOG with issue number for Italian phone list * Adds Adyghe phones, filtered Adyghe data * Updated CHANGELOG * Adds Bulgarian phone list, filtered Bulgarian data * Postprocesses with filtered Bulgarian data * Updates changelog Co-authored-by: Kyle Gorman <kylebgorman@gmail.com> * Adds Icelandic phone list, filtered Icelandic data. (#270) * Added French phonemic phones list. Added filter French phonemic tsv. * Added French phonemic phones. * Updated Changelog. * Added phones * Added filtered phonemic wordlist * Added Serbo-Croatian phonemes and filtered TSV files. * Updated summaries for Serbo-Croatian phones. * Updated CHANGELOG. * Fixed formatting of Serbo-Croat phones file and CHANGELOG. * Updated fork to match upstream. * Updated fork to match upstream * Delete .DS_Store I don't know where this file came from... * Delete .DS_Store * Delete hbs_phonemic_phones.txt * Delete .DS_Store * [ita] Adds phoneme list, filtered phonemic TSV file * Updates CHANGELOG * Adds updated README and language summary * Updates CHANGELOG with issue number for Italian phone list * Adds Adyghe phones, filtered Adyghe data * Updated CHANGELOG * Adds Bulgarian phone list, filtered Bulgarian data * Postprocesses with filtered Bulgarian data * Updates changelog * Adds Icelandic phones, filtered TSV file * Updates changelog Co-authored-by: Kyle Gorman <kylebgorman@gmail.com> * [slv] Adds Slovenian phoneme list, filtered TSV data. (#273) * Added French phonemic phones list. Added filter French phonemic tsv. * Added French phonemic phones. * Updated Changelog. * Added phones * Added filtered phonemic wordlist * Added Serbo-Croatian phonemes and filtered TSV files. * Updated summaries for Serbo-Croatian phones. * Updated CHANGELOG. * Fixed formatting of Serbo-Croat phones file and CHANGELOG. * Updated fork to match upstream. * Updated fork to match upstream * Delete .DS_Store I don't know where this file came from... * Delete .DS_Store * Delete hbs_phonemic_phones.txt * Delete .DS_Store * [ita] Adds phoneme list, filtered phonemic TSV file * Updates CHANGELOG * Adds updated README and language summary * Updates CHANGELOG with issue number for Italian phone list * Adds Adyghe phones, filtered Adyghe data * Updated CHANGELOG * Adds Bulgarian phone list, filtered Bulgarian data * Postprocesses with filtered Bulgarian data * Updates changelog * Adds Icelandic phones, filtered TSV file * Updates changelog * Adds Slovenian phones, filtered Slovenian data * Updates changelog Co-authored-by: Kyle Gorman <kylebgorman@gmail.com> * Adds normalization to `list_phones.py`, corrects bugs relating to `ipapy` (#275) * Added French phonemic phones list. Added filter French phonemic tsv. * Added French phonemic phones. * Updated Changelog. * Added phones * Added filtered phonemic wordlist * Added Serbo-Croatian phonemes and filtered TSV files. * Updated summaries for Serbo-Croatian phones. * Updated CHANGELOG. * Fixed formatting of Serbo-Croat phones file and CHANGELOG. * Updated fork to match upstream. * Updated fork to match upstream * Delete .DS_Store I don't know where this file came from... * Delete .DS_Store * Delete hbs_phonemic_phones.txt * Delete .DS_Store * [ita] Adds phoneme list, filtered phonemic TSV file * Updates CHANGELOG * Adds updated README and language summary * Updates CHANGELOG with issue number for Italian phone list * Adds Adyghe phones, filtered Adyghe data * Updated CHANGELOG * Adds Bulgarian phone list, filtered Bulgarian data * Postprocesses with filtered Bulgarian data * Updates changelog * Adds Icelandic phones, filtered TSV file * Updates changelog * Adds Slovenian phones, filtered Slovenian data * Updates changelog * Add normalization to list_phones.py * Updates changelog * Reformats list_phones.py Co-authored-by: Kyle Gorman <kylebgorman@gmail.com> * Adds Welsh .phones lists, filtered TSV data (#276) * Added French phonemic phones list. Added filter French phonemic tsv. * Added French phonemic phones. * Updated Changelog. * Added phones * Added filtered phonemic wordlist * Added Serbo-Croatian phonemes and filtered TSV files. * Updated summaries for Serbo-Croatian phones. * Updated CHANGELOG. * Fixed formatting of Serbo-Croat phones file and CHANGELOG. * Updated fork to match upstream. * Updated fork to match upstream * Delete .DS_Store I don't know where this file came from... * Delete .DS_Store * Delete hbs_phonemic_phones.txt * Delete .DS_Store * [ita] Adds phoneme list, filtered phonemic TSV file * Updates CHANGELOG * Adds updated README and language summary * Updates CHANGELOG with issue number for Italian phone list * Adds Adyghe phones, filtered Adyghe data * Updated CHANGELOG * Adds Bulgarian phone list, filtered Bulgarian data * Postprocesses with filtered Bulgarian data * Updates changelog * Adds Icelandic phones, filtered TSV file * Updates changelog * Adds Slovenian phones, filtered Slovenian data * Updates changelog * Add normalization to list_phones.py * Updates changelog * Reformats list_phones.py * Adds Welsh phoneme lists, filtered Welsh TSV data * Updates changelog Co-authored-by: Kyle Gorman <kylebgorman@gmail.com> * [yue] Handle Cantonese for scraping (#277) * ENH handle Cantonese for scraping * MAINT update changelog * DOC explain Cantonese pron XPath template * Updates `data/phones/README.md` with instructions to re-scrape (#281) * Added French phonemic phones list. Added filter French phonemic tsv. * Added French phonemic phones. * Updated Changelog. * Added phones * Added filtered phonemic wordlist * Added Serbo-Croatian phonemes and filtered TSV files. * Updated summaries for Serbo-Croatian phones. * Updated CHANGELOG. * Fixed formatting of Serbo-Croat phones file and CHANGELOG. * Updated fork to match upstream. * Updated fork to match upstream * Delete .DS_Store I don't know where this file came from... * Delete .DS_Store * Delete hbs_phonemic_phones.txt * Delete .DS_Store * [ita] Adds phoneme list, filtered phonemic TSV file * Updates CHANGELOG * Adds updated README and language summary * Updates CHANGELOG with issue number for Italian phone list * Adds Adyghe phones, filtered Adyghe data * Updated CHANGELOG * Adds Bulgarian phone list, filtered Bulgarian data * Postprocesses with filtered Bulgarian data * Updates changelog * Adds Icelandic phones, filtered TSV file * Updates changelog * Adds Slovenian phones, filtered Slovenian data * Updates changelog * Add normalization to list_phones.py * Updates changelog * Reformats list_phones.py * Adds Welsh phoneme lists, filtered Welsh TSV data * Updates changelog * Updates with instructions to re-scrape * Updates changelog * Updates * Updates data/phones/README.md Co-authored-by: Kyle Gorman <kylebgorman@gmail.com> * [vie] Adds Vietnamese `.phones` files, `.tsv` files (#283) * Added French phonemic phones list. Added filter French phonemic tsv. * Added French phonemic phones. * Updated Changelog. * Added phones * Added filtered phonemic wordlist * Added Serbo-Croatian phonemes and filtered TSV files. * Updated summaries for Serbo-Croatian phones. * Updated CHANGELOG. * Fixed formatting of Serbo-Croat phones file and CHANGELOG. * Updated fork to match upstream. * Updated fork to match upstream * Delete .DS_Store I don't know where this file came from... * Delete .DS_Store * Delete hbs_phonemic_phones.txt * Delete .DS_Store * [ita] Adds phoneme list, filtered phonemic TSV file * Updates CHANGELOG * Adds updated README and language summary * Updates CHANGELOG with issue number for Italian phone list * Adds Adyghe phones, filtered Adyghe data * Updated CHANGELOG * Adds Bulgarian phone list, filtered Bulgarian data * Postprocesses with filtered Bulgarian data * Updates changelog * Adds Icelandic phones, filtered TSV file * Updates changelog * Adds Slovenian phones, filtered Slovenian data * Updates changelog * Add normalization to list_phones.py * Updates changelog * Reformats list_phones.py * Adds Welsh phoneme lists, filtered Welsh TSV data * Updates changelog * Updates with instructions to re-scrape * Updates changelog * Updates * Updates data/phones/README.md * Adds Vietnamese phones, Vietnamese TSV files * Updates changelog Co-authored-by: Kyle Gorman <kylebgorman@gmail.com> * [hin] Adds `phones` file, updated/new TSV files. (#284) * Added French phonemic phones list. Added filter French phonemic tsv. * Added French phonemic phones. * Updated Changelog. * Added phones * Added filtered phonemic wordlist * Added Serbo-Croatian phonemes and filtered TSV files. * Updated summaries for Serbo-Croatian phones. * Updated CHANGELOG. * Fixed formatting of Serbo-Croat phones file and CHANGELOG. * Updated fork to match upstream. * Updated fork to match upstream * Delete .DS_Store I don't know where this file came from... * Delete .DS_Store * Delete hbs_phonemic_phones.txt * Delete .DS_Store * [ita] Adds phoneme list, filtered phonemic TSV file * Updates CHANGELOG * Adds updated README and language summary * Updates CHANGELOG with issue number for Italian phone list * Adds Adyghe phones, filtered Adyghe data * Updated CHANGELOG * Adds Bulgarian phone list, filtered Bulgarian data * Postprocesses with filtered Bulgarian data * Updates changelog * Adds Icelandic phones, filtered TSV file * Updates changelog * Adds Slovenian phones, filtered Slovenian data * Updates changelog * Add normalization to list_phones.py * Updates changelog * Reformats list_phones.py * Adds Welsh phoneme lists, filtered Welsh TSV data * Updates changelog * Updates with instructions to re-scrape * Updates changelog * Updates * Updates data/phones/README.md * Adds Vietnamese phones, Vietnamese TSV files * Updates changelog * Adds Hindi file, new/updated TSV files * Updates changelog Co-authored-by: Kyle Gorman <kylebgorman@gmail.com> * [hbs] Fixes Serbo-Croatian phoneme lists. Re-scrapes data. (#288) * Added French phonemic phones list. Added filter French phonemic tsv. * Added French phonemic phones. * Updated Changelog. * Added phones * Added filtered phonemic wordlist * Added Serbo-Croatian phonemes and filtered TSV files. * Updated summaries for Serbo-Croatian phones. * Updated CHANGELOG. * Fixed formatting of Serbo-Croat phones file and CHANGELOG. * Updated fork to match upstream. * Updated fork to match upstream * Delete .DS_Store I don't know where this file came from... * Delete .DS_Store * Delete hbs_phonemic_phones.txt * Delete .DS_Store * [ita] Adds phoneme list, filtered phonemic TSV file * Updates CHANGELOG * Adds updated README and language summary * Updates CHANGELOG with issue number for Italian phone list * Adds Adyghe phones, filtered Adyghe data * Updated CHANGELOG * Adds Bulgarian phone list, filtered Bulgarian data * Postprocesses with filtered Bulgarian data * Updates changelog * Adds Icelandic phones, filtered TSV file * Updates changelog * Adds Slovenian phones, filtered Slovenian data * Updates changelog * Add normalization to list_phones.py * Updates changelog * Reformats list_phones.py * Adds Welsh phoneme lists, filtered Welsh TSV data * Updates changelog * Updates with instructions to re-scrape * Updates changelog * Updates * Updates data/phones/README.md * Adds Vietnamese phones, Vietnamese TSV files * Updates changelog * Adds Hindi file, new/updated TSV files * Updates changelog * Fixes Serbo-Croatian phones * Updates CHANGELOG * Revert "Adds Hindi file, new/updated TSV files" This reverts commit 964c3becdbf8a4ec35285ffbfbe9a419fad5123e. Co-authored-by: Kyle Gorman <kylebgorman@gmail.com> * [ofs] Scraped Old Frisian. (#294) * Add Old Frisian to the configuration. * Mark "ofs" as ISO639-3 language code. * Fixed language name. * Added phonemic pronunciations. * Updated. * [aar] Rescraped Afar. (#291) * [aar] Rescraped. * Updated. Co-authored-by: Kyle Gorman <kylebgorman@gmail.com> * [dng] Scraped Dungan. (#293) * [dng] Scraped Dungan. * Updated. Co-authored-by: Kyle Gorman <kylebgorman@gmail.com> * [bre] Rescraped Breton. (#292) * [bre] Rescraped. * Updated. Co-authored-by: Kyle Gorman <kylebgorman@gmail.com> * Covering grammar script (#297) * Adds covering grammar generator, for future QA work. Also moves `list_phones.py` to the src directory, which makes sense to me. * Changelog update. * Updates CHANGELOG with issue number. * Update README.md Fix syntax highlighting hints. * [ltg] Scraped Latgalian. (#296) * [ltg] Scraped Latgalian. * Forgot to include the actual data. * Updated. * Removes reconstructions (#302) * Adds covering grammar generator, for future QA work. Also moves `list_phones.py` to the src directory, which makes sense to me. * Changelog update. * Updates CHANGELOG with issue number. * Skips reconstructions during scraping. Then, rescrapes Latin to take advantage of this. * Adds number to changelog. * Updates CHANGELOG for >>> junk. * Rescrapes Armenian. (#303) Closes #301. * [por] Adds phones files, rescraped TSV files. (#304) * Added French phonemic phones list. Added filter French phonemic tsv. * Added French phonemic phones. * Updated Changelog. * Added phones * Added filtered phonemic wordlist * Added Serbo-Croatian phonemes and filtered TSV files. * Updated summaries for Serbo-Croatian phones. * Updated CHANGELOG. * Fixed formatting of Serbo-Croat phones file and CHANGELOG. * Updated fork to match upstream. * Updated fork to match upstream * Delete .DS_Store I don't know where this file came from... * Delete .DS_Store * Delete hbs_phonemic_phones.txt * Delete .DS_Store * [ita] Adds phoneme list, filtered phonemic TSV file * Updates CHANGELOG * Adds updated README and language summary * Updates CHANGELOG with issue number for Italian phone list * Adds Adyghe phones, filtered Adyghe data * Updated CHANGELOG * Adds Bulgarian phone list, filtered Bulgarian data * Postprocesses with filtered Bulgarian data * Updates changelog * Adds Icelandic phones, filtered TSV file * Updates changelog * Adds Slovenian phones, filtered Slovenian data * Updates changelog * Add normalization to list_phones.py * Updates changelog * Reformats list_phones.py * Adds Welsh phoneme lists, filtered Welsh TSV data * Updates changelog * Updates with instructions to re-scrape * Updates changelog * Updates * Updates data/phones/README.md * Adds Vietnamese phones, Vietnamese TSV files * Updates changelog * Adds Hindi file, new/updated TSV files * Updates changelog * Fixes Serbo-Croatian phones * Updates CHANGELOG * Revert "Adds Hindi file, new/updated TSV files" This reverts commit 964c3becdbf8a4ec35285ffbfbe9a419fad5123e. * Adds Portuguese .phones files, re-scraped TSV data * Rescrapes Portuguese data * Updates changelog Co-authored-by: Kyle Gorman <kylebgorman@gmail.com> * [bur] Adds Burmese phone list, re-scraped Burmese data. (#305) * Added French phonemic phones list. Added filter French phonemic tsv. * Added French phonemic phones. * Updated Changelog. * Added phones * Added filtered phonemic wordlist * Added Serbo-Croatian phonemes and filtered TSV files. * Updated summaries for Serbo-Croatian phones. * Updated CHANGELOG. * Fixed formatting of Serbo-Croat phones file and CHANGELOG. * Updated fork to match upstream. * Updated fork to match upstream * Delete .DS_Store I don't know where this file came from... * Delete .DS_Store * Delete hbs_phonemic_phones.txt * Delete .DS_Store * [ita] Adds phoneme list, filtered phonemic TSV file * Updates CHANGELOG * Adds updated README and language summary * Updates CHANGELOG with issue number for Italian phone list * Adds Adyghe phones, filtered Adyghe data * Updated CHANGELOG * Adds Bulgarian phone list, filtered Bulgarian data * Postprocesses with filtered Bulgarian data * Updates changelog * Adds Icelandic phones, filtered TSV file * Updates changelog * Adds Slovenian phones, filtered Slovenian data * Updates changelog * Add normalization to list_phones.py * Updates changelog * Reformats list_phones.py * Adds Welsh phoneme lists, filtered Welsh TSV data * Updates changelog * Updates with instructions to re-scrape * Updates changelog * Updates * Updates data/phones/README.md * Adds Vietnamese phones, Vietnamese TSV files * Updates changelog * Adds Hindi file, new/updated TSV files * Updates changelog * Fixes Serbo-Croatian phones * Updates CHANGELOG * Revert "Adds Hindi file, new/updated TSV files" This reverts commit 964c3becdbf8a4ec35285ffbfbe9a419fad5123e. * Adds Portuguese .phones files, re-scraped TSV data * Rescrapes Portuguese data * Updates changelog * Adds Burmese phones, updated Burmese data * Updates changelog Co-authored-by: Kyle Gorman <kylebgorman@gmail.com> * [mdf] Scrape Moksha + slightly more flexible default pron selector. (#295) * Support some Moksha pronunciations that reside under "p", rather than "li". * Scrape. * Attempt to fix the test. * Updated. * Split the PR into two items. * [jpn] Adds Japanese .phones file and updated TSV files (#307) * Added French phonemic phones list. Added filter French phonemic tsv. * Added French phonemic phones. * Updated Changelog. * Added phones * Added filtered phonemic wordlist * Added Serbo-Croatian phonemes and filtered TSV files. * Updated summaries for Serbo-Croatian phones. * Updated CHANGELOG. * Fixed formatting of Serbo-Croat phones file and CHANGELOG. * Updated fork to match upstream. * Updated fork to match upstream * Delete .DS_Store I don't know where this file came from... * Delete .DS_Store * Delete hbs_phonemic_phones.txt * Delete .DS_Store * [ita] Adds phoneme list, filtered phonemic TSV file * Updates CHANGELOG * Adds updated README and language summary * Updates CHANGELOG with issue number for Italian phone list * Adds Adyghe phones, filtered Adyghe data * Updated CHANGELOG * Adds Bulgarian phone list, filtered Bulgarian data * Postprocesses with filtered Bulgarian data * Updates changelog * Adds Icelandic phones, filtered TSV file * Updates changelog * Adds Slovenian phones, filtered Slovenian data * Updates changelog * Add normalization to list_phones.py * Updates changelog * Reformats list_phones.py * Adds Welsh phoneme lists, filtered Welsh TSV data * Updates changelog * Updates with instructions to re-scrape * Updates changelog * Updates * Updates data/phones/README.md * Adds Vietnamese phones, Vietnamese TSV files * Updates changelog * Adds Hindi file, new/updated TSV files * Updates changelog * Fixes Serbo-Croatian phones * Updates CHANGELOG * Revert "Adds Hindi file, new/updated TSV files" This reverts commit 964c3becdbf8a4ec35285ffbfbe9a419fad5123e. * Adds Portuguese .phones files, re-scraped TSV data * Rescrapes Portuguese data * Updates changelog * Adds Burmese phones, updated Burmese data * Updates changelog * Adds Japanese phone list. Rescrapes Japanese data * Updates changelog * Removes data/tsv/jpn_hira_phonemic.tsv Co-authored-by: Kyle Gorman <kylebgorman@gmail.com> * Updates segments version (#308) * updates segments version and adds test for vietnamese tones * updates changelog * [ger] Adds German Phone list, filtered TSV file (#309) * Create German Phonelist * Updated CHANGELOG.md * incorporate updates in README.md, and added missing ger_phone* files * Adds some whitespace to German phone list comments. (#310) * [aze] Adds Azerbaijani phone lists and updated TSV data (#312) * Added French phonemic phones list. Added filter French phonemic tsv. * Added French phonemic phones. * Updated Changelog. * Added phones * Added filtered phonemic wordlist * Added Serbo-Croatian phonemes and filtered TSV files. * Updated summaries for Serbo-Croatian phones. * Updated CHANGELOG. * Fixed formatting of Serbo-Croat phones file and CHANGELOG. * Updated fork to match upstream. * Updated fork to match upstream * Delete .DS_Store I don't know where this file came from... * Delete .DS_Store * Delete hbs_phonemic_phones.txt * Delete .DS_Store * [ita] Adds phoneme list, filtered phonemic TSV file * Updates CHANGELOG * Adds updated README and language summary * Updates CHANGELOG with issue number for Italian phone list * Adds Adyghe phones, filtered Adyghe data * Updated CHANGELOG * Adds Bulgarian phone list, filtered Bulgarian data * Postprocesses with filtered Bulgarian data * Updates changelog * Adds Icelandic phones, filtered TSV file * Updates changelog * Adds Slovenian phones, filtered Slovenian data * Updates changelog * Add normalization to list_phones.py * Updates changelog * Reformats list_phones.py * Adds Welsh phoneme lists, filtered Welsh TSV data * Updates changelog * Updates with instructions to re-scrape * Updates changelog * Updates * Updates data/phones/README.md * Adds Vietnamese phones, Vietnamese TSV files * Updates changelog * Adds Hindi file, new/updated TSV files * Updates changelog * Fixes Serbo-Croatian phones * Updates CHANGELOG * Revert "Adds Hindi file, new/updated TSV files" This reverts commit 964c3becdbf8a4ec35285ffbfbe9a419fad5123e. * Adds Portuguese .phones files, re-scraped TSV data * Rescrapes Portuguese data * Updates changelog * Adds Burmese phones, updated Burmese data * Updates changelog * Adds Japanese phone list. Rescrapes Japanese data * Updates changelog * Removes data/tsv/jpn_hira_phonemic.tsv * Adds Azerbaijani phones, updated TSV data * Updates changelog Co-authored-by: Kyle Gorman <kylebgorman@gmail.com> * [tur] Adds Turkish phone list and updated TSV data (#314) * Added French phonemic phones list. Added filter French phonemic tsv. * Added French phonemic phones. * Updated Changelog. * Added phones * Added filtered phonemic wordlist * Added Serbo-Croatian phonemes and filtered TSV files. * Updated summaries for Serbo-Croatian phones. * Updated CHANGELOG. * Fixed formatting of Serbo-Croat phones file and CHANGELOG. * Updated fork to match upstream. * Updated fork to match upstream * Delete .DS_Store I don't know where this file came from... * Delete .DS_Store * Delete hbs_phonemic_phones.txt * Delete .DS_Store * [ita] Adds phoneme list, filtered phonemic TSV file * Updates CHANGELOG * Adds updated README and language summary * Updates CHANGELOG with issue number for Italian phone list * Adds Adyghe phones, filtered Adyghe data * Updated CHANGELOG * Adds Bulgarian phone list, filtered Bulgarian data * Postprocesses with filtered Bulgarian data * Updates changelog * Adds Icelandic phones, filtered TSV file * Updates changelog * Adds Slovenian phones, filtered Slovenian data * Updates changelog * Add normalization to list_phones.py * Updates changelog * Reformats list_phones.py * Adds Welsh phoneme lists, filtered Welsh TSV data * Updates changelog * Updates with instructions to re-scrape * Updates changelog * Updates * Updates data/phones/README.md * Adds Vietnamese phones, Vietnamese TSV files * Updates changelog * Adds Hindi file, new/updated TSV files * Updates changelog * Fixes Serbo-Croatian phones * Updates CHANGELOG * Revert "Adds Hindi file, new/updated TSV files" This reverts commit 964c3becdbf8a4ec35285ffbfbe9a419fad5123e. * Adds Portuguese .phones files, re-scraped TSV data * Rescrapes Portuguese data * Updates changelog * Adds Burmese phones, updated Burmese data * Updates changelog * Adds Japanese phone list. Rescrapes Japanese data * Updates changelog * Removes data/tsv/jpn_hira_phonemic.tsv * Adds Azerbaijani phones, updated TSV data * Updates changelog * Adds Turkish phones, rescraped Turkish data * Updates changelog Co-authored-by: Kyle Gorman <kylebgorman@gmail.com> * [afr] adds phone list for Afrikaans and updated TSV files (#316) * adds afr phone list and rescrapes * Updated CHANGELOG.md * [mlt] Adds Maltese phones file and updates TSV data (#318) * Added French phonemic phones list. Added filter French phonemic tsv. * Added French phonemic phones. * Updated Changelog. * Added phones * Added filtered phonemic wordlist * Added Serbo-Croatian phonemes and filtered TSV files. * Updated summaries for Serbo-Croatian phones. * Updated CHANGELOG. * Fixed formatting of Serbo-Croat phones file and CHANGELOG. * Updated fork to match upstream. * Updated fork to match upstream * Delete .DS_Store I don't know where this file came from... * Delete .DS_Store * Delete hbs_phonemic_phones.txt * Delete .DS_Store * [ita] Adds phoneme list, filtered phonemic TSV file * Updates CHANGELOG * Adds updated README and language summary * Updates CHANGELOG with issue number for Italian phone list * Adds Adyghe phones, filtered Adyghe data * Updated CHANGELOG * Adds Bulgarian phone list, filtered Bulgarian data * Postprocesses with filtered Bulgarian data * Updates changelog * Adds Icelandic phones, filtered TSV file * Updates changelog * Adds Slovenian phones, filtered Slovenian data * Updates changelog * Add normalization to list_phones.py * Updates changelog * Reformats list_phones.py * Adds Welsh phoneme lists, filtered Welsh TSV data * Updates changelog * Updates with instructions to re-scrape * Updates changelog * Updates * Updates data/phones/README.md * Adds Vietnamese phones, Vietnamese TSV files * Updates changelog * Adds Hindi file, new/updated TSV files * Updates changelog * Fixes Serbo-Croatian phones * Updates CHANGELOG * Revert "Adds Hindi file, new/updated TSV files" This reverts commit 964c3becdbf8a4ec35285ffbfbe9a419fad5123e. * Adds Portuguese .phones files, re-scraped TSV data * Rescrapes Portuguese data * Updates changelog * Adds Burmese phones, updated Burmese data * Updates changelog * Adds Japanese phone list. Rescrapes Japanese data * Updates changelog * Removes data/tsv/jpn_hira_phonemic.tsv * Adds Azerbaijani phones, updated TSV data * Updates changelog * Adds Turkish phones, rescraped Turkish data * Updates changelog * Adds Maltese phones, updated data * Updates changelog Co-authored-by: Kyle Gorman <kylebgorman@gmail.com> * Frequency code tire-kick (#320) * Frequency code tire-kick: 1. Increases typing. 2. No longer overwrites the .tsv files: adds `_freq.tsv` suffix sintead. 3. Adds Khmer to JSON config. file. 4. Adds `shared_tasks` subdirectory for targeted config files. 5. Updates README. * [lav] Adds Latvian phone list and updated TSV data (#322) * Added French phonemic phones list. Added filter French phonemic tsv. * Added French phonemic phones. * Updated Changelog. * Added phones * Added filtered phonemic wordlist * Added Serbo-Croatian phonemes and filtered TSV files. * Updated summaries for Serbo-Croatian phones. * Updated CHANGELOG. * Fixed formatting of Serbo-Croat phones file and CHANGELOG. * Updated fork to match upstream. * Updated fork to match upstream * Delete .DS_Store I don't know where this file came from... * Delete .DS_Store * Delete hbs_phonemic_phones.txt * Delete .DS_Store * [ita] Adds phoneme list, filtered phonemic TSV file * Updates CHANGELOG * Adds updated README and language summary * Updates CHANGELOG with issue number for Italian phone list * Adds Adyghe phones, filtered Adyghe data * Updated CHANGELOG * Adds Bulgarian phone list, filtered Bulgarian data * Postprocesses with filtered Bulgarian data * Updates changelog * Adds Icelandic phones, filtered TSV file * Updates changelog * Adds Slovenian phones, filtered Slovenian data * Updates changelog * Add normalization to list_phones.py * Updates changelog * Reformats list_phones.py * Adds Welsh phoneme lists, filtered Welsh TSV data * Updates changelog * Updates with instructions to re-scrape * Updates changelog * Updates * Updates data/phones/README.md * Adds Vietnamese phones, Vietnamese TSV files * Updates changelog * Adds Hindi file, new/updated TSV files * Updates changelog * Fixes Serbo-Croatian phones * Updates CHANGELOG * Revert "Adds Hindi file, new/updated TSV files" This reverts commit 964c3becdbf8a4ec35285ffbfbe9a419fad5123e. * Adds Portuguese .phones files, re-scraped TSV data * Rescrapes Portuguese data * Updates changelog * Adds Burmese phones, updated Burmese data * Updates changelog * Adds Japanese phone list. Rescrapes Japanese data * Updates changelog * Removes data/tsv/jpn_hira_phonemic.tsv * Adds Azerbaijani phones, updated TSV data * Updates changelog * Adds Turkish phones, rescraped Turkish data * Updates changelog * Adds Maltese phones, updated data * Updates changelog * Adds Latvian phones, updated Latvian data Co-authored-by: Kyle Gorman <kylebgorman@gmail.com> * [khm] Adds Khmer phones and updated TSV data (#327) * Added French phonemic phones list. Added filter French phonemic tsv. * Added French phonemic phones. * Updated Changelog. * Added phones * Added filtered phonemic wordlist * Added Serbo-Croatian phonemes and filtered TSV files. * Updated summaries for Serbo-Croatian phones. * Updated CHANGELOG. * Fixed formatting of Serbo-Croat phones file and CHANGELOG. * Updated fork to match upstream. * Updated fork to match upstream * Delete .DS_Store I don't know where this file came from... * Delete .DS_Store * Delete hbs_phonemic_phones.txt * Delete .DS_Store * [ita] Adds phoneme list, filtered phonemic TSV file * Updates CHANGELOG * Adds updated README and language summary * Updates CHANGELOG with issue number for Italian phone list * Adds Adyghe phones, filtered Adyghe data * Updated CHANGELOG * Adds Bulgarian phone list, filtered Bulgarian data * Postprocesses with filtered Bulgarian data * Updates changelog * Adds Icelandic phones, filtered TSV file * Updates changelog * Adds Slovenian phones, filtered Slovenian data * Updates changelog * Add normalization to list_phones.py * Updates changelog * Reformats list_phones.py * Adds Welsh phoneme lists, filtered Welsh TSV data * Updates changelog * Updates with instructions to re-scrape * Updates changelog * Updates * Updates data/phones/README.md * Adds Vietnamese phones, Vietnamese TSV files * Updates changelog * Adds Hindi file, new/updated TSV files * Updates changelog * Fixes Serbo-Croatian phones * Updates CHANGELOG * Revert "Adds Hindi file, new/updated TSV files" This reverts commit 964c3becdbf8a4ec35285ffbfbe9a419fad5123e. * Adds Portuguese .phones files, re-scraped TSV data * Rescrapes Portuguese data * Updates changelog * Adds Burmese phones, updated Burmese data * Updates changelog * Adds Japanese phone list. Rescrapes Japanese data * Updates changelog * Removes data/tsv/jpn_hira_phonemic.tsv * Adds Azerbaijani phones, updated TSV data * Updates changelog * Adds Turkish phones, rescraped Turkish data * Updates changelog * Adds Maltese phones, updated data * Updates changelog * Adds Latvian phones, updated Latvian data * Updates changelog * Adds Khmer phones and updated TSV data * Updates changelog Co-authored-by: Kyle Gorman <kylebgorman@gmail.com> * Moves Latin phonelist to Classical Latin. (#326) Also undertook a light reorg. * [nob] Adds Østnorsk (Bokmål) phones and updated TSV data (#330) * Added French phonemic phones list. Added filter French phonemic tsv. * Added French phonemic phones. * Updated Changelog. * Added phones * Added filtered phonemic wordlist * Added Serbo-Croatian phonemes and filtered TSV files. * Updated summaries for Serbo-Croatian phones. * Updated CHANGELOG. * Fixed formatting of Serbo-Croat phones file and CHANGELOG. * Updated fork to match upstream. * Updated fork to match upstream * Delete .DS_Store I don't know where this file came from... * Delete .DS_Store * Delete hbs_phonemic_phones.txt * Delete .DS_Store * [ita] Adds phoneme list, filtered phonemic TSV file * Updates CHANGELOG * Adds updated README and language summary * Updates CHANGELOG with issue number for Italian phone list * Adds Adyghe phones, filtered Adyghe data * Updated CHANGELOG * Adds Bulgarian phone list, filtered Bulgarian data * Postprocesses with filtered Bulgarian data * Updates changelog * Adds Icelandic phones, filtered TSV file * Updates changelog * Adds Slovenian phones, filtered Slovenian data * Updates changelog * Add normalization to list_phones.py * Updates changelog * Reformats list_phones.py * Adds Welsh phoneme lists, filtered Welsh TSV data * Updates changelog * Updates with instructions to re-scrape * Updates changelog * Updates * Updates data/phones/README.md * Adds Vietnamese phones, Vietnamese TSV files * Updates changelog * Adds Hindi file, new/updated TSV files * Updates changelog * Fixes Serbo-Croatian phones * Updates CHANGELOG * Revert "Adds Hindi file, new/updated TSV files" This reverts commit 964c3becdbf8a4ec35285ffbfbe9a419fad5123e. * Adds Portuguese .phones files, re-scraped TSV data * Rescrapes Portuguese data * Updates changelog * Adds Burmese phones, updated Burmese data * Updates changelog * Adds Japanese phone list. Rescrapes Japanese data * Updates changelog * Removes data/tsv/jpn_hira_phonemic.tsv * Adds Azerbaijani phones, updated TSV data * Updates changelog * Adds Turkish phones, rescraped Turkish data * Updates changelog * Adds Maltese phones, updated data * Updates changelog * Adds Latvian phones, updated Latvian data * Updates changelog * Adds Khmer phones and updated TSV data * Updates changelog * Adds Østnorsk (Bokmål) phones and updated TSV data * Updates changelog * Fixes typo Co-authored-by: Kyle Gorman <kylebgorman@gmail.com> * Add English link to language list for frequencies. (#332) * Partial scrape (#334) * scrape up to cantonese * raw partial scrape - excludes yue, rus, cmn * post-processing on partial scrape, src README fix * re-ran generate_summary.py after resolving conflicts * revert comment in scrape.py * updates changelog, resolves formatting error * Updates `data/phones/README.md` (#333) * Added French phonemic phones list. Added filter French phonemic tsv. * Added French phonemic phones. * Updated Changelog. * Added phones * Added filtered phonemic wordlist * Added Serbo-Croatian phonemes and filtered TSV files. * Updated summaries for Serbo-Croatian phones. * Updated CHANGELOG. * Fixed formatting of Serbo-Croat phones file and CHANGELOG. * Updated fork to match upstream. * Updated fork to match upstream * Delete .DS_Store I don't know where this file came from... * Delete .DS_Store * Delete hbs_phonemic_phones.txt * Delete .DS_Store * [ita] Adds phoneme list, filtered phonemic TSV file * Updates CHANGELOG * Adds updated README and language summary * Updates CHANGELOG with issue number for Italian phone list * Adds Adyghe phones, filtered Adyghe data * Updated CHANGELOG * Adds Bulgarian phone list, filtered Bulgarian data * Postprocesses with filtered Bulgarian data * Updates changelog * Adds Icelandic phones, filtered TSV file * Updates changelog * Adds Slovenian phones, filtered Slovenian data * Updates changelog * Add normalization to list_phones.py * Updates changelog * Reformats list_phones.py * Adds Welsh phoneme lists, filtered Welsh TSV data * Updates changelog * Updates with instructions to re-scrape * Updates changelog * Updates * Updates data/phones/README.md * Adds Vietnamese phones, Vietnamese TSV files * Updates changelog * Adds Hindi file, new/updated TSV files * Updates changelog * Fixes Serbo-Croatian phones * Updates CHANGELOG * Revert "Adds Hindi file, new/updated TSV files" This reverts commit 964c3becdbf8a4ec35285ffbfbe9a419fad5123e. * Adds Portuguese .phones files, re-scraped TSV data * Rescrapes Portuguese data * Updates changelog * Adds Burmese phones, updated Burmese data * Updates changelog * Adds Japanese phone list. Rescrapes Japanese data * Updates changelog * Removes data/tsv/jpn_hira_phonemic.tsv * Adds Azerbaijani phones, updated TSV data * Updates changelog * Adds Turkish phones, rescraped Turkish data * Updates changelog * Adds Maltese phones, updated data * Updates changelog * Adds Latvian phones, updated Latvian data * Updates changelog * Adds Khmer phones and updated TSV data * Updates changelog * Adds Østnorsk (Bokmål) phones and updated TSV data * Updates changelog * Fixes typo * Update data/phones/README.md * Update changelog Co-authored-by: Kyle Gorman <kylebgorman@gmail.com> * [arm] cleaned up armenian phones (#331) * cleaned up armenian phones * cleaned up armenian phones (with more tidying up) * cleaning up armenian (fixed changelog) I had written the update on the wrong spot on the changelog + I added the issue number * uncommented accidental gaps * uncommented accidental gaps * added voiceless allophones * added missing geminate affricates * reduced branch * reduced branch * final changes for commit to original branch Co-authored-by: Lucas Ashby <lfeashby@gmail.com> Co-authored-by: Alexander Gutkin <35786058+agutkin@users.noreply.github.com> Co-authored-by: Kyle Gorman <kylebgorman@gmail.com> Co-authored-by: Travis Bartley <Travismbartley@gmail.com> Co-authored-by: Jackson L. Lee <jacksonlunlee@gmail.com> Co-authored-by: ajmalanoski <71616036+ajmalanoski@users.noreply.github.com> Co-authored-by: Alireza <Alirezasampoor@gmail.com> Co-authored-by: Biswaroop Bhattacharjee <biswaroop08@gmail.com> Co-authored-by: Muhammad Fakhri Putra Supriyadi <fakhriputra123s@gmail.com> Co-authored-by: Ben Fernandes <dev.benfernandes@gmail.com> Co-authored-by: Jim Regan <jaoregan@tcd.ie> Co-authored-by: platipo <enrico.paganin@mail.com> Co-authored-by: yeonju123 <yeonju123@gmail.com> Co-authored-by: unknown <Yeonju@NYCMAXASIKKAW10.ad.insidemedia.net> Co-authored-by: Hossep Dolatian <hovdeov@gmail.com>
- Loading branch information