A repository containing links to useful phonological software. The goal is to provide a one-stop shop for phonology-related software.
This repository is maintained by Hossep Dolatian and Connor Mayer.
- Software described in published papers or conference proceedings is preferred
- Please provide a link to an implementation. This could be a GitHub repo or some other website hosting the code.
- Please place your resource in the appropriate section. If there is no topic you feel adequately describes your software, you can add a new topic or subtopic.
- Please indicate which language the program is written in.
The best way to add a new resource is to create a pull request via the GitHub user interface.
-
Click on the edit button in the top right corner of this file.
-
This will let you edit the Markdown of the file. See here for information about Markdown syntax. Add your entry.
-
Click the "Preview changes" tab at the top of the page to verify your entry is formatted correctly.
-
Once you're happy with your changes, scroll to the buttom of the page to the box labeled "Commit changes". Add a name for your proposed change and a description if you wish. Select ""Create a new branch for this commit and start a pull request" and then click "Propose file change".
If you are uncomfortable using GitHub, you can email cjmayer@uci.edu or hossep.dolatian@alumni.stonybrook.edu with your proposed changes.
- PhonoApps: A suite of phonological tools for finding natural classes, deriving surface forms from underlying forms given a list of rules, and other functonality.
- Phonomaton: A tool for deriving surface forms from underlying forms. Supports morphological rules and autosegmental tiers.
- Hidden Structure Suite: A suite of constraint-based learning algorithms for phonological hidden structure (feet and underlying representations).
- OTSoft: A Windows program that implements several constraint ranking/weighting procedures, as well as other useful procedures.
- OT-Help: A platform-independent downloadable for finding constraint weightings and constraint rankings, and for calculating typological predictions in both serial and parallel versions of Optimality Theory and Harmonic Grammar.
- Hayes and Wilson learner: A Java program that learns MaxEnt phonotactic grammars from positive data.
- Lexically-Scaled-MaxEnt: A Python-implementation for learning lexically-scaled MaxEnt grammars.
- maxent.ot: An R package for fitting and evaluating MaxEnt OT grammars.
- Maxent Grammar Tool: A Java tool for fitting MaxEnt grammars.
This includes resources that are designed to learn or model specific types of phonological phenomena, such as features or classes.
- Distributional learner: A Python program that learns phonological classes from distributional information.
- Featurizer: A Python program that learns phonological feature systems from a set of input classes.
- BUFIA: A non-stochastic learning algorithm which returns phonotactic constraints over a representation, using phonological features and logical abduction.
- Phonotactic Language Model: A Python program that learns phonotactics using recurrent neural networks.
This includes resources that are focused on learning or modeling general formal grammars (FSAs, FSTs, etc.). These grammars are not designed for any individual phonological phenomenon (such as feature learning or phonotactic learning). They can be freely adapted or used for these more specific tasks.
- BMRS: A Python implementation for Boolean Monadic Recursive Schemes, essentially a logic-based transducer.
- Language Toolkit: A Haskell library and DSL for constructing, factoring, and learning subregular stringsets.
- Pyfoma: A Python implementation for creating and learning finite-state machines.
- Pynini: A Python implementation for creating and using finite-state machines (weighted and unweighted), with support for rewrite rules.
- SigmaPie: A Python library for subregular (SL, TSL, MTSL, SP) and subsequent (OSTIA) learning algorithms. Can do scanning, sample generation, and negative-positive grammar conversion.
- 2IMTSL: A Python implementation for learning MITSL(2,2) grammars.
- OSTIA: A Python implementation for the OSTIA learning algorithm.
- pTSL: A Python program for implementing and fitting probabilistic tier-based strictly local grammars.
- Phonology Problem Set Generator: A tool for converting CSVs containing phonological data to problem set PDFs.
- PhonoGenesis: A tool for generating toy phonology data sets with specific properties.
This includes resources that collect cross-linguistic phenomena alongside either a) formal grammars or b) enough annotation that can facilitate in-depth phonological analyses. The data can be from real languages or toy languages.
- DoReCo: A cross-linguistic speech corpus that is word-aligned and phone-aligned. It can be used as a general dataset for corpus phonetics
- OpenSLR: A cross-linguistic directory of open-access speech corpora.
- Open Speech Corpora: A GitHub repo that is a directory to open-access speech corpora.
- RedTyp: A cross-linguistic database of reduplication patterns, along with a finite-state implementation (for 2-way FSTs), incomparable to the Graz Database on Reduplication.
- StressTyp: A cross-linguistic database of stress patterns, along with a finite-state implementation (FSAs).
- Talking Dictionaries: A collection of online dictionaries with audio files from endangered languages.
- Turkish Electronic Living Lexicon or TELL: A Turkish dictionary with audio files and morphological segmentation.
- UCLA Phonetics Lab Archive: A cross-linguistic database of recoreded word lists and other material.
- WikiPron: A cross-linguistic database of IPA transcriptons that are extracted from Wiktionary.
- Datasets from the Lexically-Scaled-MaxEnt learner.
This includes resources that you can use to work on a written or speech corpus.
These are tools that can quantify or analyze phonological aspects of a corpus.
- Vowel Harmony Calculator: A tool for quantifying vowel co-occurence (harmony and disharmony) in a given corpus.
Forced alignment tools can speed up your annotation time. Different alignment tools offer pre-existing alignment models for certain languages, and some can let you train your own model on a new language. To learn more about a specific aligner, refer to the respective repo.