pytwolc

Experimental two-level rule compilation using Python HFST. For more information, see https://github.com/hfst/python

Rule compiler: twol.py

The Python program twol.py is a rule compiler and tester for rules of simplified two-level model, see https://pytwolc.readthedocs.io/en/latest/formalism.html for more information on the rule formalism and the compiler. The HST package can be loaded using the command:

$ python3 -m pip install hfst

The program twol.py uses and depend on the 'tatsu' Python parser generator by Juancarlo Añez, seee http://tatsu.readthedocs.io/en/stable/index.html for detailed documentation. You can load and install TaTsu from the net using a command:

$ python3 -m pip install tatsu

The program is prepared to handle input in Unicode, including user percieved graphemes which are combined out of two or more Unicode characters (with a so called code point). In order to recognize such graphemes, an additional package has to be installed:

$ python3 -m pip install grapheme

The compiler needs two files: (1) examples as a FST and (2) a rule file. The human readable examples must be converted into a FST using twexamp.py program.

The compiler is normally executed as follows:

$ python3 twol.py examples.fst rules.twolc

One can get more information by using the --help parameter. More documentation on twol.py can be found at https://pytwolc.readthedocs.io/en/latest/compiletest.html

Converting examples from pair string format into a FST: twexamp.py

The module twexamp.py handles various tasks for the compiler during the compilation process. It is also needed for converting human readable examples into a FST so that ti is not necessary recompile it at every step of testing rules. A recompilation is only needed when the examples are changed. In order to convert examples from a pair string format into a fst you can e.g.:

$ python3 twexamp.py examples.pstr examples.fst

Morphophonemic representations

The sequence of programs parad2words.py, words2zerofilled.py, zerofilled2raw.py and raw2named.py is intended for determining the underlying or morphophonemic representations for word stems. It starts from a table of word forms or paradigms where morphs are separated from each other e.g. by a period (.). See https://pytwolc.readthedocs.io/en/latest/morphophon.html for more information on their use. Each program is run from the command line, and one can get detailed information on the parameters by running the command with a --help argument, e.g.

$ python3 words2zerofilled.py --help

Some of the programs of this sequence need the package orderedset which one can get from the net by

$ python3 -m pip install orderedset

Especially the zero-filling program needs the same package for handling combined graphemes as twol.py uses:

$ python3 -m pip install grapheme

There is a Makefile in the subdirectory parad and examples which may help in testing and using the programs.

Discovering raw rules: twdiscov.py

This program builds tentative or raw rules out of a set of examples. The examples must be given one example per line as a space-separated list of symbol pairs. See https://pytwolc.readthedocs.io/en/latest/twdiscov.html for more information.

Name		Name	Last commit message	Last commit date
Latest commit History 183 Commits
docs		docs
fi		fi
hist		hist
lexgues		lexgues
parad		parad
test		test
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
TODO.md		TODO.md
affixes2analysis.py		affixes2analysis.py
affixes2guessing.py		affixes2guessing.py
cfg.py		cfg.py
entry-pattern.py		entry-pattern.py
fs.py		fs.py
generate.py		generate.py
guessbyasking.py		guessbyasking.py
guessbygenerating.py		guessbygenerating.py
guessfromforms.py		guessfromforms.py
guessfromwords.py		guessfromwords.py
guesswithcorpus.py		guesswithcorpus.py
histdiscov.py		histdiscov.py
multialign.py		multialign.py
parad2words.py		parad2words.py
pat-proc.py		pat-proc.py
raw2named.py		raw2named.py
twbt.py		twbt.py
twdiscov.py		twdiscov.py
twexamp.py		twexamp.py
twol.py		twol.py
twolcomp.py		twolcomp.py
twolcsyntax.ebnf		twolcsyntax.ebnf
twparser.py		twparser.py
twrule.py		twrule.py
wordlist2entries.py		wordlist2entries.py
wordlist2entriesusingcorpus.py		wordlist2entriesusingcorpus.py
words2zerofilled.py		words2zerofilled.py
zerofilled2raw.py		zerofilled2raw.py

koskenni/pytwolc

Folders and files

Latest commit

History

Repository files navigation

pytwolc

Rule compiler: twol.py

Converting examples from pair string format into a FST: twexamp.py

Morphophonemic representations

Discovering raw rules: twdiscov.py

About

Topics

Resources

Stars

Watchers

Forks

Languages