lexical taggers (language of origin, lemmatizer) for Sahidic Coptic
Switch branches/tags
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
language-tagger
README.md
_enrich.pl
_enrich_no_encode.pl

README.md

lexical taggers for Sahidic Coptic

Includes tagger for language of origin. (Lemmatizer now integrated into the part-of-speech tagger at https://github.com/CopticScriptorium/tagger-part-of-speech)

_enrich.pl script to be used with lexicon file in each subdirectory (e.g., lexicon.txt in the languagetagger subdirectory for the language-of-origin tagging)

Usage: _enrich.pl [optional args] <IN_FILE>

Options and arguments:

-h print this [h]elp message and quit -l [l]exicon file (required). Defaults to lexicon.txt in same directory.

<IN_FILE> A text file one category per line, only text up to the first tab is used for lexicon lookup

example: _enrich.pl -l language-tagger/lexicon.txt my_file.txt

The language tagger now includes lexical entries provided by the Database and Dictionary of Greek Loanwords in Coptic (DDGLC). We thank the DDGLC and its director, Dr. Tonio Sebastian Richter, for this collaboration.

Perl script Copyright 2013-16 Amir Zeldes, Caroline T. Schroeder. The perl program is free software. You may copy or redistribute the script under the same terms as Perl itself.

Additional material copyright 2013-16 Amir Zeldes, Caroline T. Schroeder, Elizabeth Davidson: this is free software distributed under the GNU General Public license v. 3. http://www.gnu.org/licenses/gpl.html. You are welcome to distribute it under the conditions outlined in the license.