Bangor Autoglosser
PHP PLSQL Shell TeX Perl
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
autoglosser
clauses
cognates
combiwords
dbs
docs
enlist
florian
grammar
gt
histcorpus
includes
insertions
lookups
mc
tex
tiers
unknowns
utils
GNU_Affero_GPL.txt
GNU_GPL.txt
README.md
anonymise_audio.php
anonymise_audio_nowww.php
anonymise_audio_wav.php
append_or.php
apply_cg.php
apply_prepub.php
apply_traced_cg.php
autogloss_only.php
cgimport.php
copy_header.php
corrections.txt
create_cgfinished.php
create_cgutterances.php
create_cgwords.php
create_prepub.php
create_sampleclauses.php
diff_my_files.php
do_everything.php
do_mor.php
fix_um-uh.php
gather_fixes.php
import_and_convert.php
import_only.php
join_tags.php
newlangid.php
osfixes.php
owfixes.php
prepare_file.php
rewrite_utterances.php
tidy_or.php
write_cgautogloss.php
write_cgfinished.php
write_cohorts.php
write_compare_glosses.php
write_dataset.php
write_mysyntax.php
write_tagless.php
writeout_only.php

README.md

##Bangor autoglosser##

The code here was produced to POS-tag the conversational corpora assembled by the ESRC Centre for Research on Bilingualism in Theory & Practice at University of Wales Bangor.

The data was bilingual conversational running text, and the autoglosser tags it in one pass based on constraint grammar linguistic rules for each language.

Note that this code is not really packaged properly, because a lot of the work was done ad hoc. (To get a smaller, cleaner implementation, try the Gáidhlig autoglosser.) Hopefully this will be remedied (at least for Welsh) as part of the work on the new CorcCenCC (Corpus Cenedlaethol Cymraeg Cyfoes - National Corpus of Contemporary Welsh).