Skip to content

Gaelic-Algorithmic-Research-Group/ARCOSG-S

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ARCOSG-S

Annotated Corpus of Scottish Gaelic - Simplified

ARCOSG-S is a representative, tagged corpus of Scottish Gaelic, divided into 8 registers (4 spoken, 4 written) of approximately 10k words each. The corpus is presented as individual txt files. It differs from ARCOSG in that it uses less complex tags. For instance, common nouns are tagged in ARCOSG-S simply as 'Nc', rather than with information about number, gender and case (e.g. Ncsmn), as in ARCOSG. The tags were converted automatically from the ARCOSG tags using a mapping file in Python. While the ARCOSG tagset has 246 tags, the ARCOSG-S one has 41.

ARCOSG was hand-tagged by Lamb, Arbuthnot and Naismith and separately verified by them. It uses the Brown format tag separators ('/': e.g. 'agus/Cc') and an annotation scheme derived from the Irish PAROLE tagset (Uí Dhonnchadha, E. and van Genabith, J. 2006. A Part-of-Speech tagger for Irish using finite state morphology and constraint grammar disambiguation. Proceedings of the 5th International Conference on Language Resources and Evaluation (LREC 2006), 2241-2244.).

The annotation scheme is described in a PDF included with the data: Lamb, W. and Naismith, S (2020) Scottish Gaelic Part-of-Speech Annotation Guidelines.

Work towards ARCOSG was funded by Bòrd na Gàidhlig and Carnegie Trust for the Universities of Scotland.

CITATION Lamb, William; Arbuthnot, Sharon; Naismith, Susanna; Danso, Samuel (2020). Annotated Reference Corpus of Scottish Gaelic -Simplified (ARCOSG-S), 1997-2020 [dataset]. University of Edinburgh. School of Literatures, Languages and Cultures. Celtic and Scottish Studies.

About

Annotated Corpus of Scottish Gaelic (Simplified)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published