Skip to content
Branch: master
Find file Copy path
Find file Copy path
Fetching contributors…
Cannot retrieve contributors at this time
104 lines (90 sloc) 5.19 KB


A treebank of Scottish Gaelic based on the Annotated Reference Corpus Of Scottish Gaelic (ARCOSG).


The Scottish Gaelic treebank takes data from ARCOSG, the Annotated Reference Corpus of Scottish Gaelic (Lamb et al. 2016) with the annotation scheme based on that in the Irish UD treebank. Full bibliographic details are to be had there.

It contains eight subcorpora of a varying number of original files, each of approximately 1000 tokens. Not all of them have made it into release 2.5. The test and dev files are complete and the training set will be filled out, hopefully before 2.6. All files listed below are in the training set unless they are explicitly marked as being in test or dev. In the ARCOSG documentation the names of contributors are given in Gaelic, which I have kept and glossed with their names in English where they will be familiar to non-Gaelic speakers.

  • Conversation. c01 is in test, c03 in dev and the rest in train. These are transcripts of interviews in the Western Isles from 1998 to 2000. In c03 and c04 speakers 2, 4 and 5 are children.
  • Public interview. p04 is in test, p05 in dev and the rest in train.
  • Sport. s06 is in test, s08 in dev and the rest in train. s01 to s05 are Radio nan Gàidheil commentary on a match between Scotland and Australia; s06 to s10 on Scotland vs. Yugoslavia.
  • Oral narrative.
    • n01: Na Trì Leinntean Canaich (test)
    • n02: Conall Gulban (dev)
    • n03: Na Fiantaichean
    • n04: Gille an Fheadain Duibh
    • n05: Bodach Ròcabarraigh
    • n06: Iain Beag MacAnndra
    • n07: Fear a' Churracain Ghlais
    • n08: Boban Saor
    • n09: Bean 'ic Odrum
    • n10: Blàr Chàirinis
  • News scripts from Radio nan Gàidheal in the early 1990s.
    • ns01: Màiri Anna NicUalraig (Mary Ann Kennedy)
    • ns02: Dòmhnall Moireasdan
    • ns03: Iseabail NicIllinnein
    • ns04: Innes Rothach
    • ns05: Innes Rothach (test)
    • ns06: Pàdraig MacAmhlaigh (dev)
    • ns07: Dòmhnall Moireasdan (test)
    • ns08: Màiri Anna NicUalraig (dev)
    • ns09: Seumas Domhnallach
    • ns10: Seumas Domhnallach
  • Fiction
    • f01: Am Fainne by Eilidh Watt
    • f02: from Cùmhnantan by Tormod MacGill-Eain
    • f03: Droch Àm by Pòl MacAonghais (test)
    • f04: Spàl Tìm by Cailean T. MacCoinneach
    • f05: Teine a Loisgeas by Eilidh Watt
    • f06: Beul na h-Oidhche by Somhairle MacGill-Eain (Sorley Maclean)
    • f07: from An t-Aonaran by Iain Mac a' Ghobhainn (Iain Crichton Smith)
    • f08: Briseadh na Cloiche by Iain Moireach (dev)
  • Formal prose:
    • fp01: Trì Ginealaichean by D. E. Dòmhnallach
    • fp02: Nua-Bhàrdachd Ghàidhlig by Dòmhnall MacAmhlaigh (Donald MacAulay)
    • fp03: Mairead N. Lachlainn by Somhairle MacGill-Eain (test)
    • fp04: from Bith-eòlas ('Biology') by Ruairidh MacThòmais (Derick Thomson)
    • fp05: Aramach am Bearnaraidh
    • fp06: Blàr a' Chumhaing by Iain A. MacDonald
    • fp07: Na Marbhrannan by Coinneach D. MacDhòmhnaill
    • fp08: Cainnt is Cànan by J. MacInnes
    • fp09: from Dòmhnall Uilleam Stiùbhart (Donald William Stewart)'s unpublished PhD thesis (dev)
  • Popular writing: columns from The Scotsman:
    • pw01: An Cuir am Papa... by Aileig O Hianlaidh (Alex O'Henley)
    • pw02: A bith mar Chorra... by Joina NicDhomnaill (test)
    • pw03: Pàdraig Sellar by Ùisdean MacIllinnein
    • pw04: A' Cur Às Dhuinn Fhìn by Aonghas Mac-a-Phì
    • pw05: Aon Dùthaich by Murchadh MacLeòid
    • pw06: Blas a' Ghuga by Coinneach MacLeòid (dev)
    • pw07: Luchd-ciùil by Criosaidh Dick
    • pw08: Na Gàidheil Ùra by Criosaidh Dick
    • pw09: A' Siubhail gu Rèidh by Tormod Domhnallach (dev)
    • pw10: Poileaticeans by Niall M. Brownlie
    • pw11: Oifigeir Gàidhlig by Aileig O Hianlaidh (test)


We wish to thank all of the contributors to ARCOSG and fellow Celtic language UD developers Teresa Lynn, Johannes Heinecke and Fran Tyers.


  • Colin Batchelor, 2019. Universal dependencies for Scottish Gaelic: syntax, in Proceedings of CLTW2019 at Machine Translation Summit XVII, Dublin, August
  • Lamb, William, Sharon Arbuthnot, Susanna Naismith, and Samuel Danso. 2016. Annotated Reference Corpus of Scottish Gaelic (ARCOSG), 1997–2016 [dataset]. Technical report, University of Edinburgh; School of Literatures, Languages and Cultures; Celtic and Scottish Studies.
  • Lynn, Teresa and Jennifer Foster, [Universal Dependencies for Irish] (, CLTW 2016, Paris, France, July 2016


  • 2019-11-15 v2.5
    • Initial release in Universal Dependencies.
=== Machine-readable metadata (DO NOT REMOVE!) ================================
Data available since: UD v2.5
License: CC BY-SA 4.0
Includes text: yes
Genre: nonfiction fiction news spoken 
Lemmas: converted from manual
UPOS: converted from manual
XPOS: manual native
Features: converted from manual
Relations: converted from manual
Contributors: Batchelor, Colin
Contributing: here
You can’t perform that action at this time.