Skip to content

7 Using Treebanks

Polina Yordanova edited this page Jan 8, 2021 · 27 revisions

Sunoikisis Digital Classics, Fall 2020

Session 7. Using Treebanks

Thursday Nov 19, 15:30 UK = 16:30 CET

Convenors: Francesca Dell’Oro (University of Lausanne), Vanessa Gorman (University of Nebraska-Lincoln), Marja Vierros (Helsinki), Polina Yordanova (Helsinki)

YouTube link:

Slides: Part 1 (Yordanova); Part 2 (Vierros); Part 3 (DellOro)

Session outline

  1. Welcome and intro to treebanking (PY)
  2. Treebanks to study specific linguistic phenomena in Greek papyri (MV & PY)
  3. Stylometric study and authorship attribution on Xenophon (VG)
  4. Automatic annotation of Latin and Greek texts: lemmatisation, morphological analysis and dependency parsing (FDO)
  5. Presentation of exercise (PY)

Seminar readings

  • Francesca Dell'Oro, Helena Bermúdez Sabel & Paola Marongiu. 2020. “Implemented to Be Shared: the WoPoss Annotation of Semantic Modality in a Latin Diachronic Corpus.” Sharing the Experience: Workflows for the Digital Humanities. Proceedings of the DARIAH-CH Workshop 2019. Available:
  • Vierros, M. 2018. “Linguistic Annotation of the Digital Papyrological Corpus: Sematia.” In Nicola Reggiani (Editor), Digital Papyrology II: Case Studies on the Digital Edition of Ancient Greek Papyri. Berlin, Boston: De Gruyter. Pp. 105–118. Available:

[For discussion in this forum thread]

Further reading

  • Celano Giuseppe. 2014. “A Computational Study on Preverbal and Postverbal Accusative Object Nouns and Pronouns in Ancient Greek.” The Prague Bulletin of Mathematical Linguistics 101, pp. 97–110. Available:
  • Vanessa B. Gorman & Robert J. Gorman. 2016. “Approaching Questions of Text Reuse in Ancient Greek Using Computational Syntactic Stylometry.” Open Linguistics 2, 500-510. Available:
  • Robert J. Gorman. 2019. “Author Identification of Short Texts Using Dependency Treebanks without Vocabulary.” Digital Scholarship in the Humanities. Available:
  • Dag Haug. 2015. “Treebanks in historical linguistic research.” In Carlotta Viti (ed.), Perspectives on Historical Syntax, Benjamins, pp. 188-202.
  • Alek Keersmaekers et al. 2019. “Creating, Enriching and Valorising Treebanks of Ancient Greek: the ongoing Pedalion-project.” Paris. Available at:
  • Francesco Mambrini. 2019. “Nominal vs Copular Clauses in a Diachronic Corpus of Ancient Greek Historians.” Journal of Greek Linguistics 19, 90-113. Available:
  • Francesco Mambrini & Marco Passarotti. 2016. "Subject-Verb Agreement with Coordinated Subjects in Ancient Greek. A Treebank-Based Study." Journal of Greek Linguistics 16 (2016:1), 87–116. Available:
  • Marco Passarotti. 2019. "The Project of the Index Thomisticus Treebank." In Monica Berti (ed), Digital Classical Philology: Ancient Greek and Latin in the Digital Revolution. De Gruyter. Pp. 299–320. Available:


In groups, attempt one of the following exercises:

  1. Familiarise yourself with the Paratypa Variations tool.

    • The diphthong /οι/ has occasionally been replaced with /υ/ by the writers in papyrus texts. This reflects the pronunciation of the diphthong. Try to find with the Paratypa tool how many instances there are where /οι/ has been written as /υ/…
      1. anywhere within a word
      2. preceded by /ν/ and followed by /κ/? What word is the most common and is there a common factor in these cases?
      3. at the beginning of the word? How many of these are the the plural masculine article οἱ?
      4. at the end of the word?
      5. how many of the instances in each of the previous queries are dated BCE?
      6. (extra) try out the regex mode for searching simultaneously this variation preceded by any stop (κ, π, τ,) with their aspirated (χ, φ, θ) and voiced (γ, β, δ) variants.
  2. Second option:

    • Use one or more of the following resources to learn about the Iliados Structural Search tool:
    • In Ancient Greek, neuter plural subjects trigger either plural or singular agreement with the verb. This is supposed to be a relic of an old Indo-European collective number. How frequently does this happen in Homer? And in Aeschylus? Which agreement pattern is more frequent? Use Iliados to answer these questions. Remember to:
      • formulate the problem as clearly as possible
      • define the features of the treebank tokens that you want to consider (number, gender, syntactic relation...)
      • find out how to build a query that selects these features using the syntax of the query language implemented by Iliados
  3. Optional advanced exercise (automatic annotation):

    • Visit the WoPoss Github repository and click on the Launch Binder button, to launch the code environment in your browser. It will take a few minutes to load, but then you may select either the Greek.ipynb or Latin.ipynb notebook. Follow the in-line instructions and run the code snippets, and see if you can work out how the annotations are being generated. How good are the results? How does this compare to the manual annotation of morphology and syntax in Arethusa?

[If you have any technical or practical problems with any of these exercises, you can ask questions in this forum thread, where the tutors or your peers may be able to help]