1. Open Texts

Free and open primary texts

SunoikisisDC Digital Classics and Byzantine Studies: Session 1

Date: Monday April 8, 2024. 16:00-17:30 BST = 17:00-18:30 CEST.

Convenors: Monica Berti (Universität Leipzig), Gabriel Bodard (University of London), Martina Filosa (Universität zu Köln)

Youtube link: youtu.be/TXl8Ap5KwW8

Slides: Combined slides (PDF)

Outline

The goal of this introductory session is to present free and open resources that collect data and metadata about primary texts in ancient Greek and Latin.

The session will focus on:

Sources of texts (Wikisource, Gutenberg, Googlebooks, Lace, Loebulus, Scaife Viewer, Open Greek & Latin (OGL), Latin Library, etc.)
Brief intro to Regular Expressions for cleaning up texts (e.g. stripping XML tags and other editorial artefacts).

Required readings

Alison Babeu. 2019. "The Perseus Catalog: of FRBR, Finding Aids, Linked Data, and Open Greek and Latin". In M. Berti (ed.), Digital Classical Philology: Ancient Greek and Latin in the Digital Revolution. De Gruyter Saur. Pp. 53-72. DOI: https://doi.org/10.1515/9783110599572-005
Samuel J. Huskey. 2019. "The Digital Latin Library: Cataloging and Publishing Critical Editions of Latin Texts." In M. Berti (ed.), Digital Classical Philology. De Gruyter Saur. Pp. 19–33. DOI: https://doi.org/10.1515/9783110599572-003

Resources

PerseusDL
The Perseus Catalog
Scaife Viewer Library
Digital Latin Library
Vicifons (Latin Wikisource); Βικιθήκη (Ancient Greek texts in Wikisource)
Gutenberg (Latin) and Gutenberg (Ancient Greek)
Latin texts in HathiTrust
Latin e-books in Internet Archive
Diorisis Ancient Greek Corpus
LACE (OCRed Greek texts)
Regex tutorial from RegexOne
Understanding Regular Expressions (at Programming Historian)
Regex cheatsheet or Quick start

Exercise

For this exercise, you will need a text editor that handless Regular Expressions. For example the free Atom or VisualStudioCode editors, or SublimeText with a free trial period.

Work through a Regular Expressions tutorial (e.g. RegexOne; Programming Historian) until you understand the basic syntax.
Find a Greek or Latin text that interests you in Vicifons, Βικιθήκη, or Gutenberg (Latin), Gutenberg (Ancient Greek). Copy the plain text into a blank window in your text editor.
Use Regex to remove any non-text artefacts (line or chapter numbers, notes, annotations, etc.) from the text.
At each step, make a careful note of what you have done, and think about what features you may inadvertently lose through this process. (E.g. numbers that are part of the text; bracketed passages that are not annotations.)

Optional exercise

Download a linguistically annotated text, either from the Diorisis corpus, or from the Perseus Ancient Greek and Latin Treebank (e.g. Aesop 1–8 (direct link; right-click to download)).
Using the advanced Regex methods you have learnt, (a) create a list of only the lemmas in this text, and save as a new file; (b) create a list of only the original forms in this text, and save as a new file.
Is there any other information you might have lost from these files? Can you think of a way to retain punctuation, for example? Could you get rid of those multiple line breaks between each word? (Or only some of them, as otherwise your text has become one long paragraph!)
Keep a careful note of all processes, and be prepared to report your results back to class.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly