# CombinedReader Demo

Search for Caesar across Latin and Greek Tesserae corpora through a single unified interface.

In [1]:
from latincyreaders import TesseraeReader, GreekTesseraeReader, CombinedReader, AnnotationLevel
from itertools import islice

In [2]:
# Set up individual readers (no NLP model needed for search())
lat = TesseraeReader()
grk = GreekTesseraeReader()

# Combine into a single reader
combined = CombinedReader(lat, grk)

print(combined)
print(f"Total files: {len(combined.fileids())}")

CombinedReader(tesserae=TesseraeReader, greektesserae=GreekTesseraeReader)
Total files: 1720


## Cross-lingual search: Caesar

Use each reader's `search()` method for fast regex matching across both corpora — no NLP models loaded.

In [3]:
# Fast regex search — works on raw text, no spaCy pipeline needed

print("=== Latin ===")
print()
for fid, cit, text, matches in islice(lat.search(r"Caesare?[ms]?\b"), 5):
    print(f"{cit}: {text[:100]}...")
    print(f"  Matched forms: {matches}")
    print()

print("=== Greek ===")
print()
for fid, cit, text, matches in islice(grk.search(r"Κα[ιί]σαρ"), 5):
    print(f"{cit}: {text[:100]}...")
    print(f"  Matched forms: {matches}")
    print()

=== Latin ===

<amm. 14.1.9>: Novo denique perniciosoque exemplo, idem Gallus ausus est inire flagitium grave, quod Romae cum ulti...
  Matched forms: ['Caesare']

<amm. 14.1.10>: Thalassius vero ea tempestate praefectus praetorio praesens, ipse quoque arrogantis ingenii, conside...
  Matched forms: ['Caesar']

<amm. 14.2.20>: Haec ubi latius fama vulgasset, missaeque relationes assiduae Gallum Caesarem permovissent, quoniam ...
  Matched forms: ['Caesarem']

<amm. 14.7.1>: Latius iam disseminata licentia, onerosus bonis omnibus Caesar, nullum post haec adhibens modum, ori...
  Matched forms: ['Caesar']

<amm. 14.7.9>: Haec subinde Constantius audiens, et quaedam referente Thalassio doctus, quem obisse iam compererat ...
  Matched forms: ['Caesarem']

=== Greek ===



<Ael. Ar. Orat. 7 41.512>: Αὐτοκράτορι Καίσαρι Μάρκῳ Αὐρηλίῳ Ἀντωνίνῳ σεβαστῷ καὶ αὐτοκράτορι Καίσαρι Λουκίῳ Αὐρηλίῳ Κομόδῳ σεβ...
  Matched forms: ['Καίσαρ', 'Καίσαρ']



<app. bc. 1.0.5>: αἱ δὲ στάσεις ἐπὶ τῷδε μάλιστα αὖθις ἐπανελθοῦσαί τε καὶ αὐξηθεῖσαι δυνατώτατα ἐς μέγα προῆλθον, καὶ...
  Matched forms: ['Καίσαρ']

<app. bc. 1.0.6>: ὧδε μὲν ἐκ στάσεων ποικίλων ἡ πολιτεία Ῥωμαίοις ἐς ὁμόνοιαν καὶ μοναρχίαν περιέστη: ταῦτα δʼ ὅπως ἐγ...
  Matched forms: ['Καίσαρ', 'Καίσαρ']

<app. bc. 1.5.40>: ἡγοῦντο δὲ Ῥωμαίων μὲν ὕπατοι Σέξστος τε Ἰούλιος Καῖσαρ καὶ Πόπλιος Ῥουτίλιος Λοῦπος: ἄμφω γὰρ ὡς ἐς...
  Matched forms: ['Καίσαρ', 'Καίσαρ']

<app. bc. 1.5.42>: Γάιος δὲ Πάπιος Νῶλάν τε εἷλεν ἐκ προδοσίας καὶ τοῖς ἐν αὐτῇ Ῥωμαίοις, δισχιλίοις οὖσιν, ἐκήρυξεν, ε...
  Matched forms: ['Καίσαρ', 'Καίσαρ']



In [4]:
# Namespaced fileids let you filter by corpus

all_fids = combined.fileids()
lat_fids = [f for f in all_fids if f.startswith("tesserae/")]
grk_fids = [f for f in all_fids if f.startswith("greektesserae/")]

print(f"Latin files:  {len(lat_fids)}")
print(f"Greek files:  {len(grk_fids)}")
print(f"Total:        {len(all_fids)}")

Latin files:  900
Greek files:  820
Total:        1720
