# CombinedReader Demo

Search for Caesar across Latin and Greek Tesserae corpora through a single unified interface.

In [1]:
from latincyreaders import TesseraeReader, GreekTesseraeReader, CombinedReader, AnnotationLevel
from itertools import islice

In [2]:
# Set up individual readers (no NLP model needed for search())
lat = TesseraeReader()
grk = GreekTesseraeReader()

# Combine into a single reader
combined = CombinedReader(lat, grk)

print(combined)
print(f"Total files: {len(combined.fileids())}")

CombinedReader(tesserae=TesseraeReader, greektesserae=GreekTesseraeReader)
Total files: 1720


## Cross-lingual search: Caesar

A single regex pattern matches Latin *Caesar* and Greek *Καῖσαρ* across both corpora through one `combined.search()` call — fast, no NLP models loaded. Results are sorted by file ID so the two corpora interleave naturally.

In [3]:
# Single multilingual regex: Latin inflections + Greek inflections
# Searches raw text across both corpora — no spaCy pipeline needed
pattern = r"Caesare?[ms]?\b|Κα[ιί]σαρ\w*"

# Sort by local fileid (after namespace prefix) so corpora interleave
results = sorted(combined.search(pattern), key=lambda r: r[0].split("/", 1)[1])

print(f"Total hits: {len(results)}")
print()
for fid, cit, text, matches in results[:10]:
    corpus = fid.split("/")[0]
    print(f"[{corpus}] {cit}: {text[:90]}...")
    print(f"  Matched: {matches}")
    print()

Total hits: 3912

[greektesserae] <Ael. Ar. Orat. 7 41.512>: Αὐτοκράτορι Καίσαρι Μάρκῳ Αὐρηλίῳ Ἀντωνίνῳ σεβαστῷ καὶ αὐτοκράτορι Καίσαρι Λουκίῳ Αὐρηλίῳ ...
  Matched: ['Καίσαρι', 'Καίσαρι']

[tesserae] <amm. 14.1.9>: Novo denique perniciosoque exemplo, idem Gallus ausus est inire flagitium grave, quod Roma...
  Matched: ['Caesare']

[tesserae] <amm. 14.1.10>: Thalassius vero ea tempestate praefectus praetorio praesens, ipse quoque arrogantis ingeni...
  Matched: ['Caesar']

[tesserae] <amm. 14.2.20>: Haec ubi latius fama vulgasset, missaeque relationes assiduae Gallum Caesarem permovissent...
  Matched: ['Caesarem']

[tesserae] <amm. 14.7.1>: Latius iam disseminata licentia, onerosus bonis omnibus Caesar, nullum post haec adhibens ...
  Matched: ['Caesar']

[tesserae] <amm. 14.7.9>: Haec subinde Constantius audiens, et quaedam referente Thalassio doctus, quem obisse iam c...
  Matched: ['Caesarem']

[tesserae] <amm. 14.7.10>: Qui cum venisset ob haec festinatis itineribus Antiochiam, pr

In [4]:
# Namespaced fileids let you filter by corpus

all_fids = combined.fileids()
lat_fids = [f for f in all_fids if f.startswith("tesserae/")]
grk_fids = [f for f in all_fids if f.startswith("greektesserae/")]

print(f"Latin files:  {len(lat_fids)}")
print(f"Greek files:  {len(grk_fids)}")
print(f"Total:        {len(all_fids)}")

Latin files:  900
Greek files:  820
Total:        1720
