Python parser for EpiDoc (epigraphic documents in TEI XML).
For example idp.data-sheet uses the parser to generate a single CSV sheet of the Papyri.info Integrating Digital Papyrology data.
Install the package
pip install git+https://github.com/Xennis/epidoc-parser
Load a document from a file
import epidoc
with open("my-epidoc.xml") as f:
doc = epidoc.load(f)
Load a document from a string
import epidoc
my_epidoc = """<?xml version="1.0" encoding="UTF-8"?>
<?xml-model href="http://www.stoa.org/epidoc/schema/8.13/tei-epidoc.rng" type="application/xml" schematypens="http://relaxng.org/ns/structure/1.0"?>
<TEI xmlns="http://www.tei-c.org/ns/1.0" xml:id="hgv74005">
[...]
</TEI>
"""
doc = epidoc.loads(my_epidoc)
Call the attributes, for example
>>> doc.title
"Ordre de paiement"
>>> doc.material
"ostrakon"
>>> doc.languages
{"en": "Englisch", "la": "Latein", "el": "Griechisch"}
>>> [t.get("text") for t in doc.terms]
["Anweisung", "Zahlung", "Getreide"]
>>> doc.origin_place.get("text")
"Kysis (Oasis Magna)"
>>> doc.origin_dates[0]
{"notbefore": "0301", "notafter": "0425", "precision": "low", "text": "IV - Anfang V"}
Field | EpiDoc source element (XPath) |
---|---|
commentary | //body/div[@type='commentary' and @subtype='general'] |
edition_foreign_languages | //body/div[@type='edition']//foreign/@xml:lang |
edition_language | //body/div[@type='edition']/@xml:lang |
idno | //teiHeader/fileDesc/publicationStmt/idno |
authority | //teiHeader/fileDesc/publicationStmt/authority |
availability | //teiHeader/fileDesc/publicationStmt/availability |
languages | //teiHeader/profileDesc/langUsage/language |
material | //teiHeader/fileDesc/sourceDesc/msDesc/physDesc/objectDesc//support/material |
origin_dates | //teiHeader/fileDesc/sourceDesc/msDesc/history/origin/origDate |
origin_place | //teiHeader/fileDesc/sourceDesc/msDesc/history/origin/origPlace |
provenances | //teiHeader/fileDesc/sourceDesc/msDesc/history/provenance |
reprint_from | //body/ref[@type='reprint-from'] |
reprint_in | //body/ref[@type='reprint-in'] |
terms | //teiHeader/profileDesc/textClass//term |
title | //teiHeader/fileDesc/titleStmt/title |
Create a virtual environment, enable it and install the dependencies
python3 -m venv venv
. venv/bin/activate
pip install --requirement requirements.txt
Run the test
make unittest
see LICENSE
The test data in this project is from the project idp.data by Papyri.info. This data is made available under a Creative Commons Attribution 3.0 License, with copyright and attribution to the respective projects.