EpiDoc Parser

Python parser for EpiDoc (epigraphic documents in TEI XML).

For example idp.data-sheet uses the parser to generate a single CSV sheet of the Papyri.info Integrating Digital Papyrology data.

Usage

Installation

Install the package

pip install git+https://github.com/Xennis/epidoc-parser

Load a document

Load a document from a file

import epidoc

with open("my-epidoc.xml") as f:
    doc = epidoc.load(f)

Load a document from a string

import epidoc

my_epidoc = """<?xml version="1.0" encoding="UTF-8"?>
<?xml-model href="http://www.stoa.org/epidoc/schema/8.13/tei-epidoc.rng" type="application/xml" schematypens="http://relaxng.org/ns/structure/1.0"?>
<TEI xmlns="http://www.tei-c.org/ns/1.0" xml:id="hgv74005">
   [...]
</TEI>
"""

doc = epidoc.loads(my_epidoc)

Get data from a document

Call the attributes, for example

>>> doc.title
"Ordre de paiement"
>>> doc.material
"ostrakon"
>>> doc.languages
{"en": "Englisch", "la": "Latein", "el": "Griechisch"}
>>> [t.get("text") for t in doc.terms]
["Anweisung", "Zahlung", "Getreide"]
>>> doc.origin_place.get("text")
"Kysis (Oasis Magna)"
>>> doc.origin_dates[0]
{"notbefore": "0301", "notafter": "0425", "precision": "low", "text": "IV - Anfang V"}

Documentation

Field	EpiDoc source element (XPath)
commentary	`//body/div[@type='commentary' and @subtype='general']`
edition_foreign_languages	`//body/div[@type='edition']//foreign/@xml:lang`
edition_language	`//body/div[@type='edition']/@xml:lang`
idno	`//teiHeader/fileDesc/publicationStmt/idno`
authority	`//teiHeader/fileDesc/publicationStmt/authority`
availability	`//teiHeader/fileDesc/publicationStmt/availability`
languages	`//teiHeader/profileDesc/langUsage/language`
material	`//teiHeader/fileDesc/sourceDesc/msDesc/physDesc/objectDesc//support/material`
origin_dates	`//teiHeader/fileDesc/sourceDesc/msDesc/history/origin/origDate`
origin_place	`//teiHeader/fileDesc/sourceDesc/msDesc/history/origin/origPlace`
provenances	`//teiHeader/fileDesc/sourceDesc/msDesc/history/provenance`
reprint_from	`//body/ref[@type='reprint-from']`
reprint_in	`//body/ref[@type='reprint-in']`
terms	`//teiHeader/profileDesc/textClass//term`
title	`//teiHeader/fileDesc/titleStmt/title`

Development

Create a virtual environment, enable it and install the dependencies

python3 -m venv venv
. venv/bin/activate
pip install --requirement requirements.txt

Run the test

make unittest

LICENSE

Code

see LICENSE

Test data

The test data in this project is from the project idp.data by Papyri.info. This data is made available under a Creative Commons Attribution 3.0 License, with copyright and attribution to the respective projects.

Name		Name	Last commit message	Last commit date
Latest commit History 50 Commits
.github/workflows		.github/workflows
epidoc		epidoc
tests		tests
.editorconfig		.editorconfig
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
_config.yml		_config.yml
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

EpiDoc Parser

Usage

Installation

Load a document

Get data from a document

Documentation

Development

LICENSE

Code

Test data

About

Releases

Packages

Contributors 2

Languages

License

Xennis/epidoc-parser

Folders and files

Latest commit

History

Repository files navigation

EpiDoc Parser

Usage

Installation

Load a document

Get data from a document

Documentation

Development

LICENSE

Code

Test data

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages