Ed2TEI
is a Python package designed to convert critical editions into structured XML-TEI format following the TEI P5 Guidelines. The package allows users to create a TEI XML file from a txt file, and to automate the insertion of variant readings, rejected readings, and notes into a TEI file using structured txt files.
You can install Ed2TEI
using pip:
pip install git+https://github.com/DelphDem/Ed2TEI.git
- Create TEI Base File: Generate a basic TEI XML file from a plain text file.
- Automated Variant Insertion: Insert complex variant readings from several sources into your TEI XML file.
- Rejected Readings Tagging: Automatically tag rejected readings using
<choice>
,<sic>
, and<corr>
elements. - Editorial Notes Tagging: Insert editorial notes using the
<note>
element based on text locations. - Flexible File Formats: Supports various type of text with customizable options.
The Ed2TEI
package offers several command-line tools to handle different types of critical apparatus. Below is an overview of the main commands and their expected file formats. The model is based on the first paragraphs of Tony Hunt, Sermons on Joshua, vol1, ANTS Plain Texts Series 12-13, London, 1998. Some variants, notes, and rejected readings have been added to demonstrate the package's possibilities.
The create_tei
command is used to generate a base TEI file from a structured text file. This is typically the first step before adding variants, rejected readings, or notes.
create_tei [options] --input_file <plain_text_file> --output_file <output_tei_file>
--input_file
(required): Path to the txt file containing the main text to convert.--output_file
(required): Path to the generated TEI XML file.--is_verse
(optional): Process the text as verse (paragraph are tagged as stanza using the<lg>
tag instead of<p>
).--number_lines_every
(optional): Number lines every N lines (default: number_lines_every=4).--number_stanzas_paragraphs
(True/False): Stanzas or paragraphs are numbered (default=True).--use_roman_numerals
(True/False): Use Roman numerals for numbering stanza or paragraph (default=False).--reset_counts_on_page_break
(True/False): Reset line and paragraph counts at every page break (default=False).
The add_variants
command is used to insert variant readings into a TEI XML file.
add_variants --tei_file <input_tei_file> --variants_file <variants_text_file> --output_file <output_tei_file>
--tei_file
(required): Path to the input TEI XML file.--variants_file
(required): Path to the text file containing variant readings.--output_file
(required): Path to the output TEI XML file.
The add_rejected
command is used to insert rejected readings into a TEI XML file using the <choice>
, <sic>
, and <corr>
tags.
add_rejected --tei_file <input_tei_file> --variants_file <variants_text_file> --output_file <output_tei_file>
--tei_file
(required): Path to the input TEI XML file.--variants_file
(required): Path to the text file containing rejected readings.--output_file
(required): Path to the output TEI XML file.
The add_notes
command is used to add editorial notes to specified lines in a TEI XML file.
--tei_file
(required): Path to the input TEI XML file.--variants_file
(required): Path to the text file containing editorial readings.--output_file
(required): Path to the output TEI XML file.
To ensure the proper functioning of the package, the txt files need to be structured in a specific format:
The txt text file includes the text, the folio breaks and page breaks (where applicable). Each line break in the file will be consided as a a line using the <l></l>
tags. Page breaks should be specified with <p +volume number-page number>
(e.g.: <p i-7
>, <p. 8>
). Folio numbers will be indicated with square bracket: [f.+folio number/r,v/column]
(e.g.: [f.1ra]
). A blank line indicates a paragraph or stanza. See the Txt_models folder for an example.
The variants txt file should contain variant readings for each line in the TEI file. The format is:
line_number (lemma) variant [witness]
- Line Number: The line number in the TEI file, indicated with
xml:id="L+number"
, where the variant should be applied, without the 'L' prefix. - Lemma: The main text found in the TEI file corresponding to the variant.It should be specified with round brackets.
- Variants with Witnesses: Each variant reading followed by its witness enclosed in square brackets; in case of several witness for one reading, witness should be separated by a comma. In case of multiple readings for one lemma, the lemma should be repated before each variant.
For a unique variant:
6 (meisun) mansiun [C,P]
For multiple variants for the same lemma:
26 (Deu) om [C] (Deu) D. devine [D]
The rejected readings file should specify corrections to be marked in the TEI file. The format is:
line_number (corrected_text) rejected_text
- Line Number: The line number in the TEI file, indicated with
xml:id="L+number"
, where the variant should be applied, without the 'L' prefix. - Corrected Text: Text that should replace the rejected reading, enclosed in round brackets ().
- Rejected Text: The text that is currently in the òriginal text but should be marked as rejected.
This project is licensed under the MIT License - see the LICENSE file for details.