____________________
# Exploring TEI with E Tree

### Notebook for Encoding Music
### Haverford College (Haverford, PA USA)

### by Richard Freedman (Haverford College), Daniel Russo-Batterham (Melbourne University), and Oleh Shostak (Haverford College)

### Last Updated January 2026

In this assignment, we will explore XML files encoded in the Text Encoding Initiative (TEI) format. TEI is a widely used standard for representing texts in digital form, particularly in the humanities. TEI files are XML files that contain rich metadata and structural information about texts, making them suitable for various types of analysis.



* **access**: downloading, storing, reading, processing XML files
* **analysis**: performing basic quantitative and qualitative analysis of XML files
* **interpretation**: exploring the meaning and utility of XML files from both analytical and creative perspectives 

As we deal with these files, we will make use of lxml.etree for XML parsing.  


XML files are guided by **markup rules**, which you can read more about [here](https://www.w3schools.com/xml/xml_syntax.asp) and consist of **elements**, which you can dive into [here](https://www.w3schools.com/xml/xml_elements.asp).

TEI files are primarily XML files with specific sections, elements, and structure. You can learn more about TEI [here](https://tei-c.org/).


----

## Import TEI File

What elements are present in the header?


In [1]:
# Print the tree structure of all elements in the TEI header using lxml
import requests
from lxml import etree

url = 'https://ebba.english.ucsb.edu/ballad/37021/ebba-xml-37021'

# Load the XML from the web
response = requests.get(url)
xml_content = response.text

# Parse with lxml, using recover=True to handle entity issues
parser = etree.XMLParser(recover=True)
root = etree.fromstring(xml_content.encode('utf-8'), parser=parser)

# Debug: Check parsing
print('Root tag:', root.tag)  # Should be 'TEI.2'

# Function to print the tree structure
def print_tree(element, indent=0):
    print('  ' * indent + element.tag)
    for child in element:
        print_tree(child, indent + 1)

# Find the header directly - no namespace for TEI P4
header = root.find('teiHeader')

if header is not None:
    print("\n--- Header Structure ---")
    print_tree(header)
    
    # Now extract the content we want
    print("\n--- Titles ---")
    for title in header.findall('.//title'):
        print(''.join(title.itertext()).strip())
    
    print("\n--- Authors ---")
    for author in header.findall('.//author'):
        text = ''.join(author.itertext()).strip()
        print(text if text else "(empty)")
    
    print("\n--- Sponsors ---")
    for sponsor in header.findall('.//sponsor'):
        print(''.join(sponsor.itertext()).strip())
else:
    print("teiHeader not found")


Root tag: TEI.2

--- Header Structure ---
teiHeader
  fileDesc
    titleStmt
      title
      author
      sponsor
      sponsor
      sponsor
      respStmt
        resp
        name
      respStmt
        resp
        name
    editionStmt
      edition
        date
    publicationStmt
      publisher
      pubPlace
      date
      idno
      availability
        p
          address
            addrLine
            addrLine
            addrLine
            addrLine
            addrLine
            addrLine
      idno
    notesStmt
      note
      note
      note
      note
      note
    sourceDesc
      listBibl
        bibl
          note
          biblScope
          title
          title
          author
          respStmt
            resp
            name
            certainty
          imprint
            date
            publisher
              orig
  encodingDesc
    editorialDecl
      p
      p
      p
      p
      p
      p
      p
    classDecl
      taxonomy
        b

## Get the Header Information

And report author, title, sponsor, editors, and descriptors

In [2]:
# Extract title, author, sponsor, taxonomy, and responsibility statements

url = 'https://ebba.english.ucsb.edu/ballad/37021/ebba-xml-37021'

# Load and parse
response = requests.get(url)
parser = etree.XMLParser(recover=True)
root = etree.fromstring(response.content, parser=parser)

# Find the header
header = root.find('teiHeader')

if header is not None:
    print("=== TITLES ===")
    for title in header.findall('.//title'):
        text = ''.join(title.itertext()).strip()
        if text:
            print(text)
    
    print("\n=== AUTHORS ===")
    for author in header.findall('.//author'):
        text = ''.join(author.itertext()).strip()
        print(text if text else "(no author listed)")
    
    print("\n=== SPONSORS ===")
    for sponsor in header.findall('.//sponsor'):
        text = ''.join(sponsor.itertext()).strip()
        if text:
            print(text)
    
    print("\n=== RESPONSIBILITY STATEMENTS ===")
    for respStmt in header.findall('.//respStmt'):
        resp = respStmt.find('resp')
        name = respStmt.find('name')
        resp_text = ''.join(resp.itertext()).strip() if resp is not None else ""
        name_text = ''.join(name.itertext()).strip() if name is not None else ""
        print(f"  {resp_text}: {name_text}")
    
    print("\n=== TAXONOMY ===")
    for taxonomy in header.findall('.//taxonomy'):
        print(f"\nTaxonomy ID: {taxonomy.get('id', 'none')}")
        for category in taxonomy.findall('.//category'):
            cat_id = category.get('id', '')
            cat_desc = category.find('catDesc')
            if cat_desc is not None:
                desc_text = ''.join(cat_desc.itertext()).strip()
                print(f"  {cat_id}: {desc_text}")
else:
    print("teiHeader not found")


=== TITLES ===
THE / True Lovers Knot Untied: / Being the right PATH whereby to advise Princely Virgins how to / Behave themselves, by the Example of the Renowned Princess, the Lady / ARABELLA, and the Second SON of the Lord Seymor, late Earl of / Hartfort.
THE / True Lovers Knot Untied: / Being the right PATH whereby to advise Princely Virgins how to / Behave themselves, by the Example of the Renowned Princess, the Lady / ARABELLA, and the Second SON of the Lord Seymor, late Earl of / Hartfort.
THE True Lover's Knot Untied: Being the right PATH whereby to advise Princely Virgins how to Behave themselves, by the Example of the Renowned Princess, the Lady ARABELLA, and the Second SON of the Lord Seymour, late Earl of Hartford.

=== AUTHORS ===
(no author listed)
(no author listed)

=== SPONSORS ===
University of California - Santa Barbara
The Early Modern Center
English Broadside Ballad Archive (EBBA)

=== RESPONSIBILITY STATEMENTS ===
  Director: Patricia Fumerton
  Associate Director:

## Get the Body, plus each Line Group and Line

In [3]:
url = 'https://ebba.english.ucsb.edu/ballad/37021/ebba-xml-37021'

# Load and parse
response = requests.get(url)
parser = etree.XMLParser(recover=True)
root = etree.fromstring(response.content, parser=parser)
# Find the text body for line groups and lines
text_elem = root.find('text')

if text_elem is not None:
    print("\n=== LINE GROUPS AND LINES ===")
    for i, lg in enumerate(text_elem.findall('.//lg'), 1):
        lg_type = lg.get('type', 'stanza')
        print(f"\n[Line Group {i} - {lg_type}]")
        for line in lg.findall('l'):
            line_text = ''.join(line.itertext()).strip()
            print(f"  {line_text}")
else:
    print("\nText element not found")


=== LINE GROUPS AND LINES ===

[Line Group 1 - stanza]
  AS I to Ireland did pass,
  I saw a Ship at Anchor lay,
  Another Ship likewise there was,
  which from fair England took her way.

[Line Group 2 - stanza]
  This Ship that sa[i]l'd from fair England,
  unknown unto our Gracious King,
  The Lord Chief Justice did command,
  that they to London should her bring.

[Line Group 3 - stanza]
  I drew more near and saw more plain,
  Lady Arabella, in Distress,
  She wrung her hands and wept amain,
  bewailing of her Heaviness.

[Line Group 4 - stanza]
  When near fair London Tower she came,
  whereas her Landing [p]lace should be,
  The King and Queen with all their Train,
  did meet this Lady gallantly:

[Line Group 5 - stanza]
  How now, Arabella, said our King,
  unto this Lady straight did say,
  Who hath first ty'd ye to this thing,
  that you from England took your way?

[Line Group 6 - stanza]
  None but myself, my Gracious Liege,
  these ten long years i've been in love
  With 