Reading and Parsing Mixed Content XML with ElementTree

What Is Mixed Content?

Mixed content in XML means an element contains both plain text and nested child elements.


---
xml.etree.ElementTree

The xml.etree.ElementTree module implements a simple and efficient API for parsing and creating XML data.


---

`Python Code to Parse This XML`

In [3]:
import xml.etree.ElementTree as ET

# Parse the XML file
# Read the XML file at "mixed-content.xml" and convert it into a manageable tree
tree = ET.parse('mixed-content.xml')
root = tree.getroot() # Get the root element of the file, which is <article>

# Extract basic elements
title = root.find('title').text
author = root.find('author').text
content = root.find('content')


print(f'Title: {title}')
print(f'Author: {author}')
print('\n--- content ---')

'''mixed content'''
# Function to recursively extract mixed content (text + tags)
def extract_mixed_content(element):
    result = []

    # Add text before any child
    if element.text:
        result.append(element.text.strip())

    # Loop through child elements
    for child in element:
        # Add tag name and its text
        result.append(f"<{child.tag}>{child.text.strip()}</{child.tag}>")

        # Add tail text (text after the child element)
        if child.tail:
            result.append(child.tail.strip())

    return " ".join(result) # Return all parts as one text

# Apply function to <content> tag
full_content = extract_mixed_content(content)
print(full_content)

Title: Understanding XML Mixed Content
Author: Jane Doe

--- content ---
This article explains <emphasis>mixed content</emphasis> in XML. 
    Mixed content allows elements to contain both text and child elements. <example>Here's an example of</example> It's commonly used in document-centric XML applications. <list>Benefits of mixed content:</list> However, it can be more challenging to process than element-only content.
