# üîß Xenon v1.0.0: XML Repair for LLM-Generated XML

Welcome to the interactive Xenon demo! This notebook shows you how to repair malformed XML commonly generated by Large Language Models.

**What is Xenon?**
- **Secure-by-default**: Explicit trust levels for input sources.
- **Zero Dependencies**: Uses only Python Standard Library.
- **LLM-Focused**: Specifically designed to handle common LLM XML generation failures.
- **Accurate Reporting**: Get detailed, accurate reports on every repair made.

**Open in Colab**: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/MarsZDF/xenon/blob/main/xenon_demo.ipynb)

## üì¶ Installation

Install Xenon from PyPI:

In [None]:
# Install Xenon from PyPI
!pip install -q elemental-xenon

from xenon import (
    repair_xml_safe,
    repair_xml_with_report,
    parse_xml_safe,
    format_xml,
    TrustLevel, # New in v1.0.0
    RepairType
)

print("‚úÖ Xenon installed successfully!")

---

# üéØ The 4 Major LLM XML Failure Modes

Xenon handles the most common ways LLMs break XML. Notice the mandatory `trust` parameter in all examples.

## 1Ô∏è‚É£ Truncation / Cut-off

LLMs run out of tokens mid-tag and leave XML incomplete:

In [None]:
# LLM output that got cut off
truncated = '<root><user name="alice"><address><city>San Francisco'

print("‚ùå BROKEN XML:")
print(truncated)
print()

# Xenon repairs it
repaired = repair_xml_safe(truncated, trust=TrustLevel.UNTRUSTED)
print("‚úÖ REPAIRED XML:")
print(repaired)

## 2Ô∏è‚É£ Conversational Fluff

LLMs wrap XML in conversational text:

In [None]:
# LLM adds unnecessary commentary
fluff = '''
Sure! Here's the XML you requested:

<root>
    <message>Hello World</message>
    <status>success</status>
</root>

Hope this helps! Let me know if you need anything else.
'''

print("‚ùå MESSY OUTPUT:")
print(fluff)
print()

# Xenon extracts just the XML
repaired = repair_xml_safe(fluff, trust=TrustLevel.UNTRUSTED)
print("‚úÖ CLEAN XML:")
print(repaired)

## 3Ô∏è‚É£ Malformed Attributes

LLMs forget to quote attribute values:

In [None]:
# Missing quotes around attributes
unquoted = '<product id=12345 category=electronics price=299.99>Laptop</product>'

print("‚ùå INVALID ATTRIBUTES:")
print(unquoted)
print()

# Xenon adds quotes
repaired = repair_xml_safe(unquoted, trust=TrustLevel.UNTRUSTED)
print("‚úÖ VALID ATTRIBUTES:")
print(repaired)

## 4Ô∏è‚É£ Unescaped Entities

LLMs forget to escape special characters like `&` and `<`:

In [None]:
# Special characters not escaped
unescaped = '<description>Price: $5 < $10 & shipping included</description>'

print("‚ùå UNESCAPED ENTITIES:")
print(unescaped)
print()

# Xenon escapes them
repaired = repair_xml_safe(unescaped, trust=TrustLevel.UNTRUSTED)
print("‚úÖ PROPERLY ESCAPED:")
print(repaired)

---

# ‚ú® NEW in v1.0.0: Accurate Repair Analysis

See exactly what Xenon fixed with detailed, accurate repair reports. No more guesswork!

In [None]:
# Malformed XML with multiple issues
messy = '''
Here's your data:

<users>
    <user id=1001 role=admin>
        <name>John Smith & Associates</name>
        <email>john@example.com</email>
        <status>Active & Verified
'''

# Get detailed report of what was fixed
repaired, report = repair_xml_with_report(messy, trust=TrustLevel.UNTRUSTED)

print(f"üìä Found {len(report.actions)} repairs to make:")
for action in report.actions:
    print(f"  - Type: {action.repair_type.value}")
    print(f"    Description: {action.description}")
    if action.before:
        print(f"    Before: {action.before}")
        print(f"    After:  {action.after}")
print("\n" + "-"*20 + "\n")
print("‚úÖ REPAIRED XML:")
print(repaired)

## View Changes as Diff

See before/after comparison:

In [None]:
# Show unified diff
print("üìù UNIFIED DIFF:")
print(report.to_unified_diff())

---

# üé® XML Formatting

Format XML for readability or storage:

In [None]:
# Compact XML
compact = '<root><item>test</item><another>data</another></root>'

print("üìù ORIGINAL (compact):")
print(compact)
print()

# Pretty-print for readability
pretty = format_xml(compact, style='pretty')
print("‚ú® PRETTY-PRINTED:")
print(pretty)

## Repair + Format in One Step

In [None]:
# Broken XML that needs formatting
broken = '<root><item>test</item><another>data'

# Repair AND format in one call
result = repair_xml_safe(broken, trust=TrustLevel.TRUSTED, format_output='pretty')

print("üì• INPUT:")
print(broken)
print()
print("üì§ OUTPUT (repaired + formatted):")
print(result)

---

# üìä Parse to Dictionary

Convert repaired XML to Python dictionaries:

In [None]:
import json

# Malformed XML from LLM
malformed = '<response status=success><data count=3><item>Apple</item><item>Banana</item><item>Orange'

print("üìù Input:")
print(malformed)
print()

# Parse directly (repairs automatically)
data = parse_xml_safe(malformed, trust=TrustLevel.UNTRUSTED)

print("üì¶ Parsed Dictionary:")
print(json.dumps(data, indent=2))
print()

# Access data easily
print("üéØ Extracted Values:")
print(f"  Status: {data['response']['@attributes']['status']}")
print(f"  Count: {data['response']['data']['@attributes']['count']}")
print(f"  Items: {data['response']['data']['item']}")

---

# üõ°Ô∏è Security Audit Logging

For SOC 2 compliance and security monitoring, log all repair actions and threats:

In [None]:
from xenon.audit import AuditLogger

# 1. Configure Logger
logger = AuditLogger()

# 2. Process potentially malicious input
risky_input = '<?php system("rm -rf /"); ?> <script>alert(1)</script> <user>Alice</user>'

print("üì• RISKY INPUT:")
print(risky_input)
print()

# 3. Repair with logging enabled
safe_output = repair_xml_safe(
    risky_input,
    trust=TrustLevel.UNTRUSTED,
    audit_logger=logger
)

print("üì§ SANITIZED OUTPUT:")
print(safe_output)
print()

# 4. View Audit Log (JSON format)
print("üìã AUDIT LOG ENTRY:")
import json
print(json.dumps(logger.to_json(), indent=2))

---

# ü¶úüîó LangChain Integration

Use Xenon as a drop-in OutputParser for LangChain:

In [None]:
try:
    from xenon.integrations.langchain import XenonXMLOutputParser
    
    # Initialize parser
    parser = XenonXMLOutputParser(
        trust=TrustLevel.UNTRUSTED,
        return_dict=True  # Set False to get XML string
    )
    
    # Simulate LLM output
    llm_output = "Here is the data: <user id=123>Bob</user>"
    
    # Parse
    result = parser.parse(llm_output)
    
    print("‚úÖ LangChain Parser Result:")
    print(json.dumps(result, indent=2))

except ImportError:
    print("‚ö†Ô∏è LangChain not installed in this environment")

---

# üéÆ Interactive Playground

Try your own malformed XML below!

In [None]:
# ‚úèÔ∏è Edit this XML and run the cell to see how Xenon repairs it!

your_xml = '''
Sure, here's the XML:

<config>
    <database host=localhost port=5432>
        <credentials user=admin password=secret123
'''

print("üì• YOUR INPUT:")
print(your_xml)
print()
print("=" * 60)
print()

# Repair with detailed report
repaired, report = repair_xml_with_report(your_xml, trust=TrustLevel.UNTRUSTED, format_output='pretty')

print("üîß REPAIRS MADE:")
for action in report.actions:
    print(f"  ‚Ä¢ {action.description} (Type: {action.repair_type.value})")
print()
print("=" * 60)
print()

print("üì§ XENON OUTPUT:")
print(repaired)