Skip to content

Feature Request: Add Document Serialization APIs #24

@redvers

Description

@redvers

Recommended libxml2 API Functionality to Add Next

Executive Summary

Recommendation: Add Document Serialization APIs

Document serialization (saving XML to file/string) is the highest-value addition because:

  1. Foundation for your goals: Required for both modifying existing XML and creating new XML from scratch
  2. Immediate utility: Enables save-modify-save workflows right away
  3. Low complexity: Straightforward implementation (~2.5 hours)
  4. Unblocks future work: Prerequisite for document creation APIs

Current API Coverage Analysis

What's Implemented:

  • ✅ XML parsing (from file and string)
  • ✅ XPath evaluation with namespaces
  • ✅ Tree navigation (root, children, attributes)
  • ✅ Attribute operations (get/set/unset)
  • ✅ Content retrieval
  • ✅ Error handling via Xml2Error

Critical Gap Identified:

  • Document serialization - Cannot save documents to file or string
  • ❌ Document creation - No API to build XML from scratch
  • ❌ Node creation/insertion - Cannot add new elements
  • ❌ Node removal - Cannot delete elements
  • ❌ HTML parsing - Not exposed at high level
  • ❌ Schema validation (XSD/DTD)

Why Serialization First:
Without serialization, any document modifications or creation are useless - you can't save the results. This is the logical first step that enables all write-oriented workflows.


Implementation Plan: Document Serialization

API Design

Add two methods to Xml2Doc class:

fun serialize(
  format: Bool = true,
  encoding: String = "UTF-8")
  : String ?
  """
  Serialize document to String with optional formatting.
  Returns pretty-printed or compact XML as String val.
  """

fun saveToFile(
  auth: FileAuth,
  filename: String,
  format: Bool = true,
  encoding: String = "UTF-8")
  : None ?
  """
  Save document to file with optional formatting and encoding.
  Requires FileAuth capability for safe file access.
  """

Parameters:

  • format: Bool - Pretty-print with indentation (true) or compact (false)
  • encoding: String - Character encoding (default "UTF-8", also supports "ISO-8859-1", "UTF-16", etc.)

Error Handling:

  • Uses Pony's ? operator (consistent with existing API)
  • Raises error on memory allocation failure or file write failure

Implementation Steps

Phase 1: Add xmlFree to Raw API (required for memory management)

  • File: libxml2/raw/uses.pony
    • Add: use @xmlFree[None](ptr: Pointer[U8] tag)
  • File: libxml2/raw/functions.pony
    • Add wrapper function xmlFree(ptr: Pointer[U8] tag): None

Phase 2: Implement Serialization Methods

  • File: libxml2/xml2doc.pony
    • Add serialize() method using xmlDocDumpFormatMemoryEnc()
    • Add saveToFile() method using xmlSaveFormatFileEnc()

Key Implementation Details:

  • serialize() calls xmlDocDumpFormatMemoryEnc(), copies result to Pony String, then calls xmlFree() to release C memory
  • saveToFile() calls xmlSaveFormatFileEnc(), checks return value for errors (negative = failure)
  • Both methods convert Bool format parameter to I32 (1 or 0) for C API

Phase 3: Add Comprehensive Tests

  • File: libxml2/_tests/coverage_tests.pony
    • Round-trip test: parse → serialize → parse → verify structure
    • Formatting test: verify compact has no newlines, formatted has indentation
    • File save/load test: save → load → verify content
    • Encoding test: verify UTF-8 and ISO-8859-1 output
    • Modified document test: setProp → serialize → verify changes persist
    • Error handling test: invalid file paths raise errors correctly

Phase 4: Documentation

  • Update docstrings with examples
  • Update CHANGELOG with new features

Critical Files

  1. libxml2/xml2doc.pony - Add serialize() and saveToFile() methods
  2. libxml2/raw/uses.pony - Add @xmlFree FFI declaration
  3. libxml2/raw/functions.pony - Add xmlFree() wrapper
  4. libxml2/_tests/coverage_tests.pony - Add 6 new test cases
  5. libxml2/xml2node.pony (reference) - Existing nodeDump() shows similar pattern

Memory Management Approach

// In serialize():
// 1. Call xmlDocDumpFormatMemoryEnc() - libxml2 allocates memory
// 2. Check for null pointer (allocation failure)
// 3. Copy to Pony String using String.from_cstring()
// 4. FREE with xmlFree() - critical to avoid leak
// 5. Return Pony-owned string

Verification Plan

Manual Testing:

# Run unit tests
make unit-tests

# Test with examples
cd examples
# Modify an example to parse, modify, serialize, and save

Test Coverage Required:

  1. Parse document → serialize → verify XML structure
  2. Parse → modify attributes → serialize → verify changes
  3. Serialize with format=true → verify indentation present
  4. Serialize with format=false → verify compact output (no newlines)
  5. saveToFile → parseFile → verify round-trip works
  6. Test different encodings (UTF-8, ISO-8859-1)
  7. Error cases: invalid file paths, allocation failures

Success Criteria:

  • All new tests pass
  • No memory leaks (valgrind clean if needed)
  • Round-trip preservation: parse → serialize → parse yields same structure
  • Both UTF-8 and other encodings work correctly

Roadmap: Recommended Order for Future APIs

After document serialization is complete, implement in this order:

  1. Document Creation (next logical step)

    • Xml2Doc.create(version, encoding) - Create empty document
    • Enables building XML from scratch
  2. Node Creation and Insertion

    • Xml2Node.createElement(name, content?)
    • Xml2Node.appendChild(child)
    • Xml2Doc.setRootElement(node)
    • Enables dynamic XML construction
  3. Node Removal

    • Xml2Node.remove() or Xml2Node.unlink()
    • Completes CRUD operations on trees
  4. Text Node Access

    • Expose text nodes (currently skipped by getChildren())
    • Xml2Node.getTextContent(), Xml2Node.setTextContent()
  5. HTML Parsing (separate use case)

    • Xml2Doc.parseHtmlFile(), Xml2Doc.parseHtml()
    • Leverage libxml2's HTML parser
  6. Schema Validation (advanced feature)

    • XSD/DTD validation APIs
    • Lower priority, more complex

Why Not Other Features First?

Node Creation Without Serialization: Useless - can't save results
HTML Parsing: Separate use case, doesn't support your stated goals (XML modification/creation)
Schema Validation: Advanced feature, less commonly needed
XSLT: Complex, lower demand than basic CRUD operations

Serialization is the linchpin - it unblocks everything else you want to do.


Estimated Effort

Document Serialization Implementation:

  • Add xmlFree to raw API: 15 minutes
  • Implement serialize(): 30 minutes
  • Implement saveToFile(): 20 minutes
  • Write 6 test cases: 45 minutes
  • Debug and polish: 30 minutes
  • Total: ~2.5 hours

Difficulty: Low-Medium

  • Straightforward C API usage
  • Clear memory management pattern
  • Good test coverage possible
  • Main risk: pointer handling (mitigated by following existing patterns)

Generated by Claude Code analysis

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions