Skip to content

enhancements w3cdom

DagSverreSeljebotn edited this page · 6 revisions
Clone this wiki locally

W3C DOM access to trees


This facility can be used to put a subset of the W3C DOM (created for XML documents) on trees consisting of Python objects. The subset is:

  • Read-only. Changes to the tree must be done by "normal" Python access to the object (so ifnode.body.append(something) might be ok and is outside of the DOM, but ifnode.insertAfter etc. is not supported)
  • Only exports a Document, Elements, and Attributes, so that other DOM stuff like comments, processing instructions and so on doesn't come into it

The DOM layer is very efficient - only an additional API is added on the existing tree. For the same reason the DOM view is "live", so that changes done in the tree is immediately reflected.


  • Generic XPath implementations that can use any DOM can also use this
  • xmldiff can be used for unit-testing purposes (ie one can use tools designed for DOMs and XML to see if two threes corresponds in the right way)
  • For debugging purposes, it is possible to very easily use any DOM serializer to serialize the tree and have a peek at the contents

Also, if write support is added it means that any XML transformation can be used - but that will probably not happen.

Adding DOM support to a tree

In order to provide DOM access to your tree, these steps must be followed:

  • Inherit from PyObjDOMNode (or, within Cython, CythonNode)
  • Somehow declare a localName attribute (though it will default to the class name)
  • Somehow declare a namespaceUri attribute (though it will default to the empty string)
  • Somehow declare a prefix attribute (or set to None if no prefix is wanted)
  • Do not use any DOM-reserved names. Currently provided automatically (must not collide): nodeName, tagName, isSameNode, parentNode, ownerDocument, childNodes
  • Define __domchildattrs__ as a list containing the attribute names of the children nodes. All attributes listed here must either be a PyObjDOMNode itself, or a list of such (a node directly is treated like a list with one item). Any objects assigned to the attribute or inserted into a list assigned to the attribute will automatically have its parentNode and ownerDocument updated accordingly __domchildattrs__ is read by a metaclass so it must be set on the class, on the object instance.
  • Define __domvalueattrs__. These are attributes that should be exported as DOM attributes on the node. They will be run through str conversion (to have custom formatting for the DOM, simply define a readable property that does the string conversion). A more complex conversion facility may be added...

The children can then be accessed as normal attributes, however the assigned object's parentNode will automatically be updated on assignment and set to None when something else is assigned instead. It is ok to assign lists to the attributes, if so, the parentNode of all the items in the list (which are assumed to also be nodes) are updated as they are added to or removed from the list.



class IfNode(PyDOMNode)
    __childattrs__ = ["condition", "body"]

ifnode = IfNode()
# let a, b, c and d be simple statements...

ifnode.body = a
assert a.parentNode == ifnode
ifnode.body = b
assert a.parentNode == None
ifnode.body = [a, b, c]
assert b.parentNode == ifnode and c.parentNode == ifnode
del ifnode.body[2]
assert c.parentNode == None
ifnode.body = c
assert b.parentNode == None

BTW, this is all easily implemented using a metaclass creating property accessors...

Example/current progress

I currently have this running:

A = parse_string_to_pyrex_tree("""
a = True
if a:
    print "Hello"


using the standard Cython parser, yielding

<?xml version='1.0' encoding='UTF-8'?>
<pyr:ModuleNode xmlns:pyr='cython:pyrextree'>
            <pyr:NameNode name='a'/>
            <pyr:BoolNode value='True'/>
                <pyr:NameNode name='a'/>
                    <pyr:StringNode value='Hello'/>
Something went wrong with that request. Please try again.