xmljson converts XML into Python dictionary structures (trees, like in JSON) and vice-versa.
XML can be converted to a data structure (such as JSON) and back. For example:
<employees> <person> <name value="Alice"/> </person> <person> <name value="Bob"/> </person> </employees>
can be converted into this data structure (which also a valid JSON object):
{ "employees": [ { "person": { "name": {"@value": "Alice"} } }, { "person": { "name": {"@value": "Alice"} } } ] }
This uses the BadgerFish convention that prefixes attributes with @
.
The conventions supported by this library are:
- BadgerFish: Use
"$"
for text content,@
to prefix attributes - GData: Use
"$t"
for text content, attributes added as-is - Yahoo Use
"content"
for text content, attributes added as-is - Parker: Use tail nodes for text content, ignore attributes
To convert from a data structure to XML using the BadgerFish convention:
>>> from xmljson import badgerfish as bf >>> bf.etree({'p': {'@id': 'main', '$': 'Hello', 'b': 'bold'}})
This returns an array of etree.Element structures. In this case, the result is identical to:
>>> from xml.etree.ElementTree import fromstring >>> [fromstring('<p id="main">Hello<b>bold</b></p>')]
The result can be inserted into any existing root etree.Element:
>>> from xml.etree.ElementTree import Element, tostring >>> result = bf.etree({'p': {'@id': 'main'}}, root=Element('root')) >>> tostring(result) '<root><p id="main"/></root>'
This includes lxml.html as well:
>>> from lxml.html import Element, tostring >>> result = bf.etree({'p': {'@id': 'main'}}, root=Element('html')) >>> tostring(result, doctype='<!DOCTYPE html>') '<!DOCTYPE html>\n<html><p id="main"></p></html>'
For ease of use, strings are treated as node text. For example, both the following are the same:
>>> bf.etree({'p': {'$': 'paragraph text'}}) >>> bf.etree({'p': 'paragraph text'})
By default, non-string values are converted to strings using Python's str
,
except for booleans -- which are converted into true
and false
(lower
case). Override this behaviour using xml_fromstring
:
>>> tostring(bf.etree({'x': 1.23, 'y': True}, root=Element('root'))) '<root><y>true</y><x>1.23</x></root>' >>> from xmljson import BadgerFish # import the class >>> bf_str = BadgerFish(xml_tostring=str) # convert using str() >>> tostring(bf_str.etree({'x': 1.23, 'y': True}, root=Element('root'))) '<root><y>True</y><x>1.23</x></root>'
To convert from XML to a data structure using the BadgerFish convention:
>>> bf.data(fromstring('<p id="main">Hello<b>bold</b></p>')) {"p": {"$": "Hello", "@id": "main", "b": {"$": "bold"}}}
To convert this to JSON, use:
>>> from json import dumps >>> dumps(bf.data(fromstring('<p id="main">Hello<b>bold</b></p>'))) '{"p": {"b": {"$": "bold"}, "@id": "main", "$": "Hello"}}'
To preserve the order of attributes and children, specify the dict_type
as
OrderedDict
(or any other dictionary-like type) in the constructor:
>>> from collections import OrderedDict >>> from xmljson import BadgerFish # import the class >>> bf = BadgerFish(dict_type=OrderedDict) # pick dict class
By default, values are parsed into boolean, int or float where possible (except
in the Yahoo method). Override this behaviour using xml_fromstring
:
>>> dumps(bf.data(fromstring('<x>1</x>'))) '{"x": {"$": 1}}' >>> bf_str = BadgerFish(xml_fromstring=False) # Keep XML values as strings >>> dumps(bf_str.data(fromstring('<x>1</x>'))) '{"x": {"$": "1"}}' >>> bf_str = BadgerFish(xml_fromstring=repr) # Custom string parser '{"x": {"$": "\'1\'"}}'
To use a different conversion method, replace BadgerFish
with one of the
other classes. Currently, these are supported:
>>> from xmljson import badgerfish # == xmljson.BadgerFish() >>> from xmljson import gdata # == xmljson.GData() >>> from xmljson import parker # == xmljson.Parker() >>> from xmljson import yahoo # == xmljson.Yahoo()
This is a pure-Python package built for Python 2.6+ and Python 3.0+. To set up:
pip install xmljson
- Test cases for Unicode
- Support for namespaces and namespace prefixes