_HDS5210 - Programming for Health Data Scientists_

# Week 9 - Data Structures - XML

XML is the abbreviation for eXtensible Markup Language.

In this part of the lecture, we'll be working on reading / processing / writing XML.  You can see the sample file that we'll be working with here: https://www.hl7.org/fhir/patient-example-f201-roel.xml.html

The Python manual for the xml module can be found here: https://docs.python.org/3.6/library/xml.html

 <id value="f201"/>
 same as
 <id value="f201"></id>

In [1]:
import xml.etree.ElementTree as xml

In [2]:
x = """<?xml version="1.0"?>
<start a="1" b="2">My Value</start>
"""

In [3]:
x

'<?xml version="1.0"?>\n<start a="1" b="2">My Value</start>\n'

In [4]:
root = xml.fromstring(x)

In [5]:
root.tag

'start'

In [6]:
root.attrib

{'a': '1', 'b': '2'}

In [7]:
root.text

'My Value'

In [35]:
hds5210 = """<?xml version="1.0"?>
<class name="hds5210" >
This class is about programming in Python.
    <instructor>Paul Boal</instructor>
    <instructor>Eric Westhus</instructor>
</class>
"""

In [36]:
c = xml.fromstring(hds5210)

In [37]:
c.text

'\nThis class is about programming in Python.\n    '

In [11]:
c.tag

'class'

In [12]:
c.attrib

{'name': 'hds5210'}

In [15]:
for child in c:
    print(child.tag, child.text)

instructor Paul Boal
instructor Eric Westhus


## Parsing an XML file

In [16]:
tree = xml.parse('/samples/patient-example-f001-pieter.xml')
root = tree.getroot()

In [17]:
root.tag

'{http://hl7.org/fhir}Patient'

In [28]:
root.attrib

{}

In [31]:
for child in root:
    print(child.tag, child.attrib.get('value'), len(child))

{http://hl7.org/fhir}id f001 0
{http://hl7.org/fhir}text None 2
{http://hl7.org/fhir}identifier None 3
{http://hl7.org/fhir}identifier None 2
{http://hl7.org/fhir}active true 0
{http://hl7.org/fhir}name None 4
{http://hl7.org/fhir}telecom None 3
{http://hl7.org/fhir}telecom None 3
{http://hl7.org/fhir}gender male 0
{http://hl7.org/fhir}birthDate 1944-11-17 0
{http://hl7.org/fhir}deceasedBoolean false 0
{http://hl7.org/fhir}address None 5
{http://hl7.org/fhir}maritalStatus None 2
{http://hl7.org/fhir}multipleBirthBoolean true 0
{http://hl7.org/fhir}contact None 3
{http://hl7.org/fhir}communication None 2
{http://hl7.org/fhir}managingOrganization None 2


In [21]:
ns = { 'fhir': 'http://hl7.org/fhir'}
xml.register_namespace('fhir','http://hl7.org/fhir')

In [24]:
for id in root.findall('fhir:telecom', ns):
    print(id.attrib, len(id))

{} 3
{} 3


In [25]:
for nm in root.findall('fhir:name', ns):
    for a in nm:
        print("{:s} --> {:s}".format(str(a.tag), str(a.attrib["value"])))

{http://hl7.org/fhir}use --> usual
{http://hl7.org/fhir}family --> van de Heuvel
{http://hl7.org/fhir}given --> Pieter
{http://hl7.org/fhir}suffix --> MSc


In [26]:
for nm in root.findall('{http://hl7.org/fhir}name'):
    print(xml.tostring(nm))

b'<fhir:name xmlns:fhir="http://hl7.org/fhir">\n    <fhir:use value="usual" />\n    <fhir:family value="van de Heuvel" />\n    <fhir:given value="Pieter" />\n    <fhir:suffix value="MSc" />\n  </fhir:name>\n  '
