_HDS5210 - Programming for Health Data Scientists_

# Week 9 - Data Structures - XML

XML is the abbreviation for eXtensible Markup Language.

In this part of the lecture, we'll be working on reading / processing / writing XML.  You can see the sample file that we'll be working with here: https://www.hl7.org/fhir/patient-example-f201-roel.xml.html

The Python manual for the xml module can be found here: https://docs.python.org/3.6/library/xml.html

In [1]:
import xml.etree.ElementTree as xml

In [2]:
x = """<?xml version="1.0"?>
<start a="1" b="2">My Value</start>
"""

In [3]:
root = xml.fromstring(x)

In [4]:
root.tag

'start'

In [5]:
root.attrib

{'a': '1', 'b': '2'}

In [6]:
root.text

'My Value'

## Parsing an XML file

In [7]:
tree = xml.parse('/samples/patient-example-f001-pieter.xml')
root = tree.getroot()

In [8]:
root.tag

'{http://hl7.org/fhir}Patient'

In [9]:
root.attrib

{}

In [19]:
for child in root:
    print(child.tag, child.attrib)

{http://hl7.org/fhir}id {'value': 'f001'}
{http://hl7.org/fhir}text {}
{http://hl7.org/fhir}identifier {}
{http://hl7.org/fhir}identifier {}
{http://hl7.org/fhir}active {'value': 'true'}
{http://hl7.org/fhir}name {}
{http://hl7.org/fhir}telecom {}
{http://hl7.org/fhir}telecom {}
{http://hl7.org/fhir}gender {'value': 'male'}
{http://hl7.org/fhir}birthDate {'value': '1944-11-17'}
{http://hl7.org/fhir}deceasedBoolean {'value': 'false'}
{http://hl7.org/fhir}address {}
{http://hl7.org/fhir}maritalStatus {}
{http://hl7.org/fhir}multipleBirthBoolean {'value': 'true'}
{http://hl7.org/fhir}contact {}
{http://hl7.org/fhir}communication {}
{http://hl7.org/fhir}managingOrganization {}


In [11]:
ns = { 'hl7': 'http://hl7.org/fhir'}

In [12]:
for id in root.findall('hl7:id', ns):
    print(id.attrib)

{'value': 'f001'}


In [25]:
for nm in root.findall('hl7:name', ns):
    for a in nm:
        print("{:s} --> {:s}".format(str(a.tag), str(a.attrib["value"])))

{http://hl7.org/fhir}use --> usual
{http://hl7.org/fhir}family --> van de Heuvel
{http://hl7.org/fhir}given --> Pieter
{http://hl7.org/fhir}suffix --> MSc
