# Python code to illustrate parsing of XML files

This example is based on `https://www.datacamp.com/community/tutorials/python-xml-elementtree`

### Importing the required modules 

In [1]:
import csv 
import requests 
import xml.etree.ElementTree as ET 

the `ElementTree` library has functions to read and manipulate XML files (and other similarly structured files).

## Get the XML data

### Request XML data from an API endpoint and save it to a file on disk

In [8]:
# url of API endpoint
url = 'https://www.ilo.org/ilostat/sdmx/ws/rest/codelist/ILO/CL_ECO'

# creating HTTP response object from given url 
resp = requests.get(url) 

# saving the xml message into an xml file 
with open('test.xml', 'wb') as f: 
    f.write(resp.content) 

### Read the XML file with `ElementTree`

In [9]:
xmlfile = 'test.xml'

# create element tree object 
tree = ET.parse(xmlfile) 


## Look at the XML and print out values in order to understand how the tree is structured.

Every part of a tree (root included) has:
- a tag that describes the element
- (optional) attributes 

### Get the root element of the XML tree

In [11]:
root = tree.getroot()
root

<Element '{http://www.SDMX.org/resources/SDMXML/schemas/v2_0/message}Structure' at 0x0000021B6F40E7C8>

In [14]:
root.tag

'{http://www.SDMX.org/resources/SDMXML/schemas/v2_0/message}Structure'

In [5]:
root.attrib

{}

### Iterate over children in the root using a `for` loop

In [6]:
for child in root:
    print(child.tag, child.attrib)

{http://www.SDMX.org/resources/SDMXML/schemas/v2_0/message}Header {}
{http://www.SDMX.org/resources/SDMXML/schemas/v2_0/message}CodeLists {}


### List all the elements in the entire tree using `root.iter()`

This gives a general notion for how many elements are in the XML file, but it does not show the attributes or levels in the tree.

In [12]:
elements = [elem.tag for elem in root.iter()]
elements

['{http://www.SDMX.org/resources/SDMXML/schemas/v2_0/message}Structure',
 '{http://www.SDMX.org/resources/SDMXML/schemas/v2_0/message}Header',
 '{http://www.SDMX.org/resources/SDMXML/schemas/v2_0/message}ID',
 '{http://www.SDMX.org/resources/SDMXML/schemas/v2_0/message}Test',
 '{http://www.SDMX.org/resources/SDMXML/schemas/v2_0/message}Prepared',
 '{http://www.SDMX.org/resources/SDMXML/schemas/v2_0/message}Sender',
 '{http://www.SDMX.org/resources/SDMXML/schemas/v2_0/message}Receiver',
 '{http://www.SDMX.org/resources/SDMXML/schemas/v2_0/message}CodeLists',
 '{http://www.SDMX.org/resources/SDMXML/schemas/v2_0/structure}CodeList',
 '{http://www.SDMX.org/resources/SDMXML/schemas/v2_0/structure}Name',
 '{http://www.SDMX.org/resources/SDMXML/schemas/v2_0/structure}Name',
 '{http://www.SDMX.org/resources/SDMXML/schemas/v2_0/structure}Name',
 '{http://www.SDMX.org/resources/SDMXML/schemas/v2_0/structure}Description',
 '{http://www.SDMX.org/resources/SDMXML/schemas/v2_0/structure}Description'

### See the whole document passing the root into the `.tostring()` method

Since the `ElementTree` library can interpret more than just XML, you must specify both the encoding and decoding of the document to be displayed displaying as the string (`'utf8'` is the typical document encoding for XML files).

In [18]:
print(ET.tostring(root, encoding='utf8').decode('utf8'))

<?xml version='1.0' encoding='utf8'?>
<ns0:Structure xmlns:ns0="http://www.SDMX.org/resources/SDMXML/schemas/v2_0/message" xmlns:ns1="http://www.SDMX.org/resources/SDMXML/schemas/v2_0/structure" xmlns:ns2="http://www.SDMX.org/resources/SDMXML/schemas/v2_0/common">
  <ns0:Header>
    <ns0:ID>IDREF3877</ns0:ID>
    <ns0:Test>false</ns0:Test>
    <ns0:Prepared>2019-08-09T13:04:25.694Z</ns0:Prepared>
    <ns0:Sender id="Unknown" />
    <ns0:Receiver id="Unknown" />
  </ns0:Header>
  <ns0:CodeLists>
    <ns1:CodeList agencyID="ILO" id="CL_ECO" isFinal="true" urn="urn:sdmx:org.sdmx.infomodel.codelist.Codelist=ILO:CL_ECO(1.0)" version="1.0">
      <ns1:Name xml:lang="es">Actividad económica</ns1:Name>
      <ns1:Name xml:lang="en">Economic activity</ns1:Name>
      <ns1:Name xml:lang="fr">Activité économique</ns1:Name>
      <ns1:Description xml:lang="es">&lt;p&gt;Este tipo de clasificaci&amp;oacute;n hace referencia a la actividad principal del establecimiento en el que la persona trabaj&amp

### Use of the `iter()` function to find elements of interest

`root.iter()` will list all subelements under the root that match the element specified. 

#### Make alist of dictionaries with all the attributes of the `{http://www.SDMX.org/resources/SDMXML/schemas/v2_0/structure}Code`  elements in the tree:

In [24]:
codeAttributes = []

for c in root.iter('{http://www.SDMX.org/resources/SDMXML/schemas/v2_0/structure}Code'):
    codeAttributes.append(c.attrib)

codeAttributes

[{'urn': 'urn:sdmx:org.sdmx.infomodel.codelist.Code=ILO:CL_ECO(1.0).ECO_ISIC2_6',
  'value': 'ECO_ISIC2_6'},
 {'urn': 'urn:sdmx:org.sdmx.infomodel.codelist.Code=ILO:CL_ECO(1.0).ECO_ISIC2_7',
  'value': 'ECO_ISIC2_7'},
 {'urn': 'urn:sdmx:org.sdmx.infomodel.codelist.Code=ILO:CL_ECO(1.0).ECO_ISIC2_8',
  'value': 'ECO_ISIC2_8'},
 {'urn': 'urn:sdmx:org.sdmx.infomodel.codelist.Code=ILO:CL_ECO(1.0).ECO_ISIC2_9',
  'value': 'ECO_ISIC2_9'},
 {'urn': 'urn:sdmx:org.sdmx.infomodel.codelist.Code=ILO:CL_ECO(1.0).ECO_ISIC2_2',
  'value': 'ECO_ISIC2_2'},
 {'urn': 'urn:sdmx:org.sdmx.infomodel.codelist.Code=ILO:CL_ECO(1.0).ECO_ISIC2_3',
  'value': 'ECO_ISIC2_3'},
 {'urn': 'urn:sdmx:org.sdmx.infomodel.codelist.Code=ILO:CL_ECO(1.0).ECO_ISIC2_4',
  'value': 'ECO_ISIC2_4'},
 {'urn': 'urn:sdmx:org.sdmx.infomodel.codelist.Code=ILO:CL_ECO(1.0).ECO_ISIC2_TOTAL',
  'value': 'ECO_ISIC2_TOTAL'},
 {'urn': 'urn:sdmx:org.sdmx.infomodel.codelist.Code=ILO:CL_ECO(1.0).ECO_ISIC2_5',
  'value': 'ECO_ISIC2_5'},
 {'urn': 'u