# Handling XML files
XML is a widespread human readable information representation format and widely used in structured information storage and open format data exchange.
Python provides support to XML file processing through various modules, the module xml being the builtin module for this purpose.

## Creating XML data
The ElementTree module allows an object oriented representation of XML data, allowing creation of elements, subelements along with their associated properties.

In [1]:
# import the necessary modules
import xml.etree.ElementTree as ElementTree
import xml.dom.minidom as minidom

In [2]:
# we would like to specify a basic robot configuration using XML data
# the robot may have a a series of special attributes such as
# serial number, usage purpose and so forth 

# creating a basic XML data tree needs to start with creation
# of the root node
root_node = ElementTree.Element(
    "robot-specification", # the tag of the root tree
    attrib = {  # attributes associated with the root node
        "version": "1.0",
        "creation-date" : "01/12/2023"
    }
)

# add the serial number information to XML data 
serial_number_node = ElementTree.SubElement(
    root_node,
    "serial-number"
)
serial_number_node.text = "09372937-3bf1-454b-857b-ecca1dbe87a0"

# add the usage information attribute information to XML data 
usage_information_element = ElementTree.SubElement(
    root_node,
    "usage-information",
    attrib = {
        "restricted-usage": "false",
        "standard-compliance": "high"
    } 
)

# add multiple usage purpose child nodes
usage_purposes = ["manufacturing", "high precision operations", "micro mechanics"]
for usage_purpose in usage_purposes:
    usage_purpose_node = ElementTree.SubElement(
        usage_information_element,
        "usage-purpose",
    )
    usage_purpose_node.text = usage_purpose
    
# add a comment node for stating other attributes to be added
# adding this comment at the end of the root node
comment_node = ElementTree.Comment("More robot specification elements to be added later")
root_node.append(comment_node)

## Printing XML data
XML data can be displayed using the functionality from the xml minidom module.

In [3]:
# define a XML data formatting function
# for printing XML data 
def format_xml_data(element):
    element_string = ElementTree.tostring(element, encoding = "utf-8", short_empty_elements = False) # generate basic XML data string
    parsed_string  = minidom.parseString(element_string) # use minidom to parse the generated string for reformatting
    return parsed_string.toprettyxml() # get the formatted XML data

In [4]:
# display the formatted data
print (format_xml_data(root_node))

<?xml version="1.0" ?>
<robot-specification version="1.0" creation-date="01/12/2023">
	<serial-number>09372937-3bf1-454b-857b-ecca1dbe87a0</serial-number>
	<usage-information restricted-usage="false" standard-compliance="high">
		<usage-purpose>manufacturing</usage-purpose>
		<usage-purpose>high precision operations</usage-purpose>
		<usage-purpose>micro mechanics</usage-purpose>
	</usage-information>
	<!--More robot specification elements to be added later-->
</robot-specification>



## Saving and Loading XML data
XML data, being text data, can be saved in a text document and afterwards loaded and parsed.

In [5]:
# open a text file and write the formatted XML data
with open("robot_specification.xml", "w") as xml_file:
    xml_file.write(format_xml_data(root_node))

In [6]:
# load the XML data as text from the file 
# and parse it as element tree
xml_string = None
with open("robot_specification.xml", "r") as xml_file:
    xml_string = xml_file.read(xml_string)

parsed_root_element = ElementTree.fromstring(
    # transform the string in a canonical form, removing new lines and other extraneous text
    ElementTree.canonicalize(xml_string, strip_text= True )
)

In [7]:
# print the loaded and parsed data
# we can observe that the etree package
# DOES NOT PARSE XML comments 
# this is the implementation limitation of the module
print(format_xml_data(parsed_root_element)) 

<?xml version="1.0" ?>
<robot-specification creation-date="01/12/2023" version="1.0">
	<serial-number>09372937-3bf1-454b-857b-ecca1dbe87a0</serial-number>
	<usage-information restricted-usage="false" standard-compliance="high">
		<usage-purpose>manufacturing</usage-purpose>
		<usage-purpose>high precision operations</usage-purpose>
		<usage-purpose>micro mechanics</usage-purpose>
	</usage-information>
</robot-specification>



## Accessing XML data
XML data can be be accessed in a programmatic manner. This can be done by iteration mechanisms, direct access or querying.

In [8]:
# it is possible to iterate over the immediate children
# of an XML node
for child_node in parsed_root_element:
    print("Found element with tag {0} and text {1}".format(child_node.tag, child_node.text))

Found element with tag serial-number and text 09372937-3bf1-454b-857b-ecca1dbe87a0
Found element with tag usage-information and text None


In [9]:
# by using the iter method it is possible to 
# it is possible to iterate over the entire set of children
# of an XML node, regardless of the depth level
for child_node in parsed_root_element.iter():
    print("Found element with tag {0} and text {1}".format(child_node.tag, child_node.text))

Found element with tag robot-specification and text None
Found element with tag serial-number and text 09372937-3bf1-454b-857b-ecca1dbe87a0
Found element with tag usage-information and text None
Found element with tag usage-purpose and text manufacturing
Found element with tag usage-purpose and text high precision operations
Found element with tag usage-purpose and text micro mechanics


In [10]:
# it is also possible to access directly by index the elements 
# of an XML tree
element_index = 0
node_at_index = parsed_root_element[element_index]
print("Found element at index {0} with tag {1} and text {2}".format(element_index, node_at_index.tag, node_at_index.text))

Found element at index 0 with tag serial-number and text 09372937-3bf1-454b-857b-ecca1dbe87a0


In [11]:
# it is also possible to search for nodes by using 
# XML tag information or XPath expressions  

# an example is retrieving all usage purpose nodes
# using XPATH expressions
usage_purpose_nodes = parsed_root_element.findall(".//usage-information/*")
for usage_purpose_node in usage_purpose_nodes:
    print("Found element with tag {0} and text {1}".format(usage_purpose_node.tag, usage_purpose_node.text))

Found element with tag usage-purpose and text manufacturing
Found element with tag usage-purpose and text high precision operations
Found element with tag usage-purpose and text micro mechanics


## Modifying XML data
XML data can also be modified once accessed by modifying new nodes and attributes.

In [12]:
# accessing the usage information node
usage_information_element = parsed_root_element.find(".//usage-information")

In [13]:
# remove several children of the node 
usage_information_element.remove(usage_information_element[2])
usage_information_element.remove(usage_information_element[1])

In [14]:
# elements can be added by index
health_usage_purpose_element =  ElementTree.Element(
        "usage-purpose"
    )
health_usage_purpose_element.text = "health"
usage_information_element.insert(0, health_usage_purpose_element)

# alternatively elements can be created as subelements at the end of existing elements list
robotics_usage_purpose_element =  ElementTree.SubElement(
        usage_information_element,
        "usage-purpose"
    )
robotics_usage_purpose_element.text = "robotics"

In [15]:
# changing the attributes of the usage information node
# by directly changing the value in the attrib dictionary
usage_information_element.attrib["taxonomy"] = "simplified"

In [16]:
# display the modified xml data
print(format_xml_data(parsed_root_element))

<?xml version="1.0" ?>
<robot-specification creation-date="01/12/2023" version="1.0">
	<serial-number>09372937-3bf1-454b-857b-ecca1dbe87a0</serial-number>
	<usage-information restricted-usage="false" standard-compliance="high" taxonomy="simplified">
		<usage-purpose>health</usage-purpose>
		<usage-purpose>manufacturing</usage-purpose>
		<usage-purpose>robotics</usage-purpose>
	</usage-information>
</robot-specification>

