# Introdution

In this program, we parse an XML file using Python's ElementTree XML API and extract some interesting data and present it in a way humans can understand it. For this exercise, I have taken the XML data of Formula 1 Grand Prix held in Melbourne, Australia which is available at this [link](http://www.enetpulse.com/documentation/). After the file is parsed, the data is presented in the console as well as written to a file.

In [11]:
#!/usr/bin/python
# Python program to parse a XML file

"""
Usage: Python-XML.py

Process a XML file and generates human readable content

Options
_______

-h or help		Displays this message
"""

import xml.etree.ElementTree as ET
from sys import argv, exit

Firstly, we print the doc message followed by importing the ElementTree XML API as `ET` and a few others such as `argv` and `exit` from the `sys` module.

In [12]:
#if len(argv) > 1:
#	print(__doc__)
#	exit(0)

The above checks for the length of the initial argument to display the help message. Note these are commented out in the Ipython Notebook to avoid Ipython raising an exception as it tries to quit.

In [13]:
with open('F1-Grand-Prix-Australia.xml', 'r') as xmlFile:
	xmlTree = ET.parse(xmlFile)

print "\n"

datawrite = open('XMLResults.txt', 'w+')





We open the XML file as a `xmlFile` and parse it using the `ET.parse` method such that the entire XML file is represented as an XML tree structure. Subsequently, we open a file for writing the parsed results as we traverse the XML tree.

In [14]:
for treeProperties in xmlTree.findall('.//'):
	for prop in treeProperties.findall('properties/property'):
		propName = prop.get('name')
		propValue = prop.get('value')
		if propName == 'startnumber' or propName == "team":
			break
		else:
			print "%s: %s" % (propName, propValue)
			datawrite.write('%s: %s' % (propName, propValue) + '\n')
		fastest = prop.findall('.//participant')
		if fastest:
			for f in fastest:
				fastestPerson = f.get('name')
				print "Fastest participant name:", fastestPerson
				datawrite.write('Fastest participant name: %s' % (fastestPerson) + '\n')

Trackname: Melbourne
Kilometers: 307.6
Laps: 58
Live: yes
laps: 58
weather: Cloudy
track_condition: Dry
current_lap: 58
fastest_lap_participantFK: 62454
Fastest participant name: Kimi Raikkonen
fastest_lap_time: 1:29.274


Firstly, we find all the subelements in the `XMLTree` using `findall('.//')` and subsequently loop through each one of them in order to search for children with the name `property`. Within these, we extract the `name` and `value` attributes using the `.get` method. We are interested in only a few attributes and hence we eliminate non-essential information through the `break` statement. Within the final child `property`, we have another element that has the fastest participant data and, this is searched using the `findall(.//participant)`. Similar to the former approach, we extract using `.get` method.

In [15]:
print "\n**Event participants**\n"
datawrite.write('\n*Event participants**\n')

for treeEvent in xmlTree.findall('.//'):
	for event in treeEvent.findall('event/event_participant'):
		participantNumber = event.get('number')
		participantName = event.find('participant').get('name')
		participantGender = event.find('participant').get('gender')

		print "Participant number: %s" % participantNumber
		datawrite.write('Participant number: %s' % (participantNumber) + '\n')		
		print "Name: %s, Gender: %s" % (participantName, participantGender)
		datawrite.write('Name: %s, Gender: %s' % (participantName.encode('utf8'), participantGender) + '\n')
		
		
		results = event.findall('.//result')
		if results:
			for result in results:
				resultName = result.get('result_code')
				resultValue = result.get('value')
				print "%s: %s" % (resultName, resultValue)
				datawrite.write('%s: %s' % (resultName, resultValue) + '\n')
				
		print "******************************************"
		datawrite.write('******************************************' + '\n')
        
datawrite.close()
xmlFile.close()


**Event participants**

Participant number: 1
Name: Charles Pic, Gender: male
rank: 16
duration: 
laps_behind: 2
pitstops: 2
laps: 56
points: 0
******************************************
Participant number: 2
Name: Giedo van der Garde, Gender: male
rank: 18
duration: 
laps_behind: 2
pitstops: 3
laps: 56
points: 0
******************************************
Participant number: 3
Name: Felipe Massa, Gender: male
rank: 4
duration: +33.500
pitstops: 3
laps: 58
points: 12
******************************************
Participant number: 4
Name: Fernando Alonso, Gender: male
rank: 2
duration: +12.400
pitstops: 3
laps: 58
points: 18
******************************************
Participant number: 5
Name: Paul di Resta, Gender: male
rank: 8
duration: +1:08.400
pitstops: 2
laps: 58
points: 4
******************************************
Participant number: 6
Name: Adrian Sutil, Gender: male
rank: 7
duration: +1:05.000
pitstops: 2
laps: 58
points: 6
******************************************
Participant

After obtaining some initial summary of the F1 Grand Prix race, we can proceed to extract other interesting data. Similar to the element `properties` seen previously, `event` is another element and we need to traverse through `event_participant` within the `element` to look for other data such as participant name, gender, etc. We obtain these through looping the main tree and then again looping through the `event_participant` tree. The relevant data is obtained using the find and get methods. Under each participant, we also see that there are some results associated with it. These are obtained through looping within the `result` element.

Finally, we close the XML file and the output file.