# **Extensible Markup Language (XML) processing with Python

Python libraries to work with *XML* 
- **xml.etree.ElementTree** is an API for *analyzing and creating* XML data 
- **xml.dom.minidom** uses a *Document Object Model* approach for XML where each node of the tree structure is an object
- **xml.sax** is where SAX (Simple API for XML) deals with event-driven XML document analysis

 XML is a *markup language* intended for storing and transporting data
 
Here are some XML *elements*:
- **prolog** (optional) is specify character encoding `<?xml version="1.0" encoding="ISO-8859-2>` 
- **root element** is the main element that contains all other element 
- **elements** consist of opening and closing tags (text, attributes, child elements) 
- **attributes** key-value pairs that are inside elements

```xml
<?xml version="1.0"?>
<data>
    <book title="The Little Prince">
        <author>Antoine de Saint-Exupéry</author>
        <year>1943</year>
    </book>
    <book title="Hamlet">
        <author>William Shakespeare</author>
        <year>1603</year>
    </book>
</data>
```
`<?xml version="1.0">` is the **prolog**
`data` is your **root element**
`<book title="The Little Prince">` is an **element** with an **attribute** title 

In [4]:
# Importing ElementTree with an Alias 
import xml.etree.ElementTree as ET 

# Creating a tree from an existing XML document using the parse() method
tree = ET.parse('breakfast.xml')
# getroot() returns the root element in which we can reach any element in the document
root = tree.getroot()   

# Another way is to use formstring from an XML as a string returning a root element 
# root = ET.fromstring(your_xml_as_string)

print("Root tag is: ", root.tag)
for child in root:
    print("Child tag is: ", child.tag)
    print("Attributes: ", child.attrib)

Root tag is:  breakfast_menu
Child tag is:  food
Attributes:  {}
Child tag is:  food
Attributes:  {}
Child tag is:  food
Attributes:  {}
Child tag is:  food
Attributes:  {}
Child tag is:  food
Attributes:  {}


In [25]:
# We could access things directly using indexes
for child in root:  # inside breakfast_menu
    for inner_child in child:   # inside each food element 
        print(inner_child.text)

Berry-Berry Belgian Waffles
$8.95

Light Belgian waffles covered with an assortment of fresh berries and whipped cream

900
French Toast
$4.50

Thick slices made from our homemade sourdough bread

600
Homestyle Breakfast
$6.95

Two eggs, bacon or sausage, toast, and our ever-popular hash browns

950


**IF** my child had attributes...

You could access them using `child.attrib['attrib_name']` and work with `.text` or `.tag` properties 

In [31]:
# Looking inside and finds all child elements (and nested elements) for the requested tag
for breakfast in root.iter('name'): 
    print(breakfast.text)   # we could access the Element class object by using .text
    
# We'll see that this prints nothing because it ONLY looks at the first/closest child of the root element
for breakfast in root.findall('name'):  
    print(breakfast.text)

Berry-Berry Belgian Waffles
French Toast
Homestyle Breakfast


In [35]:
# Using the find method to parse XML 
print(root.find('food').tag)    # Represents the FIRST child element containing the "food" tag

food


In [41]:
# Exercise creating a class to convert cel to fahr
import xml.etree.ElementTree as ET

class TemperatureConverter:
    
    def __init__(self, temp_c):
        self.temp_c = temp_c
        self.temp_f = None
    
    def convert_celsius_to_fahrenheit(self):
        # perform the mafhs 
        temp_f = 9/5*float(self.temp_c) + 32
        self.temp_f = round(temp_f,1)
        return temp_f


class ForecastXmlParser:
    
    def __init__(self):
        pass 
    
    def parse(self, file_name):
        # Create the XML tree 
        tree = ET.parse(file_name)
        # Creating the root based off the tree (grabs root element)
        root = tree.getroot()
        # Looping through all the child memories of root 
        for item in root:
            # Let's convert our Cel to Fahr
            converter = TemperatureConverter(item.find('temperature_in_celsius').text)
            converter.convert_celsius_to_fahrenheit()
            # We could use index slice for each child element (0-day, 1-temp in cel) but we could also work with .find()
            # Our requested format is: "Day: C Celsius, F.0 Fahrenheit" where Fahrenheit is rounded to the first decimal
            print(f'{item[0].text}: {item.find('temperature_in_celsius').text} Celsius, {converter.temp_f} Fahrenheit')
            

forecast = ForecastXmlParser()
forecast.parse('forecast.xml')
        

Monday: 28 Celsius, 82.4 Fahrenheit
Tuesday: 27 Celsius, 80.6 Fahrenheit
Wednesday: 28 Celsius, 82.4 Fahrenheit
Thursday: 29 Celsius, 84.2 Fahrenheit
Friday: 29 Celsius, 84.2 Fahrenheit
Saturday: 31 Celsius, 87.8 Fahrenheit
Sunday: 32 Celsius, 89.6 Fahrenheit


the exact result: Check the Lab_1.PNG

---

# Modifying an XML document 

*Parsing XML files are easy with ElementTree* but now we need to *modify* the element tree and *create* XML files based on certain data 

*Modifying* requires assigning a *new value* to the `tag` property. 

```python
for child in root:
    # Changing a child tag is easy as setting the property to a new value
    child.tag = 'Movie'
    # Say we have another element inside our now child (Movie)
    child.remove(child.find('author'))
    cind.remove(child.find('year'))
```

*Setting* the attribute name requires the function `set()` 
```python
for child in root:
    # Setting an attribute of "rate" to a value to "5" 
    child.set('rate', '5')
    child.get('rate')   # Retrieves the title "rate"
```

These things are **NOT SAVED** to the XML document; therefore, we must use the method `write(file_name.xml, encoding, boolean_prolog_option)`
```python
for child in root:
    child.tag = 'movie'
    child.remove(child.find('author'))
    child.remove(child.find('year'))
    child.set('rate', '5')

tree.write('movies.xml', 'UTF-8', True)
``` 

---

## Building an XML document 

Everything stems from the `Element` class which takes in two arguments: *name* of the tag and an *optional attribute* dictionary 

```python
import xml.etree.ElementTree as ET 

root = ET.Element('data')
ET.dump(root)
```

*Creating child elements* uses the function `SubElement()` which takes in *three arguments*: *parent element* which is the root, *name* of the child tag, *optional attribute* dictionary 

```python
# Building off the code above...
# We are going to create child elements using SubElement()
movie_1 = ET.SubElement(root, 'movie', {'title': 'The Little Prince', 'rate': '5'})
movie_2 = ET.SubElement(root, 'movie', {'title': 'Hamlet', 'rate': '5'})

# Dump us a method used to debug either the whole tree or a single element
ET.dump(root)

# To ACTAULLY save this... we're going to use the write method
# In order to use the write method we need to create a tree based off the root object which is an instance of the Element class 
tree = ET.ElementTree(root) # ElementTree object 
tree.write('file.xml', 'UTF-8', True)
``` 

# Summary and Recap of XML files

**Extensible Markup Language** (XML) could work with many different libraries like:
- **xml.etree.ElementTree** is an API for *analyzing and creating* XML data 
- **xml.dom.minidom** uses a *Document Object Model* approach for XML where each node of the tree structure is an object
- **xml.sax** is where SAX (Simple API for XML) deals with event-driven XML document analysis

We need to familiarize ourselves with the XML *elements*:
- **prolog** (optional) is a specific character encoding `<?xml version="1.0" encoding="ISO-8859-2>` 
- **root element** is the main element that contains all other element 
- **elements** consist of opening and closing tags (text, attributes, child elements) 
- **attributes** key-value pairs that are inside elements

This **markup language** is intended for *storing and transporting data*

To **parse** XML files we...
- Start by Importing the ElementTree module
    - `import xml.etree.ElementTree as ET` 
- Create a tree that parses an xml file 
    - `tree = ET.parse('file_name.xml')` 
- Build a root to access child elements based on our tree 
    -  `root = tree.getroot()`
- Read XML data (Start with looping through the root `for child in root:`)
    - We need to be familiar with certain attributes and **loop through our root**:
        - `tag` grabs the element's name 
        - `attrib` grabs the attribute within the tag 
                - We could use `child.attrib['attrib_name']` to get the specific attribute
        - `text` to gain the actual information **WITHIN** the tag 
    - Finding specific elements 
        - `root.iter('requested_tag')` finds all child elements (even nested ones) for the requested tag 
        - `root.findall('requested_tag')` this ONLY finds the first/closest child to the root element 
        - `root.find('requested_tag').text` will find the FIRST element containing the requested tag (use this when we're looping INSIDE a *child element*)

**Modifying XML documents** is as simple as assigning properties to a new value...
- Start with looping through the root `for child in root:`
- Assign the child properties to another value or delete a certain element
    - `child.tag = 'Movie'`
    - `child.remove(child.find('author'))`
- Setting an attribute using the `set()` function
    - `child.set('attribute_name', 'attribute_value')`
    - `child.get('attribute')   # Retrieves the attribute`
- Understand that you haven't **SAVED** these to the **ACTUAL** file
    - to save we use `write(file_name.xml, encoding, boolean_prolog)`
        - `tree.write('file_name.xml', 'UTF-8', True)`

**Building an XML document** 
- requires us to build a root based on the `Element` class
    - ```python
      root = ET.Element('data') # root element 
      ET.dump(root) # Debugging tree or single element
      ```
-  child elements could be built using the `SubElement()` function
    - ```python
      m1 = ET.SubElement(root, 'movie', {'title': 'The Little Prince', 'rate': '5'})
      ``` 
- Similar to modifying... this isn't saved to the XML document sooo...
    - `write('file_name.xml', encoding, boolean_prolog)` 
      

# Dumb it down for me Justin... I won't remember all this 

**Extensible Markup Language (XML)** is intended for reading and writing data

We just need to remember:
- Build a tree using `xml.etree.ElementTree as ET` 
- Build the root and from there you could access information with several methods and looping through child elements 
- Modifying is different because you could set attributes, change tags using the DOT method and remove child elements with `.remove()` function **BUT** they don't change unless you use the `write()` method 
- This is similar to building an XML file by creating a root using `ET.Element('root_tag')` then creating child elements using `ET.SubElement(root, 'name', {'attr':'val'})`
    - It's similar to modifying because we **MUST** use the `write()` method to the tree in order to save it 
    - We have to build the tree using `tree = ET.ElementTree(root)` because we need to build a ElementTree object  