# XML (Extendable Markup Language) is a language

It's also a **universal and transparent carrier** of any type of data. 


```xml
<?xml version = "1.0" encoding = "utf-8"?>
<!-- cars.xml - List of cars ready to sell -->
<!DOCTYPE cars_for_sale SYSTEM "cars.dtd">
<cars_for_sale>
   <car>
      <id>1</id>
      <brand>Ford</brand>
      <model>Mustang</model>
      <production_year>1972</production_year>
      <price currency="USD">35900</price>
   </car>
   <car>
      <id>2</id>
      <brand>Aston Martin</brand>
      <model>Rapide</model>
      <production_year>2010</production_year>
      <price currency="GBP">32000</price>
   </car>
</cars_for_sale>
```

**The first line** *declares that the XML contains plain text* where we have an `attribute=value` like `version="1.0"` which informs us about the version of XML and `encoding="utf-8"` for the text encoded.

**Second line** is just a *comment* using `<!-- comment -->` (Parser is going to ignore it completely)

**Third line** has `<!DOCTYPE cars_for_sale SYSTEM "cars.dtd">` which is **self-defining** and the contents **isn't defined at all** as the parser is not able to check if the document contains all the needed data. 
- We have a "meta-document" which *describes the desired document content*. The parser is able to fully check the document's correctness
- *Document type definition* (dtd) is a file extension which allows us to aggregate our document with its external definition (any internet location)
    - *DOCTYPE* has the **name** of the XML document (*cars_for_sale*) 
        - also a **keyword** (`SYSTEM`, `PUBLIC`) where `PUBLIC` means DTD is pub with an **FPI** (Formal Public Identifier) and it's location is on a url. `SYSTEM` has it to where DTD is private and we just need the URI
        - We also have the **URI** `"cars.dtd"`

**Next** We have the XML document itself which consist of **elements** (**pair of tags**). It can also contain **empty elements** using (`<empty></empty>` or '<empty/>`).

**Inside the XML document** we have *two elements* which means there are two cars inside the root element for sale. 

**Within each car** we have specified data for each.

We need to take a look at `<price currency="WSD">35900</price>` where we see that `currency` is the `prices` **attribute**. We could also add **as many attr inside a tag**

---

## How do we process XML documents in Python

From our other course we remember that we *process* XML using a **tree graph**

```python
import xml.etree.ElementTree

cars_for_sale = xml.etree.ElementTree.parse('cars.xml').getroot()
print(cars_for_sale.tag)
for car in cars_for_sale.findall('car'):
    print('\t', car.tag)
    for prop in car:
        print('\t\t', prop.tag, end='')
        if prop.tag == 'price':
            print(prop.attrib, end='')
    print(' =', prop.text)
```

We remember the `xml.etree.ElementTree as ET` where we
- Parse the xml file then grab the root (cars_for_sale) `cars_for_sale = xml.etree.ElementTree.parse('cars.xml').getroot()`
- Iterate through all cars using `cars_for_sale.findall('car')`
- Print or alter the elements in our root to access properties like `tag`, `attrib` and `text`

**Modifying XML files** could be done with `cars_for_sale.remove(car)` and we could also add elements with `ET.SubElement(new_element, property).text = '4'` where `new_car = ET.Element('car')`
- After creating the element we use `SubElement(parent_ele, sub_element_name)` which takes in the *parent element object* and the *sub elements name* we also have one more argument for the *attribute* by providing a tuple like: `ET.SubElement(new_car, 'price', {'currency': 'EUR'}).text = '61800'`

After we have all the sub elements... we need to add this to our root. with `cars_for_sale`. We also need to use the `write()` function to fully create a new file and fills it with the modified XML document. `tree.write('newcars.xml', method='')`

```python
import xml.etree.ElementTree

tree = xml.etree.ElementTree.parse('cars.xml')
cars_for_sale = tree.getroot()
for car in cars_for_sale.findall('car'):
    if car.find('brand').text == 'Ford' and car.find('model').text == 'Mustang':
        cars_for_sale.remove(car)
        break
new_car = xml.etree.ElementTree.Element('car')
xml.etree.ElementTree.SubElement(new_car, 'id').text = '4'
xml.etree.ElementTree.SubElement(new_car, 'brand').text = 'Maserati'
xml.etree.ElementTree.SubElement(new_car, 'model').text = 'Mexico'
xml.etree.ElementTree.SubElement(new_car, 'production_year').text = '1970'
xml.etree.ElementTree.SubElement(new_car, 'price', {'currency': 'EUR'}).text = '61800'
cars_for_sale.append(new_car)
tree.write('newcars.xml', method='')

```

