## Working with XML using DOM in Python

### Introduction to XML and DOM

XML (eXtensible Markup Language) is a markup language used to store and transport data. It is both human-readable and machine-readable. An XML document consists of elements, each with a start tag, content, and an end tag. Elements can also have attributes.

DOM (Document Object Model) is a programming interface for XML documents. It represents the document as a tree structure, where each node is an object representing a part of the document.

### Parts of an XML Document

| **Component**   | **Description**                                                                                     | **Example**                                 |
|-----------------|-----------------------------------------------------------------------------------------------------|---------------------------------------------|
| **Prolog**      | Optional, contains XML declaration and processing instructions                                      | `<?xml version="1.0" encoding="UTF-8"?>`    |
| **Element**     | Basic building block of XML, consists of a start tag, content, and an end tag                       | `<name>John Doe</name>`                     |
| **Attributes**  | Provide additional information about elements                                                       | `<country name="USA">`                      |
| **Tag**         | Name of the element enclosed in angle brackets                                                      | `<name> ... </name>`                        |
| **Text String** | The content between the start and end tags                                                          | `<name>John Doe</name>` (John Doe is text)  |
| **Tail String** | The text after an element’s end tag, but within the parent element                                  | `<element>Text</element> tail text`         |
| **Nested Elements** | Elements can contain other elements                                                             | `<person><name>John</name></person>`        |
| **Child Elements**  | Elements nested within another element                                                          | `<person><name>John</name></person>`        |
| **CDATA**       | Character data, used to include text that should not be parsed by the XML parser                    | `<![CDATA[Some unparsed data]]>`            |
| **Comments**    | Provide comments within the XML code                                                                | `<!-- This is a comment -->`                |


### ElementTree vs. DOM

ElementTree and DOM (Document Object Model) are two different APIs in Python for parsing and manipulating XML documents.

#### ElementTree:

- It is a simpler and more Pythonic way of working with XML data.
- It provides an easy-to-use tree structure for XML data.
- ElementTree is more memory efficient for large XML files.
- Best suited for simpler XML manipulation tasks.

#### DOM (Document Object Model):

- It is a more complex and comprehensive API.
- Provides a detailed representation of the XML document as a tree of nodes.
- Allows for more extensive manipulation of the XML document.
- Suitable for applications that need to perform complex XML manipulations.

| **Aspect**             | **ElementTree**                                              | **DOM (Document Object Model)**                         |
|------------------------|--------------------------------------------------------------|---------------------------------------------------------|
| **Complexity**         | Simple and easy to use                                       | More complex and comprehensive                          |
| **Memory Usage**       | More memory efficient                                        | Less memory efficient                                   |
| **Performance**        | Faster for large documents                                   | Slower for large documents                              |
| **API Style**          | Pythonic, uses Element and SubElement                        | Standardized, uses Nodes and Elements                   |
| **Manipulation**       | Basic manipulation capabilities                              | Extensive manipulation capabilities                     |
| **Use Case**           | Suitable for simpler tasks                                   | Suitable for complex and detailed XML manipulations     |
| **Learning Curve**     | Easier to learn and use                                      | Steeper learning curve                                  |
| **Standards Compliance** | Less compliant with W3C standards                          | Highly compliant with W3C standards                     |


#### Parsing an XML File

We will start by parsing an XML file. Suppose we have the following XML file named countries.xml:

In [None]:
'''
<?xml version="1.0" encoding="UTF-8"?>
<countries>
    <country name="USA" capital="Washington, D.C." population="331002651" continent="North America"/>
    <country name="China" capital="Beijing" population="1439323776" continent="Asia"/>
    <country name="Brazil" capital="Brasília" population="212559417" continent="South America"/>
</countries>
'''

In [2]:
from xml.dom.minidom import parse

# Parse the XML file
dom_tree = parse("countries.xml")
root = dom_tree.documentElement

# Print the root element's tag name
print("Root element:", root.tagName)


Root element: countries


#### Accessing Elements

In [3]:
# Get all the country elements
countries = root.getElementsByTagName("country")

# Loop through the country elements and print their details
for country in countries:
    name = country.getAttribute("name")
    capital = country.getAttribute("capital")
    population = country.getAttribute("population")
    continent = country.getAttribute("continent")
    print(f"Country: {name}, Capital: {capital}, Population: {population}, Continent: {continent}")


Country: USA, Capital: Washington, D.C., Population: 331002651, Continent: North America
Country: China, Capital: Beijing, Population: 1439323776, Continent: Asia
Country: Brazil, Capital: Brasília, Population: 212559417, Continent: South America


#### Modifying XML
##### Adding a New Country

In [4]:
# Function to add a new country
def add_country(doc, root, name, capital, population, continent):
    new_country = doc.createElement("country")
    new_country.setAttribute("name", name)
    new_country.setAttribute("capital", capital)
    new_country.setAttribute("population", str(population))
    new_country.setAttribute("continent", continent)
    root.appendChild(new_country)

# Add a new country
add_country(dom_tree, root, "India", "New Delhi", 1380004385, "Asia")

# Save the changes to a new file
with open("modified_countries.xml", "w") as f:
    dom_tree.writexml(f, indent="", addindent="  ", newl="\n")

print("New country added and saved to 'modified_countries.xml'.")


New country added and saved to 'modified_countries.xml'.


##### Removing a country

In [None]:
# Function to delete a country
def delete_country(root, name):
    countries = root.getElementsByTagName("country")
    for country in countries:
        if country.getAttribute("name") == name:
            root.removeChild(country)
            break

# Delete the country 'Brazil'
delete_country(root, "Brazil")

# Save the changes to a new file
with open("modified_countries.xml", "w") as f:
    dom_tree.writexml(f, indent="", addindent="  ", newl="\n")

print("Country 'Brazil' removed and saved to 'modified_countries.xml'.")


#### Creating a New XML File from Scratch

In [5]:
from xml.dom.minidom import Document

# Create a new XML document
new_doc = Document()

# Create the root element
countries = new_doc.createElement("countries")
new_doc.appendChild(countries)

# Function to add a country element
def add_country(doc, root, name, capital, population, continent):
    country = doc.createElement("country")
    country.setAttribute("name", name)
    country.setAttribute("capital", capital)
    country.setAttribute("population", str(population))
    country.setAttribute("continent", continent)
    root.appendChild(country)

# Add countries
add_country(new_doc, countries, "Canada", "Ottawa", 37742154, "North America")
add_country(new_doc, countries, "Germany", "Berlin", 83783942, "Europe")
add_country(new_doc, countries, "Australia", "Canberra", 25499884, "Australia")

# Save the new XML to a file
with open("new_countries.xml", "w") as f:
    new_doc.writexml(f, indent="", addindent="  ", newl="\n")

print("New XML file 'new_countries.xml' created successfully.")


New XML file 'new_countries.xml' created successfully.
