### Introduction to XML

XML (eXtensible Markup Language) is a markup language used for encoding documents in a format that is both human-readable and machine-readable. XML is designed to store and transport data. Here are some key parts of XML:

# Elements of an XML File

| Element          | Description |
|------------------|-------------|
| **Prolog**       | The optional beginning of the XML document. Contains the XML declaration and processing instructions. Example: <br> ```xml <?xml version="1.0" encoding="UTF-8"?> ``` |
| **Comments**     | Provide comments within the XML code. Example: <br> ```xml <!-- This is a comment --> ``` |
| **Element**      | The basic building block of XML. It consists of a start tag, content, and an end tag. Example: <br> ```xml <name>John Doe</name> ``` |
| **Attributes**   | Provide additional information about elements. Example: <br> ```xml <person id="123"> ``` |
| **Tag**          | The name of the element. Example: In `<name>John Doe</name>`, `name` is the tag. |
| **Text String**  | The text content within an element. Example: In `<name>John Doe</name>`, `John Doe` is the text string. |
| **Child Elements / Nested Elements** | Elements that are contained within another element. Example: <br> ```xml <person> <name>John Doe</name> <age>30</age> </person> ``` |
| **Tail String**  | The text content that follows an element's end tag and is not part of another element. Example: In `<name>John Doe</name>extra text`, `extra text` is the tail string. |
| **CDATA**        | Character Data, used to include text data that should not be parsed by the XML parser. Example: <br> ```xml <![CDATA[Some unparsed data]]> ``` |


#### Sample XML file

In [None]:
'''
<?xml version="1.0" encoding="UTF-8"?>
<library>
    <book id="1">
        <title>Learning Python</title>
        <author>Mark Lutz</author>
        <year>2013</year>
        <price>39.95</price>
    </book>
    <book id="2">
        <title>Programming Python</title>
        <author>Mark Lutz</author>
        <year>2011</year>
        <price>49.95</price>
    </book>
    <book id="3">
        <title>Fluent Python</title>
        <author>Luciano Ramalho</author>
        <year>2015</year>
        <price>59.99</price>
    </book>
</library>
'''

- **Prolog**: Optional, contains XML declaration and processing instructions. Example: 
    ```xml
    <?xml version="1.0" encoding="UTF-8"?>
    ```
- **Element**: The basic building block of XML. It consists of a start tag, content, and an end tag. Example:
    ```xml
    <name>John Doe</name>
    ```
- **Attributes**: Provide additional information about elements. Example:
    ```xml
    <person id="123">
    ```
- **Nested Elements**: Elements can contain other elements. Example:
    ```xml
    <person>
        <name>John Doe</name>
        <age>30</age>
    </person>
    ```
- **CDATA**: Character Data, used to include text data that should not be parsed by the XML parser. Example:
    ```xml
    <![CDATA[Some unparsed data]]>
    ```
- **Comments**: Provide comments within the XML code. Example:
    ```xml
    <!-- This is a comment -->
    ```


### Using xml.etree.ElementTree Module
The xml.etree.ElementTree module in Python provides a simple way to parse and create XML data.

#### 1. Parsing XML

To parse an XML document, you can use the ElementTree.parse() function:

In [None]:
#Importing the Module:
import xml.etree.ElementTree as ET

In [2]:
import xml.etree.ElementTree as ET

# Parse the XML document
tree = ET.parse('books.xml')
root = tree.getroot()

# Print the root element
print(root.tag)


library


This parses the books.xml file and gets the root element of the XML tree.

In [9]:
#Print the first child element of root
print(root[0].tag)

# Print the attributes of first child element of root
print(root[0].attrib)

# Print all the attributes within the first child element of root
for i in root[0]:
    print(i.tag, i.attrib)

# Print all the text for attributes within the first child element of root
for i in root[0]:
    print(i.text)

{'id': '1'}


In [12]:
# Print all the text for attributes within the first child element of root
for book in root[0]:
    print(book.text)

Learning Python, 5th Edition
Mark Lutz
2013
39.95


#### 2. Accessing Elements

You can access elements and their attributes using methods like find(), findall(), and get():

In [3]:
# Find all book elements
for book in root.findall('book'):
    title = book.find('title').text
    author = book.find('author').text
    year = book.find('year').text
    price = book.find('price').text
    book_id = book.get('id')
    
    print(f"Book ID: {book_id}")
    print(f"Title: {title}")
    print(f"Author: {author}")
    print(f"Year: {year}")
    print(f"Price: {price}\n")

Book ID: 1
Title: Learning Python
Author: Mark Lutz
Year: 2013
Price: 39.95

Book ID: 2
Title: Programming Python
Author: Mark Lutz
Year: 2011
Price: 49.95

Book ID: 3
Title: Fluent Python
Author: Luciano Ramalho
Year: 2015
Price: 59.99



#### 3. Modifying XML

You can modify the XML tree by changing text or attributes, and then save the changes back to an XML file:

In [11]:
# Change the title of the first book
first_book = root.find('book')
first_book.find('title').text = "Learning Python, 5th Edition"

# Save the modified XML back to a file
tree.write('modified_books.xml')


##### Modifying a Child Element:

In [14]:
second_book = root.find('book[@id="2"]')
second_book.find('price').text = "44.95"

# Save the modified XML back to a file
tree.write('modified_books.xml')

#This modifies the text of the price element of the book with id="2".

##### Adding a New Child Element:

In [None]:
third_book = root.find('book[@id="3"]')
new_element = ET.SubElement(third_book, 'publisher')
new_element.text = "O'Reilly Media"

# Save the modified XML back to a file
tree.write('modified_books.xml')

#This adds a new publisher child element to the book with id="3".

##### Removing a Child Element:

In [None]:
third_book.remove(third_book.find('publisher'))

#This removes the publisher child element from the book with id="3".

##### Saving the Modified XML:

In [None]:
tree.write('modified_books.xml')

#This saves the modified XML tree back to a file named modified_books.xml.

#### Creating XML

You can also create new XML documents from scratch:

In [15]:
# Create the root element
library = ET.Element('library')

# Create a new book element
new_book = ET.SubElement(library, 'book', id='4')
ET.SubElement(new_book, 'title').text = 'Python Cookbook'
ET.SubElement(new_book, 'author').text = 'David Beazley'
ET.SubElement(new_book, 'year').text = '2013'
ET.SubElement(new_book, 'price').text = '49.99'

# Convert the Element to a string and print it
new_tree = ET.ElementTree(library)
new_tree.write('new_library.xml')


This creates a new XML document with a root element library, adds a new book element with child elements, and saves it to a file named new_library.xml.

### Exercise: Creating an XML File
#### Objective:
Create an XML file that represents a catalog of movies. Each movie should have several attributes and nested elements.
#### Requirements:
1.	The XML file should have a root element named catalog.
2.	Each movie should be represented by a movie element with an attribute id (a unique identifier).
3.	Each movie element should contain the following nested elements:
- title (the title of the movie)
- genre (the genre of the movie)
- director (the director of the movie)
- year (the release year of the movie)
- rating (the rating of the movie)
4.	Include at least three movies in the catalog.
5.	Ensure that the XML is well-formed and follows proper XML syntax.


In [None]:
import xml.etree.ElementTree as ET

# Create the root element
catalog = ET.Element('catalog')

# Function to add a movie element
def add_movie(catalog, id, title, genre, director, year, rating):
    movie = ET.SubElement(catalog, 'movie', id=str(id))
    ET.SubElement(movie, 'title').text = title
    ET.SubElement(movie, 'genre').text = genre
    ET.SubElement(movie, 'director').text = director
    ET.SubElement(movie, 'year').text = str(year)
    ET.SubElement(movie, 'rating').text = str(rating)

# Add movies to the catalog
add_movie(catalog, 1, 'The Shawshank Redemption', 'Drama', 'Frank Darabont', 1994, 9.3)
add_movie(catalog, 2, 'The Godfather', 'Crime', 'Francis Ford Coppola', 1972, 9.2)
add_movie(catalog, 3, 'The Dark Knight', 'Action', 'Christopher Nolan', 2008, 9.0)

# Convert the ElementTree to a string
xml_str = ET.tostring(catalog)

# Save the XML string to a file
with open('movies_catalog.xml', 'wb') as f:
    f.write(xml_str)
