# Chapter 3: Working with xml : writing

## 1. From xml to string and files

`lxml` offers many way to interact with xml in Python, including getting xml objects to string. This would be quite convenient to retrieve for example part of an XML object. The system is quite simple, we will use the function `etree.tostring()`

In [None]:
from lxml import etree

with open("data/phi1294.phi002.perseus-lat2.xml") as f:
    xml = etree.parse(f)

#Let's get the third div
div = xml.xpath("//tei:div/tei:div", namespaces = { "tei" : "http://www.tei-c.org/ns/1.0" })[2]

#And now we export it to string
print(etree.tostring(div))

As you can see, the result is quite simple but it's not pretty. Most of all, the result is a byte, it's an object type in Python3. Hopefuly, etree.tostring takes some options :

- `encoding` : accepts both string types (str) or typical encoding such as "utf-8" or "iso-xxx"
- `xml_declaration` : if set to `True`, will add the famous `<?xml version='1.0' encoding='***'?>` at the beginning
- `with_comments` : if set to False, will remove the comments

See more details on the [official documentation](lxml.de/api/lxml.etree-module.html#tostring)

In [None]:
# Let see how encoding works :
print(etree.tostring(div, encoding=str))
# And with utf-8 ? 
print(etree.tostring(div, encoding="utf-8"))

**Be careful** : as you see, this is still bytes. `str` in Python3 is actually unicode. So if you need to transform something to a string, you should put `encoding=str` as much as possible !

**Files**

There is two options for writing file. One is simply exporting to string and then writing the string you just made with `tostring`. The other other is using `etree.xmlfile`. To do that properly, you will need to use the `with` statement. You have seen `with` use before. `with FileFunction() as var:` allows you to secure a proper processing of your code with the file being open as the beginning and close at the end :

In [None]:

with open("data/phi1294.phi002.perseus-lat2.xml") as f:
    xml = etree.parse(f)
    print(f) # f exists
print(f.read()) # Raise an exception because the file is closed.

`etree.xmlfile()` do the same thing : it opens after the with and close it at the end. Once you have retrieved the variable, the output of `etree.xmlfile()` supports different functions :

- `write(xml)` allows you to write some xml to the document
- `write_doctype(doctype)` add a doctype verbatim such as `write_doctype("<!DOCTYPE root SYSTEM "some.dtd">")`
- `write_declaration(standalone=None)` write a declaration and if True set it to standalone
- `element(xml, attrib={}, nsmap={})` using `with` statement, allows for creating and opening a contextual node taking attributes and namespace map

In [None]:
with etree.xmlfile("somefile.xml", encoding='utf-8') as xf:
    with xf.element("tei:TEI", nsmap={"tei":"{http://www.tei-c.org/ns/1.0}"}): # No variable stored here
        xf.write(div)
    
with open("somefile.xml") as f:
    print(f.read())

** DIY **

Using previous parsing system we have seen, can you clean up the mess of namespace create with `tei:` ?

In [None]:
# Write your code here

## 2\.  Adding and removing elements

##3\. Changing attributes

## Going further


##Exercises

### 1\. XML to String

Using the first Poem identified by 1.1 in Martial's Epigrammata, replace all line by paragraphs.

### 2\. XML from XPath

Create a function which given an xpath, creates and returns the whole tree.

Expected outcome :

    xml = fromXpath("/TEI/body/text/div")
    print(etree.tostring(xml))
> `<TEI><body><text><div></div></text></body></TEI>`

In [None]:
# Write your code here

### 3\. XML from XPath with attributes

Taking a single node xpath as parameter, write a function which will create a node with its attributes.

Expected outcome :

    xml = fromXpath("div[@n='1' and @subtype='chapter']")
    print(etree.tostring(xml))
> `<div n="1" subtype="chapter"/>`

### 4\. Full XML from complex XPath

Using the outcome of the previous two exercises, write a function which can create a complex tree using xpath :

    xml = fromXpath("/TEI/body/text/div[@type='edition' and @n='0']/div[@n='1' and @subtype='chapter']")
    print(etree.tostring(xml))
> `<TEI><body><text><div n="0" type="edition"><div n="1" subtype="chapter" /></div></text></body></TEI>`

-----

In [None]:
# Do not care about this cell, it's just here to make the page nicer.

from IPython.core.display import HTML
def css_styling():
    styles = open("styles/custom.css", "r").read()
    return HTML(styles)
css_styling()

---

<p><small><a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/"><img alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by-sa/4.0/88x31.png" /></a><br /><span xmlns:dct="http://purl.org/dc/terms/" property="dct:title">Python Programming for the Humanities</span> by <a xmlns:cc="http://creativecommons.org/ns#" href="http://fbkarsdorp.github.io/python-course" property="cc:attributionName" rel="cc:attributionURL">http://fbkarsdorp.github.io/python-course</a> is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/">Creative Commons Attribution-ShareAlike 4.0 International License</a>. Based on a work at <a xmlns:dct="http://purl.org/dc/terms/" href="https://github.com/fbkarsdorp/python-course" rel="dct:source">https://github.com/fbkarsdorp/python-course</a>.</small></p>