<h1>
    Learn Python
</h1>

<h2 style="color: yellow">Manipulating XML</h2>

Sometimes when you're processing markup like XML or HTML, you don't want to build a parser that just runs through the document one line at a time. What you'll need to do is have the entire document in memory so you can manipulate it. In other words, you'll operate on the Documents Object Model or DOM. In this example, we'll see how to use the XML Mini DOM class that Python provides to load an XML file and then operate on the document while it's in memory.

First thing I'm going to do is import the module that let's me operate on an XML DOM. So I'm going to import xml.dom.minidom.

In [None]:
import xml.dom.minidom

This is sample XML (samplexml.xml) and if you open it up and look at it, you can see it's a pretty standard XML file. It's just got some basic information about a person in it. So here's my name, where I live, some skills that I have, again, just simple XML file for demonstrating parsing.

I'm going to use the parse function on the XML mini DOM to load and parse the file. So I'll make a variable called, doc and I'll call xml.dom.minidom.parse, and to the parts function I'm going to parse that file samplexml.xml So this will parse the XML file and create an in-memory DOM object that I can manipulate. And because the name of the file that I want to parse happens to be in the same directory as my code, I don't have to do any fancy path manipulation. So once we've parsed the document, let's print out the name of the root of the document. So that's going to be Node name on the document element along with the tag name of the first child of the document.

So there'll be doc.firstchild.tagname. Now, if these property names don't look familiar to you, there are standard names that are used in the Document Object Model, things like no name and first child and tag name, these are all standard properties of DOM elements.

In [None]:
# use the parse() function to load and parse an XML file
doc = xml.dom.minidom.parse("samplexml.xml")

# print out the document node and the name of the first child tag
print (doc.nodeName)
print (doc.firstChild.tagName)


And you can see that the node name of the doc is #document, which is just like the W3C specs as it should be. And the first child tag in the document, the tag name of that is person. And if you look sure enough, that's the first tag that's in the document.

So now we're going to get a list of XML tags from the document and print each one. So I'm going to use the DOM standard function called get elements by tag name. So I'll name this variable skills and I'll call doc.getelementsbytagname. And I'm going to get all the skill tags. And again, if you look at the XML, that's going to be these tags right here. So I'll get all of those skill tags and then I'm going to print out. Let's see I'll print out skills.length skills are listed. Then I'll print out each one of the skills, for skill in skills, let's print out, skill.getattribute and I'm going to get the name attribute. So again, if we go back to the XML, you can see that each one of these skill tags has a name attribute and it has a value. So I'm going to loop over each one of the skills tags, get the attribute called name and print out its value.

In [None]:
# get a list of XML tags from the document and print each one
skills = doc.getElementsByTagName("skill")
print ("%d skills:" % skills.length)
for skill in skills:
    print (skill.getAttribute("name"))

So there's four skills listed.

Let's create a new XML tag and add it into the document. So I'll create a new tag and I'll have a variable called new skill. And I'm going to call the create element function on the document. And I'm going to create a new skill element. And on that new skill, I'm going to call set attribute, and I'm going to set the name attribute to another skill that I have, let's do jQuery.

And then I'm going to to tell the document to tell it's first child element, that's the person tag to append a new child inside of itself, and that's going to be my new skill tag. So create element is a standard W3C function, which creates a new tag. And then I'm creating that new tag. And then I'm sending the, the name attribute to be jQuery. And then I'm going to append this new skill tag into the first child of the document, which remember is this person tag. So the new skill tag is going to appear in the document below this one.

And then we'll print out a listing of the skills before the new one is added. And then we'll print out the list after it's added to make sure everything worked.

In [None]:
print ("Before add new skill")
skills = doc.getElementsByTagName("skill")
print ("%d skills:" % skills.length)
for skill in skills:
    print (skill.getAttribute("name"))

print ("--------------------------------")

# create a new XML tag and add it into the document
newSkill = doc.createElement("skill")
newSkill.setAttribute("name", "jQuery")
doc.firstChild.appendChild(newSkill)

print ("After add new skill")

skills = doc.getElementsByTagName("skill")
print ("%d skills:" % skills.length)
for skill in skills:
    print (skill.getAttribute("name"))