
https://docs.python.org/zh-cn/3.7/library/markup.html

`parse()`函数可以采用文件名或打开的文件对象


`xml.dom.minidom.parse(filename_or_file[, parser[, bufsize]])`

如果你在一个字符串中有XML，你可以使用这个`parseString()`函数：

```
xml.dom.minidom.parseString(string[, parser])
```

返回一个表示字符串的文档。 此方法为该字符串创建一个StringIO对象，并将其传递给parse（）。

这两个函数都会返回一个`Document`代表文档内容的对象。

`parse()`和`parseString()`功能做的是一个“DOM生成器”，可以从任何SAX解析器解析接受事件并将它们转换成DOM树连接XML解析器。

这些功能的名称可能是误导性的，但在学习接口时很容易理解。文档的解析将在这些函数返回之前完成; 只是这些函数本身不提供解析器实现。


In [1]:
from xml.dom.minidom import parse,parseString
import xml.dom.minidom
documents="""<Contact>
        <Name>胡一刀</Name>
        <Starred>0</Starred>
        <PhoneList>
            <Phone Type="2">+86 199 9876 1807</Phone>
        </PhoneList>
        <PhoneList>
            <Phone Type="2">+86 199 9875 1807</Phone>
        </PhoneList>
        <EmailList>
            <Email Type="2">huyidao@xiaomi.com</Email>
        </EmailList>
        <Account value="0">
            <Name>215037230</Name>
            <Type>com.xiaomi</Type>
        </Account>
        <GroupList>
            <GroupName>CRM</GroupName>
        </GroupList>
    </Contact>"""

DOMTree = parseString(documents)
collection = DOMTree.documentElement


In [2]:
print(collection.tagName)

Contact


In [3]:
# 按标签名获取节点列表
DOMTree.getElementsByTagName('Name')
collection.getElementsByTagName('Name')

[<DOM Element: Name at 0x1fe1eb48c28>, <DOM Element: Name at 0x1fe1ebc9210>]

In [4]:
# 统计标签出现次数
DOMTree.getElementsByTagName('Name').length

2

In [5]:
# 指定节点列表项
DOMTree.getElementsByTagName('Name').item(0)

<DOM Element: Name at 0x1fe1eb48c28>

In [6]:
# 获取标签内属性
collection.getElementsByTagName('Email').item(0).getAttributeNode("Type")

<xml.dom.minidom.Attr at 0x1fe1ebb7390>

In [7]:
# 获取标签内属性名称
collection.getElementsByTagName('Email').item(0).getAttributeNode("Type").name

'Type'

In [8]:
# 获取标签内属性值
collection.getElementsByTagName('Email').item(0).getAttributeNode("Type").value

'2'

In [9]:
# 获取标签名称
print(DOMTree.getElementsByTagName('PhoneList')[0].nodeName)

PhoneList


In [10]:
[t.childNodes for t in DOMTree.getElementsByTagName('Name')]

[[<DOM Text node "'胡一刀'">], [<DOM Text node "'215037230'">]]

In [11]:
# 获取节点下的VALUE值
print(DOMTree.getElementsByTagName('PhoneList')[0].childNodes[1].childNodes[0].data)
print(DOMTree.getElementsByTagName('PhoneList')[1].childNodes[1].childNodes[0].nodeValue)

+86 199 9876 1807
+86 199 9875 1807


In [12]:
import xml.dom.minidom

document = """\
<slideshow>
<title>Demo slideshow</title>
<slide><title>Slide title</title>
<point>This is a demo</point>
<point>Of a program for processing slides</point>
</slide>

<slide><title>Another demo slide</title>
<point>It is important</point>
<point>To have more than</point>
<point>one slide</point>
</slide>
</slideshow>
"""

dom = xml.dom.minidom.parseString(document)

def getText(nodelist):
    rc = []
    for node in nodelist:
        if node.nodeType == node.TEXT_NODE:
            rc.append(node.data)
    return ''.join(rc)

def handleSlideshow(slideshow):
    print("<html>")
    handleSlideshowTitle(slideshow.getElementsByTagName("title")[0])
    slides = slideshow.getElementsByTagName("slide")
    handleToc(slides)
    handleSlides(slides)
    print("</html>")

def handleSlides(slides):
    for slide in slides:
        handleSlide(slide)

def handleSlide(slide):
    handleSlideTitle(slide.getElementsByTagName("title")[0])
    handlePoints(slide.getElementsByTagName("point"))

def handleSlideshowTitle(title):
    print("<title>%s</title>" % getText(title.childNodes))

def handleSlideTitle(title):
    print("<h2>%s</h2>" % getText(title.childNodes))

def handlePoints(points):
    print("<ul>")
    for point in points:
        handlePoint(point)
    print("</ul>")

def handlePoint(point):
    print("<li>%s</li>" % getText(point.childNodes))

def handleToc(slides):
    for slide in slides:
        title = slide.getElementsByTagName("title")[0]
        print("<p>%s</p>" % getText(title.childNodes))

handleSlideshow(dom)


<html>
<title>Demo slideshow</title>
<p>Slide title</p>
<p>Another demo slide</p>
<h2>Slide title</h2>
<ul>
<li>This is a demo</li>
<li>Of a program for processing slides</li>
</ul>
<h2>Another demo slide</h2>
<ul>
<li>It is important</li>
<li>To have more than</li>
<li>one slide</li>
</ul>
</html>
