# PyQuery

本章节内容：

- [初始化](#初始化)
    - 字符串初始化
    - url初始化
    - 文件初始化
- [基本CSS选择器](#基本CSS选择器)
- [查找元素](#查找元素)
    - 子元素
    - 父元素
    - 兄弟元素
- [遍历](#遍历)
    - 单个元素
- [获取信息](#获取信息)
    - 属性
    - 文本
    - HTML
- [DOM操作](#DOM操作)
## 初始化

### 字符串初始化

In [5]:
html = """
<div>
    <ul>
        <li class="item-0">first item</li>
        <li class="item-1"><a href="link2.html"></a>second item</li>
        <li class="item-0 active"><a href="link3.html"><span class="blod">third item</span></a></li>
        <li class="item-1 active"><a href="link4.html">fourth item</a></li>
        <li class="item-0"><a href="link5.html">fifth item</a></li>
    </ul>
</div>
"""
from pyquery import PyQuery as pq
doc = pq(html)
print(type(doc))
print(doc('li'))

<class 'pyquery.pyquery.PyQuery'>
<li class="item-0">first item</li>
        <li class="item-1"><a href="link2.html"/>second item</li>
        <li class="item-0 active"><a href="link3.html"><span class="blod">third item</span></a></li>
        <li class="item-1 active"><a href="link4.html">fourth item</a></li>
        <li class="item-0"><a href="link5.html">fifth item</a></li>
    


### URL初始化

In [20]:
from pyquery import PyQuery as pq
# import requests
# html = requests.get("https://www.baidu.com")
# html.encoding='utf-8'
# doc = pq(html.text)
doc = pq(url="https://www.baidu.com",encoding="utf-8") # 对于中文来说，容易出现乱码
print(doc('head'))

<head><meta http-equiv="content-type" content="text/html;charset=utf-8"/><meta http-equiv="X-UA-Compatible" content="IE=Edge"/><meta content="always" name="referrer"/><link rel="stylesheet" type="text/css" href="https://ss1.bdstatic.com/5eN1bjq8AAUYm2zgoY3K/r/www/cache/bdorz/baidu.min.css"/><title>百度一下，你就知道</title></head> 


### 文件初始化

In [25]:
from pyquery import PyQuery as pq

# with open("demo_file/books_douban.txt", encoding='utf-8') as f:
#     html = f.read()
#     doc = pq(html)
#     print(doc('img'))

doc = pq(filename="demo_file/demo.txt",encoding="utf-8")
print(doc("li"))

<li class="item-0">first item</li>
        <li class="item-1"><a href="link2.html"/>second item</li>
        <li class="item-0 active"><a href="link3.html"><span class="blod">third item</span></a></li>
        <li class="item-1 active"><a href="link4.html">fourth item</a></li>
        <li class="item-0 active"><a href="link5.html">fifth item</a></li>
    


小结：PyQuery自身的初始化对中文支持不是太好，需要借助第三方将编码转换为UTF-8的才可正常识别。对于url初始，则可以通过requests库来获取到html后，再交给PyQuery处理；对于文件初始，则需要用python自带open函数将文件打开后，再交由PyQuery处理。

## 基本CSS选择器

In [33]:
html = """
<div id="container">
    <ul class="list">
        <li class="item-0">first item</li>
        <li class="item-1"><a href="link2.html"></a>second item</li>
        <li class="item-0 active"><a href="link3.html"><span class="blod">third item</span></a></li>
        <li class="item-1 active"><a href="link4.html">fourth item</a></li>
        <li class="item-0"><a href="link5.html">fifth item</a></li>
    </ul>
</div>
"""
from pyquery import PyQuery as pq
doc = pq(html)
print(doc('#container .list .active'))

<li class="item-0 active"><a href="link3.html"><span class="blod">third item</span></a></li>
        <li class="item-1 active"><a href="link4.html">fourth item</a></li>
        


## 查找元素

### 子元素

In [34]:
html = """
<div id="container">
    <ul class="list">
        <li class="item-0">first item</li>
        <li class="item-1"><a href="link2.html"></a>second item</li>
        <li class="item-0 active"><a href="link3.html"><span class="blod">third item</span></a></li>
        <li class="item-1 active"><a href="link4.html">fourth item</a></li>
        <li class="item-0"><a href="link5.html">fifth item</a></li>
    </ul>
</div>
"""
from pyquery import PyQuery as pq
doc = pq(html)
items = doc('.list')
print(type(items))
print(items)
lis = items.find('li')
print(type(lis))
print(lis)

<class 'pyquery.pyquery.PyQuery'>
<ul class="list">
        <li class="item-0">first item</li>
        <li class="item-1"><a href="link2.html"/>second item</li>
        <li class="item-0 active"><a href="link3.html"><span class="blod">third item</span></a></li>
        <li class="item-1 active"><a href="link4.html">fourth item</a></li>
        <li class="item-0"><a href="link5.html">fifth item</a></li>
    </ul>

<class 'pyquery.pyquery.PyQuery'>
<li class="item-0">first item</li>
        <li class="item-1"><a href="link2.html"/>second item</li>
        <li class="item-0 active"><a href="link3.html"><span class="blod">third item</span></a></li>
        <li class="item-1 active"><a href="link4.html">fourth item</a></li>
        <li class="item-0"><a href="link5.html">fifth item</a></li>
    


In [35]:
lis2 = items.children()
print(type(lis2))
print(lis2)

<class 'pyquery.pyquery.PyQuery'>
<li class="item-0">first item</li>
        <li class="item-1"><a href="link2.html"/>second item</li>
        <li class="item-0 active"><a href="link3.html"><span class="blod">third item</span></a></li>
        <li class="item-1 active"><a href="link4.html">fourth item</a></li>
        <li class="item-0"><a href="link5.html">fifth item</a></li>
    


In [36]:
print(items.children('.active'))

<li class="item-0 active"><a href="link3.html"><span class="blod">third item</span></a></li>
        <li class="item-1 active"><a href="link4.html">fourth item</a></li>
        


### 父元素



In [42]:
html = """
<div id="wraper">
<div id="container">
    <ul class="list">
        <li class="item-0">first item</li>
        <li class="item-1"><a href="link2.html"></a>second item</li>
        <li class="item-0 active"><a href="link3.html"><span class="blod">third item</span></a></li>
        <li class="item-1 active"><a href="link4.html">fourth item</a></li>
        <li class="item-0"><a href="link5.html">fifth item</a></li>
    </ul>
</div>
</div>
"""
from pyquery import PyQuery as pq
doc = pq(html)
items = doc('.list')
parent = items.parent()
print(type(parent))
print(parent)


<class 'pyquery.pyquery.PyQuery'>
<div id="container">
    <ul class="list">
        <li class="item-0">first item</li>
        <li class="item-1"><a href="link2.html"/>second item</li>
        <li class="item-0 active"><a href="link3.html"><span class="blod">third item</span></a></li>
        <li class="item-1 active"><a href="link4.html">fourth item</a></li>
        <li class="item-0"><a href="link5.html">fifth item</a></li>
    </ul>
</div>



In [43]:
parents = items.parents()
print(parents)

<div id="wraper">
<div id="container">
    <ul class="list">
        <li class="item-0">first item</li>
        <li class="item-1"><a href="link2.html"/>second item</li>
        <li class="item-0 active"><a href="link3.html"><span class="blod">third item</span></a></li>
        <li class="item-1 active"><a href="link4.html">fourth item</a></li>
        <li class="item-0"><a href="link5.html">fifth item</a></li>
    </ul>
</div>
</div><div id="container">
    <ul class="list">
        <li class="item-0">first item</li>
        <li class="item-1"><a href="link2.html"/>second item</li>
        <li class="item-0 active"><a href="link3.html"><span class="blod">third item</span></a></li>
        <li class="item-1 active"><a href="link4.html">fourth item</a></li>
        <li class="item-0"><a href="link5.html">fifth item</a></li>
    </ul>
</div>



In [45]:
parent = items.parents('#wraper')
print(parent)

<div id="wraper">
<div id="container">
    <ul class="list">
        <li class="item-0">first item</li>
        <li class="item-1"><a href="link2.html"/>second item</li>
        <li class="item-0 active"><a href="link3.html"><span class="blod">third item</span></a></li>
        <li class="item-1 active"><a href="link4.html">fourth item</a></li>
        <li class="item-0"><a href="link5.html">fifth item</a></li>
    </ul>
</div>
</div>


### 兄弟元素

In [51]:
html = """
<div id="wraper">
<div id="container">
    <ul class="list">
        <li class="item-0">first item</li>
        <li class="item-1"><a href="link2.html"></a>second item</li>
        <li class="item-0 active"><a href="link3.html"><span class="blod">third item</span></a></li>
        <li class="item-1 active"><a href="link4.html">fourth item</a></li>
        <li class="item-0"><a href="link5.html">fifth item</a></li>
    </ul>
</div>
</div>
"""
from pyquery import PyQuery as pq
doc = pq(html)
items = doc('.list .item-0.active')
sibling = items.siblings()
print(type(sibling))
print(sibling)

print('\n\n\n',items.siblings('.active'))

<class 'pyquery.pyquery.PyQuery'>
<li class="item-1"><a href="link2.html"/>second item</li>
        <li class="item-0">first item</li>
        <li class="item-1 active"><a href="link4.html">fourth item</a></li>
        <li class="item-0"><a href="link5.html">fifth item</a></li>
    



 <li class="item-1 active"><a href="link4.html">fourth item</a></li>
        


## 遍历

### 单个元素

In [53]:
html = """
<div id="wraper">
<div id="container">
    <ul class="list">
        <li class="item-0">first item</li>
        <li class="item-1"><a href="link2.html"></a>second item</li>
        <li class="item-0 active"><a href="link3.html"><span class="blod">third item</span></a></li>
        <li class="item-1 active"><a href="link4.html">fourth item</a></li>
        <li class="item-0"><a href="link5.html">fifth item</a></li>
    </ul>
</div>
</div>
"""
from pyquery import PyQuery as pq
doc = pq(html)
item = doc('.list .item-0.active')
print(item)

<li class="item-0 active"><a href="link3.html"><span class="blod">third item</span></a></li>
        


In [75]:
items = doc('li')
print(items.items())
for i in items.items():
    print(i)

<generator object PyQuery.items at 0x00000139169333B8>
<li class="item-0">first item</li>
        
<li class="item-1"><a href="link2.html"/>second item</li>
        
<li class="item-0 active"><a href="link3.html"><span class="blod">third item</span></a></li>
        
<li class="item-1 active"><a href="link4.html">fourth item</a></li>
        
<li class="item-0"><a href="link5.html">fifth item</a></li>
    


## 获取信息

### 属性

In [68]:
html = """
<div id="wraper">
<div id="container">
    <ul class="list">
        <li class="item-0">first item</li>
        <li class="item-1"><a href="link2.html"></a>second item</li>
        <li class="item-0 active"><a href="link3.html"><span class="blod">third item</span></a></li>
        <li class="item-1 active"><a href="link4.html">fourth item</a></li>
        <li class="item-0"><a href="link5.html">fifth item</a></li>
    </ul>
</div>
</div>
"""
from pyquery import PyQuery as pq
doc = pq(html)
a = doc('.item-0.active a')
print(a)
print(a.attr['href'])
print(a.attr.href)

<a href="link3.html"><span class="blod">third item</span></a>
link3.html
link3.html


### 获取文本

In [78]:
print(a.text())
print(a.html())

third item
<span class="blod">third item</span>


## DOM操作

### addClass, removeClass

In [80]:
html = """
<div id="wraper">
<div id="container">
    <ul class="list">
        <li class="item-0">first item</li>
        <li class="item-1"><a href="link2.html"></a>second item</li>
        <li class="item-0 active"><a href="link3.html"><span class="blod">third item</span></a></li>
        <li class="item-1 active"><a href="link4.html">fourth item</a></li>
        <li class="item-0"><a href="link5.html">fifth item</a></li>
    </ul>
</div>
</div>
"""
from pyquery import PyQuery as pq
doc = pq(html)
li = doc('.item-0.active')
print(li)
li.removeClass('active')
print(li)
li.addClass('active')
print(li)

<li class="item-0 active"><a href="link3.html"><span class="blod">third item</span></a></li>
        
<li class="item-0"><a href="link3.html"><span class="blod">third item</span></a></li>
        
<li class="item-0 active"><a href="link3.html"><span class="blod">third item</span></a></li>
        
