#### 简述BeautifulSoup类中标签的基本元素

Tag 标签，最基本的信息组织单元，分别用`<>`和`</>`标明开头和结尾

Name 标签的名字，`<p>`…`</p>`的名字是'p'，格式：`<tag>.name`

Attributes 标签的属性，字典形式组织，格式：`<tag>.attrs`

NavigableString 标签内非属性字符串，`<>…</>`中字符串，格式：`<tag>.string`

Comment 标签内字符串的注释部分，一种特殊的Comment类型


#### 简述BeautifulSoup类中contents、children和descendants的作用

.contents 子节点的列表，将`<tag>`所有儿子节点存入列表

.children 子节点的迭代类型，与.contents类似，用于循环遍历儿子节点

.descendants 子孙节点的迭代类型，包含所有子孙节点，用于循环遍


#### 谈谈什么是平行遍历

平行遍历是在树状结构中，处于同一层之间的节点的遍历

#### 针对页面：http:/python123.io/ws/demo.html,写程序完成如下功能：

In [9]:
import requests as rts
import bs4

r = bs4.BeautifulSoup(rts.get("http://python123.io/ws/demo.html").text, "html.parser")

print(r)

<html><head><title>This is a python demo page</title></head>
<body>
<p class="title"><b>The demo python introduces several python courses.</b></p>
<p class="course">Python is a wonderful general-purpose programming language. You can learn Python from novice to professional by tracking the following courses:
<a class="py1" href="http://www.icourse163.org/course/BIT-268001" id="link1">Basic Python</a> and <a class="py2" href="http://www.icourse163.org/course/BIT-1001870001" id="link2">Advanced Python</a>.</p>
</body></html>


##### (1) 输出第一个a标签的名字、属性和标签包含的非属性字符串

In [4]:
a = r.find("a")
print(f"名字: {a.name}, 属性: {a.attrs}, 字符串{a.string}")

名字: a, 属性: {'href': 'http://www.icourse163.org/course/BIT-268001', 'class': ['py1'], 'id': 'link1'}, 字符串Basic Python


##### (2) 利用contents输出body标签的所有子标签的名字

In [8]:
for i in r.body.contents:
    print(i.name)

None
p
None
p
None


##### (3) 利用children输出body标签的所有子标签的名字

In [10]:
for i in r.body.contents:
    print(i.name)

None
p
None
p
None


##### (4) 输出该页面的所有标签的名字

In [11]:
for i in r.body.descendants:
    print(i.name)

None
p
b
None
None
p
None
a
None
None
a
None
None
None


##### (5) 输出所有a标签的href属性对应的值

In [15]:
for i in r.find_all("a"):
    print(i.attrs["href"])

http://www.icourse163.org/course/BIT-268001
http://www.icourse163.org/course/BIT-1001870001


##### (6) 输出第一个a标签的所有下行标签

In [16]:
for i in r.find("a").descendants:
    print(i.name)

None


#### 根据给出的“实例1中国大学定向爬虫”，自己实践完成该程序。要求：增加一项功能，即将取得排名信息不仅print到屏幕上，还需要保存到本地

In [None]:

import requests as rts
import bs4, re

url = "https://www.shanghairanking.cn/rankings/bcur/2022"

r = rts.get(url)
r.encoding = r.apparent_encoding
rb = bs4.BeautifulSoup(r.text, 'html.parser')

rank = rb.find_all('td', recursive=True)

result = [["排名", "学校", "分数"]]

for i in range(0,len(rank),6):
    sub = []
    for j in range(6):
        if(j == 2 or j == 3 or j == 4):
            continue
        if(j == 1):
            sub.append(rank[i+j].find('a').text.strip())
            # print(rank[i+j].find('a').text.strip(), end='\t')
            continue
        sub.append(rank[i+j].text.strip())
        # print(rank[i+j].text.strip(), end='\t')
    result.append(sub)
    # print("")

for i in result:
    print(f"{i[0]}\t\t{i[1]}\t\t\t{i[2]}")

with open("ranking.txt", "w") as f:
    for i in result:
        f.write(f"{i[0]}\t\t{i[1]}\t\t\t{i[2]}\n");