# 修改文档树

BeautuifulSoup强项是文档树的搜索，但也可以修改文档树

## 修改tag的名称和属性

In [1]:
from bs4 import BeautifulSoup
soup = BeautifulSoup('<b class="boldest">Extremely bold</b>', 'lxml')
tag = soup.b
tag.name = "blockquote"
tag['class'] = 'verybold'
tag['id'] = 1
tag

<blockquote class="verybold" id="1">Extremely bold</blockquote>

In [2]:
del tag["class"]
del tag["id"]
tag

<blockquote>Extremely bold</blockquote>

## 修改.string

给tag的.string属性赋值就相当于用当前的内容替代了原来的内容

In [3]:
markup = '<a href="httP://example.com/">I linked to <i>example.com</i></a>'
soup = BeautifulSoup(markup, 'lxml')
tag = soup.a
tag.string = "New link text."
tag

<a href="httP://example.com/">New link text.</a>

**如果当前的tag包含了其他的tag，那么给它的.string属性赋值会覆盖掉原有的所有内容包括子tag**

## append()

Tag.append()方法向tag中添加内容就好像python的列表的.append()方法

In [4]:
soup = BeautifulSoup("<a>Foo</a>", 'lxml')
soup.a.append("Bar")
soup

<html><body><a>FooBar</a></body></html>

In [5]:
soup.prettify()

'<html>\n <body>\n  <a>\n   Foo\n   Bar\n  </a>\n </body>\n</html>'

In [6]:
soup.a.contents

['Foo', 'Bar']

## BeautifulSoup.new_string()和.new_tag()

### 添加文本

如果想要添加一段文本内容到文档中也没问题，可以调用python的append()方法或调用工厂方法BeautifulSoup.new_string()

In [7]:
soup = BeautifulSoup("<b></b>", 'lxml')
tag = soup.b
tag.append("Hello")
new_string = soup.new_string(" there")
tag.append(new_string)
tag

<b>Hello there</b>

In [8]:
tag.contents

['Hello', ' there']

### 创建注释或NavigableString的任何子类

将子类作为new_string()方法的第二个参数传入

In [9]:
from bs4 import Comment
new_comment = soup.new_string("Nice to see you.", Comment)
tag.append(new_comment)
tag

<b>Hello there<!--Nice to see you.--></b>

In [10]:
tag.contents

['Hello', ' there', 'Nice to see you.']

### 创建一个tag最好的方法是调用工厂方法BeautifulSoup.new_tag()

In [11]:
soup = BeautifulSoup("<b></b>", 'lxml')
original_tag = soup.b

In [12]:
new_tag = soup.new_tag("a", href="http://www.example.com")
original_tag.append(new_tag)
original_tag

<b><a href="http://www.example.com"></a></b>

第一个参数作为tag的name,是必填项，其他参数可选

## insert()

tag.insert()方法与tag.append()方法类似，区别是不会把新元素添加到父节点.contents属性的最后，而是把元素插入到指定的位置，与python列表的.insert()方法的用法相同

In [13]:
markup = '<a href="http://example.com/">I linked to <i>example.com</i></a>'
soup = BeautifulSoup(markup, 'lxml')
tag = soup.a
tag.insert(1, "but did not endorse ")
tag

<a href="http://example.com/">I linked to but did not endorse <i>example.com</i></a>

In [14]:
tag.contents

['I linked to ', 'but did not endorse ', <i>example.com</i>]

## insert_before()和insert_after()

insert_before()方法在当前tag或文本节点前插入内容

In [15]:
soup = BeautifulSoup("<b>stop</b>", 'lxml')
tag = soup.new_tag("i")
tag.string = "Don't"
soup.b.string.insert_before(tag)
soup.b

<b><i>Don't</i>stop</b>

insert_after()方法在当前tag或文本节点后插入内容

In [16]:
soup.b.i.insert_after(soup.new_string(" ever "))
soup.b

<b><i>Don't</i> ever stop</b>

In [17]:
soup.b.contents

[<i>Don't</i>, ' ever ', 'stop']

## clear()

tag.clear()方法移除当前tag的内容：

In [18]:
markup = '<a href="http://example.com/">I linked to <i>example.com</i></a>'
soup = BeautifulSoup(markup, 'lxml')
tag = soup.a
tag.clear()
tag

<a href="http://example.com/"></a>

## extract()

PageElement.extract()方法将当前的tag移除文档树，并作为方法结果返回：

In [19]:
markup = '<a href="http://example.com/">I linked to <i>example.com</i></a>'
soup = BeautifulSoup(markup, 'lxml')
a_tag = soup.a
i_tag = soup.i.extract()
a_tag

<a href="http://example.com/">I linked to </a>

In [20]:
i_tag

<i>example.com</i>

In [21]:
print(i_tag.parent)

None


此方法实际上产生了2个文档树：一个是用来解析原始文档的BeautifulSoup对象，另一个是被移除并且返回的tag可以继续调用extract方法

In [22]:
my_string = i_tag.string.extract()
my_string

'example.com'

In [23]:
i_tag

<i></i>

## decompose()

tag.decompose()方法将当前节点移除文档树并完全销毁

In [24]:
markup = '<a href="http://example.com/">I linked to <i>example.com</i></a>'
soup = BeautifulSoup(markup, 'lxml')
a_tag = soup.a
soup.i.decompose()
a_tag

<a href="http://example.com/">I linked to </a>

## replace_with()

PageElement.replace_with()方法移除文档树中的某段内容，并用新tag或文本节点替代它：

In [25]:
markup = '<a href="http://example.com/">I linked to <i>example.com</i></a>'
soup = BeautifulSoup(markup, 'lxml')
a_tag = soup.a
new_tag = soup.new_tag("b")
new_tag.string = "example.net"
a_tag.i.replace_with(new_tag)
a_tag

<a href="http://example.com/">I linked to <b>example.net</b></a>

replace_with()方法返回被替代的tag或文本节点，可以用来浏览或添加到文档树的其他地方

## wrap()

PageElement.wrap()方法可以对指定的tag元素进行包装，并返回包装后的结果

In [26]:
soup = BeautifulSoup("<p>I wish I was bold.</p>", 'lxml')
soup.p.string.wrap(soup.new_tag("b"))

<b>I wish I was bold.</b>

In [27]:
soup.p.wrap(soup.new_tag("div"))

<div><p><b>I wish I was bold.</b></p></div>

## unwrap()

tag.unwrap()方法与wrap()方法相反，将移除tag内的所有tag标签，该方法常被用来进行标记的解包

In [28]:
markup = '<a href="http://example.com/">I linked to <i>example.com</i></a>'
soup = BeautifulSoup(markup, 'lxml')
a_tag = soup.a
a_tag.i.unwrap()
a_tag

<a href="http://example.com/">I linked to example.com</a>

与replace_with()方法相同，unwrap()方法返回被移除的tag