## Manipulating HTML Content

Let us understand how to manipulate HTML content leveraging APIs provided by BeautifulSoup.

* `decompose` - to remove the tag along with the content.
* `unwrap` - to remove the tag by retaining the content.
* We can also change the properties of the tag, by assigning values  to the generated dict type object.
* We can also enclose existing content or tag into new tags.

In [92]:
%%HTML
<iframe width="560" height="315" src="https://www.youtube.com/embed/v5k1iA2RkW4?rel=0&amp;controls=1&amp;showinfo=0" frameborder="0" allowfullscreen></iframe>

In [93]:
html_str = """
<p>Some Text</p>
<table>
    <tbody>
        <tr>
            <th>Details</th>
            <th>URL</th>
        </tr>
        <tr>
            <td>Video Content</td>
            <td><a href="https://www.youtube.com/itversityin">YouTube Channel</a>
            </td>
        </tr>
        <tr>
            <td>Reference Material</td>
            <td><a href="https://www.github.com/dgadiraju/itversity-books">GitHub Repository</a>
            </td>
        </tr>
    </tbody>
</table>
"""

In [94]:
from IPython.core.display import HTML, display
display(HTML(html_str))

Details,URL
Video Content,YouTube Channel
Reference Material,GitHub Repository


In [95]:
from bs4 import BeautifulSoup

soup = BeautifulSoup(html_str, 'html.parser')
print(soup.prettify())

<p>
 Some Text
</p>
<table>
 <tbody>
  <tr>
   <th>
    Details
   </th>
   <th>
    URL
   </th>
  </tr>
  <tr>
   <td>
    Video Content
   </td>
   <td>
    <a href="https://www.youtube.com/itversityin">
     YouTube Channel
    </a>
   </td>
  </tr>
  <tr>
   <td>
    Reference Material
   </td>
   <td>
    <a href="https://www.github.com/dgadiraju/itversity-books">
     GitHub Repository
    </a>
   </td>
  </tr>
 </tbody>
</table>



### Using decompose

In [96]:
p = soup.find('p')

In [97]:
p.decompose()

In [98]:
soup



<table>
<tbody>
<tr>
<th>Details</th>
<th>URL</th>
</tr>
<tr>
<td>Video Content</td>
<td><a href="https://www.youtube.com/itversityin">YouTube Channel</a>
</td>
</tr>
<tr>
<td>Reference Material</td>
<td><a href="https://www.github.com/dgadiraju/itversity-books">GitHub Repository</a>
</td>
</tr>
</tbody>
</table>

### Using unwrap

In [99]:
a = soup.find('a')

In [100]:
a

<a href="https://www.youtube.com/itversityin">YouTube Channel</a>

In [101]:
a.unwrap()

<a href="https://www.youtube.com/itversityin"></a>

In [102]:
soup



<table>
<tbody>
<tr>
<th>Details</th>
<th>URL</th>
</tr>
<tr>
<td>Video Content</td>
<td>YouTube Channel
</td>
</tr>
<tr>
<td>Reference Material</td>
<td><a href="https://www.github.com/dgadiraju/itversity-books">GitHub Repository</a>
</td>
</tr>
</tbody>
</table>

In [103]:
from IPython.core.display import display, HTML
display(HTML(str(soup)))

Details,URL
Video Content,YouTube Channel
Reference Material,GitHub Repository


### Updating Tag Attribute

In [104]:
for tag in soup.find_all('tr'):
    print(tag)

<tr>
<th>Details</th>
<th>URL</th>
</tr>
<tr>
<td>Video Content</td>
<td>YouTube Channel
</td>
</tr>
<tr>
<td>Reference Material</td>
<td><a href="https://www.github.com/dgadiraju/itversity-books">GitHub Repository</a>
</td>
</tr>


In [105]:
for tag in soup.find_all('tr'):
    print(tag['class'])

KeyError: 'class'

In [None]:
for tag in soup.find_all('tr'):
    tag['class'] = 'special'

In [None]:
for tag in soup.find_all('tr'):
    print(tag['class'])

In [None]:
soup

### Wrapping Text

In [106]:
strong = soup.new_tag('strong')

In [107]:
strong

<strong></strong>

In [108]:
type(strong)

bs4.element.Tag

In [109]:
td = soup.find('td')
td

<td>Video Content</td>

In [110]:
td.text

'Video Content'

In [111]:
strong.insert(0, td.text)

In [112]:
strong

<strong>Video Content</strong>

In [113]:
td.string = ''

In [114]:
td

<td></td>

In [115]:
td.insert(0, strong)

In [117]:
soup



<table>
<tbody>
<tr>
<th>Details</th>
<th>URL</th>
</tr>
<tr>
<td><strong>Video Content</strong></td>
<td>YouTube Channel
</td>
</tr>
<tr>
<td>Reference Material</td>
<td><a href="https://www.github.com/dgadiraju/itversity-books">GitHub Repository</a>
</td>
</tr>
</tbody>
</table>

In [118]:
for tag in soup.find_all('td'):
    if not tag.find('a'):
        strong = soup.new_tag('strong')
        strong.insert(0, tag.text)
        tag.string = ''
        tag.insert(0, strong)

In [119]:
soup



<table>
<tbody>
<tr>
<th>Details</th>
<th>URL</th>
</tr>
<tr>
<td><strong>Video Content</strong></td>
<td><strong>YouTube Channel
</strong></td>
</tr>
<tr>
<td><strong>Reference Material</strong></td>
<td><a href="https://www.github.com/dgadiraju/itversity-books">GitHub Repository</a>
</td>
</tr>
</tbody>
</table>

In [None]:
from IPython.core.display import HTML, display
display(HTML(str(soup)))