# Web Scraping and Parsing Using Beautiful Soup (bs4) in Python - Tutorial 33 in Anaconda

## Working with objects

In [6]:
# ! pipenv install beautifulsoup4

In [7]:
from bs4 import BeautifulSoup

## The BeautifullSoup Object

In [12]:
filename = 'data/DSFD_Listing.html'

html_doc = None
with open(filename, 'r') as f:
    html_doc = f.read()

In [14]:
soup = BeautifulSoup(html_doc, 'html.parser')
print(soup)

<html>
<head>
<title>Best Books</title>
</head>
<body>
<p class="title"><b>DATA SCIENCE FOR DUMMIES</b></p>
<p class="description">Jobs in data science abound, but few people have the data science skills needed to fill these
    increasingly important roles in organizations. Data Science For Dummies is the pe
    <br/><br/>
    Edition 1 of this book:
    <br/>
<ul>
<li>Provides a background in data science fundamentals before moving on to working with relational databases and
        unstructured data and preparing your data for analysis</li>
<li>Details different data visualization techniques that can be used to showcase and summarize your data</li>
<li>Explains both supervised and unsupervised machine learning, including regression, model validation, and
        clustering techniques</li>
<li>Includes coverage of big data processing tools like MapReduce, Hadoop, Storm, and Spark</li>
</ul>
<br/><br/>
    What to do next:
    <br/>
<a class="preview" href="http://www.data-mania.com/b

In [15]:
print(soup.prettify())

<html>
 <head>
  <title>
   Best Books
  </title>
 </head>
 <body>
  <p class="title">
   <b>
    DATA SCIENCE FOR DUMMIES
   </b>
  </p>
  <p class="description">
   Jobs in data science abound, but few people have the data science skills needed to fill these
    increasingly important roles in organizations. Data Science For Dummies is the pe
   <br/>
   <br/>
   Edition 1 of this book:
   <br/>
   <ul>
    <li>
     Provides a background in data science fundamentals before moving on to working with relational databases and
        unstructured data and preparing your data for analysis
    </li>
    <li>
     Details different data visualization techniques that can be used to showcase and summarize your data
    </li>
    <li>
     Explains both supervised and unsupervised machine learning, including regression, model validation, and
        clustering techniques
    </li>
    <li>
     Includes coverage of big data processing tools like MapReduce, Hadoop, Storm, and Spark
    </li>


## Tag Objects

### Working with names

In [17]:
soup = BeautifulSoup('<b body="description">Product Description</b>', 'html')

In [19]:
tag = soup.b

In [21]:
print(tag)

<b body="description">Product Description</b>


In [22]:
tag.name

'b'

In [24]:
tag.name = 'p'
tag

<p body="description">Product Description</p>

### Working with attributes

In [26]:
tag['body']

'description'

In [27]:
tag.attrs

{'body': 'description'}

In [30]:
tag[ 'id'] = 3
tag

<p body="description" id="3">Product Description</p>

In [32]:
del tag['body']

In [33]:
tag

<p id="3">Product Description</p>

In [35]:
tag.attrs

{'id': 3}

### Using tags to navigate a tree

In [38]:
soup = BeautifulSoup(html_doc, 'html.parser')

In [40]:
soup.head

<head>
<title>Best Books</title>
</head>

In [42]:
soup.title

<title>Best Books</title>

In [43]:
soup.body.b

<b>DATA SCIENCE FOR DUMMIES</b>

In [44]:
soup.ul

<ul>
<li>Provides a background in data science fundamentals before moving on to working with relational databases and
        unstructured data and preparing your data for analysis</li>
<li>Details different data visualization techniques that can be used to showcase and summarize your data</li>
<li>Explains both supervised and unsupervised machine learning, including regression, model validation, and
        clustering techniques</li>
<li>Includes coverage of big data processing tools like MapReduce, Hadoop, Storm, and Spark</li>
</ul>

In [45]:
soup.a

<a class="preview" href="http://www.data-mania.com/blog/books-by-lillian-pierson/" id="link 1">See a preview of the
      book</a>