# Parsing HTML

In [None]:
from urllib.request import urlopen
from bs4 import BeautifulSoup
from urllib.error import HTTPError
from urllib.error import URLError

In [None]:
url = 'http://www.pythonscraping.com/pages/page1.html'
try:
    html = urlopen(url)
except HTTPError as e:
    print(e)
except URLError:
    print("page not found")
else:
    print(html)

In [None]:
bs = BeautifulSoup(html.read(), 'html.parser')

Using this BeautifulSoup object, you can use the find_all function to extract a
Python list of proper nouns found by selecting only the text within <span
class="green"></span> tags (find_all is an extremely flexible function you’ll be
using a lot later in this book):

In [None]:
nameList = bs.find_all('span', {'class' : {'green', 'red'}}) #returns the span tags with class name as red or green
print(nameList)
heading = bs.find('h1')
print(heading.get_text()) # or use heading.text

## find() and find_all() with BeautifulSoup
The two functions are extremely similar, as evidenced by their definitions in the
BeautifulSoup documentation:
``` 
    find_all(tag, attributes, recursive, string, limit, keywords)
    find(tag, attributes, recursive, string, keywords)
```

The recursive argument is a boolean. How deeply into the document do you want to
go? If recursive is set to True (default), the find_all function looks into children, children`s and  children, for tags that match your parameters.
If it is False, it will look only at the top-level tags in your document.

 if you want to find the number of times “the prince” is surrounded by tags on the example page, you could replace your `.find_all()` function in the previous example with the following lines

In [None]:
nameList = bs.find_all(string='the prince')
print(len(nameList))

The keyword argument allows you to select tags that contain a particular attribute or
set of attributes. For example:

In [None]:
title = bs.find_all(id='title', class_='text')

Alternatively, you can enclose class in quotes:
`bs.find_all('', {'class':'green'})`

there are two more objects in the library that, although less commonly
used, are still important to know about:
### NavigableString objects
    Used to represent text within tags, rather than the tags themselves (some functions operate on and produce NavigableStrings, rather than tag objects).
### Comment object
    Used to find HTML comments in comment tags, <!--like this one-->.