# Exception Handling

In [None]:
from urllib.request import urlopen
from urllib.error import HTTPError
from urllib.error import URLError

The web is messy. Data is poorly formatted, websites go down, and closing tags go
missing. One of the most frustrating experiences in web scraping is to go to sleep
with a scraper running, dreaming of all the data you’ll have in your database the next
day—only to find that the scraper hit an error on some unexpected data format and
stopped execution shortly after you stopped looking at the screen. 

```html = urlopen('http://www.pythonscraping.com/pages/page1.html')```
Two main things can go wrong in this line
1. The page is not found on the server (or there was an error in retrieving it). `HTTPError`
2. The server is not found `URLError`

You can handle this exception in the following way:

In [None]:
try:
    html = urlopen('http://www.pythonscraping.com/pages/page1.html')
except HTTPError as e:
    print(e)
except URLError as e:
    print("server could not be found")
else:
    print("it worked")

Every time you access a tag in a BeautifulSoup object, it’s smart to add a check to make sure the tag actually exists. If you attempt to access a tag that does not exist BeautifulSoup will return a None object.
The problem is, attempting to access a tag on a None object itself will result in an AttributeError being thrown.

In [None]:
from bs4 import BeautifulSoup
bs = BeautifulSoup(html.read(), 'html.parser')
print(bs.find('nonExistentTag'))

In [None]:
try:
    title = bs.find('body').find('h1')
except AttributeError as e:
    print('tag not found')
except HTTPError as e:
    print(e)
else:
    print(title)

When writing scrapers, it’s important to think about the overall pattern of your code
in order to handle exceptions and make it readable at the same time. You’ll also likely
want to heavily reuse code. Having generic functions such as getSiteHTML and
getTitle (complete with thorough exception handling) makes it easy to quickly—
and reliably—scrape the web.