# Connecting Reliably and Handling Exceptions

**Two** main things can go wrong when attempting to connect to a website for scraping:  
1. Page not found  
    `HTTPError`  


2. Server not found  
    `URLError`

In [2]:
# Import urlopen
from urllib.request import urlopen

In [5]:
# Valid URL
html = urlopen('http://www.pythonscraping.com/pages/page1.html')

In [6]:
# Invalid URL
# Throws HTTPError (page error)
non_working_html = urlopen('http://www.thisurldoesnotexist.com')

HTTPError: HTTP Error 404: Not Found

In [7]:
# Import HTTPError and URLError
from urllib.error import HTTPError, URLError

# Use try-catch to handle both errors
try:
    html = urlopen('http://www.thisurldoesnotexist.com')
except HTTPError as e:
    print(e)
except URLError as e:
    print('The server could not be found!')
else:
    print('It worked!')

HTTP Error 404: Not Found


In [10]:
# install bs4
%pip install bs4

Collecting bs4
  Downloading bs4-0.0.1.tar.gz (1.1 kB)
Collecting beautifulsoup4
  Downloading beautifulsoup4-4.9.3-py3-none-any.whl (115 kB)
[K     |████████████████████████████████| 115 kB 1.8 MB/s eta 0:00:01
[?25hCollecting soupsieve>1.2
  Downloading soupsieve-2.2.1-py3-none-any.whl (33 kB)
Building wheels for collected packages: bs4
  Building wheel for bs4 (setup.py) ... [?25ldone
[?25h  Created wheel for bs4: filename=bs4-0.0.1-py3-none-any.whl size=1273 sha256=a99ad3bc245d53d915f6b680fd1296320986c4dca3402cf2b63cfcf33237c923
  Stored in directory: /private/var/folders/qg/nrn8mtv572gcjwl4kbl3gqvr0000gn/T/pip-ephem-wheel-cache-wqbklyfh/wheels/0a/9e/ba/20e5bbc1afef3a491f0b3bb74d508f99403aabe76eda2167ca
Successfully built bs4
Installing collected packages: soupsieve, beautifulsoup4, bs4
Successfully installed beautifulsoup4-4.9.3 bs4-0.0.1 soupsieve-2.2.1
Note: you may need to restart the kernel to use updated packages.


In [11]:
# Import bs4
from bs4 import BeautifulSoup

### Error can occur if connection is successful but content request is invalid.  
BeautifulSoup returns a `None` object when trying to access a tag that does not exist.  
Calling a function on a `None` object will return an `AttributeError`.

In [12]:
bs = BeautifulSoup(html.read(), 'html.parser')

print(bs.invalidTag)

None


  name=tag_name


In [13]:
print(bs.invalidTag.anotherTag)

AttributeError: 'NoneType' object has no attribute 'anotherTag'

In [14]:
# use try-catch to check for invalid content request
try:
    badContent = bs.invalidTag.anotherTag
except AttributeError as e:
    print('Tag not found')
else:
    if badContent == None:
        print('Tag not found')
    else:
        print(badContent)

Tag not found


  name=tag_name


## Cleaner Code

In [28]:
# imports
from urllib.request import urlopen
from urllib.error import HTTPError
from bs4 import BeautifulSoup

def getTitle(url):
    try:
        html = urlopen(url)
    except HTTPError as e:
        print('Page not found!')
        return None
    except URLError as e:
        print('Server not found!')
        return None
    try:
        bs = BeautifulSoup(html.read(), 'html.parser')
        title = bs.body.h1
    except AttributeError as e:
        print('Tag not found!')
        return None
    return title


title = getTitle('http://www.pythonscraping.com/pages/page1.html')
print(title)

<h1>An Interesting Title</h1>
