I went through connecting reliably ,using BeautifulSoup and handling exceptions. 

In [3]:
# get data from a site
from urllib.request import urlopen

html = urlopen('http://pythonscraping.com/pages/page1.html')
print(html.read())

#Done !!! :)

b'<html>\n<head>\n<title>A Useful Page</title>\n</head>\n<body>\n<h1>An Interesting Title</h1>\n<div>\nLorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.\n</div>\n</body>\n</html>\n'


In [7]:
#After installing beautiful soup.
from bs4 import BeautifulSoup
from urllib.request import urlopen

html = urlopen('http://pythonscraping.com/pages/page1.html')
bs = BeautifulSoup(html.read(),'html.parser')

print(bs.h1)

# returns the first h1 tag.
# the other possible parser is lxml
#xml has some advantages over html.parser in that it is generally better at parsing
# “messy” or malformed HTML code.
#One of the disadvantages of lxml is that it has to be installed separately and depends
# on third-party C libraries to function. This can cause problems for portability and
# ease of use, compared to html.parser.


<h1>An Interesting Title</h1>


## connecting reliably and handling exceptions

Always anticipate exceptions.

Things that may go wrong:
- The page is not found on the server (HTTP error)
- The server is not found.



In [4]:
from urllib.request import urlopen
from urllib.error import HTTPError

#retrieve html.
try:
    html = urlopen('http://pythnscraping.com/pages/page1.html')
    #urlopen means that the server wasn't found.
except HTTPError as e:
    print("There was an error")
    #return null, break or do some other "Plan B"


URLError: <urlopen error [Errno -2] Name or service not known>

This errors must be caught and handled.


In [6]:
from urllib.request import urlopen
from urllib.error import HTTPError
from urllib.error import URLError
try:
    html = urlopen('https://pythonscrapingthisurldoesnotexist.com')
except HTTPError as e:
    print(e)
except URLError as e:
    print('The server could not be found!')
else:
    print('It Worked!')

The server could not be found!


What if the page retrieved successfully isn't what you expected?
Every time yo access a tag in a BeautifulSoup object, it's smart to add a check to make sure the tag actually exists.
If the tag doesn't exist, BeautifulSoup will return a None object and thus will throw an AttributeError.


Handle the errors.

In [9]:
try :
    badContent = bs.nonExistingTag.anotherTag
except AttributeError as e:
    print('Tag was not found')
else :
    if badContent == None:
        print('Tag was not found')
    else : 
        print(badContent)

Tag was not found


## Checking and Handling can be tedious.
Like knocking on a stadium door (Its hard and clanky) so lets simplify.

Lets get the handling to be easier to read and write by.

<h1>An Interesting Title</h1>


In [None]:
from urllib.request import urlopen
from urllib.error import HTTPError
from bs4 import BeautifulSoup

def getTitle(url):
    try:
        html = urlopen(url)
    except HTTPError as e:
        return None
    try:
        bs = BeautifulSoup(html.read(),'html.parser')
        title = bs.body.h1
    except AttributeError as e:
        return None
    return title


title = getTitle('http://www.pythonscraping.com/pages/page1.html')
if title == None:
    print('Title could be found')
else:
    print(title)