# Let's Have a **BeautifulSoup**
*Curtis Miller*

In this document we will see some basic usage of navigating the web with **BeautifulSoup**.

First, let's load in libraries.

In [None]:
import requests
from bs4 import BeautifulSoup

Let's set a header that will make our scraper look more "human" (just to be safe, and to see how it's done), then download a webpage from Wikipedia containing a [list of Nobel laureates](https://en.wikipedia.org/wiki/List_of_Nobel_laureates).

In [None]:
session = requests.Session()
# Our "human" header; go to https://www.whatismybrowser.com/ to see what the Internet can see about your browser,
# including what your header is. Below are the settings for a browser I used.
header = {"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
          "Accept-Language": "en-US,en;q=0.5",
          "Connection": "keep-alive",
          "Referrer": "https://www.google.com/",
          "User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux i686; rv:54.0) Gecko/20100101 Firefox/54.0"}

# The URL we are visiting
url = "https://en.wikipedia.org/wiki/List_of_Nobel_laureates"
# Visit the url, using the header just defined
page = session.get(url, headers=header).text
# A preview of the content
print(page)

Let's create a `BeautifulSoup` object to parse this document.

In [None]:
nobelList = BeautifulSoup(page)    # A warning will be thrown since no parser was specified
                                   # BeautifulSoup will choose to use the best parser available in this case

Now that we have the object let's see some common tools.

In [None]:
nobelList.find("a")    # Find a single link

In [None]:
nobelList.findAll("a")    # Find all links (returns in a list)

In [None]:
nobelList.findAll("a", {"class": "internal"})    # Find all links with a particular class

In [None]:
nobelList.findAll("a", {"class": "internal"})[0].\
    attrs["href"]                                     # For the first link, get its destination

In [None]:
nobelList.find("h1")

In [None]:
nobelList.find("h1").contents    # Get the contents of a tag

In [None]:
nobelList.table    # Another way to locate elements; this time, a table

In [None]:
nobelList.table.attrs["class"]    # This belongs to two classes

In [None]:
nobelList.find("table", {"class": ["wikitable", "sortable"]})    # A way to find elements with multiple classes

We can drill down through the DOM of a document and each layer has the same methods as the original object.

In [None]:
nobelList.table.children     # Child nodes of this node; gives an iterator

In [None]:
# Let's see what nodes are children
for node in nobelList.table.children:
    print("\nNode:\n-----")
    print("Name: %s" % node.name)    # The HTML name of a tag
    print(node)

In [None]:
nobelList.table.tr    # The first row of the table

In [None]:
nobelList.table.findAll("td")

In [None]:
nobelList.table.parent    # Locating the parent node