# Webscraping with Beautiful Soup

In [None]:
# import required modules

import requests
from bs4 import BeautifulSoup

## 1. Make a GET Request and Read in HTML

We use `requests` library to:
1. make a GET request to the page
2. read in the html of the page

In [None]:
# make a GET request
req = requests.get('')

# read the content of the server’s response
src = req.text

## 2. Soup it

Now we use the `BeautifulSoup` function to parse the reponse into an HTML tree. This returns an object (called a **soup object**) which contains all of the HTML in the original document.

In [None]:
# parse the response into an HTML tree
soup = BeautifulSoup(src, 'lxml')

# take a look
print(soup.prettify()[:1000])

## 3. Find Elements

BeautifulSoup has a number of functions to find things on a page. Like other webscraping tools, Beautiful Soup lets you find elements by their:

1. HTML tags
2. HTML Attributes
3. CSS Selectors


Let's search first for **HTML tags**. 

The function `find_all` searches the `soup` tree to find all the elements with an a particular HTML tag, and returns all of those elements.

What does the example below do?

In [None]:
# find all elements in a certain tag

soup.find_all("a")

That's a lot! Many elements on a page will have the same html tag. For instance, if you search for everything with the `a` tag, you're likely to get a lot of stuff, much of which you don't want. What if we wanted to search for HTML tags ONLY with certain attributes, like particular CSS classes? 

We can do this by adding an additional argument to the `find_all`

In the example below, we are finding all the `a` tags, and then filtering those with `class = "sidemenu"`.

In [None]:
# Get only the 'a' tags in 'sidemenu' class

soup.find_all("a", class_="sidemenu")

Oftentimes a more efficient way to search and find things on a website is by **CSS selector.** For this we have to use a different method, `select()`. Just pass a string into the `.select()` to get all elements with that string as a valid CSS selector.

In the example above, we can use "a.sidemenu" as a CSS selector, which returns all `a` tags with class `sidemenu`.

In [None]:
# get elements with "a.sidemenu" CSS Selector

soup.select("a.sidemenu")