# Let's talk about Web Scraping

In [None]:
import requests                 # needed to make a request to the server
from bs4 import BeautifulSoup   # needed to read the soup that is HTML

url = "http://books.toscrape.com/index.html"  # the url we're going to scrape
response = requests.get(url)                  # the request to the server
html = response.content                       # the HTML we get from the request
parsed = BeautifulSoup(html, 'html.parser')   # the BeautifulSoup done with the HTML

## Let's scrape some information!!

### Get an element by it's tag name

Let's get the text **inside** the <code>title</code> tag 
(and remove the whitespaces surrounding it!)

In [None]:
parsed.title.text.strip()

What about getting an information inside an attribute?

In [None]:
parsed.h3.a['title']

### Get an element by something other than it's tag name

We can say what is the class of the element by using the <code>find</code> method

In [None]:
parsed.find("article", class_="product_pod")

From the result, we can continue searching

In [None]:
parsed.find("article", class_="product_pod").h3.a['title']

### Getting multiple elements

Quick review on <code> for ... in ... : </code>

In [None]:
names = ['Rodrigo', 'Paola', 'Vitor', 'Arthur', 'Gabriela', 'Annelise']

for name in names:
    # this code will run for EACH name inside the names list
    print ('Hello ' + name)

If we want to get all the elements with the same tag name and/or class, we can use the
<code>find_all</code> method and do some code for **each** element (using the loop <code>for ... in ...:</code>)

In [None]:
all_articles = parsed.find_all('article', class_="product_pod")

# we want to get the book's title from EACH article
for article in all_articles:
    print(article.h3.a['title'])

Can we get the price of that book? (as a float, of course =D)

In [None]:
for article in all_articles:
    price = article.find('p', class_='price_color')
    price = float(price.text.lstrip('£')) # this will remove the £ sign from the left of the text
    print(price)

**Hard!** What if we want to store both the name AND the price for EVERY book? Quick tip:

- **Lists `[]`** are good to store similiar elements and/or when the **position matters**
- **Dictionaries `{}`** are good to store multiple infos about one element and/or to **label each information**

In [None]:
all_books = []

for article in all_articles:
    title = article.h3.a['title']
    price = article.find('p', class_='price_color')
    price = float(price.text.lstrip('£'))
    book = {'book_title': title, 'book_price': price}
    all_books.append(book)
    
print(all_books)

We can also use [CSS Selectors](https://www.freecodecamp.org/news/css-selectors-cheat-sheet/) in order to find some elements in our HTML using the `select` method

In [None]:
all_articles_two = parsed.select('.product_pod')

for article in all_articles_two:
    title = article.h3.a['title']
    price = article.find('p', class_='price_color')
    price = float(price.text.lstrip('£'))
    print(title)
    print(price)
    print('\n')