# Web Scraping using Python - Try it yourself!

Let's start by importing the necessary libraries and parsing the very first page of one of the products (e.g. a notebook for taking notes 📓) on [TMall](https://list.tmall.com/search_product.htm?q=%B1%BE%D7%D3&type=p&vmarket=&spm=875.7931836%2FB.a2227oh.d100&xl=ben_1&from=mallfp..pc_1_suggest) website.


Every paragraph in this document is a cell, that can contain other text description, or a snippet of runnable Python code.

To run the cell, select it and click "Run" in the toolbar, or just press Shift-Enter. Double-clicking the cell allows you to edit its contents.

**Pro tip 🤓:** Run your cells often to catch possible errors early!

In [70]:
import requests
from bs4 import BeautifulSoup

url = "https://list.tmall.com/search_product.htm?q=%B1%BE%D7%D3&type=p&vmarket=&spm=875.7931836%2FB.a2227oh.d100&xl=ben_1&from=mallfp..pc_1_suggest"
response = requests.get(url)
html = response.content
scraped = BeautifulSoup(html, 'html.parser')

After **running** a cell above, you'll be able to use the `scraped` variable to look for elements on the page.

In order to see the page that we just run `scraped`, run a cell below. 👩‍💻

In [73]:
# scraped 

### Challenge 1: Print the title of the page

To print output in Python, you can use the `print()` function. It can either take a literal value as an argument`(print("hello")`, `print(2))`, or a variable - in that case function will print the value that the variable refers to!

```python
name = "Bob"
print(name) # => Bob
```

Remember you need to print just **text** inside the `<title>` tag, not the whole element!

In [72]:
# write your code here




<details>
<summary>
    <strong>Reveal answer 🤫</strong>
</summary>
<pre>
page_title = scraped.title.text
print(page_title)
</pre>
</details>

### Challenge 2: Print a *price* of the first product on the page

Remember how to locate a single element with BeautifulSoup.

In [74]:
# write your code here




<details>
<summary>
    <strong>Reveal answer 🤫</strong>
</summary>
<pre>
first_price = scraped.em.text
print(first_price)
</pre>
</details>

### Challenge 3: Print *all* prices from the page

Use the BeautifulSoup methods that return a *collection* of elements. Remind yourself of how to **loop** over them(`for.. in..` constuct)

Here's how you can get rid of a currency symbol and convert text to a numerical value(given that the initial text value is in a variable called `price`):

`price = float(price.text.lstrip("¥"))`

In [75]:
# write your code here




<details>
<summary>
    <strong>Reveal answer 🤫</strong>
</summary>
<pre>
prices = scraped.find_all("em", title=True)
for price in prices:
    price_float = float(price.text.lstrip("¥"))
    print(price_float)
</pre>
</details>

### Challenge 4: Print a title of the first product

In [76]:
# write your code here




<details>
<summary>
    <strong>Reveal answer 🤫</strong>
</summary>
<pre>
first_title = scraped.find('p', class_ = 'productTitle').a['title']
print(first_title)
</pre>
</details>

### Challenge 3: Print *all* prices from the page

Let's do the same operation that we did with prices.

**Pro tip 🤓:** Don't blindly copy-paste code from the cell above. (some corrections needed)

In [77]:
# write your code here




<details>
<summary>
    <strong>Reveal answer 🤫</strong>
</summary>
<pre>
titles = scraped.find_all('p', class_ = 'productTitle')
for title in titles:
    print(title.a['title'])
</pre>
</details>

### Challenge 5: Get a corresponding price for each title

This is how the resulting data structure should look like (a List of Dictionaries):
```    
    [{'Sharp Objects': 'WICKED above her hipbone, GIRL across her heart...'}, {'Sapiens: A Brief History of Humankind': 'From a renowned historian comes a groundbreaking narrative of humanity’s ...}]
```    
Note that the real descriptions will be much longer.

A reminder on how you can append a Dictionary into an List:

```python
title_prices = []

# Iterate over all articles 
    # Get article's title as `title` 
    # Get article's price as `price`
    title_prices.append({title: price})
```    

In [78]:
title_prices = []

# write your code here

print(title_prices)

[]


<details>
<summary>
<strong>Reveal answer 🤫</strong>
</summary>
<pre>
title_prices = []

products = scraped.select(".product")

for product in products:
    title = product.find('p', class_ = 'productTitle').a['title']
    price = product.em.text
    price_float = float(price.lstrip("¥"))
    title_prices.append({title: price_float}) # Create a Dictionary and append to Array

print(title_prices)
</pre>
</details>