<p><img alt="Colaboratory logo" height="45px" src="/img/colab_favicon.ico" align="left" hspace="10px" vspace="0px"></p>

<h1>What is Colaboratory?</h1>

Colaboratory, or "Colab" for short, allows you to write and execute Python in your browser, with
- Zero configuration required
- Free access to GPUs
- Easy sharing

Whether you're a **student**, a **data scientist** or an **AI researcher**, Colab can make your work easier. Just get started below!

## **Getting started**

**Save a copy of this document to your Google Drive: Go to the upper left-hand corner of this page, click "File" and choose "Save a copy in Drive".** You will see a folder named "Colab Notebooks" in your Google Drive.

The document you are reading is not a static web page, but an interactive environment called a **Colab notebook** that lets you write and execute code.

For example, here is a **code cell** with a short Python script that computes a value, stores it in a variable, and prints the result:

In [None]:
seconds_in_a_day = 24 * 60 * 60
seconds_in_a_day

86400

To execute the code in the above cell, select it with a click and then either press the play button to the left of the code, or use the keyboard shortcut "Command/Ctrl+Enter". To edit the code, just click the cell and start editing.

Variables that you define in one cell can later be used in other cells:

In [None]:
seconds_in_a_week = 7 * seconds_in_a_day
seconds_in_a_week

604800

Evaluate a formula =-2^2

In [None]:
-2^2

-4

Print out something?

In [None]:
print('Hello World!')

Hello World!


In [None]:
print(seconds_in_a_week)

604800


Colab notebooks allow you to combine **executable code** and **rich text** in a single document, along with **images**, **HTML**, **LaTeX** and more. When you create your own Colab notebooks, they are stored in your Google Drive account. You can easily share your Colab notebooks with co-workers or friends, allowing them to comment on your notebooks or even edit them.

## **Let's create a Price Tracker app**
You would like to design a price tracker that can help you monitor the price of a product you are interested in. The product (e.g., a gaming laptop) is available from multiple sellers on an e-commerce site (e.g., Amazon).

**Step 1: Install/ Import the necessary libraries**

In [7]:
import bs4
from bs4 import BeautifulSoup as bs
import requests
import pandas as pd

**Step 2: Connect to a website and download the page content**

In [10]:
# We will be scraping a list of Gaming laptops for sales on Flipkart.com, an Indian e-commerce platform
# Please do not scrape the Amazon website in the class. UW IP address might be blocked if we send too many requests to Amazon.com
url='https://www.uwbookstore.com/Wisconsin-Badgers/Tech/MacBook'
page = requests.get(url)
#page.content
soup = bs(page.content, 'html.parser')

In [11]:
#page.content

In [12]:
soup

<!DOCTYPE html>
<html lang="en-US"><head><title>Just a moment...</title><meta content="text/html; charset=utf-8" http-equiv="Content-Type"/><meta content="IE=Edge" http-equiv="X-UA-Compatible"/><meta content="noindex,nofollow" name="robots"/><meta content="width=device-width,initial-scale=1" name="viewport"/><style>*{box-sizing:border-box;margin:0;padding:0}html{line-height:1.15;-webkit-text-size-adjust:100%;color:#313131;font-family:system-ui,-apple-system,BlinkMacSystemFont,Segoe UI,Roboto,Helvetica Neue,Arial,Noto Sans,sans-serif,Apple Color Emoji,Segoe UI Emoji,Segoe UI Symbol,Noto Color Emoji}body{display:flex;flex-direction:column;height:100vh;min-height:100vh}.main-content{margin:8rem auto;max-width:60rem;padding-left:1.5rem}@media (width <= 720px){.main-content{margin-top:4rem}}.h2{font-size:1.5rem;font-weight:500;line-height:2.25rem}@media (width <= 720px){.h2{font-size:1.25rem;line-height:1.5rem}}#challenge-error-text{background-image:url(data:image/svg+xml;base64,PHN2ZyB4bWx

**Step 3: Retrieve data of interests**

In [21]:
merchsku = soup.findAll('span',class_='merchSKU')
merchsku

[]

In [25]:
product_id_list=[]
for sku in merchsku:
  skunum = sku.text
  product_id_list.append(skunum)
  print(skunum)

19425308276       
19594989902       
19594989947       
19594989992       
19594990037       
19594912417       
19594912511       
19594912605       
19594912464       
19594990103       
19594990149       
19594990195       


In [26]:
product_name_list = []
merchtitle = soup.findAll('p',class_='merchTitle')
for title in merchtitle:
    itemtitle = title.text
    product_name_list.append(itemtitle)
    print(itemtitle)

MacBook Air 13" M2; 8GB Memory; 512GB SSD (Starlight)
MacBook Air 13" M2; 16GB; 256GB SSD (Space Gray)
MacBook Air 13" M2; 16GB; 256GB SSD (Silver)
MacBook Air 13" M2; 16GB; 256GB SSD (Starlight)
MacBook Air 13" M2; 16GB; 256GB SSD (Midnight)
MacBook Air 13" M3; 8GB Memory; 256GB SSD (Space Gray)
MacBook Air 13" M3; 8GB Memory; 256GB SSD (Silver)
MacBook Air 13" M3; 8GB Memory; 256GB SSD (Starlight)
MacBook Air 13" M3; 8GB Memory; 512GB SSD (Space Gray)
MacBook Air 13" M3; 16GB; 256GB SSD (Space Gray)
MacBook Air 13" M3; 16GB; 256GB SSD (Silver)
MacBook Air 13" M3; 16GB; 256GB SSD (Starlight)


In [27]:
price_list=[]
merchprice = soup.findAll('span',class_='merchPriceCurrent')
for price in merchprice:
    currentprice = price.text
    price_list.append(currentprice)
    print(currentprice)

$1,099.00
$899.00
$899.00
$899.00
$899.00
$999.00
$999.00
$999.00
$1,199.00
$999.00
$999.00
$999.00


In [None]:
# Create several lists to stores the data we are interested for each product
# Data includes product name, price, and product number

In [37]:
image_list=[]
image = soup.findAll('img', class_ = 'merchImage')
image

[<img alt='Image For MacBook Air 13" M2; 8GB Memory; 512GB SSD (Starlight)' class="merchImage img-responsive" src="/storeimages/177-1766043-1.png" width="200"/>,
 <img alt='Image For MacBook Air 13" M2; 16GB; 256GB SSD (Space Gray)' class="merchImage img-responsive" src="/storeimages/177-1872303-1.png" width="200"/>,
 <img alt='Image For MacBook Air 13" M2; 16GB; 256GB SSD (Silver)' class="merchImage img-responsive" src="/storeimages/177-1872306-1.png" width="200"/>,
 <img alt='Image For MacBook Air 13" M2; 16GB; 256GB SSD (Starlight)' class="merchImage img-responsive" src="/storeimages/177-1872309-1.png" width="200"/>,
 <img alt='Image For MacBook Air 13" M2; 16GB; 256GB SSD (Midnight)' class="merchImage img-responsive" src="/storeimages/177-1872312-1.png" width="200"/>,
 <img alt='Image For MacBook Air 13" M3; 8GB Memory; 256GB SSD (Space Gray)' class="merchImage img-responsive" src="/storeimages/177-1852252-4.png" width="200"/>,
 <img alt='Image For MacBook Air 13" M3; 8GB Memory; 2

In [38]:
for im in image:
  link = 'www.uwbookstore.com/' + im.get('src')
  image_list.append(link)
  print(link)

www.uwbookstore.com//storeimages/177-1766043-1.png
www.uwbookstore.com//storeimages/177-1872303-1.png
www.uwbookstore.com//storeimages/177-1872306-1.png
www.uwbookstore.com//storeimages/177-1872309-1.png
www.uwbookstore.com//storeimages/177-1872312-1.png
www.uwbookstore.com//storeimages/177-1852252-4.png
www.uwbookstore.com//storeimages/177-1852255-4.png
www.uwbookstore.com//storeimages/177-1852258-4.png
www.uwbookstore.com//storeimages/177-1852264-4.png
www.uwbookstore.com//storeimages/177-1872122-1.png
www.uwbookstore.com//storeimages/177-1872125-1.png
www.uwbookstore.com//storeimages/177-1872128-1.png


**Step 4: Create a data frame to store all product information**

In [39]:
df = pd.DataFrame({'Product':product_name_list,'ID':product_id_list, 'Price':price_list, 'ImageURL': image_list})
df  # Display the first 10 rows of the data

Unnamed: 0,Product,ID,Price,ImageURL
0,"MacBook Air 13"" M2; 8GB Memory; 512GB SSD (Sta...",19425308276,"$1,099.00",www.uwbookstore.com//storeimages/177-1766043-1...
1,"MacBook Air 13"" M2; 16GB; 256GB SSD (Space Gray)",19594989902,$899.00,www.uwbookstore.com//storeimages/177-1872303-1...
2,"MacBook Air 13"" M2; 16GB; 256GB SSD (Silver)",19594989947,$899.00,www.uwbookstore.com//storeimages/177-1872306-1...
3,"MacBook Air 13"" M2; 16GB; 256GB SSD (Starlight)",19594989992,$899.00,www.uwbookstore.com//storeimages/177-1872309-1...
4,"MacBook Air 13"" M2; 16GB; 256GB SSD (Midnight)",19594990037,$899.00,www.uwbookstore.com//storeimages/177-1872312-1...
5,"MacBook Air 13"" M3; 8GB Memory; 256GB SSD (Spa...",19594912417,$999.00,www.uwbookstore.com//storeimages/177-1852252-4...
6,"MacBook Air 13"" M3; 8GB Memory; 256GB SSD (Sil...",19594912511,$999.00,www.uwbookstore.com//storeimages/177-1852255-4...
7,"MacBook Air 13"" M3; 8GB Memory; 256GB SSD (Sta...",19594912605,$999.00,www.uwbookstore.com//storeimages/177-1852258-4...
8,"MacBook Air 13"" M3; 8GB Memory; 512GB SSD (Spa...",19594912464,"$1,199.00",www.uwbookstore.com//storeimages/177-1852264-4...
9,"MacBook Air 13"" M3; 16GB; 256GB SSD (Space Gray)",19594990103,$999.00,www.uwbookstore.com//storeimages/177-1872122-1...


**Step 5: Save the data to an Excel file**

In [29]:
df.to_excel('University_MacBook.xlsx', index=False)
#if index=True, Python will add one extra column of row index 1,2,3...

**Step 6: Download the Excel file to your local machine**

In [30]:
from google.colab import files
files.download("University_MacBook.xlsx")

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>



```
# This is formatted as code
```

## **Exercise**

**Can you extend the Python program to also extract the link to the product page for each product?**

A link on a webpage is enclosed by a pair of tags (`<a> </a>`). Inspecting the webpage, we know the tages of your interest have the class value of `merchLink`. Therefore, we include the link of code below to find the tag for a product:

In [41]:
link_tag = soup.findAll('a', attrs={'class':'merchLink'})
link_tag

[<a class="merchLink displayb bottom10" href="//www.uwbookstore.com/Wisconsin-Badgers/Tech/MacBook/MacBook-Air-13-M2-8GB-Memory-512GB-SSD-Starlight" tabindex="0">
 <div class="imageWrapper">
 <div class="bottom10 merchImageWrapper">
 <!--Merch Image-->
 <img alt='Image For MacBook Air 13" M2; 8GB Memory; 512GB SSD (Starlight)' class="merchImage img-responsive" src="/storeimages/177-1766043-1.png" width="200"/>
 </div>
 </div>
 <p class="merchTitle top0 textc lead" data-id="1766043">MacBook Air 13" M2; 8GB Memory; 512GB SSD (Starlight)</p>
 </a>,
 <a class="merchLink displayb bottom10" href="//www.uwbookstore.com/MerchDetail?MerchID=1872303&amp;CategoryName=MacBook&amp;CatID=28368&amp;Name=MacBook" tabindex="0">
 <div class="imageWrapper">
 <div class="bottom10 merchImageWrapper">
 <!--Merch Image-->
 <img alt='Image For MacBook Air 13" M2; 16GB; 256GB SSD (Space Gray)' class="merchImage img-responsive" src="/storeimages/177-1872303-1.png" width="200"/>
 </div>
 </div>
 <p class="merchT

The link (or partial link that does not include the head of a full url is the value of the "href" attribute inside the` <a> `tag:

In [43]:
for tag in link_tag:
  link = tag.get('href')
  print(link)

//www.uwbookstore.com/Wisconsin-Badgers/Tech/MacBook/MacBook-Air-13-M2-8GB-Memory-512GB-SSD-Starlight
//www.uwbookstore.com/MerchDetail?MerchID=1872303&CategoryName=MacBook&CatID=28368&Name=MacBook
//www.uwbookstore.com/MerchDetail?MerchID=1872306&CategoryName=MacBook&CatID=28368&Name=MacBook
//www.uwbookstore.com/MerchDetail?MerchID=1872309&CategoryName=MacBook&CatID=28368&Name=MacBook
//www.uwbookstore.com/MerchDetail?MerchID=1872312&CategoryName=MacBook&CatID=28368&Name=MacBook
//www.uwbookstore.com/Wisconsin-Badgers/Tech/MacBook/MacBook-Air-13-M3-8GB-Memory-256GB-SSD-Space-Gray
//www.uwbookstore.com/Wisconsin-Badgers/Tech/MacBook/MacBook-Air-13-M3-8GB-Memory-256GB-SSD-Silver
//www.uwbookstore.com/Wisconsin-Badgers/Tech/MacBook/MacBook-Air-13-M3-8GB-Memory-256GB-SSD-Starlight
//www.uwbookstore.com/Wisconsin-Badgers/Tech/MacBook/MacBook-Air-13-M3-8GB-Memory-512GB-SSD-Space-Gray
//www.uwbookstore.com/MerchDetail?MerchID=1872122&CategoryName=MacBook&CatID=28368&Name=MacBook
//www.uwboo

In [44]:
link

'//www.uwbookstore.com/MerchDetail?MerchID=1872128&CategoryName=MacBook&CatID=28368&Name=MacBook'

Now, can you write a loop to visit each of the MacBook product page, and scrape all detailed description for me!

In [63]:
for tag in link_tag:
  link = tag.get('href')
  product_url = 'https:' + link
  product_page = requests.get(product_url)
  macbook_page = bs(product_page.content, 'html.parser')
  details = macbook_page.find('div', class_ ="merchDesc")
  text_body = details.get_text(separator='\n', strip=True)
  print(text_body)

The price displayed is our special educational price available to UW Students, Alumni, Faculty, Staff, & UW Health Employees.
The regular price was $1199- you’re saving $250!
Super portable. Supercharged for school.
Supercharged by the next-generation M2 chip, the redesigned MacBook Air combines incredible performance and up to 18 hours of battery life into its strikingly thin aluminum enclosure.
1
Choose from four gorgeous colors and fly through any course load with style.
More Info
M2 chip with next-generation CPU, GPU, and machine learning performance
Faster 8-core CPU and 8-core GPU to power through complex school projects
2
16-core Neural Engine for advanced machine learning tasks
Go all day with up to 18 hours of battery life
1
Fanless design for silent operation
13.6-inch Liquid Retina display with 500 nits of brightness and P3 wide color for incredible images
3
1080p FaceTime HD camera with 2x resolution and low-light performance
3-microphone array
4-speaker sound system with S