Basic Scraping Exercise
Take the list of Canadian PGA winners at the Wiki link below and turn it into a list of tuples with the year, first name, and last name as entries in each tuple.
- Example list of tuples:[(2014, “Alex”, “Miller”), (2013, “Bob”, “Ross”), … ]
I encourage you to do it in 100% Python for practice (but also try out the Google Sheets method).
Here is some boilerplate PyQuery code to get you started:
import requests from pyquery import PyQuery as PQ # Insert the URL you want to scrape here url = "" # Initialize the PyQuery object r = requests.get(url) raw_html = r.text pq = PQ(raw_html) # Select just the set of elements that you want to extract # Use CSS selectors! # Once you have the elements identfied, extract the text texts =  for el in elements: # extract text # Append to "texts" list # Split the text into year, first name, and last name data =  for text in texts: # Split text into a tuple of the 3 data points # Append the tuple to the "data" list
Try to do the task without looking at this tip, but if you get stuck extracting the list items from the HTML, go ahead and click.
The CSS Selector for the table is...
ul = pq("ul") list_items = PQ(ul)("li")