Introduction to web scraping with requests and BeautifulSoup.
This workshop is designed for beginners to intermediate programmers who have no or little web-scraping experience. It focuses on learning how to use Python to scrape online data that are accessible without using APIs. Some familiarity with Python is preferred but not required.
Participants will learn how to read HTML pages and the Python libraries "requests" and "BeautifulSoup" to scrape online data. We will cover some of the most common challenges (and solutions) encountered in static web-scraping.
By the end of this workshop, you will be able to...
- Read an HTML page and evaluate it
- Use the library “requests” to interact with websites
- Use the library “BeautifulSoup” to parse and get data from websites
- Understand some of the key tasks in static web scraping (missing data, errors, turn pages)
- Conceptualize web scraping as a process that goes from the website to the cleaned data
In the first part of the workshop, we will learn the basics of the libraries requests and BeautifulSoup. Then, we will use these libraries by scraping two websites: a University of Arizona website, and the IMDb movie review website.
Click on to launch and play with the workshop dynamically on MyBinder (no installation required, it should take between 30 and 60 seconds to build). You can also download the Jupyter notebook contained in this repository (in this case, you need Jupyter and Python installed on your computer).
This workshop is licensed under CC-BY-4.0 by Sabrina Nardin