Skip to content
Introduction to web scraping
Jupyter Notebook
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
img
solutions correct typo in construct_absolute_url Dec 5, 2018
.gitignore
LICENSE
README.md
environment.yml
workbook.ipynb

README.md

Binder

Introduction to web scraping

Overview

This is a one-hour beginner's introuction to web scraping, using Python. We'll work through a complete example of scraping a website containing course information from a university, resulting in a dataset of almost 10,000 university courses. We'll focus on the concepts involved in web scraping rather than memorizing Python syntax.

What you'll learn

  • Why you'd want to scrape data from the web in the first place
  • A high-level view of how the web works
  • How to make a HTTP request in Python
  • How to parse HTML in Python
  • Why you need to read the Terms of Service of a website before you scrape any website

Prerequisites

Anyone is welcome at this workshop no matter what level their programming is at. That's because we'll focus on the concepts behind web scraping more than the specific syntax. This workshop will be most useful to people who have some familiarity with Python but have never done web scraping before.

IOKN2K

It's OK Not To Know! That's our motto at D-Lab. D-Lab is open to researchers and professionals from all disciplines and levels of experience. Ask any questions.

Contributing

If you spot a problem with these materials, please make an issue describing the problem.

Author

  • Geoff Bacon

Acknowledgments

  • Chris Hench

D-Lab logo

Binder

You can’t perform that action at this time.