Scrape the Gibson
These code snippets are the core of a post I wrote about web scraping in python. It's addressed at people who have already done a bit of coding but want to explore scraping in
python in more depth. The workshop will be much easier if you have a Mac or Linux-based computer.
Download repo: https://github.com/abelsonlive/scrape-the-gibson
- If you don't have pip installed, type:
sudo easy_install pip
- change directories
- now run:
sudo pip install -r requirements.txt
- Getting started with Scraping in Python using requests
- Exploring HTML documents and extracting the data, with BeautifulSoup
- Saving scraped data to a database with dataset
- Thinking about ETL (Extract, Transform, Load)
- Keep your source data around.
- Running multiple requests in parallel to scrape faster
- Regular Expressions to Extract More Data
- Programmatic crawling of entire sites.
There are plenty of existing resources on scraping. A few links: