Lots and lots of web scrapers
Python
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
boardgamegeek committing modified ids.txt with duplicates removed Oct 27, 2015
cia added games.csv Sep 19, 2015
craigslist fixed to not use relative url Aug 18, 2015
ncaabb
pastes Added Paste class to allow proper grabbing of raw Oct 11, 2015
podcasts
reddit
trackobot Adding trackobot history scraper Jan 4, 2017
.gitignore docstrings to scrapers Aug 16, 2015
LICENSE Initial commit Aug 16, 2015
README.md update readme Aug 16, 2015
requirements.txt combined requirements files Nov 29, 2015

README.md

Practical Webscraping

This repository aims to be a collection of examples for useful web scraping written in Python. All examples scripts are provided as is and are free to use according to the terms laid out in the LICENSE file.

Each subdirectory contains scrapers relevant to a single service. As an example, a subdirectory called "reddit" would contain scrapers for Reddit and nothing else.

In mose cases there will be two scrapers that accomplish the same thing. The only difference between the two scrapers is the set of libraries used in each. Most commonly scrapers will be written using the library Scrapy. If a second scraper exists it will be written using requests and BeautifulSoup.