This project contains the code of the spider described in my blogpost Crawl a website with Scrapy.
This spider crawls the website http://isbullsh.it, and extract information about each blogpost:
- title
- author
- tag(s)
- release date
- url
- HTML formatted text
- location
We implement the spider using Scrapy.
- Scrapy:
pip install Scrapy
- pymongo:
pip install pymongo
- An installed MongoDB server
Release the spider by entering
scrapy crawl isbullshit