Skip to content
Ever wanted to learn the basics of writing a Web Crawler / Scraper ? Look no further !
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.
Chapter 1

Web Crawling 101 - On Going Project

Wondering what this is all about? Take 2 minutes to read our short Open Code, Open Data Manifesto.

This project is structured to work as a series of classes focused on bootstrapping your data-mining / web-crawling knowledge. Some of the topics that are covered here:

How do I Start ?

Keep this project Wiki open at all times, since most of the text / references will be there for you to read, while you advance through the chapters/classes of this project.

Start each chapter by going to the Wiki first, and only after reading it's text, proceed to the code.

Take your time, read the code comments, run it, modify it and run it again to understand the impact of each change.

Happy hacking :)


  1. Install pip (using terminal/command prompt navigate to the "Setup" directory and run python
  2. Reload your terminal/command prompt (open and close)
  3. Make sure pip is installed by running: pip freeze
  4. If it is, you can now install the needed dependencies by running from the root of the project: pip install -U -r Setup/requirements.txt

About Me

Marcello Lins is passionate about technology and crunching data for fun. Feel free to connect with me through Linkedin and find more about what I'm working at via my AboutMe Profile. Visit for more awesomeness !



You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.