This is the code used in the blog post on www.databrawl.com. This project contains script for scraping www.google.com for most popular programming blogs, identifying their traffic by scraping www.statshow.com and launching various analysis based on that data.
Note: Using Python 3.6
This will create a virtual environment and install dependencies. Launch from the desired directory you wish the project to reside. * Note: virtualenvwrapper needed.
$ mkvirtualenv blog_analysis -p /usr/bin/python3.6
$ pip install -r requirements.txt
Available commands:
- Scrape top programming blogs, classified by the programming language
$ python python blog_analysis/run.py blogs
- Scrape traffic data for the blogs
$ python blog_analysis/run.py traffic
- Analyze the traffic data and run data visualization routines
$ python blog_analysis/run.py analyzer