Join GitHub today
GitHub is home to over 20 million developers working together to host and review code, manage projects, and build software together.
Fetching latest commit…
Cannot retrieve the latest commit at this time.
|Failed to load latest commit information.|
Tested on: Linux Mint 12/13, python 2.7, BeautifulSoup 4 How to install: sudo apt-get install python-qt4 sudo easy_install request sudo apt-get install python-mysqldb sudo apt-get install mysql-server sudo easy_install requests sudo easy_install beautifulsoup4 sudo easy_install pyvirtualdisplay sudo apt-get install xvfb depending on you configuration you may need some additional packages Setup: Create a ‘Site Samples Directory’ at the same level as the chains.py file. Install mysql, modify the createDB.sql file as desired and source it in the mysql prompt. At the mysql prompt, source the mywot.sql file. Make sure to modify batch.py, chains.py and mywot/mywot.py according to how you modified How to run: “-batch” tells the script to run batch mode “-skipcomments” tells the script to not collect comments from MYWOT The chains.py takes files in the following format (example): http://bit.ly/jU0kgr nonspam 1 1309265860 1309265860 ./chains.py -batch datasets/URLs_spam.txt 0 10000 -skipcomments | tee -a spamRun.txt Would follow chains starting at the first URL in URLs_spam.txt up until URL 10000 without scraping comments. batch.py is essentially the same interface however it expects a file in one of the following formats: http://bit.ly/jU0kgr http://bit.ly/jU0kgr 1234 where 1234 is the popularity of a given URL. If using popularity with batch.py you must provide the “-hasPop” flag. Located at: https://github.com/dmritard96/mywot-coverage