No description, website, or topics provided.
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
html
pdf
README.md
cookies.txt
jam.txt
list.txt
load.scr
paper.txt
paper.txt~
proxy.list
result
retractshons.ipynb
search.txt
search.txt~
top10.csv
top5.csv

README.md

so herein we have an ipython notebook that we use to get a list of urls to download

we use wget to download them using the .scr file which is from a bash script I found on the internet

every 400 pages or so google blocks us and challenges us to prove we're not a robot

we delete our existing cookies, prove we're not a robot, obtain a new cookie and save it to the working directory using a 'save cookie' plugin for firefox

when we have the pages we process them using our ipython notebook

@thatdavidmiller wrote the functions in the ipython notebook. I wrote the pandas codes and wonky regex

you can see the ipython notebook here http://nbviewer.ipython.org/github/drcjar/retractshons/blob/master/retractshons.ipynb