Scrapes citation statistics from Google Scholar
Clone or download
Pull request Compare This branch is 585 commits ahead, 110 commits behind lintool:master.
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
tablesorter
README.md
citations-bmi-yr.html
citations-bmi-yr.js
citations-bmi.html
citations-bmi.js
index-bmi.html
index-hci.html
index-stratosphere.html
index.html
people-bmi-1.json
people-bmi-2.json
people-bmi.json
people-hci.json
people-ir.json
people-stratosphere.json
scrape-citations-yr.js
scrape-citations.js
scrape.js
stats-bmi-1.js
stats-bmi-2.js
stats-bmi.js
stats-hci.js
stats-ir.js
stats-stratosphere.js
test-bmi.json

README.md

Scholar Scraper

I wrote this simple utility to scrape citation statistics of researcher profiles on Google Scholar, using it as an opportunity to learn node.js. I began with a list of information retrieval researchers, but have since expanded to include a separate list of researchers in human-computer interaction. The results are here.

Editorial note: This list contains only researchers who have a Google Scholar profile; names were identified by snowball sampling and various other ad hoc techniques. If you wish to see a name added, please email me or send a pull request. I will endeavor to periodically run the crawl to gather updated statistics. Of course, scholarly achievement is only partially measured by citation counts, which are known to be flawed in many ways. Evaluations of scholars should include comprehensive examination of their research contributions.

Rerunning the Scraper

Assuming you have node.js installed, rerun the scraper as follows:

$ npm install request cheerio async
$ node scrape.js ./people-ir.json > stats-ir.js
$ node scrape.js ./people-hci.json > stats-hci.js
$ node scrape.js ./people-stratosphere.json > stats-stratosphere.js
$ node scrape.js ./people-bmi.json > stats-bmi.js

Then open up index.html and it should display the new statistics.