Skip to content
Scripts to parse "citations page" of Google Scholar
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
parser
tests
.gitignore
CHANGELOG.md
README.md
requirements.txt

README.md

google_scholar_parser

Scripts to parse "citations page" of Google Scholar

Using scholar.py v1.5.

Requirements

It runs in Python2.7 and requires the library BeautifulSoup (version3). You can install BeautifulSoup using the command:

pip install -r requirements.txt

Usage

Beware that Google might block you if you do too many requests in quick succession.
You might want to use tor or random sleep times. For example:

import time;  
random.seed();  
n = random.random()*5;  
time.sleep(n);  

Get citations for a publication using its DOI:

python scholar.py -c 1 10.1111/j.1096-3642.2009.00627.x

Output:

         Title The radiation of Satyrini butterflies (Nymphalidae: Satyrinae)...     
           URL http://onlinelibrary.wiley.com/doi/10.1111/j.1096-3642.2009.00627.x/full  
     Citations 14  
      Versions 6  
Citations list http://scholar.google.com/scholar?cites=13407052944292989945&as_sdt=2005&sciodt=0,5&hl=en&num=1  
 Versions list http://scholar.google.com/scholar?cluster=13407052944292989945&hl=en&num=1&as_sdt=0,5  
          Year 2011   

Grab the Citations list page:
http://scholar.google.com/scholar?cites=13407052944292989945

And feed it to the script scholar_cites.py:

python scholar_cites.py http://scholar.google.com/scholar?cites=13407052944292989945

So you will get all the DOIs of publications citing your article (up to 100 DOIs):

10.1111/j.1463-6409.2010.00421.x  
10.1146/annurev-ecolsys-102710-145024  
10.1111/j.1420-9101.2011.02352.x  
10.1111/j.1439-0469.2010.00587.x
You can’t perform that action at this time.