Skip to content

A wrapper for the scholarly library to scrape Google Scholar entries through a keyword search, and add suggested DOIs using CrossRef with habanero.

License

Notifications You must be signed in to change notification settings

ac-jorellanaf/Google-Scholar-Scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Google Scholar Scraper

A wrapper for the scholarly library to scrape Google Scholar entries through a keyword search, and add suggested DOIs using CrossRef with habanero.
Note: Scholarly requires the geckodriver browser to allow the user to solve Google captchas.

Help

Scrape Google Scholar entries by keywords.

positional arguments:
  email                 For faster Crossref API queries, we require an e-mail address

optional arguments:
  -h, --help            show this help message and exit
  -q, --querystr
                        Set a custom in-line query, rather than use a query file.
  -qf, --queryfile
                        Set a custom filepath for a .csv file of entries to scrape, rather than use the queries.csv file.
  -n, --numentries
                        Set a custom in-line number of entries to scrape, rather than use the queries.csv file.
  -op, --outdir
                        Set a custom path for the directory where the search .CSV files should be stored.
  -nm, --nomerge
                        Do not merge the .CSV files after searching.
  -nd, --nodoi
                        Do not scrape CrossRef for DOIs after the search.
  -doi, --getdoi
                        Set a directory or file of GS searches for which to find the suggested entry DOIs. Combineable with --nomerge.
  -v, --verbose
                        Verbose mode.

To-Do

  • Write a proper readme.
  • Comment the code.
  • Warm users of the peril of suggested DOIs.
  • Improve iteration method for going through GS entries.
  • Make --nodoi and the email arg mutually exclusive.
  • Warn users about captchas.
  • Add argument to allow for user to input the top n entries to merge

About

A wrapper for the scholarly library to scrape Google Scholar entries through a keyword search, and add suggested DOIs using CrossRef with habanero.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages