A literature parsing tool that compiles and analyzes publication data
Clone or download
Type Name Latest commit message Commit time
Failed to load latest commit information.
biblio_reader import manager Sep 21, 2017
geolocations monthly downloads Nov 29, 2017
outputs 1307 == 13544 Aug 30, 2017
scholar LOCAL_DATA Aug 17, 2017
table begin affiliation controlled vocabulary Jul 5, 2017
working controlled output Jul 18, 2017
.gitignore Code clean-up and documentation Jun 23, 2017
manager.py manager drop duplicates Aug 30, 2017
requirements.txt 📰 ⬆️ requirements.txt May 1, 2018



Welcome! biblio-reader is a literature parsing tool based on Christian Kreibich's scholar.py that compiles and analyzes publications matched by Google Scholar searches. For publications found on Google Scholar between pages 1 and 99, it can do the following:

  • Compile key information from each publication (such as article title, year, authors, journal title, URL, and citations)
  • Write all key information into a CSV file
  • Look for trends in journal fields, publication growth over the years, publication types, journal impact, citations, and more
  • Find and display author information, including relationships between each author and attributed articles
  • Help users find full-text PDFs for each publication
  • Subsequently analyze and categorize full text files for each PDF
  • Map author affiliations on Google Maps
  • Facilitate manual publication review, including assigning articles to separate reviewers and analyzing their input
  • Create a sortable table displaying publications and key information about each article



Manager.py is the utilities manager, and provides support for reading and writing files through the inputs, outputs, and working directories. It is in charge of updating the main data CSV file with the update_data() method.

This is also where users can enter project-specific variables including marking which publications are connected to the original work of interest, and categories of search terms with regular expressions Google Scholar may have used to find them.


scholar.py is where the original Google Scholar results are compiled. (More in link)




Directory containing all user inputs. Journal categories and attributes have been included. For full text analysis, all PDFs should go here inside a subdirectory entitled "pdfs".

Manual review and categorization of each publication should be stored here as well, under a subdirectory entitled "article_review". It should contain csv files with specific categorization of each article that will then be analyzed by 'review_analysis.py' in biblio_reader directory.


All final outputs are stored here. This includes matplotlib graphs generated by scholar_reader.py, the main CSV file, and the reviewer assignments.


Provides location for intermediate files including Pubmed bibliographies, keyword paragraphs, and TXT converted PDFs.


Provides support for creating a sortable, viewable table HTML based on the csv file. In data_mg.py, the data can be filtered to only show specific publications based on criteria set by the user.