A literature parsing tool that compiles and analyzes publication data
Clone or download
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
biblio_reader import manager Sep 21, 2017
bibliography
geolocations monthly downloads Nov 29, 2017
inputs
outputs 1307 == 13544 Aug 30, 2017
scholar LOCAL_DATA Aug 17, 2017
table begin affiliation controlled vocabulary Jul 5, 2017
working controlled output Jul 18, 2017
.gitignore Code clean-up and documentation Jun 23, 2017
README.md
__init__.py
manager.py manager drop duplicates Aug 30, 2017
requirements.txt 📰 ⬆️ requirements.txt May 1, 2018

README.md

biblio-reader

Welcome! biblio-reader is a literature parsing tool based on Christian Kreibich's scholar.py that compiles and analyzes publications matched by Google Scholar searches. For publications found on Google Scholar between pages 1 and 99, it can do the following:

  • Compile key information from each publication (such as article title, year, authors, journal title, URL, and citations)
  • Write all key information into a CSV file
  • Look for trends in journal fields, publication growth over the years, publication types, journal impact, citations, and more
  • Find and display author information, including relationships between each author and attributed articles
  • Help users find full-text PDFs for each publication
  • Subsequently analyze and categorize full text files for each PDF
  • Map author affiliations on Google Maps
  • Facilitate manual publication review, including assigning articles to separate reviewers and analyzing their input
  • Create a sortable table displaying publications and key information about each article

Navigation

manager.py

Manager.py is the utilities manager, and provides support for reading and writing files through the inputs, outputs, and working directories. It is in charge of updating the main data CSV file with the update_data() method.

This is also where users can enter project-specific variables including marking which publications are connected to the original work of interest, and categories of search terms with regular expressions Google Scholar may have used to find them.

scholar

scholar.py is where the original Google Scholar results are compiled. (More in link)

biblio_reader

See README

inputs

Directory containing all user inputs. Journal categories and attributes have been included. For full text analysis, all PDFs should go here inside a subdirectory entitled "pdfs".

Manual review and categorization of each publication should be stored here as well, under a subdirectory entitled "article_review". It should contain csv files with specific categorization of each article that will then be analyzed by 'review_analysis.py' in biblio_reader directory.

outputs

All final outputs are stored here. This includes matplotlib graphs generated by scholar_reader.py, the main CSV file, and the reviewer assignments.

working

Provides location for intermediate files including Pubmed bibliographies, keyword paragraphs, and TXT converted PDFs.

table

Provides support for creating a sortable, viewable table HTML based on the csv file. In data_mg.py, the data can be filtered to only show specific publications based on criteria set by the user.