Skip to content


Repository files navigation


License: MIT Read the Docs

"The Science of Science (SciSci) is based on a transdisciplinary approach that uses large data sets to study the mechanisms underlying the doing of science—from the choice of a research problem to career trajectories and progress within a field"[1].

The pySciSci package offers a unified interface to analyze several of the most common Bibliometric DataBases used in the Science of Science, including:

Data Set Example
Microsoft Academic Graph (MAG) Getting Started with MAG
Clarivate Web of Science (WoS) Getting Started with WOS
DBLP Getting Started with DBLP
American Physical Society (APS) Getting Started with APS
PubMed Getting Started with PubMed
OpenAlex Getting Started with OpenAlex

The pySciSci package also provides efficient implementations of recent metrics developed to study scientific publications and authors, including:

Publications Metrics
Measure Example
Interdisciplinarity - Simpsons Index Example of Interdisciplinarity
Interdisciplinarity - Shannons Index Example of Interdisciplinarity
Interdisciplinarity - RoaStirling Index Example of Interdisciplinarity
Disruption Index Example Publication Citations
Sleeping Beauty Coefficient Example Publication Citations
Novelty & Conventionality Example Novelty
Long Term Citation Example Publication Citations
Author Metrics
Measure Example
H-index Example Career Analysis
G-index Example Career Analysis
Q-factor Example Career Analysis
Annual productivity trajectories Example Career Analysis
Author Pagerank Example of Scientific Credit
Collective credit allocation Example of Credit Allocation
Career Topic Switching Example Career Topic Switching
HotStreak Example Career Analysis

Advanced tools for constructing and analyzing network objects (both static and temporal):

Network Analysis
Measure Example
Citation Network
Author Citation Network Example of Diffusion of Scientific Credit
Co-citation network Example of Cocitation Network
Co-authorship network
Co-mention network Example of Coword Mention Network
Graph2vec network embedding Example_Node2vec
Multiscale Backbone Example of Cocitation Network
Career Topic Switching Example Career Topic Switching

Natural Language Processing

  • Publication matching
  • Author matching



Latest PyPI stable release

  pip install pyscisci

Latest development release on GitHub

Pull and install in the current directory:

  pip install git+


  • To enable all extra functionality run: pip install pyscisci[nlp,hdf]
  • The requirenemnt to only use hdf tables has been removed, thus the dependency on tables is moved to an extra: pip install pyscisci[hdf]
  • Advanced NLP dependencies can be installed by running: pip install pyscisci[nlp]

Computational Requirements

Currently, the pySciSci is built ontop of pandas, and keeps entire dataframes in working memory. We have found that most large-scale analyzes can be performed on a personal computer with extended RAM. If you dont have enough computational power, consider a smaller database (DBLP or APS), or running on a cloud computing platform (Google Cloud, Microsoft Azure, Amazon Web Services, etc).

We also support basic Dask implemenations for multiprocessing. An example notebook can be found here.


See the contributing guide for detailed instructions on how to get started with our project.

Help and Support




[1] Fortunato et al. (2018). Science of Science. Science, 359(6379), eaao0185.

[2] Wang & Barabasi (2021). Science of Science. Cambridge University Press.


pySciSci was originally written by Alexander Gates, and has been developed with the help of many others. Thanks to everyone who has improved pySciSci by contributing code, bug reports (and fixes), documentation, and input on design, and features.

Original Author


Optionally, add your desired name and include a few relevant links. The order is an attempt at historical ordering.


pySciSci those who have contributed to pySciSci have received support throughout the years from a variety of sources. We list them below. If you have provided support to pySciSci and a support acknowledgment does not appear below, please help us remedy the situation, and similarly, please let us know if you'd like something modified or corrected.

Research Groups

pySciSci was developed with full support from the following:


pySciSci acknowledges support from the following grants:

  • Air Force Office of Scientific Research Award FA9550-19-1-0354
  • Templeton Foundation Contract 61066