Toolkit for discovering and aggregating data for whole-cell modeling
Switch branches/tags
Clone or download

[//]: # ( PyPI package ) Documentation Test results Test coverage Code analysis License Analytics

Datanator: Toolkit for discovering and aggregating data for whole-cell modeling



Extensive data is needed to build comprehensive predictive models of cells. Although the literature and public repositories contain extensive data about cells, this data is hard to utilize for modeling because it is scattered across a large number of sources; because it is described with inconsistent identifiers, units, and data models; and because there are few tools for finding relevant data for modeling specific species and environmental conditions.

Datanator is a software tool for discovering, aggregating, and integrating the data needed for modeling cells. This includes metabolite, RNA, and protein abundances; protein complex compositions; transcription factor binding motifs; and kinetic parameters. Datanator is particularly useful for building large models, such as whole-cell models, that require large amounts of data to constrain large numbers of parameters.

This package contains the source code for Datanator. The data aggregated with Datanator is available at

Installation instructions and documentation

Please see the documentation for installation instructions, user instructions, and code documentation.

Note, Datanator only supports Python 3.

Testing Datanator

To ensure Datanator works properly, we have developed extensive units tests of every aspect of datanator. We recommend using pytest to run these tests as follows:

python3 -m pytest tests


This software is released open-source under the MIT license.

Development team

The model was developed by the Karr Lab at the Icahn School of Medicine at Mount Sinai in New York, US.

  • Saahith Pochiraju
  • Yosef Roth
  • Balazs Szigeti
  • Jonathan Karr

Questions and comments

Please contact the Karr Lab with any questions or comments.